Lexicalist Machine Translation of Spatial
Prepositions
Indalecio Arturo Trujillo
Trinity Hall
University of Cambridge
April 1995
Thesis submitted in partial fulllment of the requirements for the degree of
Doctor of Philosophy at the University of Cambridge
Preface
This dissertation is not substantially the same as any that I have submitted for a degree
or diploma or other qualication at any other university or similar institution.
No part of this dissertation has already been or is concurrently being submitted for any
such degree, diploma or other qualication.
This dissertation is the result of my own work and includes nothing which is the outcome
of work done in collaboration. Where I have adapted the theories or results of others, I
have stated this clearly in the text.
Arturo Trujillo
April 1995
1
Acknowledgments
There are many people whom I would like to mention. First and foremost I would like
to thank my supervisor Ted Briscoe, without whom this thesis would not have been possible.
His sound advice and patience have provided a reliable source of guidance throughout. Five
people have also greatly inuenced the contents herein. Ann Copestake, who made available
to me her implementation of the LKB and who answered many a question on TFSs. Antonio
Sanlippo, whose knowledge of linguistic theory was an invaluable light at the beginning of this
long tunnel. John Beaven, who introduced me to the Shake-and-Bake approach to MT. Valeria
de Paiva, who explained complex mathematical ideas with clarity and enthusiasm. John Carroll,
whose good humour and faultless technical knowledge brought much needed clarity to obscure
problems. My gratitude also goes to the Faculty and sta in the Computer Laboratory. In
particular, I wish to thank Roger Needham who, as Head of Department, made working in the
Lab a rewarding experience. Karen Sparck Jones answered questions on topics too numerous
to mention. Talks with Julia Galliers were much appreciated. Thanks also go to the sta in
the Computer Laboratory, and in particularly to Graham Titmus, Margaret Levitt and Angela
Leeke. The following researchers also gave their assistance. Manny Rayner helped me with the
CLE and BCI. The accurate insights of Dick Crouch were never amiss. Benjamn Macias gave me
advice on more than one occasion. The Master, Fellows and MCR in Trinity Hall made my stay
in Cambridge a very pleasant one. The research students in the Lab were irreplaceable during
the course of my studies. Eirik Hektoen supplied the parse-tree macros and the Norwegian data.
Tung-Ho Shih claried some points about Chinese. George Kiraz and Edmund Grimley-Evans
were superb oce-mates. Ellen Germain was an essential source of moral support. Tanya Bowden
and Malgorzata Stys helped me in many dierent ways. For the Hungarian data, Zsuzsanna Varga
was an extremely patient informant. I beneted enormously from my period in the Translation
Division of the European Parliament; warm thanks go to its director, Barry Wilson, and to the
rest of the sta. I am also extremely grateful to David Harper and the rest of the the faculty
and sta at The Robert Gordon University for helping me complete this thesis; I should mention
in particular Gareth Palmer, Malcom Souter, David Hendry, Robin Boswell, Dean Henderson
and Iain MacKenzie. Many improvements and clarications in the thesis are due to Martin
Kay and Steve Pulman, my two examiners. I thank also my friends over the years: Nicolas
Zala Flores, Perlina Montilla de Zala, Miles Osborne, David Plowman, Molly Andrews, Claudia
Medina Fras, Carlos Casta~no Bernard, Ignasi Forcada i Miro, Ornella Maietta, Emma Sangster,
Frank Liddiard and Frida Knight. No other person could have been a better proof-reader nor a
better friend than Vanessa Knights; her ability and kindness will be forever remembered. Finally,
I owe a great deal of gratitude to my family; I hope to be able to repay them one day: Ana Ines
Hernandez, Ron Franklin, Gonzalo and Alvaro Trujillo and George Franklin.
In the company of all these people, it has been a privilege to work.
Ana Ines Hernandez
Raquel Awad
Ron Franklin
Gonzalo Trujillo (1941-1975)
2
Abstract
This thesis proposes a strongly lexicalist approach to machine translation and applies
it to the translation of spatial prepositions and prepositional expressions between English
and Spanish. Bilingual contrastive knowledge resides solely in the bilingual lexicon and
is structured in the form of correspondences between sets of source and target language
lexemes related through indices. The resulting architecture maximizes the independence
of the monolingual and bilingual components. This independence is demonstrated by developing a grammar of Spanish which is signicantly dierent in its constructions from its
analogous English grammar. In particular, relative clauses are analysed through a single
rule that allows gaps in subject position, while clitic climbing and doubling are handled
through mechanisms not normally found in grammatical descriptions of English. Bilingual
lexical rules, in conjunction with the bilingual lexicon, constitute a single, motivated and
well dened mechanism for encoding bilingual knowledge. It is shown how most translation problems found in the literature can be handled with bilingual entries and bilingual
lexical rules. These problems include head switching, argument switching, lexical gaps
and lexicalization dierences. Algorithms for lexicalist transfer and generation are given
and their computational properties considered. A classication of spatial relations is proposed in the form of a type hierarchy. Each node in the hierarchy is associated with a
number of properties which are inherited uniformly by its subnodes. The cross-linguistic
validity of the classication is supported by data from English, Spanish and Hungarian.
The hierarchy is used to establish correspondences between prepositional expressions in
English and Spanish. Nouns are assigned a locative type to account for deviations from
the literal meaning of certain spatial prepositional phrases such as `on the bus/in the car';
this assignment is based on the lexicalized preposition most commonly associated with
the noun. It is shown that there is a well-dened and highly restricted set of lexicalized
prepositions. It emerges that information from dierent sources is needed for target language disambiguation. These sources include the complement noun, the preposition, the
modied constituent and the located object; they are incorporated into the translation
procedure by using target language ltering, a technique in which monolingual information in the target language is used to select between dierent translations. An evaluation
of the system based on a small corpus of examples suggests that intelligibility of translations with the approach is good, while accuracy remains comparable to that achieved by
other systems. An extensive review of recent transfer-based machine translation systems
is included in the introduction.
3
Contents
1 Introduction
1.1 Machine Translation : : : : : : : : : : : : : : : : : :
1.1.1 Motivation for Transfer : : : : : : : : : : : : :
1.2 Problems in Translation and Preposition Translation
1.2.1 Selection of Problem : : : : : : : : : : : : : :
1.2.2 Main Problems in Preposition Translation : :
1.3 Summary of Solutions : : : : : : : : : : : : : : : : :
1.4 Review of Transfer Based Systems : : : : : : : : : : :
1.4.1 Metal : : : : : : : : : : : : : : : : : : : : : :
1.4.2 The CAT Formalism : : : : : : : : : : : : : :
1.4.3 Environnement Linguistique d'Unication : :
1.4.4 Structural Correspondences in LFG : : : : : :
1.4.5 Type Rewriting : : : : : : : : : : : : : : : : :
1.4.6 The Bilingual Conversation Interpreter : : : :
1.4.7 Rosetta : : : : : : : : : : : : : : : : : : : : :
1.4.8 Shake-and-Bake : : : : : : : : : : : : : : : : :
1.4.9 Indexed Logic Transfer : : : : : : : : : : : : :
1.4.10 Transfer in the LKB : : : : : : : : : : : : : :
1.4.11 Statistical Machine Translation : : : : : : : :
1.5 Properties of Transfer in MT : : : : : : : : : : : : : :
1.6 Translation and Theories of Prepositions : : : : : : :
1.6.1 Systran : : : : : : : : : : : : : : : : : : : : :
1.6.2 Metal : : : : : : : : : : : : : : : : : : : : : :
1.6.3 Eurotra : : : : : : : : : : : : : : : : : : : : :
1.6.4 Hjelmslev's Theory of Cases : : : : : : : : : :
1.6.5 Herskovits' Theory of PP Meanings : : : : : :
1.6.6 Conclusion : : : : : : : : : : : : : : : : : : : :
1.7 Overview of the Thesis : : : : : : : : : : : : : : : : :
2 Representation for Transfer
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
2.1 Separation of KR Formalism from Transfer Representation
2.2 Representation for Transfer : : : : : : : : : : : : : : : : :
2.2.1 Original Motivation for IL Lists : : : : : : : : : : :
2.2.2 Indexed Lexemes : : : : : : : : : : : : : : : : : : :
2.2.3 Formal Properties of IL Lists : : : : : : : : : : : :
4
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
12
12
15
17
18
18
21
23
24
26
31
34
39
41
47
50
54
55
64
67
71
71
72
74
79
81
85
85
86
86
87
88
89
90
2.2.4 Transfer and Generation with IL lists
2.3 The Role of IL Lists in Transfer : : : : : : :
2.3.1 Indices in IL Lists : : : : : : : : : : :
2.3.2 Determiners : : : : : : : : : : : : : :
2.3.3 Argument Switching : : : : : : : : :
2.3.4 Passives : : : : : : : : : : : : : : : :
2.3.5 Dative Verbs : : : : : : : : : : : : :
2.3.6 Adjectives : : : : : : : : : : : : : : :
2.3.7 Copulas : : : : : : : : : : : : : : : :
2.3.8 Relative Clauses : : : : : : : : : : :
2.3.9 Sentential Complements : : : : : : :
2.3.10 Head Switching : : : : : : : : : : : :
2.3.11 Bilexical Rules in Other Problems : :
2.3.12 Lexical Gaps : : : : : : : : : : : : :
2.3.13 Anaphora Resolution : : : : : : : : :
2.3.14 IL Lists and Logical Forms : : : : : :
2.4 Adequacy of the Representation : : : : : : :
2.5 Conclusion : : : : : : : : : : : : : : : : : : :
3 Analysis and Grammars
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
3.1 Parsing with TFSs : : : : : : : : : : : : : : :
3.1.1 Algorithm : : : : : : : : : : : : : : : :
3.1.2 Rules as TFSs : : : : : : : : : : : : : :
3.1.3 Example : : : : : : : : : : : : : : : : :
3.2 English Grammar : : : : : : : : : : : : : : : :
3.2.1 PP Structure : : : : : : : : : : : : : :
3.2.2 TFSs for Categories : : : : : : : : : : :
3.3 Spanish Grammar : : : : : : : : : : : : : : : :
3.3.1 Phrase Structure Grammar of Spanish
3.3.2 Clitic Doubling : : : : : : : : : : : : :
3.3.3 Clitic Climbing : : : : : : : : : : : : :
3.3.4 Relative Clauses : : : : : : : : : : : :
3.4 IL Lists as TFSs : : : : : : : : : : : : : : : :
3.5 Conclusion : : : : : : : : : : : : : : : : : : : :
4 Lexicalist Transfer and Generation
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
4.1 Transfer : : : : : : : : : : : : : : : : : : : : : : : :
4.1.1 Transfer Algorithm : : : : : : : : : : : : : :
4.1.2 ILs in Tlinks : : : : : : : : : : : : : : : : :
4.1.3 A Modication to the Tlink Rules : : : : : :
4.2 Generation : : : : : : : : : : : : : : : : : : : : : : :
4.2.1 Brew's Algorithm : : : : : : : : : : : : : : :
4.2.2 Constructing FOLLOW for TFS Grammars
4.2.3 Reachability Constraints : : : : : : : : : : :
4.2.4 Connectivity Constraints : : : : : : : : : : :
5
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
92
93
93
96
97
98
99
100
102
102
104
105
108
110
111
112
113
115
116
116
116
118
119
120
120
121
124
126
128
132
134
137
139
141
141
141
145
146
149
149
151
155
156
4.2.5 Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 158
4.2.6 Remaining Problems : : : : : : : : : : : : : : : : : : : : : : : : : : 159
4.3 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 160
5 Classication of Spatial Relations
5.1 Properties and Semantics of Spatial Prepositions : : :
5.1.1 Vendler Classes : : : : : : : : : : : : : : : : :
5.1.2 Lexical Decomposition in Montague Grammar
5.1.3 Reference to Locations : : : : : : : : : : : : :
5.1.4 Paths and Journeys : : : : : : : : : : : : : : :
5.1.5 Lexicalization Patterns : : : : : : : : : : : : :
5.2 Multilingual Spatial Relations : : : : : : : : : : : : :
5.2.1 Spatial Relations : : : : : : : : : : : : : : : :
5.2.2 Dynamic Relations : : : : : : : : : : : : : : :
5.2.3 Static Relations : : : : : : : : : : : : : : : : :
5.3 Ambiguity and Other Relations : : : : : : : : : : : :
5.3.1 Path and Goal Alternations : : : : : : : : : :
5.3.2 Path End Static Relations : : : : : : : : : : :
5.4 Description of Hungarian : : : : : : : : : : : : : : : :
5.4.1 Classication of Postpositions : : : : : : : : :
5.5 Bilingual Correspondences : : : : : : : : : : : : : : :
5.5.1 Simple Equivalence : : : : : : : : : : : : : : :
5.5.2 Translation of Regular Alternations : : : : : :
5.5.3 Translation of Irregular Alternations : : : : :
5.6 Conclusion : : : : : : : : : : : : : : : : : : : : : : : :
6 Translation and Disambiguation of Prepositions
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
6.1 Lexicalist Translation of Prepositions : : : : : : : : : : : : : : : : : : : :
6.2 Disambiguation during Generation : : : : : : : : : : : : : : : : : : : : :
6.2.1 TL Filtering : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
6.3 Noun Knowledge : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
6.3.1 Pustejovsky's Levels of Representation : : : : : : : : : : : : : : :
6.3.2 Qualia Structure : : : : : : : : : : : : : : : : : : : : : : : : : : :
6.3.3 Locative Type : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
6.4 Target Language Disambiguation : : : : : : : : : : : : : : : : : : : : : :
6.4.1 Translation of Lexicalized Relations : : : : : : : : : : : : : : : : :
6.4.2 Lexicalized Relations in Other Contexts : : : : : : : : : : : : : :
6.4.3 Dierences in Lexicalized Relations : : : : : : : : : : : : : : : : :
6.4.4 Disambiguation of Path-End Relations : : : : : : : : : : : : : : :
6.4.5 Disambiguation Based on Complement Noun : : : : : : : : : : : :
6.4.6 Disambiguation Based on Measure Phrases : : : : : : : : : : : : :
6.4.7 Disambiguation Based on Modied Constituent : : : : : : : : : :
6.4.8 Disambiguation Based on Modied Constituent and Complement
6.4.9 Disambiguation Through Discourse Semantics : : : : : : : : : : :
6.5 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
6
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
162
162
163
164
165
166
169
170
171
172
177
186
186
189
191
192
196
196
198
198
202
203
203
205
206
207
208
208
209
212
212
214
218
219
220
221
222
223
226
229
7 Evaluation
7.1 Evaluation of Translation Systems : : : : : : : : : : : : : : : : : :
7.1.1 Test Suite Evaluation : : : : : : : : : : : : : : : : : : : : :
7.1.2 Corpus Evaluation : : : : : : : : : : : : : : : : : : : : : :
7.2 Experiment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
7.3 Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
7.3.1 Parsing : : : : : : : : : : : : : : : : : : : : : : : : : : : :
7.3.2 Scoring : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
7.3.3 Analysis of Failures : : : : : : : : : : : : : : : : : : : : : :
7.4 Comparison with Other Systems : : : : : : : : : : : : : : : : : : :
7.5 Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
7.6 Relationship to Other Translation Problems : : : : : : : : : : : :
7.7 General Problems : : : : : : : : : : : : : : : : : : : : : : : : : : :
7.7.1 Grammar Coverage : : : : : : : : : : : : : : : : : : : : : :
7.7.2 Disambiguation of Nouns, Verbs, Adjectives and Adverbs :
7.7.3 Inadequacies in the Spatial Relations Hierarchy : : : : : :
7.8 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
8 Conclusion
8.1
8.2
8.3
8.4
Key Ideas : : : : : : : :
Principal Characteristics
Objections : : : : : : : :
Future Research : : : : :
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
A Sentences for Development
B Questionnaire for Sentence Construction
C Questionnaire for Assessing Translation Quality
D Unanalysed Sentences (for Testing Scalability)
E Testing Scaled System: Input/Output
Bibliography
7
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
230
230
231
232
232
233
233
234
235
236
239
240
243
244
244
245
246
248
248
249
250
252
254
256
258
263
264
266
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
Approaches to machine translation. : : : : : : : : : : :
Result of parsing in Metal. : : : : : : : : : : : : : : : :
Generation with restructuring in Metal. : : : : : : : : :
C-, f- and semantic structure for `the baby fell'. : : : :
Structural correspondences at dierent linguistic levels.
Translation path in Rosetta. : : : : : : : : : : : : : : :
English, Interlingua and Dutch isomorphic structures. :
Simple typed feature structure. : : : : : : : : : : : : :
Invalid type hierarchy. : : : : : : : : : : : : : : : : : :
Simple type hierarchy : : : : : : : : : : : : : : : : : : :
Extending type sign. : : : : : : : : : : : : : : : : : : :
TFS and associated DAG with reentrant features. : : :
A simple example of unication. : : : : : : : : : : : : :
Meat - vless tlink-rule. : : : : : : : : : : : : : : : : : :
Structure of `property' feature system. : : : : : : : : :
Structure of `entity' feature system. : : : : : : : : : : :
Semantic features for PP translation. : : : : : : : : : :
Hjelmslev's Three Dimensional Theory of Case. : : : :
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
::::::
Recognition with an active chart. : : : : : : : : : : : : : : : : : :
Simple rules implemented as TFSs. : : : : : : : : : : : : : : : : :
Outline of the English grammar used. : : : : : : : : : : : : : : : :
Pollard and Sag (1987) signs and signs used in the system. : : : :
CF back-bone of the Spanish grammar used. : : : : : : : : : : : :
Portion of the Spanish type hierarchy. : : : : : : : : : : : : : : : :
Simple sentence with corresponding rule and category. : : : : : : :
Lexical entry for ve. : : : : : : : : : : : : : : : : : : : : : : : : : :
Fuller TFS for ve. : : : : : : : : : : : : : : : : : : : : : : : : : : :
Lexical entry for da. : : : : : : : : : : : : : : : : : : : : : : : : :
Parse tree for John le da el dinero a ella. : : : : : : : : : : : : : :
Spanish clitic climbing. : : : : : : : : : : : : : : : : : : : : : : : :
Analysis of: le quiere intentar poder dar el dinero. : : : : : : : : :
Castel's and GKPS's analyses respectively. : : : : : : : : : : : : :
Castel's analysis of Mara compro un libro. : : : : : : : : : : : : :
Trees for \the dog that is in the park" and \the dog that I saw". :
Indexed lexemes for `the cat'. : : : : : : : : : : : : : : : : : : : :
8
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
13
25
25
34
35
47
48
55
56
56
57
58
59
62
75
76
79
80
117
119
120
122
127
127
128
129
130
131
131
132
134
135
136
138
139
3.18 Index types: event, object, relation. : : : : : : : : : : : : : : : : : : : : : : 139
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
6.1
6.2
6.3
6.4
6.5
Cover-SL-List algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : 143
Computing Cover-SL-List([go; outside]). : : : : : : : : : : : : : : : : : : 144
Direct bilexical entry. : : : : : : : : : : : : : : : : : : : : :
Tlink for `young bull $ novillo' : : : : : : : : : : : : : : :
Bilexical rule for translating between `just' and acabar de.
Bilexical entry for `just arrived $ acaba de llegar'. : : : : :
Unconstrained adjacency graph. : : : : : : : : : : : : : : :
Middle and nal stages in constraint propagation. : : : : :
Applying the negative restrictor = forthg. : : : : : : :
Constraint propagation with unordered adjectives. : : : : :
Arrows arising from PP modication. : : : : : : : : : : : :
Connectivity graph. : : : : : : : : : : : : : : : : : : : : : :
Connectivity graph for two impossible subparses. : : : : :
::::
::::
::::
::::
::::
::::
::::
::::
::::
::::
::::
Simple event type hierarchy. : : : : : : : : : : : : : : : : : : : : :
Type hierarchy of spatial relations (arrows indicate lexical rules).
Lexical type for Spanish preposition desde. : : : : : : : : : : : : :
Lexical type for Spanish preposition hasta : : : : : : : : : : : : :
Indices for `across'. : : : : : : : : : : : : : : : : : : : : : : : : : :
Indices for `towards'. : : : : : : : : : : : : : : : : : : : : : : : : :
TFS for the preposition `in'. : : : : : : : : : : : : : : : : : : : : :
Dierent points of view for the same scene. : : : : : : : : : : : : :
TFS for `in front of'. : : : : : : : : : : : : : : : : : : : : : : : : :
Lexical rule for intrinsic to path alternation. : : : : : : : : : : : :
Lexical rule type p-lexicalized-2-goal. : : : : : : : : : : : : : :
Path to path-end lexical rule. : : : : : : : : : : : : : : : : : : : :
Hungarian spatial relations hierarchy. : : : : : : : : : : : : : : : :
Spanish spatial relations hierarchy. : : : : : : : : : : : : : : : : :
Bilingual lexical entry for `inside - dentro de'. : : : : : : : : : : :
Bilexical rule for regular translation mapping. : : : : : : : : : : :
Summary of translation equivalences. : : : : : : : : : : : : : : : :
Parse tree for `Mary waits inside the hotel'. : : : : : : : : : : : : :
IL list after analysis. : : : : : : : : : : : : : : : : : : : : : : : : :
Copy of bilexical entry for `inside - dentro de'. : : : : : : : : : : :
Bag after transfer. : : : : : : : : : : : : : : : : : : : : : : : : : :
Asher and Sablayrolles' seven generic locations. : : : : : : : : : :
9
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
145
146
147
148
150
151
153
156
156
157
158
164
171
174
175
177
177
181
182
184
186
188
190
193
197
197
199
201
204
204
205
206
228
List of Tables
1.1 Parallel derivations from Interlingua structure. : : : : : : : : : : : : : : : : 49
3.1 Relative clauses, clitic climbing and clitic doubling data. : : : : : : : : : : 125
4.1 Eect of pruning technique on dierent constructions. : : : : : : : : : : : : 158
5.1
5.2
5.3
5.4
5.5
Co-occurrence for some nouns and lexicalized prepositions.
Frequencies for determining the locative type of a noun. : :
Hungarian locative cases and postpositions. : : : : : : : :
Hungarian common nouns and lexicalized prepositionss. : :
Regular translations for path and goal alternations. : : : :
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
179
180
193
195
198
6.1 Comparison between Asher and Sablayrolles and present proposal. : : : : : 228
7.1
7.2
7.3
7.4
7.5
7.6
7.7
Intelligibility percentages: out of all sentences, and up to score 4.
Accuracy percentages: out of all sentences, and up to score 5. : :
Intelligibility percentages compared: for all sentences. : : : : : : :
Intelligibility percentages compared: up to 4. : : : : : : : : : : : :
Accuracy percentages compared: for all sentences. : : : : : : : : :
Accuracy percentages compared: up to 5. : : : : : : : : : : : : : :
Percentage of correct or acceptable translations for two systems. :
10
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
234
234
237
237
237
238
238
List of Main Abbreviations
AAAI American Association for Articial Intelligence
AI Articial Intelligence
ACL Association for Computational Linguistics
BCI Bilingual Conversation Interpreter
BNF Backus-Naur form
CAT Contructor, Atom, Translator
CF, CFG Context-Free, Context-Free Grammar
CFPSG Context-Free Phrase Structure Grammar
CG Categorial Grammar
CL Computational Linguistics
CLE Core Language Engine
DAG Directed Acyclic Graph
DRS, UDRS (Underspecied) Discourse Representation Structure
ELU Environnement Linguistique d'Unication
FOL First Order Logic
GPSG Generalized Phrase Structure Grammar
HPSG Head-driven Phrase Structure Grammar
ICSLP International Conference on Speech and Language Processing
IJCAI International Joint Conference on Articial Intelligence
IL Indexed Lexeme
IS Interface Structure
KB Knowledge-Based
LDC Linguistic Data Consortium
LF Logical Form
LFG Lexical-Functional Grammar
LKB Lexical Knowledge Base
LR Left-to-right Rightmost
MT Machine Translation
MRS Minimal Recursion Semantics
MUC Message Understanding Conference
NLP Natural Language Processing
P-A Predicate-Argument
PP, NP, VP, S Prepositional Phrase, Noun Phrase, Verb Phrase, Sentence
PS Phrase Structure
QLF Quasi-Logical Form
SB Shake-and-Bake
SL Source Language
SOV, SVO Subject-Object-Verb, Subject-Verb-Object
TFS, FS Typed Feature Structure, Feature Structure
TL Target Language
TMI Theoretical and Methodological Issues in Machine Translation
TREC Text REtrieval Conference
UCG Unication Categorial Grammar
11
Chapter 1
Introduction
This thesis describes an approach to the Machine Translation (MT) of spatial prepositional
expressions which is modular and practical, and which overcomes certain diculties found
in previous approaches to this problem. The distinguishing features of the proposal are a
strongly lexicalist perspective, a transfer architecture which maximizes the independence
of monolingual components, a motivated classication of the spatial relations found in
natural languages, and a modular disambiguation method for translation.
In this chapter I introduce the eld of MT, describe the problems considered in the
thesis, and outline the solutions pursued. I will begin by describing the strategies generally
adopted in tackling the translation problem and justify the one I have adopted. Then I
present the problems that arise in the translation of spatial expressions indicating why this
particular problem oers a good case for research in MT. There follows a brief introduction
to the solutions to these problems as developed in the thesis, together with indications
about their novelty and ecacy. The rest of the chapter concentrates on existing transfer
systems, pointing out their main inadequacies and how these bear on the development of
a new system. The chapter also includes a description of systems that have considered
the problem of translating prepositions and the compromises that have been required to
tackle this dicult task. An overview of the remainder of the thesis concludes the chapter.
1.1 Machine Translation
One of the rst applications in symbolic processing was the translation of human languages,
as the early references noted by Hutchins (1986) suggest. Initially, early techniques of code
breaking were used to translate texts, hoping that this problem was an instance of the
deciphering problem; however, it was quickly realized that the translation problem was
much more involved than this. In fact, it is so complex that the initial goals of fully
automatic, high quality MT are now seen as distant goals which may or may not be
achieved.
In order to understand the various MT paradigms Figure 1.1 is often used; this diagram,
adapted from Hutchins and Somers (1992:107) but rst used by Vauquois (1968), shows
the amount of analysis and generation that each paradigm requires, together with the
degree of similarity between the source and target representation. For instance, in the
direct approach only a small amount of analysis is performed, with the consequence that
12
the translation step has to perform very complicated operations and rearrangements. I
interlingua
analysis
semantic
transfer
syntactic transfer
generation
direct
SL
TL
Figure 1.1: Approaches to machine translation.
will briey describe the development of MT systems using Figure 1.1 as the format for the
order of the description.
Early approaches to MT involved very restricted computational resources and almost
non-existent formal linguistic theories: a system would have had approximately 250 source
language (SL) words and half a dozen rules for disambiguation and TL rearrangement.
Translation was performed on an essentially word-for-word basis. This approach led to
unsatisfactory results and ultimately to disappointment. After the ALPAC report in 1966,
MT funding disappeared almost completely in the United States, save for a few research
centres such as the LRC in Austin, TX, due to these negative experiences; in the rest of the
world the conclusions of the report were also inuential but to a lesser degree; for example,
research continued in France and in Germany on a number of systems including GETA
and SUSY. Nevertheless, most MT funding was diverted into Computational Linguistics
and Articial Intelligence.
Systran, one of the rst commercial MT systems to appear, was a development of the
pioneering eorts in the 50s and early 60s, especially those by the team at Georgetown
University prior to the ALPAC report. Systran has evolved over three decades to make
it one of the most widely used MT systems in the market. Its success derives from its
extensive phrasal dictionaries and from its ecient coding of various translation routines.
Systran's architecture is usually classied as rst generation or direct, in which the distinguishing feature is the absence of a separate analysis stage prior to transfer. Although
the system now includes features of second generation systems, the notion of rst generation MT is still relevant not only historically but also because it best describes much
current translation software for use with personal computers. In its purest form a direct
approach involves word-for-word translation with extensive string matching and reordering. One problem which is inherent to direct MT systems is that they do not take into
consideration the syntactic structure of the sentence nor the semantic relationships that
exist between words. In addition there is no way of ensuring the well-formedness of target
language (TL) expressions due to the absence of grammatical rules. Thus a sentence such
as Juan la vio could be translated as `John the saw' by a direct system instead of the
correct `John saw her' due to the homonymity of Spanish la. However, one property of
direct approaches which will be relevant to this thesis is that translation equivalences are
established on the basis of directly observable pairings such as would be found in a bilingual dictionary, phrase book or language teaching textbook. I will be contrasting these
13
equivalences against those based on more abstract representations such as the syntactic
or predicate argument structure of a sentence.
The many shortcomings of rst generation MT systems, coupled with developments
in Natural Language Processing (NLP), Computer Science and Theoretical Linguistics,
led to `indirect' approaches to translation of which the rst to become viable was the
transfer paradigm. The most important property of a transfer system is the existence of
a transfer module which maps SL intermediate representations into TL ones; it is this
additional module that gives these systems (and to a certain extent direct systems) their
characteristic O(n2) growth
on the number of languages, since for n languages there will
n n2 ?n
have to be at least 2 = 2 transfer modules. That is, each language will have analysis
and generation components, and, in addition, there will be a transfer module (two in the
case of unidirectional systems) for each pair of languages. This polynomial increase in
system size is one of the greatest drawbacks of transfer systems and much care must be
taken in the design of a transfer system to ensure that the compilation of the transfer
module is as inexpensive as possible.
One may distinguish three types of transfer system: syntactic, semantic and mixed;
I will consider each in turn. The principal operations in syntactic transfer systems are
tree-to-tree transformations which convert SL syntactic structures into TL ones. Good
examples of the syntactic transfer approach are the Metal and ARIANE systems; in the
former tree-to-tree transformations are used by indexing them to individual grammar
rules and applying them after analysis. In the latter, an interface structure is constructed
during analysis which is then transferred into a target interface structure and subjected to
further restructuring (syntactic generation) before morphological generation is performed
(Vauquois and Boitet 1988). The Metal system will be described in more detail in Section
1.4.1, given its increasing use as a commercial system (Fontenelle et al. 1994).
Semantic transfer systems construct a representation of the meaning of a sentence which
is nevertheless dependent on the SL. This representation may take the form of predicateargument structures or some other formalized representation of meaning. Transfer in such
systems involves principally predicate translation, although structural transfer rules are
also used to overcome discrepancies in SL and TL representations. An example of semantic
transfer is the BCI as described by Alshawi et al. (1992); in this system transfer is eected
at the level of Quasi-Logical Form (QLF), a logical representation derived mainly from
the syntactic structure of the sentence. In QLF syntactic and semantic relations in the
SL are identied and made explicit; also, many structural dierences between SL and TL
disappear. For example, a sentence such as `Mary gives John the dog' is given a canonical
representation in which `Mary' is the subject of `gives' and where `the dog' and `John'
are the direct and indirect objects respectively. Transfer of this canonical structure would
only require predicate translation before being passed to the TL generator. The BCI will
be considered in detail in Section 1.4.6.
In mixed transfer systems, syntactic, functional, semantic and sometimes pragmatic
information is brought to bear on the expression of transfer relations. By using multiple
levels of information, mixed transfer systems can encode translation equivalences at the
level which is most appropriate to the languages and construction at hand. For instance,
the translation of a passive sentence into a language which has passives can best be effected at the grammatical functional level rather than at the predicate-argument level in
14
order to preserve grammatical structure. Instances of this approach include the structural
correspondences of Kaplan et al. (1989) and the type rewriting mechanism of Zajac (1989),
both of which will be elaborated on in Sections 1.4.4 and 1.4.5 respectively.
The other alternative to direct MT is the interlingua approach where the main goal
is the elimination of the transfer component such that the addition of new languages to
the system only requires the construction of new analysis and generation modules for the
new language. Although the interlingua idea was tried relatively early on in the history
of MT, it is only recently that this approach has been used for practical systems. At
the heart of an interlingua system is a language independent representation into which
the analyser maps the input sentence and from which the generator constructs the TL
sentence. For example, Dorr (1992) uses an extended version of the lexical conceptual
structure (LCS) of Jackendo (1990) as an interlingua into which sentences in the SL
are mapped and out of which TL sentences are generated. Elimination of the transfer
step leads to an increase in the complexity of the analysis and generation modules: the
analysis module must be capable of constructing a representation of a sentence equivalent
to that derived by other language modules, while the generation module has to be able to
construct a sentence from the interlingua produced by other language modules. Carbonell
et al. (1992) distinguish between interlingua and knowledge-based (KB) systems. The
former require a language independent representation which need not have knowledge
of the subject domain. By contrast, KB translation uses lexical, syntactic, semantic,
pragmatic and domain knowledge in order to disambiguate and generate TL sentences.
An interlingua may take the form either of a natural language such as English or Esperanto
as in the DLT project (Schubert 1988), or a formalism like Montague's Intensional Logic
as used in the Rosetta project (Landsbergen 1987); on the other hand, KB systems use
AI techniques such as semantic networks and frames to derive the interpretation of the SL
sentence before translating it (Nirenburg et al. 1992).
1.1.1 Motivation for Transfer
It has been a long-standing debate whether transfer or interlingua is the best approach
to MT. It has also been argued that the interlingua/transfer distinction is not a real
distinction; in other words, it is said that MT systems can in principle span a spectrum
of designs ranging from pure interlingua, through systems using language dependent and
language independent predicates, to systems eecting transfer between purely language
dependent predicates and structures. However, most systems, be they experimental or
commercial have, on the main, adopted one or the other paradigm and then have evolved
to include design decisions from the alternative paradigm (some recent systems do not
follow this line, as I will show); this is the strategy to be followed.
I have chosen a transfer design for the following reasons. Firstly, the specication of
an appropriate interlingua which will include all the necessary information for deriving a
sentence in any language is beyond the capabilities of current technology. For example,
what is the interlingua representation of `blue'? If one only considers English and Spanish
its representation could be some constant c1, since both languages have equivalent words
for this notion. However, it is well known that Russian does not have a single word for
`blue', having instead the two words goluboi (pale blue) and sinii (dark blue). To cope with
15
this situation one would have to specify two interlingua constants, corresponding to each
of the Russian words. Unfortunately, this would not ensure that these two constants were
sucient for any other language that one might like to add to the system. Furthermore,
similar situations arise with almost any word thus undermining the possibility of specifying
a cross-linguistically valid interlingua once and for all. In a transfer system, where equivalences are established at the bilingual level, discrepancies between languages are localised
in the transfer component and therefore only become an issue when translating between
the languages concerned. In other words, in a transfer system, disambiguation of `blue'
need only be carried out for English-Russian translation and not for English-Spanish.
Another problem is that of dening an algorithm which will construct a canonical
interlingua representation for an arbitrary sentence. Since the canonical interlingua must
serve as input to all the generation components, each monolingual component must be
capable of mapping from this representation into an appropriate TL sentence. However,
to achieve this on a cross-linguistic basis is dicult because there is as yet no practical
way of determining what constitutes a canonical representation. For instance it could be
argued that one sense of `across' and its Spanish translation al otro lado de should not
result in the same interlingua representations. For one thing, there is a noun in the Spanish
phrase which is not present in its English equivalent; moreover, al otro lado de is one of the
translations of other prepositions including `through' and `over'. It may be replied that the
problem here is one of a mismatch between syntax and semantics, and that al otro lado de
is simply the syntactic correlate of a semantic structure corresponding to that of `across'.
However, this argument would lead to the construction of equal interlingua representations
for `across' and `on the other side of', since al otro lado de can also translate as `on the
other side of'. But `across' and `on the other side of' are not perfect synonyms, since the
following two sentences do not mean the same:
He put his chewing gum on the other side of the window.
He put his chewing gum across the window.
Although subsequent chapters will show that there are indeed certain commonalities between `across' and al otro lado de, the interlingua requirement that their representations
be equivalent is too strong. I have adopted the weaker position of monolingually motivating a representation for al otro lado de and then equating this representation with that for
`across'.
A related problem is that one interlingua structure usually corresponds to more than
one linguistic expression. For instance, Dorr (1992:160) assigns `I gave John the gift' and
`I gave the gift to John' the same LCS. Thus, if her system is fully reversible, generation
from this single LCS should result in those two sentences. This means that the equivalent
of either sentence in a language such as Hungarian, which allows this type of dative shift,
would result in two English sentences, thus introducing unnecessary ambiguity into the
translation task.
The system that I will develop incorporates some of the advantages of dierent approaches to translation. Firstly, the system is of the transfer type with cross-linguistic
knowledge residing solely in the bilingual lexicon (bilexicon henceforth); this makes the
construction of the transfer module less time consuming and more independent of monolingual linguistic descriptions whilst also establishing transfer relations at a level close to
16
the surface form of a sentence, which is one of the main commending properties of rst
generation MT systems. Secondly, translationally relevant syntactic and semantic relations within a sentence are identied and used during transfer and generation, such that
they obtain in the TL expression. Finally, the individual equivalences established in the
bilexicon between the spatial prepositions of English and Spanish are circumscribed by a
cross-linguistically valid classication of the spatial relations found in natural languages;
in this way, the generalizations advocated by the interlingua philosophy are not ignored.
1.2 Problems in Translation and Preposition Translation
Having described the historical background against which the developments in this thesis
take place, and the general approach to MT adopted, I now move on to a more specic
description of the problems tackled tackled here. These problems comprise only a very
limited subset of the range of issues relevant to MT in general, given that a complete
resolution of these would require answers to long-standing philosophical, theoretical and
practical questions in NLP and AI. For example, on the philosophical side there is the
thesis of the indeterminacy of translation of Quine (1960) who argues that there are
uncountably many translation procedures giving incompatible translations between two
languages, yet, each procedure being in accordance with all the translational evidence
available. Theoretically, the specication of an adequate semantic theory for the expression
of linguistic and world knowledge and its application to translation has not yet been
achieved. In practical terms current theories of meaning, language and translation pose
a number of formal, computational and descriptive diculties which, although currently
under investigation, are far from being resolved. Consequently, the eort undertaken here
constitutes a fraction of what is required to solve the MT problem.
Ambiguity is the biggest problem in MT. It aects the translation of most words, as
the following examples, taken from Garca-Pelayo (1988) show:
mesa: table, bureau, desk, writing desk, board, general committee, plateau, at, etc.
The selection of a single translation from the many translations possible is a very dicult
problem since it involves complex interactions between syntactic, semantic, collocational,
pragmatic, stylistic and world knowledge.
A number of other linguistic problems in MT stem from the dierent grammatical
mechanisms used by languages in expressing equivalent meanings. As an example consider
the use of articles in English and Spanish. Although in many cases the two languages
coincide in their use of these, often an article is appropriate in one language but not in
the other:
Eng: Mary runs in the park. Babies cry.
Spa: Mara corre en el parque. Los bebes lloran.
Many other more specic problems arise in MT; these are adequately documented by
Hutchins and Somers (1992) and Arnold et al. (1994).
17
1.2.1 Selection of Problem
I have chosen to investigate the problem of translating spatial expressions. These expressions are used to indicate the location of an object or event in space and usually involve
prepositions or prepositional expressions. Some typical examples of spatial expressions
include (Sparck-Jones and Boguraev 1987):
Mr Brown is at the oce
We walked along the river bank He swam across the river
They lost themselves in the fog The plane ew over the town
We set sail from Liverpool
There are three main reasons for investigating spatial prepositions. Firstly, prepositions have both syntactic and semantic characteristics which require an approach to their
translation to include considerations of both of these issues. That is, the translation of
these words is not completely syntactic, because prepositions obviously contribute to the
meaning of a sentence:
He puts it on the box 6= He puts it in the box.
Compare this with the near synonyms:
She sits on the bus.
She sits in the bus.
in which the particular preposition makes a reduced contribution to meaning. There also
appear to be certain syntactic constraints on prepositional usage which need to be taken
into account in any description of their behaviour:
They stood two metres inside King's Chapel.
* They stood two metres in King's Chapel.
Secondly, meanings in the spatial domain are more concrete than in other semantic
elds, allowing a more direct verication of translation equivalence. For example, one can
dene the meaning of a spatial expression as the correct behaviour of a motional agent
such as a robot acting on that expression and use this denition to establish whether two
sentences in dierent languages have the same meaning. Alternatively, one could provide
an arrangement of objects and attempt to describe the position of one object in relation to another; expressions which described the same arrangement would be considered
equivalent.
Thirdly, since prepositions are a relatively closed class, a fairly complete description of
their spatial subset can be achieved, thus providing an overall view of their behaviour and
main features.
1.2.2 Main Problems in Preposition Translation
Many problems in MT need to be tackled when translating spatial prepositional phrases
(PPs). These problems include:
Unavailable Distinction A distinction made in one language is not made in another
language. For example, spatial `in' and `on' correspond roughly to Spanish en which
is insensitive to whether its complement is a surface or a volume.
18
Lexicalization Patterns Meaning components are distributed dierently in the lexical
items of dierent languages. In an expression such as `he walked behind the lamppost
(to the shop)' the notion of the path described by the walking and of the position of
this path in relation to the lamppost are both encoded in the preposition `behind'.
By contrast, the Spanish translation of this sentence, camino por detras del farol,
has two prepositions, one encoding the concept of path, and another describing that
of behindness.
Ambiguity A word in one language has more than one translation into another language.
This is probably the biggest translation problem. A simple example is the case of
the spatial preposition entre in Spanish which can translate either as `among' or
`between':
She is between John and Peter She is among the crowd
Esta entre Juan y Pedro
Esta entre la muchedumbre
That entre is in fact ambiguous, and not just underspecic, can be corroborated by
the reluctance of its two senses to conjoin:
* Esta entre Juan y Pedro y la muchedumbre.
Lexical Gaps The concept expressed by one word in one language can only be expressed
by a phrase in another language. For example, the sense of `across' in `the house is
across the street' translates into Spanish as the phrase al otro lado de. The main
problem is with the representation that should be assigned to the Spanish multi-word
expression al otro lado de. In a transfer system, it will be necessary to represent it
either as an idiom or by using a representation which combines structural and lexical
information. The rst alternative is unsatisfactory because this expression is not
perceived as an idiom in Spanish; in fact, its meaning is partly compositional (cf.
`on the other side of'). The second alternative makes the transfer component depend
on the structure assigned to this phrase by the Spanish grammar thus aecting the
modularity of the transfer component. As for using an interlingua, some of the
incumbent diculties were outlined in Section 1.1.1.
Object Knowledge Information regarding the lexical semantics of a word is needed for
translation. For example, `under' can translate as either bajo or debajo de, but in
`under the sun' it should translate as bajo; part of the reason for this is that `sun' is
a celestial object. The problem in this case is the incorporation of this knowledge in
a well-dened and ecient representation which can be motivated monolingually.
The above problems will be described in more detail in the main body of the thesis. Another problem which is specic to PP translation is the problem of preposition stacking,
as described by Durand et al. (1991:119). Preposition stacking occurs when the complement of a preposition is itself a prepositional phrase. Some examples taken from Durand
et al. (1991) are given below.
Out from under the bed.
Researchers from within the community.
19
The diculty here is deciding what representation to assign to these expressions. In a syntactic transfer approach, the problem of preposition stacking could be tackled by writing
tree-to-tree transformations which would derive the appropriate TL syntactic structure.
However, this is time consuming because these transformations tend to become a repetition
of the source and target grammars, leading to redundancy and possible inconsistency. For
example, the phrase `from within' could be translated via the transfer rule (@x represents
a translation variable):
(PP from (PP within @x)) $ (PP de (PP dentro de @x))
The problem with this is that additions to either the source or target grammar would
require changes to the transfer component. Thus, coverage of additional phenomena such
as adverbial modication of PPs, as in:
They came directly from within the community.
would not only involve a bilexical entry to translate `directly', but also a rule to allow
construction of an appropriate TL syntactic tree for the equivalent structure in Spanish.
When transfer takes place at the semantic level, such problems may be avoided, but
then the problem that has to be tackled is the precise description of the semantic representation of prepositions and the way analysis, generation and transfer operate and interact.
This issue will receive further attention later.
Finally, another problem which will play a prominent role in this thesis is the `bus/car'
problem, exemplied by the phrases:
Spa: Viajan en el bus.
Viajan en el coche.
Eng: They travel on the bus. They travel in the car.
The diculty here arises when selecting the appropriate preposition for translating Spanish
en. There are three aspects to this problem:
1. `On' is the preposition normally associated with `bus' whilst `in' is associated with
`car', as shown in the example above. In this case the problem is that of adequately
characterizing the dierences between `bus' and `car' that lead to these preferences.
2. The expression `on the bus' is ambiguous, and may be paraphrased as either `inside
the bus' or `on top of the bus'. This ambiguity is not possible with `on the car'
which only allows the paraphrase `on top of the car'. Here one has to ask whether
this ambiguity is a property of `bus' and `car', or of `on' or both.
3. There is a certain amount of freedom regarding the use of `on' and `in' with the noun
`bus': `You can eat in/on the bus'. The issue here is how this ambiguity is to be
represented.
One possible approach to the rst aspect would be to adopt the Lexical Functions of
Mel'cuk and Zholkovsky (1988) who argue that in cases where equal meanings are expressed by dierent word combinations of the type just shown, a lexical function is at
play. There are dierent lexical functions for dierent meanings; for instance, Locin (p.
57) when applied to either `bus' or `car' would return the appropriate preposition for the
noun: Locin(bus) = `on', Locin(car) = `in'. The solution to be developed in Section 5.2.3
is compatible with this view of lexical relationships.
20
1.3 Summary of Solutions
As an indication to the goals and achievements of the thesis, and also as an introduction
to the MT system implemented, I summarize below the principal contributions I hope to
make with this work.
The two main guiding principles for the system developed here have been practicality
and modularity. While the interlingua ideal should not be abandoned, especially since
there has been much progress in the specication of interlingua-like formalisms, particularly in the realm of tense and aspect (Allegranza et al. 1991:44-65), and also because, as
Dorr (1992:142-43) rightly points out, an interlingua can capture cross-linguistic generalizations in a much more concise manner than a transfer system, most practical systems
that exist have found the transfer approach the best paradigm for achieving a reasonable
degree of coverage, maintainability and generality.
Modularity is an important goal from both the theoretical and engineering points of
view. Theoretically, the identication and formal description of similarities and dierences
between languages is best done in a separate transfer module containing precise representations of these relationships such that they may be tested and studied by applying them
to the translation task. That is, even the specication and construction of an interlingua
will be advanced by the study and renement of transfer relations, since such a study is
the most accurate way of isolating the generalizations that an interlingua presupposes.
From an engineering perspective, a system must be such that changes to one component
should have minimum eect on other components. In particular, a change in a monolingual module should not induce changes in the transfer component and certainly not imply
changes in the other monolingual modules. Achieving modularity with a transfer architecture requires a transfer representation which is independent of monolingual grammatical
descriptions and which is powerful enough to express cross-linguistic generalizations.
To achieve practicality and modularity, I have adopted the lexicalist MT paradigm
described by Whitelock (1992) and Beaven (1992a) and have enhanced it with a version
of the bilingual lexical rules of Copestake et al. (1993) interpreted in the way suggested
by Trujillo (1992) in order to express cross-linguistic generalizations in a succinct manner.
In the lexicalist approach to MT, transfer is restricted to the mapping of the SL lexical
items occurring in the input into sets of TL lexical items. During analysis syntactic and
semantic relations are established between the lexemes in a sentence; then, the resulting set
of interrelated lexical items is mapped into its corresponding set of TL lexical items which
then forms the input to the generation module. This module is a modied parser which,
using the TL grammar, orders the output of transfer into a valid TL sentence; grammars
are therefore fully reversible. To this basic strategy I have added bilingual lexical rules.
These rules encode regularities between items in the bilexicon by establishing mappings
between input and output entries. A modication to the rules of Copestake et al. (1993)
interprets the output of such rules as sets of lexemes rather than as phrasal signs. For
example, from the bilingual pair `apple - manzana' a bilingual lexical rule derives the
related pair `apple tree - manzano' automatically. The original description of lexicalist
MT will be fully described in Section 1.4.8 under the title of Shake-and-Bake MT.
At the level of PP translation this thesis will develop solutions to the problems introduced in Section 1.2.2 above; a summary of these is given below.
21
Unavailable Distinction To cope with the fact that the `in/on' distinction is not made
in Spanish, I have treated the use of these two prepositions as a highly language specic characteristic. Thus, each noun in a language is marked with the preposition
which, when combined with the noun, gives rise to a non-predictable meaning; this
marking is called the locative type of the noun and indicates that the interpretation
of a PP with a locative preposition is not compositional, being dened instead by
convention in a partially non-compositional way. Motivation for the locative type is
derived from a hierarchical classication of the dierent spatial relations expressed
by prepositions in natural languages; this classication leads to a very limited and
well-dened set of prepositions which are used in dening locative types. Since the
value of a locative type is a preposition, diculties found in previous approaches
which assigned an abstract conceptualization or dimensionality to a noun are overcome. These diculties included deciding what the appropriate conceptualization
of a noun was, specifying its consequences and encoding the conceptualization in a
computationally tractable manner.
Lexicalization Dierences The discrepancies in meaning components encoded in prepositions in dierent languages are overcome by relating sets of lexical entries in the
bilexicon. Thus, in the bilingual entry `behind - por detras de' the fact that one sense
of `behind' encodes path and location is mirrored by the use of two prepositions in
Spanish each of which contains just one of these meaning components. Meaning
components are motivated through a range of tests which lead to the hierarchical
classication mentioned above. Bilingual generalizations over sets of prepositions are
encoded using bilingual lexical rules, which construct new bilingual entries from existing ones. With these rules, a number of regularities identied by other researchers
are formalized.
Ambiguity Most approaches to TL disambiguation only implicitly acknowledge that the
range of possible translations of a given SL lexeme is limited. In the approach to
be presented, the range of allowed alternatives found in PP translation is strongly
delimited by the spatial relations hierarchy. In addition, the sources of knowledge
needed for eecting disambiguation are identied and a mechanism for using this
knowledge is adopted. This mechanism relies on TL ltering whereby the TL generator must discard some of the invalid translations produced by the transfer module;
invalid sentences are detected by using the Qualia structures of Pustejovsky (1991a)
in order to specify restrictions on certain lexical combinations. The structure and
type of the restrictions is structured around the relations hierarchy.
Lexical Gaps Lexical gaps are handled by equating a lexeme with a set of lexemes.
Thus, the entry `across - al otro lado de' in the bilexicon overcomes a lexical gap
in Spanish by directly translating `across' into a TL set of lexical entries. In doing
so diculties arising from both the interlingua and syntactic/semantic transfer approaches are overcome. For example, unlike interlingua representations, the English
and Spanish expressions do not have to result in equivalent structures; also, equating
lexical items in SL and TL oers more exibility regarding particular syntactic or
semantic analyses of these expressions. Furthermore, this treatment of lexical gaps
22
is modular in the sense that it allows the incorporation of other languages into the
system without propagating the gaps between one pair of languages and another.
For example, if Portuguese is added to the system, the Spanish-Portuguese bilexicon
can express the translation of al otro lado de as a compositional translation since for
this phrase there is a one-to-one correspondence with the Portuguese ao otro lado
do.
Object Knowledge It is shown that by using the theory of Qualia of Pustejovsky (1991a),
enhanced with the notion of a locative type, enough information is available for resolving many instances of TL ambiguity in the spatial domain. Hence the possibility
of using a well structured and linguistically motivated theory of noun knowledge in
MT is demonstrated. This is in contrast to other representations of knowledge in
transfer systems which normally rely on the meaning of the word naming a semantic feature to convey the interpretation and usage of that feature. Such knowledge
representations can lead to circularity.
Preposition Stacking An extra argument added to the predicate of a preposition is used
as a referent for the prepositional phrase; this argument is equated with the spatial
relation expressed by the PP and allows prepositional stacking and prepositional
modiers to be uniformly treated. For instance, the preposition `within' is given
the representation `within0(x,y,z)' where `y' and `z' are the located and locator entities as in standard approaches and `x' corresponds to the spatial relation expressed
by `within'. This additional argument is used by the stacking preposition as the
object argument in its representation. Without quantiers, the representation is:
from0(w,y,x) & within0(x,w,z).
The `bus/car' Problem Each of the aspects of the `bus/car' problem is dealt with using
information which is useful for other purposes. The solution relies on a renement of
what may be described as an intuitive idea that speakers of English have regarding
the reason why one says `on the bus' but `in the car'. One informally justies this
use of `on' by saying that a bus is wider than a car, or that it has a platform-like
area in which there are seats, or that buses are used as public transport more often
than cars, etc. The renement I will make consists in interpreting this intuition as a
locative type based on frequency of co-occurrence of a preposition and a noun, and
in dierences between expected and actual meaning of certain PPs.
This concludes the overview of the solutions to be developed.
1.4 Review of Transfer Based Systems
In this section I will describe and comment on a wide and heterogeneous sample of transfer
based systems in order to justify and consolidate the decision to adopt a lexicalist approach
to MT and to set the criteria by which the properties of this approach are to be judged.
This survey will also place this study in the context of contemporary transfer based systems. The systems and strategies to be described are: Metal, Eurotra's CAT formalism,
23
the ELU system from ISSCO, Structural Correspondences using Lexical-Functional Grammar, Type Rewriting MT from ATR, the BCI from SRI International, the Rosetta system
from Philips Research Laboratories, Shake-and-Bake MT, systems based on Indexed Logic,
the tlinks and tlink-rules developed as part of the LKB in the ACQUILEX project, and
statistical MT.
Before starting this review of systems, I should mention one problem regarding the
availability of up-to-date or suciently detailed information in English in this area. Although good surveys of MT systems and research have appeared recently (Hutchins and
Somers 1992; Arnold et al. 1994; Kay et al. 1994), much information regarding the current
state of various MT projects is not easily available. One reason might be that publication
of descriptions is not a priority for companies developing MT software. Another is that
publication is often in the form of technical reports produced at dierent times and places
and containing dierent types of information. Hutchins and Somers (1992:274) point out
that recent developments in Metal are not well documented even though these developments have aected the design of the system quite substantially. Another example is the
Eurotra project: perhaps because of its geographically diverse development, reports on
dierent aspects of the system are not easily available as a single document nor are they
always mutually compatible (see description of Eurotra below). Also, the specic aspects
of preposition translation are in most cases left as a matter to be resolved during system
development, and in the rare cases when explicit mention is made of this issue, only a brief
indication (with a few notable exceptions) is made about how this problem is tackled.
1.4.1 Metal
The Metal MT system was introduced in Section 1.1; I will now consider it in more detail, basing my description on those of Bennett and Slocum (1988), Schneider (1991) and
Hutchins and Somers (1992:260-74). Although the origins of Metal included considerations
of bidirectional and interlingua MT, the rst commercial version of Metal was a unidirectional syntactic transfer system for translating German into English which rst appeared
in 1989. As a transfer system, this version did not clearly separate between its transfer
and generation components, having in fact an overloaded transfer module which reduced
generation to morphological synthesis. With the prospective addition of new language
pairs this situation has changed and a more modular architecture is in place. I will discuss
this later version of the system.
In Metal, translation follows the sequence: morphological analysis; parsing; transfer
including lexical and structural or tree-to-tree rules, and synthesis. Parsing is done with
a context-free (CF) grammar augmented with tests on features and with transformations.
Transfer rules convert SL syntactic trees and transform them, in conjunction with the
bilexicon, into trees appropriate for the TL generator. Generation involves further structural transformations to construct a valid TL structure. There are many other specic
features of Metal such as its lexical compilation and grammar writing tools, and particularly its formatting and deformatting algorithms. However, these are of no concern to the
present discussion.
Since the notation in Metal is not very self-explanatory, I will elaborate on the above
steps by describing an example of translation with a more abstract notation but main24
taining the main features of the approach; the example is an adaptation of that given by
Hutchins and Somers (1992:273-74). Take the sentence
It will have been tested.
After parsing, a structure equivalent to that shown on the left in Figure 1.2 is constructed.
One important feature of this tree is that any structural transfer rules that need to be applied are appended to the node to which they apply. These rules are specied for each rule
in the SL grammar and are added to the tree as it is constructed. Thus, for node pred the
leftmost daughter will become rightmost after transfer. Application of the transfer routine to this parse tree results in the structure on the right in which lexical and structural
transfer has taken place. The transfer routine uses the bilexicon in conjunction with the
PRED: 1 2 3 ) 2 3 1
PRED
Z
Z
ZZ
it
AUX
V
tested
?@
?
@
?
@
?
@
will
have
BE
Z
AUX
?
?
?
?@
V
X
gepruft
@
@
@
WERDEN SEIN
wird
sein
Analysis
Z
ZZ
WERDEN
worden
Transfer
Figure 1.2: Result of parsing in Metal.
structural transformations attached to the nodes to derive a TL tree. Local rearrangement
of constituents is not the only type of transfer rule; more complex structural transformations are allowed although for this particular example complex transformations only take
place during generation. Transfer proceeds from the top node downwards, applying transPRED
PRED
Z
Z
ZZ
Z
Z
ZZ
WERDEN X NFPRED
wird
?@
?
?
?
WERDEN X NFPRED
wird
?@
?
?
?
@
@
@
SEIN WERDEN V
sein
worden gepruft
@
@
@
V WERDEN SEIN
gepruft worden
sein
Generation
Figure 1.3: Generation with restructuring in Metal.
fer rules as they are found and eecting lexical transfer on the leaves of the tree when they
are reached. The transferred tree then serves as input to the generator; during generation
further transformations may be applied in order to construct an appropriate TL tree (see
Figure 1.3). Morphological synthesis is then applied to the leaves of this tree.
25
Comment
One problem with Metal is that it is not bidirectional; this is because transfer rules are
specied in relation to particular SL grammar rules and are interpreted in a procedural
manner. Thus, a transfer rule has a clear input and output structure whose input side
includes a number of tests on the SL node which must be satised before the rule can
apply; the output side consists solely of the portion of the TL structure to be constructed
and therefore does not contain sucient information to be used in the reverse direction.
Procedurality is also manifest in the transfer and generation algorithms leading to
complex interactions between the bilexicon, the structural transfer rules and the rules for
generation; in turn, this leads to diculties in the debugging, maintenance and expandability of the system. For instance, to transfer the arguments of a verb it is possible to
attach to a grammar rule a transfer rule which will convert the frame of a SL verb into
a TL verb frame. The resulting structure will have to interact with the generation rules
and subcategorization information in the TL verb in order to ensure appropriate ordering
and form of complements.
Another problem arises with the transformations allowed during generation. These are
designed to convert a structure which preserves characteristics of the SL into its most appropriate equivalent in the TL. While this is characteristic of syntactic transfer systems, it
nevertheless diminishes the modularity of the design because specic generation rules have
to be specied for each SL in order to overcome structural mismatches; such generation
transformations would make the size and complexity of every generator grow considerably
with the addition of new SLs.
There is also a problem of monolingual reversibility: the grammars used for analysis are
dierent from the transformations used for generation. This creates an enormous amount
of overlap in information content between the analysis and generation rules, which in
turn leads to inconsistencies in coverage. Although the Metal developers claim that the
generation rules can cover a narrower set of expressions than the analysis grammar, the
cost of implementing separate analysis and generation components is certainly more than
that of implementing a single reversible module.
Finally, it is worth noting that the preceding description does not highlight the preponderance of the lexical component in Metal. As Magnusdottir (1993) points out, Metal's
extensive lexical encoding is a very important feature of its operation.
1.4.2 The CAT Formalism
Unlike Metal, which is now a commercial product, the system described in this section
is still of a mainly experimental nature. The Eurotra MT project was funded by the
European Union (then European Community) with the aim of developing a multilingual
transfer system comprising nine languages. The project, and particularly its formalism,
have gone through a number of phases (Bech and Nygaard 1988; Crookston 1990; Pulman 1991) and so I will only concentrate on the Constructor, Atom, Translator (CAT)
description language, which was probably the rst fully developed formalism. There are
two reasons for considering CAT in detail. Firstly, analysis of the CAT framework as a
whole will help in avoiding many of its disadvantages; secondly, some of the solutions that
26
have been proposed in the past for certain problems of concern here have been described
using CAT.
As described by Arnold et al. (1986), Arnold and des Tombe (1987) and Schmidt (1988),
the CAT formalism views the translation process as a series of successive transformations
which start with the source language (SL) surface form and end with the TL surface form.
This strategy can be represented as
SL ! RL1 ! RL2 ::: ! RLn ! TL
where RLi is the representation formalism i. In Eurotra there are four levels of representation for each language. Eurotra Morphological Structure (EMS) describes the surface
features of words. Eurotra Congurational Structure (ECS) is basically a constituent or
phrase structure description of the input sentence. Eurotra Relational Structure (ERS)
applies valency restrictions to lexical units that take arguments and makes them explicit;
the notions relevant to this level are subcategorization, complements and modier constituents. ERS is similar to f-structure in LFG and includes such concepts as governor,
subject, object, indirect object, modier, etc. Eurotra Interface Structure (IS) contains
thematic roles such as agent, patient, experiencer, time, place, etc. and also semantic features such as time references, animacy and abstractness; it constitutes the level at which
bilingual transfer is carried out. Note that IS has information accumulated from all the
other levels.
The following sequence shows the stages in English-German translation.
SL(GB ) ! EMS (GB ) ! ECS (GB ) ! ERS (GB ) ! IS (GB ) ! IS (D) !
ERS (D) ! ECS (D) ! EMS (D) ! TL(D)
GB marks English descriptions and D marks German ones.
The CAT formalism is used both for representing information and for transforming
between representations. It would be too time consuming to consider here the variety of
nomenclature and notational variants that resulted from the various stages and projects
within Eurotra. Therefore certain simplications have been made in the description below
while preserving as many of the distinguishing features of CAT as possible.
CAT rules are used for analysis, transfer and generation. Constructors and Atoms
(sometimes collectively referred to as b-rules) are used for building structures, whilst
Translators are used for eecting transfer, not only between languages but also between
representational levels. Constructors are akin to phrase structure rules, consisting of a
mother and a list of daughters, whilst Atoms are akin to the preterminals in a phrase
structure grammar. Some simple Atoms and Constructors are shown below. For other
examples see Varile and Lav (1988:167).
Atoms:
fcat=det, lex=the, def=yesg
fcat=n, lex=bicycle, num=singg
Constructors:
fcat=np, def=Xg [fcat=det, def=Xg, fcat=ng]
fcat=sg [fcat=npg, fcat=vpg]
fcat=vpg [fcat=vg, fcat=npg]
27
The trees constructed by the CAT formalism may be divided into two types: derivation trees and representations (Arnold and des Tombe 1987:121-22). Derivation trees are
structures encoding a Constructor and the possible categories that may unify with its
daughters; these trees are analogous to lambda terms where no -reductions have been
performed. Representations are the result of unifying the daughters of a constructor with
a list of categories in a way analogous to -reduction in the lambda calculus. For example
the ECS derivation tree and representation for `the bicycle' are, respectively:
a) < fr-name=cnp, cat=np, def=X, person=3g [ fcat=det, def=Xg, fcat=noung],
fcat=det, lex=the, def=yesg,
fcat=noun, lex=bicycle, number=singg >
b) fr-name=cnp, cat=np, def=yes, person=3g
[fcat=det, lex=the, def=yesg, fcat=noun, lex=bicycle, number=singg]
Here angle brackets delimit the derivation tree, curly brackets contain bundles of featurevalue pairs and square brackets delimit the list of daughters in a structure. Note that the
mother is placed to the left of the square brackets. During analysis a derivation tree is
constructed and then passed onto a general mechanism which reduces it to a representation
using the unication operation.
Since generalizations over feature bundles cannot be expressed in the CAT formalism,
a set of rules called a-rules are used for describing percolation and ltering of features and
other operations. Depending on whether the generalizations that a-rules express hold in a
representation, they can leave a representation unchanged, replace it with the unication of
its features, or delete the complete structure which the representation belongs to (Steiner
et al. 1988b:13). Thus, the purpose of a-rules is to check representations reduced from
derivation trees. Example a-rules are:
a-rules
To percolate tense from a daughter to its mother (called a feature a-rule):
:f: fcat=s, tense=Tg [fcat=npg, fcat=vp, tense=Tg]
To delete any structure which has an auxiliary and is tensed (called a killer a-rule; ^ =
optional, * = any number of structures):
:k: fcat=s, tensed=yesg [^fcat=npg, ^fcat=npg, fcat=vrb, aux=yesg), *]
This example shows two types of a-rules. Feature rules are used to lter and percolate
those feature values which have not been bound within a representation. Killer rules are
used to delete any representation which is inconsistent because of its values. There are two
kinds of killer rules: one destroys a representation if it unies with the rule, the other works
the opposite way and deletes a representation if unication fails. The rst type expresses
negative conditions such as: delete if `tensed=yes' and `v-form=participle'. Unless stated
otherwise, this version of the killer rules will be used throughout this section. The second
type is used to enforce constraints on structures; for example, delete if `def6=yes' and
`lex=the'.
Translators are transformation rules (t-rules) with a left hand side (lhs) and a right
hand side (rhs); they express a transformation from the left representation to the right
representation. One corollary of this is that the CAT formalism is unidirectional. The
following t-rules map a determiner and a noun from ECS to ERS (rules 1 and 2) and a
verb from ERS to IS (rule 3; sf=syntactic function):
28
1) fcat=det, lex=the, def=yesg ) fcat=det, lex=like, sf=modg
2) fcat=noun, lex=bicycleg ) fcat=noun, lex=bicycle, sf=govg
3) fcat=v, lex=like, frame=subj-objg ) fcat=v, lex=like, frame=exp-patg
There are two general requirements on Translators: they must be one shot and they must
be compositional. One shot means that they must produce target structures in one step
only without an intermediary representation; this implies that the rhs of a t-rule must be
an expression of the target formalism. Compositionality means that the transformation of
a complex structure is a function of its substructures. The following t-rule maps the ECS
for `the bicycle' above into its ERS derivation tree.
4) fcat=npg [1:fcat=detg, 2:fcat=noung]
)
<fr-name=rnpg, 2, 1>
The lhs of this t-rule will unify with b) above to build the rst part of a derivation tree. The
rest of this tree is found by applying t-rules to transfer variables 1 and 2 in 4) recursively.
Assuming t-rules 1) and 2) above for `the' and `bicycle' the resulting derivation tree is:
<fr-name=rnp, cat=npg[fcat=noun, sf=govg, fcat=det, sf=modg],
fcat=noun, sf=gov, lex=bicycleg
fcat=det, sf=mod, lex=theg>
This tree is then reduced to give the required ERS representation. It is possible that during
reduction a certain derivation tree is found to be invalid. For example, it could be that
Constructor `rnp' requires the determiner to be demonstrative. If the input determiner
does not have this feature, the construction of the appropriate representation will fail.
Comment
There are several disadvantages with the CAT formalism. Firstly, it is unidirectional and
hence eort is wasted in writing two sets of t-rules for each pair of levels. This is further
compounded by allowing t-rules to delete, insert and rearrange representations, making
t-rules procedural and even more unlikely to be reversible.
Secondly, the notation is rather verbose and inecient. For example, consider the
ungrammatical string * `we eats'. Assuming correct lexical entries for both words, 3
dierent items are necessary to reject such an expression: a Constructor and two killer
a-rules, as shown below.
Constructor:
fcat=sg [ fcat=npg, fcat=vpg]
Killer Rules:
fcat=sg [ fcat=np, person=Pg, fcat=vp, person6=Pg]
fcat=sg [ fcat=np, number=Ng, fcat=vp, number6=Ng]
The CAT mechanism builds a representation using the Constructor, only for it to be
deleted by one of the killer rules. Unfortunately, the two killer rules cannot be merged
into one because such a rule would only delete structures where both person and number
agreement were violated. If the feature values were bound in the b-rule, the string could
be rejected by the Constructor alone, but in that case the appropriate bindings would
29
have to be encoded explicitly for all Constructors since no inheritance of constraints is
available within the framework. On the other hand, if feature rules are used to generalize
the binding of agreement features, killer rules would still be necessary because feature
rules apply only after a representation has been built, and if they fail the representation
is not deleted but left unchanged.
Thirdly, because there is no interaction between representational levels, a large amount
of ambiguity is present at earlier levels in the analysis sequence. For instance, by placing
verb subcategorization information at the ERS level, the ECS level has to generate the
two analyses shown below for the sentence `he runs in London'.
a) fcat=sg [fcat=np, lex=heg,
fcat=vpg [fcat=v, lex=rung],
fcat=ppg [fcat=p, lex=ing,
fcat=np, lex=Londong]]
b) fcat=sg [fcat=np, lex=heg,
fcat=vpg [fcat=v, lex=rung,
fcat=ppg [fcat=p, lex=ing,
fcat=np, lex=Londong]]
The rst analysis indicates that the PP is a modier of the verb phrase, while the other
analysis indicates, incorrectly, that it is a verb complement. Disambiguation between these
two analyses is only eected by ERS rules.
Fourthly, the types of representations that can appear on either side of a t-rule are too
unconstrained. That is, each side of a rule can have a representation with any number
and depth of sub-representations. Thus it is possible to specify structures which refer
not only to features in a local tree (i.e. in a mother and/or its daughters) but also to
those in trees below them. This property is introduced for dealing with idioms and lexical
phrases but has the disadvantage of making t-rules dicult to manage and of making the
constraints on transfer rules less well dened. For example, one can write an ECS-to-ERS
t-rule which checks to see if a sentence has a verb with an NP sister in order to select a
transitive constructor in ERS ( = node is not used in mapping):
:t: S:fcat=sg [1:fcat=npg,
:fcat=vpg [2:fcat=vg, 3:fcat=npg]]
)
S:fcat=sg <2, 1, 3>
This rule involves checking the structure of the VP even though this structure is not part of
the local S tree. One could also inspect the daughters of the NP, and also their daughters,
and so on and end up with an unwieldy set of t-rules.
Fifthly, the semantics of the formalism is procedural and non-monotonic. The main
disadvantages with this are that the order of application of rules is important for the
correctness of results, and that the addition of new rules can cause some previously valid
constructions to become invalid. Procedurality arises for example in requiring feature rules
to be applied before killer rules; this is because feature values have to propagate through
a representation in order for killer rules to detect as many inconsistencies as possible.
Non-monotonicity is evident from the operation of killer rules, since adding a new killer
30
rule can lead to a previously valid representation becoming invalid. Both of these factors
have serious consequences for system development.
Sixthly, the number of t-rules explodes combinatorially. Take the following ERS-ECS
t-rule (adapted from Crookston (1990)):
1) S:fcat=sg[1:fcat=verb, sf=govg, 2:fsf=subg, 3:fsf=objg]
)
<S:fcat=sg, 2, <fcat=vpg, 1, 3>>
If negative sentences were to be added to this one-rule grammar the feature-value pair
`neg=no' would need to be added to non-negative sentences. This would require the
addition of the following t-rule in order to cope with auxiliary `do':
2) S:fcat=sg[1:fcat=verb, sf=gov, neg=yesg, 2:fsf=subg, 3:fsf=objg]
)
<S:fcat=sg, 2, faux=do, neg=yesg[<fcat=vpg, 1, 3>]>
Adding future tense sentences would involve two more t-rules: one for armative and one
for negative future sentences. If one then added passive sentences, a further four rules
would be needed to handle future and negative passives. This rapid expansion of rules is
clearly undesirable.
Finally, the output side of the t-rules simply repeats the Constructors of the target
level representation and thus, indirectly, makes the Translator dependent on the grammar
of the target representation.
It would seem that the idea of using the same mechanism for analysis and transfer
has made the CAT formalism inecient: while the main advantage of dividing analysis,
transfer and generation into several stages of transformations is to provide modularity
for a project the size of Eurotra, it seems that the price thereby incurred might be too
great. Even perfect modularity may not have been achieved since, for example, the ECS
level needs to know what sort of subcategorization frames are used in the ERS in order to
generate sucient subcategorization patterns.
1.4.3 Environnement Linguistique d'Unication
The `Environnement Linguistique d'Unication' (ELU) is the system developed at ISSCO
for experimenting in NLP (Estival 1990). It is similar to the PATR-II system described by
Shieber (1986) but has the following extra features and operations: disjunction over feature
structures (FSs), atomic negation, tree and list valued features, operations on list valued
features such as append and member, variable path names, multiple default inheritance
for lexical entries, relational abstractions, and transformations between representations.
This last addition allows transformations between FSs and is of particular importance for
MT (Estival et al. 1990; Russell et al. 1991).
The formalism for eecting FS transformations is declarative, bidirectional, and local.
Analysis of the SL results in a FS which incorporates, among other things, semantic
and functional information about a sentence. This FS forms the input to the transfer
component and, by symmetry, to the generator. A simplied example of a FS is shown
below for sentence `Mary likes to swim':
31
2
6
4
2
head sem
6
4
pred like
2
< 1 > head sem pred
Mary
pred swim
args 4 < 2 > head sem
args < 1 >
3
3 3
7 7
5 5 5
The translation system is divided into three modules: the two grammars for the languages in question and a set of transfer rules. An example of a fairly complex transfer
rule is that needed to transfer `like' into German gern:
:T: gern-like
:L1: <* head sem pred>= Rg
<* head sem args>= [Ag j Tg]
<* head sem mod>= gern
:L2: <* head sem pred>= like
<* head sem args>= [Ae, Ve]
<Ve head sem pred>= Re
<Ve head sem args>= [Ae j Te]
:X: Rg = Re
Ag = Ae
Tg = Te
In this example * represents the root or dominating path of the current FS; :T: species
the name of the transfer rule; :L1: and :L2: establish the path descriptions or featurevalues that must be present in the German and English FS, and :X: indicates the transfer
correspondences between :L1: and :L2:. In this example the correspondences established
may be paraphrased as:
1. The predicate in the German sentence is the transfer of the English subordinate
predicate.
2. The rst argument of the German predicate is the transfer of the rst argument of
the English predicate. Note reentrancy in the English FS.
3. Any other arguments of the German predicate are the transfer of any other arguments
of the English subordinate predicate.
Transfer rules are bidirectional and therefore there is no reference to source or target
language. However, before application, transfer rules are compiled into unidirectional
rules to make transfer more ecient.
For a rule to apply, its source FS or S(FS) must subsume the input FS. When it
does, the rule succeeds if both its source and target FS or T(FS) unify with the input
and output FSs respectively. Unication with the output FS ensures that the T(FS) is
compatible with the FSs transferred by other rules. It should be clear that a recursive call
to the transfer algorithm is introduced via the :X: component of a transfer rule through
the expression of further transfer relations. Also note that the equal sign (=) has been
overloaded, expressing value sharing and unication on the one hand, and transfer relations
through the :X: component on the other.
The transfer relation diers from value sharing and unication in that value sharing
requires values to unify. By contrast, the transfer relation requires that there exist a
32
transfer rule such that each of its sides unies with the appropriate side of the transfer
rule. For example, given the transfer rule above, a further transfer rule such as
:T: karl-charles
:L1: <* head sem pred>= karl
:L2: <* head sem pred>= charles
:X:
would make the relation expressed by Ag=Ae succeed.
In order to avoid certain types of ambiguities, rules are applied to the input FS following
a partial ordering on rules; that is, rules higher in the order are tried before rules lower
down. The partial ordering is given by a `specicity' relation which uses subsumption and
the number of variables in the transfer component of a rule as a measure of specicity.
Specic rules are tried before less specic ones. A rule is more specic than another rule if
its S(FS) is subsumed by the other rule's S(FS) or if there are more variable bindings in its
:X: component. To check completeness of transfer, all the S(FS)s which have successfully
applied must unify and the result must match the input FS. If reentrancy is present in the
input FS but is not explicitly specied in a transfer rule then it is not preserved across
transfer.
Comment
One problem with ELU is that it disallows a certain amount of ambiguity which is necessary
for the translation process to be complete. This is because a transfer rule which applies
successfully blocks the application of rules lower down in the lattice, thus restricting the
number of possible alternatives. These alternatives would be necessary if a phrase were
ambiguous between an idiomatic reading and a literal reading for example. This rule
application strategy has more serious implications in the case of invalid T(FS)s because if
they are rejected by the generator no backtracking through the lattice is permitted. While
backtracking could be incorporated into the system, it is worth making this point because
this method of indirect TL disambiguation contrasts with the method I will adopt in this
thesis.
The fact that FSs in ELU are not typed is also a disadvantage when developing an
MT system. By typing a FS, the set of possible features that may appear in combination
with each other is restricted (see Section 1.4.10). The importance of this restriction lies
in helping ensure that the result of operations over FSs are in fact valid FSs. The main
consequences of untyped FSs are found in system development and in the conciseness of
descriptions. For system development the type mechanism allows automatic checking of
a range of inconsistencies that might go undetected in an untyped system. For example,
with typed unication it is possible to guarantee that all paths values in a transfer rule are
appropriate for the FSs at hand. Conciseness is achieved through types by establishing
a hierarchy of transfer relations in which common bindings and relations in a number of
rules may be generalized and applied uniformly throughout the bilexicon. For example,
the fact that the subject of a verb normally translates as the subject of a TL verb can be
encoded once in the type hierarchy and then applied to the translation of dierent kinds
of verbs. This property may look superuous if macros are allowed in the specication
33
of transfer rules, but it must be added that a typed system allows additional consistency
checking.
A number of other issues which would need to be discussed in this section are also
relevant to other formalisms and approaches discussed in future chapters and therefore
will not be described here (see Section 2.3).
Despite the above comments, it is clear that the ELU formalism is a powerful tool for
carrying out research in MT, and that the comments made above could be overcome by
making a few modications. The main insight taken from ELU is that it proposes the
establishment of correspondences between FSs as the basis for an MT paradigm.
1.4.4 Structural Correspondences in LFG
Kaplan et al. (1989) and Kaplan and Wedekind (1993) propose an approach to MT in
which the transfer relation is seen as a set of correspondences between dierent levels of
linguistic analysis, expressed as structural correspondences between dierent levels of linguistic description in the theory of Lexical-Functional Grammar (LFG). They distinguish
this view from other approaches to transfer in which a single level of representation is unnaturally required to hold all the information required during transfer. I will introduce the
LFG formalism and then show how dierent levels of description may be used to overcome
dierent problems in translation.
As described in Kaplan et al. (1989), LFG consists of three levels of description: cstructure, f-structure and semantic structure. C-structure corresponds to the phrasestructure description of a sentence and is equivalent to the syntactic trees associated with
it. In f-structure, grammatical relations such as subject, object, tense and deniteness are
explicitly encoded. In semantic structure predicate argument relationships and quantier
scopes are established. Figure 1.4 shows the c-, f- and semantic structure for the sentence
`the baby fell'; value sharing, or reentrancy, is expressed by boxed integers. Structures at
2
S
!!aa
NP
VP
Det N V
"b
"
b
the baby fell
3
2
?!
pred = `fall[" subj]'
37
6 tense = 2past
pred = baby
7
6
7
6
numb
=
sg
h
i
5
4
5
4 subj =
= +
spec = def
pred = the
?!
rel =
fall2
pol =
1
3
7
ind the # 77 777
baby 75 77
0
7
1
7
37
ind-loc # 77
precede 75 77
7
1
5
loc-d
6
ind = id = 0
6
6 spec =
6
det =
"
6 arg1 = 6
6
6
rel =
4
6
cond = arg1 =
6
pol =
6
6
2
6
ind = id
1
6
" =
6
6
rel
=
6 loc = 4
6
cond = arg1 =
4
arg2 =
Figure 1.4: C-, f- and semantic structure for `the baby fell'.
each level are obtained by a mapping from a less abstract level; for example, f-structure is
obtained through the function from c-structure. Function is implemented as a series
of equations attached to each phrase structure rule, as shown below.
S ?!
NP
VP
(") subj = # " = #
This states that the f-structure of the mother node S (") has its subj feature set to the
f-structure of the NP (#), and that the f-structure of the S as a whole is that of the VP.
34
3
Semantic structure is derived in a similar way through a mapping from c- and f-structure;
although f-structure is the principal source for mapping into semantic structure, c-structure
is needed for the correct interpretation of certain constructions such as coordination.
In the f-structure above, the value of the feature pred is an expression to be interpreted
during semantic mapping; this expression identies the predicate of the verb and the position of its arguments within the rest of the attribute-value matrix (AVM); the meaning
of the other features should be self-explanatory. The semantic structure is based on Situation Semantics as formalized by Fenstad et al. (1987); in this example, this structure
represents some unspecied situation which contains a relation fall with one argument
corresponding to `the baby'; this relation is at a spatiotemporal location ind-loc which
precedes a discourse-dened spatiotemporal location loc-d; its positive polarity 1 indicates that the relation holds in the relevant situation. A more detailed description of this
semantic formalism is beyond the scope of this thesis; relevant and useful treatments may
be found in Fenstad et al. (1987), Devlin (1991) and Rosner and Johnson (1992).
In the conception of Kaplan et al. (1989) the structural correspondence approach to
transfer may be represented by Figure 1.5. This diagram indicates that transfer mappings
Target
Source
f
f
0
f
A
A
-
f semantic structure
A
K
A
-
f f-structure
A
K
A
f c-structure
Figure 1.5: Structural correspondences at dierent linguistic levels.
are established between f-structures and semantic structures, as indicated by the arrows;
the diagram is not to be confused with that in Figure 1.1 in which only one level of
representation is involved during transfer.
To see how dierent levels are appropriate for dierent translation problems, I will
describe two examples, one where transfer is eected at the f-structure level and the other
where it is mostly done at the semantic structure level. For the rst example, consider the
sentences:
Eng: John saw Mary.
Spa: Juan vio a Mara.
Transfer at a level higher than f-structure would lead to undue ambiguity during generation since decisions would have to be made regarding passivization, topicalization, etc.
Therefore transfer is established between the f-structures of the two sentences. The English
f-structure is as follows:
"
pred = `saw[" subj," obj]'
subj = John
obj = Mary
#
while the target f-structure to be constructed by transfer is:
35
2
3
pred = `vio[" subj," aobj]'
subj = Juan
h
i
4
= `a[" obj]' 5
aobj = pred
obj = Mara
To obtain this structure, the following mappings in the bilexicon entry for `saw' would
be necessary (direct equivalences for `John' and `Mary' are also assumed).
( ") pred = vio
( ") subj = (" subj)
( ") aobj obj = (" obj)
Equation ( ") subj = (" subj), for example, indicates that the subject of the TL
structure (lhs) is set to the translation of the SL subject (rhs). During transfer, the above
equations, in conjunction with the bilingual entries for `John' and `Mary' and the equations
in the Spanish lexical entry for vio, would be solved by a process of step-wise constraint
resolution (Kaplan and Bresnan (1982) for details) to nd the smallest f-structure which
satised all the equations; this f-structure would constitute the result of transfer, namely
the TL f-structure sought.
Transfer in which mapping at the level of semantic structure is required proceeds in a
similar fashion. For example, consider the translation pair:
Eng: I think that the baby just fell.
Spa: Yo creo que el bebe acaba de caerse.
Sadler and Thompson (1991) show that structural correspondences at the level of fstructure are not satisfactory for this type of sentence because they have subordinate
clauses with dierent heads: in the English case the head is `fell' whereas in Spanish it
is the translation of `just' namely the verb acaba de. The problem arises when having
to provide, at f-structure, a compositional translation of `just fell' which will embed the
translation of `just' into that of `fell'. In the most adequate grammatical analysis of this
phrase, `just' acts as an adjunct to `fell'; given this, the transfer of `just' as the syntactic
head of the Spanish sentence does not lead to correct TL f-structures. For example, from
the highly simplied SL f-structure shown below:
2
pred = think
pred =
4
comp = subj =
sadj =
3
fell baby 5
just
issues of control require the translation of `fell' (caerse) and that of its subject `baby'
(bebe) to be in the complement structure of the translation of `just' (acaba de); this need
leads Kaplan et al. (1989) to formulate the following equation:
(1) S ?! NP
ADVP
VP
( ") = (" sadj) xcomp
This says that the TL f-structure of the S node (lhs of the equation) is mapped into the
complement of the translation of the adverb (xcomp). However, issues of compositionality
require that the complement clause also be the transfer of this S node; the equation for
eecting this would be:
(2) VP ?!
Vsent
S
( ") comp = (" comp)
36
Note that the value of comp is the f-structure for S, which has already been assigned a
value in (1). The crucial question now is: what is the transfer value of the embedded
f-structure required by the translations of both `think' and `just'? On the one hand, the
TL equivalent of `think' requires its complement to be the translation of `the baby just
fell'; on the other hand, the TL equivalent of `just' requires only the transfer of `the baby
fell'. Resolving this conict via constraint resolution of equations leads to an f-structure
disconnected from the rest of the TL f-structure because of the equation in rule (1) which
makes ADVP, rather than VP, the top-most structure of the complement clause. In other
words, both the VP and the ADVP will construct topmost TL nodes. The net eect of
this is the construction of the two f-structures below, neither of which is contained by the
other (note that this is the only example in which reentrancy between dierent AVMs is
intended):
2
3
`pienso[" subj," comp]'
yo 7
= `caerse[" pred]' 5
= 0 pred
pred = 1 bebe
pred =
6 subj =
4
comp
"
pred = `acaba
subj = 1
xcomp = 0
de[" xcomp]" subj'
#
As the result of transfer, this is incorrect since it does not represent a valid f-structure.
Kaplan and Wedekind (1993) suggest the following semantic structure for `I think that
the baby just fell' which enables a direct and compositional solution to the problem just
described:
3
2
rel =
6 arg1
6
4 arg2
think
= I"
=
rel = just
h
i
= fall
arg1 = rel
arg1 = baby
#7
7
5
The equivalent Spanish semantic structure for this sentence is isomorphic to the English
one, requiring lexical transfer only via the function 0:
2
3
rel = pienso
"
#7
6 arg1 = yo
6
rel = acaba
de
h
i 7
4 arg2 =
= caerse 5
arg1 = rel
arg1 = bebe
The key issue in this solution lies in the function which maps the English f-structure
into its semantic structure. Since `just' is a sister to the verb it modies, the task of
the function will be that of constructing a hierarchical semantic structure from a at
f-structure; this situation
is shown below in a highly
abstract form:
#
"
rel = just
i
h
= fall
arg1 = rel
arg1 = baby
pred =
subj =
sadj =
fell baby ?!
just
pred =
subj =
sadj =
i
h
fell pred = fell
baby restriction
=)
subj = baby
just
This mapping requires that the whole of the f-structure, except for the adverb, be mapped
into the argument slot of the semantic relation just. In order to achieve this, but also
motivated by certain syntactic phenomena occurring in some languages, Kaplan and
Wedekind (1993) introduce a new device which they call a `restrictor'. The basic intuition
of a restrictor is that it deletes a specied feature-value pair from a feature structure.
Below are shown a simplied AVM before and after restricting on the feature sadj:
Using this restriction, the adverb is rst made the head of the semantic structure and then
the restricted feature structure is made its complement. I will not go through the details
of the mechanism since they add little to the present discussion.
37
Comment
While the above approach to transfer is attractive, it has a number of shortcomings. One
problem is that it relies rather heavily on a particular linguistic theory, namely LFG.
There are two main disadvantages with this. Firstly, each language in the system needs to
adopt the same formalism; unfortunately there are many linguistic formalisms at present
and it is not yet clear which one is the best for describing all the languages in a system. Secondly, commitment to a particular theory implies that changes in the treatment
of dierent monolingual phenomena will require changes to the transfer module, thus
diminishing the modularity of the design. For example, Kaplan et al. (1989) proposed
transfer of head switching constructions at the f-structure level. However, in Kaplan and
Wedekind (1993) they prefer to eect transfer at the semantic level by augmenting their
semantic interpretation function (see above). In a multilingual system such a change could
induce modications in all the transfer components in which English was involved.
Although the exibility of specifying transfer relations at dierent levels may be desirable, it can also lead to compromises having to be made. Put dierently, the question
is, why not eect all transfer at the semantic level? One could argue that this level would
not preserve grammatical information useful for translation between related languages,
but equally, one could say that this grammatical information is the result of more abstract levels of interpretation which could be used in translation. For instance, f-structure
may be argued to be the best level for translating passives between English and Spanish.
However, the use of passives is the result of factors such as the informational structure of
the text, its style and the formal devices available in the language. Thus, the expression
of transfer relations at f-structure may be seen as a pragmatic decision whose usefulness
depends on the level of abstraction available in the grammatical theory adopted. This
means that the transfer component will have to be modied considerably in the face of
developments in syntactic and semantic theory.
Another point is that phenomena as pervasive in translation as lexical gaps and differences in lexicalization patterns are not naturally handled by the above approach. For
instance, transfer relations of the type `commit suicide - suicidarse' are considered by
Sadler et al. (1990); they propose a separate transfer lexicon in which this type of correspondence would be stored and which would override other applicable transfer relations.
Theoretically, this points to a shortcoming in the approach since special mechanisms have
to be included to cope with such prevalent transfer phenomena. Another diculty with
a separate lexicon of this sort is that it cannot encode regularities which exist between
dierent types of lexical transfer relations. For example, patterns such as `apple - manzana' and `apple tree - manzano' and similar ones are not uncommon, and these would be
overlooked in the special bilexicon suggested by Sadler et al. (1990).
Finally, a number of observations regarding the use of predicate-argument structure as
a representation for transfer are also applicable to the structural correspondences approach
but they will be considered in Section 2.3.
38
1.4.5 Type Rewriting
The Type Rewriting formalism for MT described in Zajac (1989) and Emele et al. (1992)
uses Typed FSs (TFSs) and an associated rewriting mechanism for eecting analysis,
transfer and generation; I will concentrate on the transfer step.
At the heart of a Type Rewriting Transfer (TRT) system is a set of rewriting rules
consisting of a type on the lhs and its supertype on the rhs. Rewriting proceeds by unifying
the features of a type with those of its supertype and assigning the resulting FS the type
of the supertype. This process is then repeated with the supertype of the supertype and
so on until no further rewritings can be made. In addition, any types specied within a
FS recursively undergo type rewriting in this manner. The result of this is a FS whose
type has no supertypes (this mechanism should not be confused with that described in
Section 1.4.10, in which a resolved TFS is assigned a more specic type). The description
of TRT below uses English and Japanese as example languages following the presentation
of Zajac (1989).
Transfer is achieved by dening a set of rules consisting of transfer types; these types
have at least two features, one for the SL and one for the TL. Transfer types in general
encode transfer relations, bilexical entries and generalizations over them. An example of
a very simple transfer type and its corresponding rule is shown below.
speaker =
noun
jap =
eng =
j-speaker
e-speaker
Bilexical entries inherit the constraints of their supertypes via the rewriting mechanism,
which in eect unies the constraints of a type and a supertype to give a more specic
TFS. Thus, speaker above inherits the constraints of noun plus the (trivial?) fact that
the speaker of a Japanese utterance translates as the speaker of an English one.
Before transfer takes place, the SL input is analysed using the SL grammar in order
to derive the transfer representation of the sentence; this representation is assigned to
the SL feature of a transfer type. Based on this partially instantiated type the rewriting
mechanism is invoked; its eect is to maximally expand the type until its TL feature is
fully instantiated. The nal value of the TL feature eectively constitutes the result of
transfer which can then form the basis for the generation phase.
A trivial example of lexical transfer using this procedure is shown below.
prop
jap =
pred =
okuru-1
2
u
send
4 jap
=
eng =
pred
3
5
= okuru-1
pred =
send-1
2
=
send
4 jap
=
eng =
pred
3
5
= okuru-1
pred =
send-1
The left hand TFS is the input to transfer; its type is prop which has as supertypes all
the bilexical entries. On rewriting, this TFS is unied with the transfer type send from
the following rewriting rule.
prop = speaker j hearer j book j send j ...
This unication will cause the feature eng in the input TFS to be instantiated to the
English equivalent of Japanese okuru. This is the only successful rewriting that can occur
since the value of pred, okuru-1, will be the only one to unify with the Japanese side of
the transfer types.
39
Clearly this is an extremely simple example. To make this scheme work in general it is
necessary to introduce a way of translating not only a predicate, but also any arguments
that it may have. Consider for instance the translation of the verb `send'. This verb has
three arguments identied by the features agent, recipient and object. To translate
the value of these, a recursive step must be introduced which will invoke the rewriting
mechanism. This is achieved by adding the transfer features trans-ag, trans-rec, and
trans-obj (agent, recipient and object respectively) to the transfer type for `send' which
will act as a working area for establishing the respective transfer relationships. Prop is
the type of these features which enables whichever predicate that appears as argument to
`send' to recursively invoke the transfer process. The next example is more realistic. The
input TFS on the left, which corresponds to the analysis of the Japanese for `you send me
the book', eectively causes okuru-1 to be transferred as `send'; this in turn invokes the
rewriting mechanism on the value of the transfer (trans-) features. The initial situation
is depicted below:
2
2
prop
6
6
4 jap
=
2
3
j-okuru
3
7
okuru-1
j-hearer 75 75 u
recipient = j-speaker
object = hon-1
6 pred =
4 agent =
send
2
j-okuru
3
okuru-1
3
6
7
pred =
6
7
77
6 jap = 6 agent = 1
4
57
6
6
7
recipient = 2
6
7
object = 3
6
2
37
6
7
6
7
pred =
6
7
6
7
6 eng = 4 agent = 4
57
6
7
recipient = 5
6
7
6
7
object
=
6
"
# 7
6
6
7
6
7
jap = 1
6 trans-ag =
7
6
7
eng
=
4
"
#7
6
6
7
6
7
jap = 2
6 trans-rec =
7
6
7
eng
=
5
# 7
"
6
6
7
4 trans-obj =
5
jap = 3
e-send
send-1
prop
prop
prop
eng = 6
The intermediate and nal values of the transfer TFS are:
2
send
2
j-okuru
okuru-1
j-hearer
j-speaker
hon-1
e-send
send-1
6
6
6 pred =
6 jap = 4 agent =
6
recipient =
6
object =
6
2
6
6
pred =
6
6 eng = 6 agent = 4
4
6
recipient =
6
6
object
= 6
6
"
6
6
jap =
6 trans-ag =
6
eng =
6
"
6
6
jap =
6 trans-rec =
6
eng =
6
"
6
4
jap =
prop
prop
trans-obj =
prop
5
j-speaker
5
hon-1
eng = 6
3
7
77
57
7
7
7
3
7
7
7
7
7
5
7
7
7
# 7
7
7
7
7
#7
7
7
7
7
7
#
7
5
j-hearer
4
3
2
send
2
j-okuru
3
3
7
7
okuru-1
j-hearer 75 77
j-speaker 7
hon-1
37
7
e-send
7
send-1
77
e-hearer 5 77
e-speaker
book-1 777
hearer
7
j-hearer
7
e-hearer 77
7
speaker
j-speaker 77
e-speaker
7
7
book
5
trans-obj = jap = hon-1
eng = book-1
6
pred =
6
6 jap = 6
4 agent =
6
recipient =
6
object =
6
2
6
6
6
pred =
6 eng = 6
4 agent =
6
recipient =
6
object =
6
6
6
jap =
6 trans-ag =
6
eng =
6
6
6 trans-rec =
jap =
6
eng =
6
6
4
It is worth emphasising that the rewriting system allows the generalization of certain
transfer relations. For example, the following type species that there is a transfer relation
between the agents of English and Japanese verbs.
40
2
ag-verb =
verb
6 jap =
agent =
6
6 eng =
agent
=
"
6
6
4 trans-ag =
jap
1
2
prop
= 1
eng = 2
3
7
7
7
#7
7
5
This has to be stated only once in the type hierarchy; via the type inheritance mechanism,
all verbs with ag-verb as one of their supertypes will inherit this property. Thus TRT
allows conciseness and modularity in the description of transfer relations.
Comment
As was the case with ELU, certain issues which are relevant to TRT will be left for the
moment; they will be taken up in Section 2.3. Here I will consider one limitation of the
TRT system just described. The problem is that this implementation of the system does
not support defaults which are necessary to stop the proliferation of distinct types. For
instance, in a lexicon organized as a network the relations between lexical entries should be
as general as possible; that is, they should apply to as many lexical classes as permitted by
the behaviour that needs to be captured. In a system without default inheritance, lexical
relations that dier minimally in some respect have to be encoded as completely dierent
relations. As an example consider the English nouns which denote animals. Many of them
can mean either the animal or the meat of the animal (e.g. `lamb', `chicken', `turkey',
`pheasant', `salmon') as noted for example by Copestake and Briscoe (forthcoming). The
same holds true, to a large extent, of their Spanish equivalents (i.e. cordero, pollo, pavo,
faisan and salmon). It would therefore be appropriate to establish a transfer relation indicating this similarity between the two languages. In a system without default inheritance,
the noun `pig' would then have to be dened as having a special status in the class of
animal-denoting nouns because the noun describing its meat is actually `pork', whereas in
Spanish the animal-meat relation also holds for cerdo. With a default system, it is only
necessary to override the value indicating the relevant alternation in the lexical entry for
`pig'. Thus most of the generalization would still be captured while accounting for the
particular behaviour of `pig'.
For the purpose of this thesis, feature and constraint inheritance along with the use of a
type hierarchy to structure the transfer relations that embody cross-linguistic knowledge,
are the main ideas taken from TRT.
1.4.6 The Bilingual Conversation Interpreter
The Bilingual Conversation Interpreter (BCI) is an application of the Core Language
Engine (CLE) described in Alshawi (1992) for use in interactive MT. Its purpose is to
enable communication between two monolingual humans using typed text in two dierent
languages. One development from the BCI has been the introduction of speech recognition
and synthesis for speech-to-speech translation (Rayner et al. 1993; Rayner et al. 1994).
The CLE is a general purpose system for mapping between natural language sentences
and logical form representations of their meanings. The BCI, as described by Alshawi
et al. (1992) uses one of the CLE's representation levels as the basis for an EnglishSwedish transfer MT system. In the BCI, transfer is eected at the level of Quasi-Logical
41
Form (QLF), which is a representation intermediate between the syntactic structure and
the logical form (LF) of a sentence.
Alshawi et al. (1992) argue that there are several advantages for eecting transfer at
this level rather than at more or less abstract levels of representations. At the syntactic
level, the description of a sentence requires complex transformations to map it into the
syntactic representation of another language; in addition, it becomes quite tempting in
syntactic based transfer to adapt monolingual syntactic descriptions to those of the target
language (TL) in order to ease transfer. At the level of LF, so much information which
is not truth conditional has been lost that a TL sentence generated from such a level
will not be judged to be an appropriate translation of the SL expression. For instance,
topicalization information or the form of noun phrases will have been lost in LF.
Another level available in the CLE is that of resolved QLF. However, it seems that
this level is not optimal for transfer either, since it appears to make translations unnatural
because of resolved ellipsis and denite descriptions. In addition, resolved QLF could
result in unreliable transfer structures by not achieving accurate resolutions in a principled
manner; for example, a resolved relation in the SL may not make a ne enough distinction
for constructing a resolved QLF in the TL.
Since QLF is a superset of LF I shall briey describe LF (and how it diers from rst
order logic - FOL) and then show in turn how QLF diers from LF. I will follow this with
a description of the transfer procedure in the BCI.
Logical Form
To describe LF, consider the following sentence and its corresponding LF translation (Alshawi 1992:22).
John designed a college in Cambridge
dcl(quant(nm.(m > 0), a, college place(a),
past(quant(lk.(k > 0), b, event(b),
design1(b, john1, a) & in location(b, cambridge1)))))
The rst dierence to note between FOL and LF is the use of intensional operators such
as dcl; these operators appear to serve a mainly operational purpose in the CLE in the
sense that applications interpret them according to their requirements. In the BCI they
act as interlingual markers of mood.
Generalized Quantiers and their scope are represented with the predicate quant; its
rst argument indicates the form of the quantier, which in this case is the equivalent
of the FOL quantier 9. In this example the quantier is encoded as follows: n binds
n to the cardinality of the set college place, whilst m binds m to the cardinality of the
intersection between this set and the set denoted by the formula in the last argument;
this last formula stands for the set of all things `designed by john1 in Cambridge'. If m
is non-empty, that is, if there exists at least one element in common between the set of
colleges and the set of things designed by John, then the relevant formula is true.
The second argument indicates that the variable a is bound by this quantier, which
means that value assignments to this variable are used in determining the cardinalities
required by m and n. Predicate college place is true of entities which are college locations
as opposed to institutions.
42
Past is an operator implicitly representing rst order quantication over events and
time intervals. In the nal version of the CLE the LF of a sentence such as `John slept' is
quant(exists, e, event(e) & before(e, t1), sleep(e, john))
where t1 is a contextually determined point in time (Alshawi 1992:215). In this latter
view the QLF represents tense as anaphoric relations to be resolved during QLF to LF
translation. In this example the tense anaphoric relation has been resolved to e.before(e,
t1).
The event predicate restricts b to events. The arguments to design1 are an event, the
agent `John' and the direct object of the sentence, `a college'. The last predicate in the
formula is the relation in location indicating the locative sense of the preposition `in'. Its
arguments are a designing event b and the constant cambridge1. The reading here is one
where the PP attaches to the VP rather than to the noun.
Apart from Generalized Quantiers and intensional operators such as dcl, there are two
other important ways in which LF diers from FOL. First, lambda abstracted terms (e.g.
properties) are allowed as arguments to predicates. This is useful in the representation of
sentences such as `it is nice to swim'. Simplifying somewhat, the translation of `nice' in this
context is the predicate nice1 property, which takes as argument the function associated
with the verb `swim' but without a bound subject:
nice1 property(a.9e (event(e) & swim1(e,a)))
The second extension is that, within quantiers, variables can range over sets of individuals as well as individuals. Suce it to say that the motivation for this has to do with
the interpretation of distributive/collective distinctions in certain expressions.
Quasi-Logical Form
One way of thinking about QLF is as a semantic representation derived purely from the
syntactic structure of a sentence. QLF diers from LF in that quantied expressions
are not assigned scope, anaphoric terms are not resolved, and implicit relations are not
disambiguated. This results in a representation which is closer than LF to the surface
form of a sentence. One novel aspect of this representation is that it contains grammatical
information such as number and syntactic category. This information is used by other
rules in the system in the process of converting QLF into LF.
While the QLF formalism has undergone a number of changes recently (Agnas et al. 1994),
for the purpose of this thesis the somewhat simpler version described in Alshawi (1992)
will be sucient to explain the main features of QLF. The three most important additions to the QLF syntax are the constructs headed by qterm (quantier term), a term
(anaphoric term) and a form (anaphoric formula). The remaining additions to the language are a index (anaphoric index), term coord (term coordination) and island (which
prevents quantiers from having scope outside the formulae they mark).
To explain qterms consider the (tense- and mood-less) QLF of `some bishops gathered':
gather(qterm(<t=quant, n=sing, l=ex>, e, event(e)),
qterm(<t=quant, p=det, n=plur, l=some>, a, bishop1(a)))
43
Here gather is a two place predicate having as its rst argument an event variable, and
as its second argument the representation of its agent. The rst qterm corresponds to the
gathering event, and its rst argument is the category of the determiner applied to this
event. In this case the pseudo-determiner ex(istential) has been inserted as the value of the
feature l(exical) and the qterm has been assigned singular number. More information is
present in the category feature bundle for `some'. This information is used by the scoping
and reference resolution rules to derive the scope of determiners, the quantier it translates
to given the context, and whether a collective interpretation of the expression is possible.
The second argument of a qterm is the variable bound by the determiner, whilst the third
argument expresses restrictions on this variable.
To understand a terms consider the representation for `her' in `Mary thinks that John
likes her':
a term(<t=ref, p=pro, l=her, n=sing, a=[john1, mary1]>, x,
female(x) & personal(x))
The rst argument indicates the category of the pronoun; its value for the feature a is a
list of possible antecedents within the same sentence. The second argument is a variable
standing for the object referred to by the pronoun, while the last argument contains the
restrictions placed on this object. All this information is available from the syntactic
structure of the sentence.
Unresolved relations or a forms occur in several contexts: in phrases with genitives,
relational `of', in various prepositional phrases, in compound nominals and in expressions
with unresolved ellipticals. For instance, the following is the QLF for `a computer message':
qterm(<t=quant, p=det, n=sing, l=a>, x,
a form(<t=pred, p=nn>, R,
message(x) & R(kind(y, computer thing(y)), x)))
The category in the a form includes in its value the type of syntactic construction from
which the unresolved relation is derived; here it is a compound nominal or nn. Its second
argument is a variable, R, ranging over relations (i.e. a second order variable). Its third
argument is the formula in which R is an unresolved relation. Predicate kind is used to
restrict y to a typical computer thing. During QLF-to-LF translation all of this information
can be used to direct the translation procedure. It is worth noting that, as implemented
in the CLE, higher order variables are instantiated using rst order unication in Prolog.
This is done by treating predicates as lists, in which the rst element is the predicate name
and the other elements are its arguments. For instance, the a form in the example above
has a variable R which will unify with an atom standing for a relation name. Thus, in the
QLF notation, R(kind(y,computer thing(y))) is represented as the following Prolog list:
[R,kind(Y,[computer_thing,Y])]
Transfer
Transfer in the BCI has as input the QLF of the SL sentence and as output a TL QLF;
generation from this QLF results in the TL sentence. The relation between the two source
and target QLFs is established via a number of transfer rules of the form:
44
trans(<QLF pattern 1>
<Operator>
<QLF pattern 2>)
A transfer rule relates portions of the source and target QLF; the patterns identifying these
portions may be atoms, predicate structures or other more complex formulae. Although
most transfer rules are bidirectional, it is possible to restrict their application to one
direction only by using one of the operators => or <=; bidirectionality is represented by
the operator <=>. During transfer, the SL QLF is traversed recursively from the top
downwards, following the recursive structure of the QLF. At each stage the SL side of
every transfer rule is compared against the current portion of the SL QLF; if there is a
match, the TL side of the rule is used to build the target QLF. Recursion is introduced in
a rule by using transfer variables; for example, in the transfer rule below, the two transfer
variables tr(cat) and tr(rest) recursively invoke the transfer algorithm with whatever
appears in their place in the SL QLF:
trans(qterm(tr(cat),X,tr(rest))
<=>
qterm(tr(cat),X,tr(rest))).
This rule states that qterms translate as qterm and that their rst and third argument
should be translated by invoking the transfer rules on their value.
The following is an example of transfer at QLF for the translation of Spanish el gatito
duerme to `the little cat sleeps'. The Spanish QLF is:
[dcl,
[duerme1,
qterm(<t=quant,n=sing,l=ex>,E,
[and,[event,E],
a_form(<tense=pres,aspect=imperf>,P,[P,E])])
qterm(<t=ref,p=def,l=el,n=sing>,N,[gatito1,N])]]
Note here that tense and aspect are represented as unresolved predicates within an a form.
This overcomes certain diculties arising from the mismatch of tense and aspectual scope
in dierent languages; the predicate P would be resolved to a temporal relation in QLFto-LF translation.
The following transfer rules would be necessary for translating the above QLF:
trans([dcl, tr(for)]
trans([duerme1, tr(arg1), tr(arg2)])
trans(qterm(tr(cat),X,tr(rest))
trans(<t=quant,n=sing,l=ex>
trans([and, tr(first), tr(rest)]
trans([event,X]
trans(a_form(tr(cat),X,tr(rest))
trans(<tense=pres,aspect=imperf>
trans([P,X]
trans(<t=ref,p=def,l=el,n=sing>
trans([gatito1, X]
<=>
<=>
<=>
<=>
<=>
<=>
<=>
<=>
<=>
<=>
<=>
[dcl, tr(for)])
[sleep1, tr(arg1), tr(arg2)]
qterm(tr(cat),X,tr(rest)))
<t=quant,n=sing,l=ex>)
[and, tr(first), tr(rest)])
[event,X])
a_form(tr(cat),X,tr(rest)))
<tense=pres,aspect=imperf>)
[P,X])
<t=ref,p=def,l=the,n=sing>)
[and,[little1,X],[cat1,X]])
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
The rst transfer rule that matches is (1) which recursively invokes the transfer procedure on the formula covered by tr(for); rule (2) matches then, invoking again all the
transfer rules on the arguments of duerme1, and so on. The resulting English QLF is:
45
[dcl,
[sleep1,
qterm(<t=quant,n=sing,l=ex>,E,
[and,[event,E],
a_form(<tense=pres,aspect=imperf>,P,[P,E])])
qterm(<t=ref,p=def,l=the,n=sing>,N,[and,[little1,N],[cat1,N]])]]
Generation of the TL sentence from this QLF is done by using an adaptation of the
algorithm of Shieber et al. (1990). Note that there is no guarantee that the TL QLF
corresponds to a QLF derivable from the TL grammar. In some instances this is done
deliberately in order to simplify the transfer component; what happens in such cases is that
a number of TL QLFs are produced during transfer and are ltered by the TL grammar.
In doing this, modularity is enhanced since the translation of a fragment of QLF might
depend on information solely present in the TL grammar. For example, in English-Swedish
translation the article `the' can translate as nothing or as den/det, unless it comes before
an adjective in which case it must translate as an article. To incorporate such information
into the transfer rule for `the' would be dicult and it would diminish the modularity of
the system; therefore, transfer produces QLFs with and without articles and it is left to
the generation step to select between them. This process is called TL ltering and will
play an important role in the system to be developed in this thesis.
Ambiguity is a serious problem for translation; it is present during analysis, transfer
and generation.
Disambiguation during analysis is done by interacting with the SL user. Two types of
ambiguity are handled in this way: for structural ambiguities the user is shown bracketed
expressions from which a selection must be made; for word sense ambiguity, paraphrase
options of each word are oered and from these a single TL expression ensues. Later
versions of the BCI have used statistical techniques in the selection of appropriate readings
and translations (Alshawi et al. 1992).
During transfer, an important source of ambiguity is the mismatch between word
meanings in dierent languages. One type of mismatch is when a word has more than one
translation but where one of these translations can always be used in all contexts. The
clearest example here might be that of synonyms:
trans(alrededor_de <=> around)
trans(alrededor_de <= round)
In this case ambiguity is eliminated by disallowing alrededor de from translating as `round'.
The other case of ambiguity during transfer is when a distinction is made in one
language and not in the other. For instance, the Spanish word reloj can translate as
either `clock' or `watch'. In such cases the BCI would generate one sentence with both
possibilities in them by using the string clock/watch.
Finally, much of the ambiguity encountered during generation in the BCI consists of
stylistic variants with the same or almost the same meaning; this is because generally,
sentences deriving from the same QLF will have equivalent meanings. In the BCI this
type of ambiguity is resolved by presenting the rst sentence produced by the generator.
46
Comment
Two important characteristics of the BCI are: a) transfer is deliberately eected at a level
which is closer to the surface form of the sentence; b) TL ltering is used to maintain modularity between the monolingual and transfer components. Point a) indicates that at least
from the point of view of this system, a transfer representation should not be maximally
removed from the properties of the surface form of a sentence; instead morphological,
syntactic and lexical-semantic information are all useful during transfer.
As I will be considering the consequences of eecting transfer using predicate-argument
structures in Section 2.3, no discussion in this respect will take place in this section. Instead
I will only point out that as described above the BCI has no way of encoding relationships
in the bilexicon of the type `apple - manzana' and `apple tree - manzano'.
1.4.7 Rosetta
Although Rosetta is classied as an interlingua MT system, it is worth considering some
of its main features, because it is an example of a system in which SL and TL grammars
are deliberately constructed such that they match each other. The following description is
based on those of Appelo et al. (1987), Odijk (1989) and Hutchins and Somers (1992:27996).
As an interlingua system the distinguishing feature of Rosetta is its use of isomorphic
derivations, in addition to interlingua representations of predicates, as language independent representations of a sentence. That is, in Rosetta the analysis stage of translation
not only constructs an interlingua structure but also builds a tree of the transformations
used to construct this structure; this tree of applied transformations is called the semantic
derivation tree. Since each transformation in the SL has an equivalent transformation in
the TL, generation proceeds from the interlingua structure by applying in reverse order
the TL transformations corresponding to those in the SL semantic derivation tree. This
is the basic intuition in the Rosetta approach and it is depicted in Figure 1.6. where Ps
SL: s-sentence
analysis
?!
Ps Ts &
TL: t-sentence
generation
Pt Tt .
?
l
interlingua
Figure 1.6: Translation path in Rosetta.
and Pt indicate SL and TL parse trees respectively and Ts and Tt are the corresponding
semantic derivation trees (translation in this diagram is shown clockwise).
During analysis and generation, a number of dierent representations are constructed,
but to simplify the exposition I will consider the two most important ones: syntactic
structure in the form of a parse tree and semantic structure based on Montague semantic
formulae without a model-theoretic interpretation. Translation starts with morphological
analysis followed by syntactic parsing; then transformations apply to the parse tree in order to construct the interlingua structure; generation then applies these stages in reverse
to derive a TL expression. To eect these steps, the grammar of each language consists of
three components: morphological, syntactic and semantic; it is the semantic component
47
R-decl-sent
R-subj-inc
```
`
`
R-pron
R-tense&aspect
I-pron
he
T-control
B1
R-sent-comp
XXX
X
R-active
R-subordinate
R-innitival R-arg1-arg2
!a
R-active x1! like ax2
R-arg1
Q
Q
x1 swim
T-topicalization
I-decl-sent
T-verb-second
I-subj-inc
R-decl-sent
```
`
R-subj-inc
I-tense&aspect
((((hhhh
h
(
R-pron
I-sent-comp
R-tense&aspect
X
XX
R-special-comp
I-subordinate I-active hij
XXX
X
R-active
I-innitival I-arg1-arg2
R-subordinate
!
a
!
a
I-active x1 B3 x2
R-innitival R-arg1-arg2
PP
I-arg1
R-active x1 graag Px2
R-arg1
x1 B2
Q
Q
x1 swem
Figure 1.7: English, Interlingua and Dutch isomorphic structures.
that contains the transformations that build the interlingua representation. I will concentrate on the application of transformations, since they constitute the most distinctive
feature of the approach; furthermore, since the transformations are reversible, I will talk
of generation rather than analysis. Another important feature of Rosetta which I will be
ignoring for the most part is that it divides the semantic component of a grammar into a
number of projection subgrammars, each corresponding to a major syntactic category in
the language.
Transformations in Rosetta can delete and add constituents, reorder them and change
the values in their nodes; their application is not free but is determined by an allowed sequence of rule applications stated as a control expression (Appelo et al. 1987). There are
two types of transformation: meaningful and meaningless. The former can form part of
the semantic derivation tree of a sentence and therefore must have a counterpart in the TL.
The latter are not included in semantic derivation trees and are intended to eect transformations which are highly language specic; therefore they do not have a counterpart
in the TL. To avoid terminological confusion I should point out that \transformations" in
Rosetta correspond to what I call meaningless transformations, while \rules" are what I
call meaningful transformations; unfortunately the term \rule" is already overloaded, and
therefore I have opted for distinguishing transformations using an adjective.
There follows an example to clarify all of the above notions; the example is taken from
Odijk (1989) and it involves the well known case of `like - graag' head switching where a
verb translates as an adverb. The two sentences to be constructed are:
English: He likes to swim.
Dutch: Hij zwemt graag (He swims likingly).
It is easier to understand translation in Rosetta by starting with an interlingua expression
and deriving its syntactic trees in each language in parallel; however it should not be
forgotten that translation would rst construct an interlingua structure and associated
semantic derivation tree and then generate a TL expression, as shown in Figure 1.6 above.
The interlingua and language specic semantic derivation trees for the two sentences
are shown in Figure 1.7; the semantic derivation trees would have been obtained by a
top-down procedure which operated on the SL parse tree. Meaningful transformations are
48
Transformation
substitute B2
R-arg1
R-active
R-innitival
R-subordinate
substitute B3
R-arg1-arg2
R-active
R-sent&special-comp
T-control
R-tense&aspect
substitute B1
R-pron
R-subj-inc
R-decl-sent
T-verb-second
R-topicalization
Output
English
swim
VP(x1 swim)
VP(x1 swim)
VP(x1 to swim)
SENT(x1 to swim)
like
VP(x1 like x2)
VP(x1 like x2)
VP(x1 like SENT(x1 to swim))
VP(x1 like SENT(to swim))
VP(x1 like SENT(to swim))
he
NP(he)
SENT(NP(he) likes SENT(to swim))
SENT(NP(he) likes SENT(to swim))
Dutch
zwem
VP(x1 zwem)
VP(x1 zwem)
VP(x1 zwemmen)
SENT(x1 zwemmen)
graag
AdvP(x1 graag x2)
AdvP(x1 graag x2)
VP(x1 AdvP(graag) zwem)
-
VP(x1 AdvP(graag) zwem)
hij
NP(hij)
SENT(NP(hij) AdvP(graag) zwemt)
SENT(NP(hij) AdvP(graag) zwemt)
SENT(NP(hij) zwemt AdvP(graag))
SENT(NP(hij) zwemt AdvP(graag))
SENT(NP(he) likes SENT(to swim)) SENT(NP(hij) zwemt AdvP(graag))
Table 1.1: Parallel derivations from Interlingua structure.
prexed by R- whereas meaningless ones are prexed by T-. The interlingua derivation
tree shown in the middle is obtained by ignoring meaningless transformations; its nodes,
prexed by I-, serve as links between the transformations in the two languages and therefore do not correspond to any operations. Note that the monolingual predicates are also
substituted by language independent predicates, B1, B2 and B3. The insertion of meaningless transformations in the TL semantic derivation tree is determined by the control
expressions of the TL.
While construction of the semantic derivation tree and therefore of the interlingua
representation proceeds top-down from the parse tree, generation operates bottom-up. In
this example of parallel generation, rules will apply in the order shown in the English and
Dutch trees. The step-by-step eect of these operations is shown, in a radically simplied
form, in Table 1.1. Each meaningful transformation achieves translationally equivalent
changes in each language. For example, R-innitival constructs innitives in dierent
ways depending on the language but its eect is the same, namely constructing a tenseless
expression. Perhaps the least self-explanatory of these transformations is the Dutch Rspecial-comp which corresponds to the English R-sent-comp. This Dutch transformation
is special because it achieves the head switch required by the translation through complex
operations. Head switching takes place as follows: rst the rule deletes the AdvP node
and replaces it by a VP node; then it changes the function of the adverb graag from head
to modier; next, it makes the phrase VP(x1 zwemmen) the head of this VP node and
then it deletes its x1 variable; nally it deletes subordination and innitival morphological
features to yield the structure VP(x1 AdvP(graag) zwem). It is worth noting that although
graag is an adverb, it has two arguments, one corresponding to the verb phrase it modies
and the other to the subject of this verb phrase; this representation is equivalent to that
for `likes', thus making the isomorphism between the semantic derivation trees easier to
construct. Furthermore, it is claimed that motivation for this analysis of graag is based
49
not only on its English counterpart but also on monolingual criteria including the fact that
graag places syntactic and semantic restrictions on the subject of its verb: ? het regent
graag (? it likes to rain).
Comment
Using (meaningful) transformations as part of interlingua representations is a exible way
of dening language independent structures; the main advantage is that a transformation
can be language specic while at the same time having a language independent function.
However the single most signicant drawback of this approach that it requires tuning of the
source and target grammars to produce isomorphic semantic derivation trees. This means
that the construction of the grammar of one language is not only motivated by monolingual
criteria but also by the semantic derivation trees constructed by other grammars in the
system. This leads to serious losses in the modularity of the monolingual components and
magnies the eect that changes to one grammar have on the rest of the system.
In addition, the power of transformations makes it necessary to introduce strategies for
the control of their application. These strategies include dividing grammars into dierent
projection subgrammars and specifying valid application sequences. In such regimes the
addition of new rules involves careful consideration of their interaction with the rest of the
rule system thus diminishing the locality of rule descriptions.
Finally, the comments made in regard of interlingua systems, especially those which
apply to interlingua predicates, are also applicable to the Rosetta approach.
1.4.8 Shake-and-Bake
Isomorphic UCGs
Beaven and Whitelock (1988) apply the notion of isomorphism to Unication Categorial
Grammars (UCGs) in the construction of a bilingual MT system for English-Spanish
translation. Although subscribing to the idea of equivalent derivation trees, their approach
diers from that of Rosetta in that Beaven and Whitelock do not include a separate stage
in which transformations are applied to a parse tree in order to construct a semantic
representation. Instead, they exploit the information contained in UCG lexical entries to
eect translation while parsing. I will briey explain how their mechanism works before
concentrating on the Shake-and-Bake technique as it developed from this work.
For this brief description the relevant ideas in UCG are: a category is either a basic
category or of the form A/B where A and B are categories, and where B is called the active
category; a lexical category is just the category assigned to a lexical item; the combination
of two categories, a functor A/B and an argument B0, takes place if B unies with B0, and
results in an expression of category A0; during parsing an additional restriction applies
which is that A/B and B0 must be adjacent: B0 must be to the left of A/B if B is marked
by the feature-value pair ord= post or to the right if marked by ord= pre. Now, in
isomorphic UCG translation each lexical item is paired with its counterpart in the TL.
During analysis, each time a SL functor combines with an argument, their corresponding
TL functor and argument are also combined but this time their order is not determined by
their relative positions but by the value of ord. For example, to translate `red book' into
50
the Spanish libro rojo the bilingual lexicon would have the following associations (square
brackets indicate the AVM of a category):
red
rojo
n/n[ord= pre] n/n[ord= post]
book
libro
n[ord= X]
n[ord= X]
During parsing, a combination of `red' and `book' would trigger a combination of rojo and
libro which would be attempted both as rojo libro and as libro rojo. Of these only the
reverse combination would succeed thanks to the value of pred in rojo. This would result
in the correct translation libro rojo.
An important problem with the isomorphic UCG approach is that constituents in the
SL and TL have to combine in the same inside-outside order (cf. the so-called syntax
mobile). This is a problem in cases where a particular ordering is not available in the TL.
For example, the sentence `John gave Mary the cat' can only translate as John le dio el
gato a Mary where a Mary is the last constituent to combine into the VP. However, during
parsing the combination `gave' + `Mary' is not matched by a Spanish combination dio + a
Mary because el gato needs to be combined rst with dio in the TL. Accommodating such
discrepancies in this approach would require tailoring the Spanish categorial signs to the
English ones in order to overcome dierences in constituent combination; one consequence
of this would be a reduction in the modularity of the system. Note that this problem
could not be resolved by lexical transformations on the categorial sign for the verb dar
because of the adjacency constraint on TL derivations, which would state that combined
constituents result in adjacent TL strings.
Analysis, Lexical Transfer and Generation
The problem with the isomorphic UCGs approach was overcome by separating the construction of a TL expression from the analysis of the SL, Whitelock (1992), Beaven (1992a)
and Beaven (1992b); this led to the development of what is known as Shake-and-Bake (SB)
MT. In this approach, the analysis phase of translation constructs a logical formula from
the SL. After construction, uninstantiated variables corresponding to arguments in predicates are assigned unique values; these instantiations permeate to the leaves of the analysis
tree which in turn are used for transfer via the bilingual lexicon. Since only the bilingual
lexicon is used, the result of transfer is a bag of TL lexical signs where a bag (or multiset),
as dened by Knuth (1981:454, 636), is a set in which repeated elements are signicant.
Therefore, the task of the generator then is to reorder this bag into a valid TL string.
Generation is by a process similar to parsing but in which the input is treated as a set
rather than as a list, such that the order of the input lexical signs is insignicant. In
this way, the diculties with isomorphic UCGs are overcome since generation does not
proceed in step with analysis but instead constitutes a dierent process. It also ensures
that exactly the same grammar is used for analysis and generation.
The following example shows the steps needed for translating un hombre duerme into
English. I will use an informal AVM notation which allows -functions (Dowty et al. 1981)
to be encoded as the value of an attribute and with shared values between SL and TL
51
signs represented by boxed integers; feature structures in general will be described in more
detail in Section 1.4.10. Assume the following bilexicon:
"
"
"
orth = un
syn = Det
sem = SP(9 0
#
()
. S( 0 ) & P( 0 ))
orth = hombre
syn = N
sem = hombre( 1
orth = duerme
syn = Vtr
sem = dormir( 2
"
#
"
()
)
#
"
()
)
#
orth = a
syn = Det
sem = SP(9 0
orth = man
syn = N
sem = man( 1
orth = sleeps
syn = Vtr
sem = sleep( 2
. N( 0 ) & V( 0 ))
#
)
#
)
Analysis of the SL sentence using lexemes equivalent to the SL side of the bilexicon results
in the structure:
"
orth = un hombre duerme
syn = S
sem = 9 3 (hombre( 3 ) & duerme( 3
#
))
The next step in SB is to assign a unique constant to the shared variable 3 ; this step may
be thought of as assigning a unique range or type to the variable which would disallow
variable binding; for simple cases, this process may also be thought of as skolemisation. It
is essential that this assigned value permeates through the analysis tree down to the lexical
signs at the leaves. Then, given the value sharings between SL and TL in the bilexical
entries, unication of each SL leaf with the relevant SL side in the bilexicon will result in
a unique constant also being assigned to the TL side of the bilexicon. The result of these
two operations is the set of bilingual entries shown below, where the unique constant is
indicated by 0.
"
"
"
orth = un
syn = Det
sem = SP(9 0
#
. S( 0 :0) & P( 0 ))
orth = hombre
syn = N
sem = hombre( 1 :0)
orth = duerme
syn = Vtr
sem = dormir( 2 :0)
"
()
#
"
()
#
"
()
#
orth = a
syn = Det
sem = SP(9 0
orth = man
syn = N
sem = man( 1
orth = sleeps
syn = Vtr
sem = sleep( 2
. N( 0 ) & V( 0 ))
#
)
#
)
Generation using the TL side of these bilexical entries would take place by eectively
attempting to parse all possible permutations of the TL lexemes. The result would be the
structure:
"
orth = a man sleeps
syn = S
sem = 9 4 :0. man( 4
#
) & sleeps( 4 )
The transfer and generation algorithms in SB will be considered in more detail in
Chapter 4. Although the generation algorithm is NP-complete and therefore may not have
a polynomial version, some techniques will also be described for alleviating this problem.
Since generation in SB eectively attempts every possible linear ordering of TL lexical entries, the assignment of constants is crucial for preserving argument ordering in a
sentence. For instance, in a simplied Prolog-like notation one could express the result of
parsing `Mary hits John' as the bag of lexemes (upper case indicates variables):
f mary(X), hits(X,Y), john(Y) g
52
Without constant assignment the corresponding Spanish bag would be:
f mary(X), golpea a(X,Y), john(Y) g
Now, note that variables X and Y can unify. Since SB attempts every possible ordering
of the TL bag, it will construct the two successful analyses
mary(X) golpea a(X,Y) john(Y)
john(X) golpea a(X,X) mary(X)
the second of which is incorrect. Instantiation to a constant ensures that value sharings
not explicitly dened during analysis are not permitted during generation:
f mary(c1), golpea a(c1,c2), john(c2) g ) mary(c1) golpea a(c1,c2) john(c2)
Comment
SB has an important advantage over all the translation strategies considered so far, which
is that it relates source and target structures via the bilexicon only, eliminating many
problems arising from structural dierences in the source and target language. For example, the `likes - graag' case, as in `John likes swimming' can be tackled using the following
(simplied) bilingual entries (Whitelock 1992) and the SB generation algorithm:
john(X) , jan(X)
enjoy(X,Y) , graag(X,Y)
swim(X,Y) , zwimt(X,Y)
However note that, as in the Rosetta system, the adverb graag has to be made a two-place
predicate. Since no specic structural description of the SL or TL is assumed, maximum
modularity is achieved between the monolingual and bilingual components.
Nonetheless, there remain certain issues to be addressed in the SB approach. The rst
one is the eect of constant assignment on the semantic representation of the sentence.
Apart from existential quantiers, which one can delete when the variable is assigned a
constant, there remain other interactions, such as those with determiners and lambda
terms, which are not made explicit. For example, what exactly should be done with nouns
specied by a denite article? Is the logical representation of determiners necessary for
translation to be accurate? I will argue for a negative answer to the last question in Section
2.3.2 and in doing so will make redundant an answer to the former.
The second issue is the status of function words. Whitelock (1992), following Calder
et al. (1989), treats prepositions such as French a as having `identity semantics' when used
in expressions such as pla^t a (likes); he includes them as part of the bilexicon and indexes
them to the object of the verb. Nevertheless, it is not clear what the treatment of other
function words such as `that', pleonastic `it' and auxiliary verbs should be.
The third issue is whether complex structural dierences between languages can be
handled through lexical transfer alone. Whitelock (1992) analyses the Dutch adverb graag
as having two arguments in order to cope with head switching. However, it is necessary
to show that other structural problems can be overcome through bilexical entries.
Finally, like the other systems considered up to now, there is no way in SB of capturing
lexical relationships within the bilexicon.
53
1.4.9 Indexed Logic Transfer
Phillips (1993) and Kinoshita et al. (1992) describe a transfer-based system in which ideas
from Kamp (1981), Parsons (1990) and Dowty (1989) are used in the description of an
Indexed Logic for use as a transfer representation. An Indexed Logic formula consists
of terms, each term being a predicate over an index or a relation between indices. For
example:
Eng: A boy runs.
Indexed Logic: e: boy(b) & run(e) & tense(e,pres) & actor(e,b)
Index `e' is the discourse referent for the complete sentence; the relation `tense' encodes
temporal information; `actor' indicates the thematic role of the boy in the event of running,
while `boy' and `run' are the properties that hold of the indices `b' and `e' respectively.
Transfer in this system is eected principally at the level of the bilexicon, in a manner
comparable to that in SB. TL sentences are generated from Indexed Logic representations
via a procedure also similar to SB: the Indexed Logic representation is used to select the
TL lexical entries which are then rearranged using a modied chart parser in which the
input is treated as a bag rather than as a string. It is argued that adoption of the Indexed
Logic and associated generation algorithm avoids many of the problems associated with
other generation algorithms operating on alternative representations. In particular the
independence of the monolingual grammars is guaranteed because generation from the
Indexed Logic is not dependent on the order of its predicates. Also, Phillips (1993:221)
argues that \for this particular formalism [Indexed Logic] in this particular application
[Machine Translation], a generator which ignores the order of predicates in a logical form
thereby copes with logical equivalence." By this I will assume that he means that the
generation module constructs all sentences which give rise to the same predicates but not
in the same order.
Comment
The Indexed Logic approach to transfer diers from the SB approach in that in the latter
transfer relations are between lexical signs rather than between predicates as in Indexed
Logic. However, it seems that this dierence is more in form than in content. Although
conjunctions are used to associate predicates their main purpose seems to be that of
separators between predicates, while implying that the order of predicates is not important
to the interpretation of the expression. Thus, Indexed Logic formulae could be recast as
SB bags with only a (possible) loss in the explicitness of temporal and other relations.
There are certain disadvantages with the Indexed Logic approach to transfer: 1) lack of
determiners and other function words causes ambiguity and information loss, especially in
cases where the same Indexed Logic representation corresponds to more than one surface
string; thus, `every' and `all' normally give rise to the same Indexed Logic formula, although
their respective translations into Spanish cada and todos are not the same (see Section
2.3.1). 2) SL word order, which in certain cases can help disambiguation, is not available
from the Indexed Logic formulae; for example, the following English sentence would give
rise to two Spanish translations in the Indexed Logic approach.
54
Eng: The black and grey cat.
1) Spa: El gato negro y gris.
2) Spa: El gato gris y negro.
In fact there is sucient information in the English sentence to decide that 1) is the closer
translation of the two. While for this example the problem is not very signicant, if the
number of adjectives were increased the number of TL sentences would grow rapidly.
The Indexed Logic approach follows the view that a transfer representation is not
necessarily equivalent to the representation needed for other NLP tasks such as database
querying, and it sets about nding an adequate representation containing sucient information for eecting transfer. I will argue a similar case and develop a formalism which
combines the indexing idea with the lexical grounding of SB.
1.4.10 Transfer in the LKB
A system for stating cross-linguistic correspondences independently of a particular MT
system is that embodied in the Lexical Knowledge Base (LKB) (Copestake et al. 1993). In
this section I will describe the typed feature structure (TFS) formalism used in this system,
its notational devices and the way in which translation correspondences and relationships
are expressed. Such a detailed exposition will prove useful for subsequent chapters since I
will be developing a transfer MT system using the LKB.
Typed Feature Structures
The TFS formalism in the LKB follows the graph unication tradition, as exemplied by
the PATR-II system Shieber (1986), in the sense that features in a TFS do not need to be
specied in any particular order and that TFSs may be consistently extended by adding
new features (with the one proviso described below). Compare this with term unication
in Prolog, where terms of dierent arities may never unify successfully. Graph unication
derives its name from the data structure normally used to represent it, namely a directed
graph.
Let me start the description of types and feature structures by considering the TFS
shown in Figure 1.8. A feature structure (FS) is a collection of feature-value pairs to which
2
3
sign
4 orth = orth 5
syn = syn
sem = sem
Figure 1.8: Simple typed feature structure.
no type is assigned; the value of a feature may be an atomic value or another FS containing
further feature-value pairs. In comparison with TFSs, FSs are relatively unconstrained,
and this can be a disadvantage for the reasons explained below.
There are three features in this structure, labelled with small capital letters. Feature
orth has type orth as its value which will generally be instantiated to the orthographic
form of a linguistic expression. The value of feature syn is of type syn, which is intended
to unify with information on the syntactic properties of an expression. Finally, sem has
as value the semantic content of the expression and it is of type sem.
55
The idea of grouping dierent types of information into one single structure is taken
from Pollard and Sag (1987) who trace it back to the view of de Saussure (1916) who
interprets a linguistic sign (e.g. a word) as consisting of a signier (i.e. the orthographic
form of the word) and a signied (i.e. its semantics) between which there is no motivated
relationship. Thus, I have typed as sign the collection of the aforementioned features; this
type is expressed as the rst line of the AVM.
Having briey introduced TFSs I will now present a description of types and how they
relate to TFSs. The notion of type in the LKB builds on the work of Carpenter (1992)
and others. In a TFS formalism all FSs must have a type from a predened set TYPE; in
addition, the value of each feature must also be a TFS. An important property of TYPE
is that it is organized as a partial order by the relation v. There are three conditions for
a relation to qualify as a partial order: a) reexivity: for any element t in TYPE, t v t; b)
anti-symmetry: if t v s and s v t then s = t; c) transitivity: if t1 v t2 and t2 v t3 then
t1 v t3. In addition, a type hierarchy is required to have a unique greatest lower bound for
every consistent subset of types. That is, partial orders such as the one shown in Figure
1.9 are disallowed. In order to complete the specication of the TFS in Figure 1.8, it is
top
!a
!!
!
aa
a
t1``
t3
t2
```
`
`
t4
Figure 1.9: Invalid type hierarchy.
necessary to specify the type hierarchy shown in Figure 1.10.
top
```
```
?
?@
`
sem
orth @syn
sign
Figure 1.10: Simple type hierarchy
Thus, there are two components to a TFS system: a type hierarchy and its associated
TFSs, both of which are very closely interrelated. However, the relationship as described
so far is not constrained enough, since in principle it would be possible to assign arbitrary
types to a FS to get a TFS. To prevent this, and also for other reasons described later,
every type in the hierarchy is associated with a TFS called its constraint. The constraint
of a type is a TFS of that type. For example, the constraint of the type sign in Figure
1.10 may be dened to be the TFS in Figure 1.8. The main import of this constraint is
that any TFS of type sign must have the three features orth, syn and sem for which a
value must be dened. It is important to distinguish constraints from TFSs. The former
are denitions specifying restrictions on a structure, whereas the latter are instantiations
of these denitions. By a Well Formed Feature Structure (WFFS) is meant a TFS which
satises the constraints of its type, and whose features have WFFSs as their value.
In the LKB types may be atomic or non-atomic, depending on whether their constraints
contain features or not. Typical atomic types include 1, 2 and 3 with supertype person; pl
and sg with supertype number, and + and - with supertype boolean. Their constraints
56
are TFSs of their respective types with no features in them. In general, I will indicate
atomic types with parentheses when describing TFSs in the LKB in order to emphasize
the lack of complexity of such types.
I mentioned earlier that, in the context of graph unication, structures may be expanded consistently by adding further features. New features are introduced into a TFS
in the LKB by specifying further subtypes which incorporate the new feature as part of
their constraints. For example one may add the feature sense-id to the type sign to dene
a subtype lex-sign. Figure 1.11 shows the type denition and path equations necessary
to specify the constraints of lex-sign in the LKB, together with the AVM representing its
constraint.
3
2
lex-sign
6 orth = orth 7
5
4 syn = syn
sem = sem
sense-id = top
lex-sign (sign)
<sense-id> = top.
Figure 1.11: Extending type sign.
A path (Shieber 1986:14) is a list of features specifying a particular portion of a TFS,
which will in itself be a TFS. Thus, in Figure 1.11 the path <sense-id> species a TFS
of type top, while <syn> species a TFS of type syn. Paths consisting of more than one
feature are specied using a colon (:) as in <feature1:feature2:feature3>; examples
of such paths will be given shortly. I will usually represent a path with a slightly dierent
notation as in feature1:feature2:feature3. A path equation indicates the TFS value
that a particular path should have; this value may be specied explicitly, or by equating
it with the value of another path, e.g. sense-id = top and feature1:feature2 =
feature3:feature4. When a path is equated with another path, the assigned value is
called a shared or reentrant value.
New features can be introduced into the type system by one, and only one, new type;
this implies that, given a feature, the type that introduced it as part of its constraint
may be deduced. This property is called type inference here; its main advantages are that
it simplies the description of paths and that it allows deterministic typing of untyped
FSs. TFSs inherit the constraint of their supertypes. Thus, lex-sign above inherits the
constraints of sign; similarly any subtype of lex-sign would in turn inherit its constraint.
Inheritance of constraints can also be from more than one supertype as long as the constraints from all the supertypes are consistent. For example, one could dene an English
sign as:
2
3
eng-sign
6 orth = orth
7
4 syn = syn
5
sem = sem
lang = (english)
eng-sign (sign)
<lang> = english.
Then, a denition of eng-lex-sign can be concisely expressed as:
3
eng-sign
orth
=
orth
7
6 syn = syn
7
6
5
4 sem = sem
lang = (english)
sense-id = top
2
eng-lex-sign (eng-sign lex-sign).
Reentrancy is an important and useful relation between features. If TFSs are viewed as
directed graphs, where nodes correspond to types and the edges leaving the node are the
57
constraint features of that type, reentrancy can be viewed as two or more edges pointing to
the same node. One consequence of reentrancies is that cycles may be introduced into the
graph; that is, it becomes possible to dene a type constraint which has a feature whose
value is the type of which it is a part, thus introducing an innite expansion into the type's
constraint. Such cyclic TFSs are disallowed in the LKB and therefore TFSs are restricted
to directed acyclic graphs (DAGs). Figure 1.12 shows a TFS and its corresponding DAG.
Note that the reentrancy is expressed by a boxed integer in the TFS. The relevant path
2
lex-sign
orth = orth
6 syn = syn
6
2
6
sem
6
6
6 sem
6
6
6
6
4
6
4 sem-hd
=
"
sense-id =
3
"
=
sense-id
ind-lex
lex = 0 lexeme
ind1 = entity
#
fs-id = 0
dictionary =
(string)
7
37
# 7
7
77
57
7
7
7
7
5
orth
ORTH
syn
lex-sign
SYN
IND1
SEM
entity
ind-lex
SEM-HD
LEX
sem
lexeme
SENSE-ID
FS-ID
sense-id
DICTIONARY
(string)
Figure 1.12: TFS and associated DAG with reentrant features.
equation is sem:sem-hd:lex = sense-id:fs-id. It is worth mentioning at this point that
while the type (string) is atomic, it allows any string of characters to be its subtype;
this property is particularly useful for specifying orthography values, predicate names and
various other types of information, as I will show.
The most useful operations on TFSs are subsumption and unication. Intuitively,
subsumption indicates whether a TFS is more or less informative than another. The
subsumption relation, v `is subsumed by' (not to be confused with the ordering relation
for TYPE) establishes a specicity ordering on TFSs. This ordering is partial in the sense
that certain TFSs cannot be ordered using v. One way of understanding subsumption is
by noting that if t v s, then any TFS that unies with t also unies with s. A denition of
subsumption for the TFSs in the LKB is given by de Paiva (1993:170-71) using the notion
of feature structure morphisms.
Intuitively the unication operation combines the informational content of two TFSs
to give a more specic TFS. It is dened as follows (although see de Paiva (1993) for an
alternative denition): in D0 u D00 = D, D is the unication of D0 and D00 if D is the most
general TFS such that D v D0 and D v D00 . During untyped unication the features of
58
each FS are compared: if they match, unication proceeds recursively through the values
of those features; if they do not the resulting FS will contain the unmatched features of
each FS. Typed unication has the additional condition that the resulting FS must be
typed. This means that any allowed combination of features must have a type, and that
the value of these features is consistent with the constraint of that type. In addition, any
two types to be unied must have a greatest lower bound in the type hierarchy irrespective
of whether their features match or not. A simple example of unication is shown in Figure
1.13.
agr
pers =
num =
pers u
(pl)
agr
pers =
num =
(3) =
num
"
agr
pers =
num =
(3)
(pl)
#
Figure 1.13: A simple example of unication.
Two additional features of the LKB which are important are psorts and default unication. A psort is simply a TFS which has been given a name Copestake et al. (1993:155);
its main purpose is to reduce the number of types that need to be specied at any particular time, since an excessive number of types would have adverse consequences for the
eciency of the system. In the LKB psorts are used mainly for dening lexical items. For
example, assuming the TFS of Figure 1.12, one could dene the lexical entry for `dog' as
a psort named dog 1:
dog 1 (lex-sign)
<orth> = ``dog''
<sem:sem-hd:lex> = ``dog1''.
This denition states that the TFS
lex-sign
orth = (dog)
6 syn = syn
2
6
6
sem
6
2
6
6 sem
6
4
=
6
4 sem-hd
sense-id =
"
=
sense-id
ind-lex
lex = (dog1)
ind1 = entity
3
7
37
# 7
7
77
57
7
5
has been given the name dog 1 (the underscore is introduced by the input function).
Dening this lexical entry as a psort avoids specifying a type dog which would only be
used for the lexical entry of `dog'. The boxed value sense-id indicates that the complete
TFS for this feature has been hidden or \shrunk" in order to improve readability; shrunk
values are not to be confused with reentrancy markers: reentrancy is indicated by a boxed
integer, whereas shrunk values are indicated by a boxed type. Another advantage of psorts
is that, in general, TFSs for lexical entries cannot be predicted in advance; that is, it is
impractical to predict what the constraint of each possible lexical sign will be. Finally,
psorts enable the incorporation of default inheritance into the system in a practical manner.
Default inheritance is an operation between TFSs which diers from the inheritance
relation in the type hierarchy in that values may be overridden. In the LKB it takes place
between a non-defeasible TFS and a defeasible one, and it is achieved by an operation of
default unication in which only values in the defeasible TFS which are compatible with
the non-defeasible TFS are unied (Copestake 1993b). Psorts are the only structures which
59
may be assigned defeasible status, while both psorts and TFSs can express non-defeasible
knowledge.
Defaults allow a motivated structuring of linguistic information in a knowledge system
such that generalizations may be expressed even if they do not hold for some particular
cases. For example, given the following naive psort for `bird' (shown as name & TFS)
indicating that generally birds y
2
bird 1
3
animate
4 nature = animal 5
mammal = (no)
fly = (yes)
a psort for `penguin' could be dened in the LKB as:
penguin 1
<> < bird_1 <>
<fly> = no.
in which the default value fly = yes is overridden. In this denition <> indicates the
empty path, namely the root node which stands for the whole TFS of the psort, while the
operator < eects default inheritance. In this case the defeasible TFS corresponds to the
root node of the psort bird 1. Non-default inheritance from psorts is also possible and it
is indicated by the operator <=, in which case values in the psort cannot be overridden.
Lexical Rules
TFSs in the LKB are used to encode grammar rules, lexical rules, tlinks and tlink-rules.
Grammar rules are described in Section 3.1.2. Lexical rules dene a mapping between an
input and an output lexical sign. This relationship is normally of a linguistic character
and can embody a variety of properties. In the LKB lexical rules are dened as a subtype
of rule:
lexical-rule 0 = lex-sign
v
1 = lex-sign
rule
0 =
1 =
sign
sign
A simple lexical rule is that expressing the relation between an animal and its meat (only
relevant paths shown):
animal-meat
v lexical-rule # 3
"
lex-noun
6
7
orth = 0
60 =
7
6
7
syn:type
=
(mass)
6
"
#7
6
7
lex-noun
4
5
1 = `orth = 0
syn:type = (count)
2
This rule takes as input a count noun (feature 1) and results in a mass noun with the
same orthography.
Tlinks
There are two mechanisms for expressing cross-linguistic generalizations in the LKB. The
rst one, described by Copestake et al. (1993:158), uses tlinks. A tlink is a TFS establishing a correspondence between lexical or phrasal rules in the source and target languages.
In the simplest of cases these rules are simply identity rules:
60
simple-tlink
v tlink
2
id-lex-rule
6
6 sfs = 4 0 = 0 lex-sign
6
2
6
6
6
6
4 tfs
2
=
1 = 0
id-lex-rule
= 1 lex-sign
40
1 = 1
3
3
7
57
7
7
37
7
7
55
This simple tlink is used to represent the type of bilexical entry found in most MT systems.
The equivalence between the source and target lexical entry is expressed in the following
way: to translate a source lexical sign, unify with the value of path sfs:1; if unication
succeeds, the target lexical sign is the value of tfs:0. In other words, it is the TFS value
of paths sfs:0 and tfs:0 which actually stand in the transfer relation in a tlink. The
transfer direction can be completely reversed by switching sfs and tfs. Based on this
tlink type, an English-Spanish bilingual entry can be entered into the LKB as:
dog_1 / perro_1 :
simple-tlink.
This tlink equates equivalent English and Spanish lexical entries (which are psorts unied
with the value of feature 1 in the respective side of the tlink).
Having lexical rules in the source and target TFSs within a tlink allows the encoding of
certain lexically bound transfer relations. For example, Copestake et al. (1993:160) dene
the transfer equivalence of certain words as a relation between the output of monolingual
lexical rules. Thus, the translation of the word `furniture' is dened as the output of the
pluralization lexical rule applied to its Spanish pseudo-translation mueble such that the
correct transfer relation `furniture - muebles' is established. The derivation of this bilingual
entry could be represented schematically as:
SFS:0 furniture 1 $ TFS:0 mueble 1 + PLU
* id-lex-rule
SFS:1 furniture 1
* pluralization
TFS:1
mueble 1
The corresponding, simplied TFS is:
3
2
singular-plural-tlink
v tlink #
"
id-lex-rule
6
6 sfs =
0 = 0 lex-noun
6
6
6
6
6
6
6
6
6 tfs
6
6
4
1 = 0
pluralization
"
lex-noun
6
2
=
60
6
6
6
4
=
1 =
"
orth = 1
syn:agr:num =
(pl)
orth = 1
syn:agr:num =
(sg)
lex-noun
7
7
7
7
37
# 7
7
77
77
77
#77
77
57
5
As input to the LKB, this TFS is specied as:
furniture_1 / mueble_1 :
tlink
<sfs> = id-lex-rule
<tfs> = pluralization.
61
Tlink-rules
The other mechanism for expressing cross-linguistic correspondences are tlink-rules. These
relate an input tlink to an output tlink in order to capture regularities in the bilexicon.
That is, a set of tlinks in the bilexicon can give rise to new tlinks through the application of
a tlink-rule. An example from Copestake et al. (1993:160-61) should clarify. It was already
noted that animal nouns give rise to nouns denoting their meat; such sense alternations
have analogous, albeit dierent, processes in other languages. A case in point is Dutch, in
which the animal noun lam (lamb) is used to generate its meat denoting noun lamsvlees.
Thus, from the tlink associating `lamb - lam' in the bilexicon, the tlink associating `lamb
- lamsvlees' can be derived through a regular process encoded as a tlink-rule in which
corresponding English and Dutch animal-meat alternations are encoded. This mapping is
expressed schematically by:
e-nounanimal $ d-nounanimal
+ animal-meat +
e-nounmeat $
+ vless-rule
d-nounvlees
A TFS encoding this relationship is shown in Figure 1.14: t0 and t1 indicate input and
2
3
meat-trule
v tlink-rule
2
3
6
7
simple-tlink
6
7
2
3
6
7
6
7
id-lex-rule
6
6
77
6
6 sfs = 4 0 = 0 e-noun-animal 5 7 7
6
6
77
6
6
77
1 = 0
6 t0 = 6
7
37
2
6
6
77
id-lex-rule
6
6
77
6
6
77
6
4 tfs = 4 0 = 1 d-noun-animal 5 5 7
6
7
1 = 1
6
7
6
3 7
2
6
7
simple-tlink
6
7
3
2
6
7 7
6
6
id-lex-rule
7 7
6
6
6 sfs = 4 0 = 2 e-noun-meat 5 7 7
6
7 7
6
6
7 7
1 = 2
6 t1 = 6
7 7
6
6
2
3
7 7
6
6
id-lex-rule
7 7
6
6
7 7
6
6
4
5
0 = 3 d-noun-vlees 5 7
4 tfs =
6
7
6
7
1 = 3
6
7
3
2
6
7
6
7
animal-meat
6
7
5
6 srule = 4 0 = 2
7
6
7
1 = 0
6
7
2
3
6
7
6
7
vlees-rule
6
7
4 trule = 4 0 = 3 5
5
1 = 1
Figure 1.14: Meat - vless tlink-rule.
output tlinks respectively while srule and trule indicate the source and target lexical
rules that link the two tlinks. This tlink-rule would not only apply to `lamb - lam' but also
to any other bilingual entry for which the appropriate pattern held. Clearly, there will
be translation pairs for which this pattern is not appropriate and these would have to be
ruled out by some monolingually specied restriction on either the English or the Dutch
lexical rule.
62
Comment
The LKB formalism, in the form of TFSs, has two principal advantages over untyped
FSs. Firstly, constraints on feature structures can be specied and enforced more easily.
This enables automatic checking of TFSs, facilitating debugging and development; similar
facilities are provided by programming languages, especially those used in the development
of large software systems. Secondly, the hierarchical structure of types and the inheritance
relation between constraints allows complex generalizations to be expressed economically
and in a computationally tractable formalism, avoiding redundancy in the description
of linguistic phenomena. In addition, default inheritance oers a powerful mechanism for
simplifying even further the description of exceptions and of the relation between structures
that dier minimally.
Tlinks and tlink-rules allow the description of cross-linguistic correspondences independently of specic system design and grammatical formalism. Tlinks overcome lexical
mismatches of the `furniture - muebles' type in a well-dened and motivated way, while
tlink-rules express relationships between bilexical entries which capture linguistically valid
generalizations.
However, problems arise with the suggested technique for overcoming lexical gaps
(Copestake 1993a; Copestake and Sanlippo 1993), which consists of using phrasal signs
in the tlinks making up a tlink-rule. The diculties are best explained with the aid of
the following example. To translate the Spanish novillo into English `young bull' a tlink
similar to the one below would be proposed:
2
phrasal-rule
6 0 = N1
6
4 inc1:0 = young 1
inc2:0 = bull 1
3
7
7
5
()
2
identity-rule
4
0 = novillo 1
3
5
Shrunk types in this example in fact represent complex structures which I have abbreviated. Transfer from English into Spanish in this approach takes place by unifying the
value of feature 0 with the corresponding node in the SL parse tree and ensuring that the
lexical signs in inc1:0 and inc2:0 are part of this phrase. For example, the parse tree for
`the young bull sleeps' might be:
S
PPP
P
NP
VP
!aa
!!
a
Det
the
N1
Q
Q
AP
N1
young
N
sleeps
bull
To transfer `young bull' the source side of the tlink above unies with the topmost N1; in
addition, the transfer algorithm checks that `young' and `bull' occur as leaves of this node.
If these two conditions are satised, the output is the lexeme on the Spanish side of the
tlink.
63
The problem with this approach is that ambiguity is introduced because the phrasal
side of the tlink will unify with any N1 in the parse tree which includes the relevant lexical
signs. For instance, in most analyses there are at least two N1 nodes in `young bull in the
eld':
N1
PP
P
P
N1
PP
!aa
!!
a
Q
Q
AP
young
N1 in the eld
N
bull
Under the phrasal approach this expression will result in two derivations of novillo, one for
each of the two topmost N1 nodes shown, leading to two identical TL sentences. It would
be possible to modify the English analysis to avoid this problem but such modications
should be seen as interfering with the monolingual component, and this should be avoided.
The ambiguity could also be avoided by only applying phrasal tlinks at the lowest matching
node. In such cases there is an increase in the computational complexity of the transfer
algorithm, which would then have to consider all the nodes in the analysis tree, in addition
to the lexical leaves.
Finally, the proposal of Copestake and Sanlippo (1993) makes the transfer component depend on the grammatical descriptions of individual languages since phrasal signs,
intimately associated with the grammar of the language, are included in tlinks; such dependencies are best avoided if possible in order to increase modularity of the monolingual
and transfer components.
1.4.11 Statistical Machine Translation
A statistical model of translation in which separate monolingual and bilingual sources of
knowledge are used is that described by Brown et al. (1990) and Brown et al. (1993).
In their conception of MT, there is a statistical language model which contains monolingual information and a statistical translation model which contains bilingual information.
Translation then requires a method for: a) computing the probability of a string being
the translation of a SL string, b) computing the probability of a TL being a valid TL sentence, c) a technique to search for the TL sentence which maximizes these probabilities.
Mathematically, the relationship between these three processes may be expressed as:
(1) ^t = argmax Pr(t)Pr(sjt)
t
This says that given a SL sentence s, its translation ^t is the sentence which is most
likely both as a TL sentence (Pr(t)) and as a translation of s (i.e. which maximizes the
conditional probability, Pr(sjt), of s occurring given that t occurred).
The monolingual language model is based on bigrams (more recently a trigram model
has been used, Brown et al. (1992)) from which the likelihood of a string of words being a
valid sentence can be computed. By contrast the translation model determines the most
64
likely translation of each SL word(s) and its (their) position within the TL string. One way
of implementing the translation model is by using three measures. One is the probability
of a given TL word being the translation of a SL word; another measure is the probability
of 0, 1, 2 or more SL words giving rise to a single TL word; the third measure is the
probability of a TL word occupying a particular position in the TL string. The TL string
which maximizes all of these probabilities will be a good candidate for being a translation
of the SL sentence. Other translation models are given by Brown et al. (1993) each varying
in its complexity and number of linguistic assumptions.
An example will help consolidate the above description (taken from Brown et al. (1990)).
To determine the probability of `John does beat the dog' being a translation of Le chien
est battu par Jean would involve the following computation. First, the probability from
the translation model would be calculated as follows:
Pr(fertility = 1jJohn) Pr(JeanjJohn) Pr(fertility = 0jdoes) Pr(fertility = 2jbeat) Pr(estjbeat)Pr(battujbeat) Pr(fertility = 1jthe) Pr(Lejthe) Pr(fertility = 1jdog) Pr(chienjdog) Pr(fertility = 1j ) Pr(parj ) Fertility here indicates the number of French words (SL) giving rise to an English word
(TL); is the empty English word onto which par translates. The fertility values here are
just one possible (although probable) assignment for this pair of sentences; in the search
process many more would be tried in order to nd the one that maximized the value of
Equation 1 above. The above term would be further multiplied by the probabilities of the
position of a TL word in the TL string, and by the probabilities derived from the English
bigram model:
Pr(1j6, 5) Pr(2j0, 5) Pr(3j3, 5) Pr(3j4, 5) Pr(4j1, 5) Pr(5j2, 5) Pr(Johnjstart) Pr(doesjJohn) Pr(beatjdoes) Pr(thejbeat) Pr(dogjthe)
The rst set of probabilities have the format Pr(i j j; l), where i is the position of the
TL word, j is the position of the SL word that gives rise to it, and l is the length of
the TL string. As I mentioned earlier, more complex models which include more realistic
approximations to the exact position of the TL words are possible; these are described in
Brown et al. (1993). The second set of products represents the bigram probabilities for
English. The result of these and the preceding terms would be compared with those for
other TL strings and their arrangements in order to select the most likely translation. Since
searching for an optimal TL string which maximizes the probabilities of the monolingual
and transfer models is computationally impractical, a suboptimal search algorithm is used
which proceeds stepwise through a number of hypotheses, pursuing at each point those
whose extension is most promising.
Comment
Statistical approaches such as these are appealing for a number of reasons: they require
minimal linguistic knowledge and therefore, it is claimed, are applicable to any language
65
pair; they are trained on corpora thus making them better adapted at translating real
texts and, by their very nature, coping with large volumes of data.
However, their lack of linguistic sophistication makes it dicult for pure statistical approaches to handle phenomena which cannot be modelled easily in terms of textual words.
For instance, since structural information is normally not available in these approaches,
it is dicult to correctly translate sentences in which such information is necessary. For
example, in the translation of:
SL Eng: The city that I went to.
TL Spa: La ciudad a la que fui.
the exact position of the translation of `to' in the TL string has to be determined with
reference to the structure of the TL because Spanish does not allow preposition stranding;
calculating this position for the general case cannot be done simply by inspecting the
surface form of the SL or TL string. In addition, lack of sucient data can be a problem
for these approaches, even when large corpora are used. For example, using word forms
instead of morphologically analysed lexical entries reduces the quantity of data for inected
languages such as French and Spanish, leading to a decrease in the accuracy of their cooccurrence matrices. This problem has been addressed by Brown et al. (1992) where
morphological analysis, reordering and other types of operations are performed on SL
and TL texts before the monolingual and translation models are calculated; nevertheless,
structural problems such as the one just described remain.
Another problem is that the grammaticality of a translation cannot be guaranteed;
this is because the TL model is a bigram model which cannot account for several syntactic
restrictions present in natural languages. For instance, pure bigram models cannot cope
with subject-verb agreement when the noun in the subject is separated by an unbounded
number of elements from the inected verb:
The boy runs.
The boy by the river runs.
The boy by the river which ooded runs.
etc.
Availability of bilingual corpora is another factor which aects the viability of statistical
MT, at least at present, mainly because much translation that is done either starts from
a non-electronic version of the SL text, or because translation is done by a large number
of independent translators whose output cannot be easily collected into one place.
Although the above issues are important, most of them could be overcome, if not in
practice, at least in principle. Thus, one can imagine a sophisticated statistical technique
which took account of structural dierences between languages, or of a language model
which allowed for unbounded syntactic dependencies. In his defense of KB approaches
to MT, Farwell (1992) argues that KB systems are better suited to the study of nonlinguistic context, especially because this context is not normally present in the strings
that constitute a text. His argument is that given a SL text, there are potentially many
high quality TL texts which are translations of it; this is because of the underdeterminacy
of a text; that is to say, there is a one-to-many mapping of a SL text into its interpretation.
In addition, from each interpretation there is a one-to-many mapping into an appropriate
expression in the TL. For example, whether the name of a company is used to refer to the
66
company, to its board of directors, to its product or to some other entity associated with
it can only be inferred with a model of the non-linguistic properties of the text. Farwell
concedes however that a demonstration of the advantages of KB over statistical MT is not
possible because for any phenomena accounted for with a KB system, a statistical system
can be conceived which could account for the same phenomena.
I believe that the conclusion to be drawn from the preceding paragraphs is that, given
the generality of statistical MT in particular, and of so-called empirical approaches in
general, their main benet appears to lie in their motivated integration into KB or rule
based methodologies. In other words, statistical techniques, particularly those relevant to
MT, are not incompatible with rule based approaches, and insights and solutions from
the one can be incorporated into the other, as long as they are combined in a welldened and motivated manner. For example, statistics on the co-occurrence of lexical
entries can improve the quality of translations in cases where rule based approaches cannot
be expected to detect very ne meaning distinctions and word associations. Thus, in a
situation where both `on' and `in' are appropriate translations of Spanish en, one of these
prepositions can be selected based on the context surrounding the preposition. More will
be said on this point later.
1.5 Properties of Transfer in MT
The systems just described constitute a wide range of transfer based approaches and
strategies to MT, each with its own advantages and disadvantages. Below, I summarize
the principal characteristics of each of the systems:
Metal Advantages: execution speed; extensively tested and developed.
Disadvantages: uni-directional; transfer rules are procedural; analysis and generation
use dierent grammars; not modular because of interaction between analysis, transfer
and generation; semantics of the formalism not well-dened due to the use of CF
grammars enhanced by local tests on features; structural transformations used during
analysis, transfer and generation.
CAT Advantages: multilingual; well-dened representation for transfer; modularity pre-
served between monolingual components.
Disadvantages: uni-directional; inecient and repetitive notation due to use of multiple and independent levels of analysis and representation; procedural and nonmonotonic formalism; non-reversible grammars.
ELU Advantages: declarative, well-dened and exible formalism; bi-directional trans-
fer; reversible grammars; transfer relations expressed statically as correspondences
between FSs; default inheritance in the lexicon; separate analysis, transfer and generation modules; translation is ecient.
Disadvantages: eciency achieved at the expense of completeness; untyped FSs;
transfer module highly dependent on SL analysis and therefore on its grammar.
LFG Structural Correspondences Advantages: dierent levels of linguistic informa-
tion constrain the translation relation; declarative representation of cross-linguistic
67
information; same grammar used for analysis and generation; modular.
Disadvantages: same grammatical theory must be used for all languages; transfer
module totally dependent on SL analysis and therefore on its grammar; representation for transfer unstable as it depends on available semantic analyses; lexical gaps
and other mismatches not naturally expressible.
Type Rewriting Advantages: well-dened, declarative, powerful formalism with general
inheritance mechanism; type system aids consistency checking; bi-directional transfer and reversible grammars; dierent sources of linguistic knowledge constrain the
translation relation; modular.
Disadvantages: structural transfer relationships make transfer dependent on source
and target grammar; no default inheritance implemented.
BCI Advantages: transfer at a single, well-dened level of representation; few structural
transfer rules; inference possible; determiners, pronouns and other lexical categories
available for transfer; bi-directional transfer and reversible grammars; ecient implementation; modular.
Disadvantages: untyped FSs; no general inheritance mechanism available; some surface form information such as function words is lost; no mechanism available for
expressing relationships in the bilexicon.
Rosetta Advantages: multilingual system; reversible grammars.
Disadvantages: grammars have to be tuned to each other in order to construct
isomorphic derivation trees; transformations in the grammars require explicit application orderings to be specied.
Shake-and-Bake and Indexed Logic Advantages: bi-directional transfer and reversible
grammars; modularity through independence of SL and TL grammars; lexical gaps
and structural mismatches handled in a unied way; no structural transfer rules necessary; no tactical generation problem (i.e. turning logical form into lexical entries
(van Noord 1991)).
Disadvantages: no mechanism for expressing regular relationships within the bilexicon; generation algorithm is exponential on the number of words in the worst case.
LKB Translation Links Advantages: well-dened, ecient and powerful formalism;
typed FSs; inheritance and default inheritance dened; can represent cross-linguistic
generalizations independently of system design; mechanism for expressing relationships within the bilexicon.
Disadvantages: not designed as an MT system; lexical gaps handled by relating
phrasal signs to lexical items.
Statistical MT Advantages: no explicit linguistic rules necessary; easily adaptable to
dierent subject domains and language pairs; robust; it is suited to translating real
texts; collocations and co-occurrences are captured uniformly within the same framework.
Disadvantages: structural conditions on transfer cannot be encoded; quality of TL
68
output limited because of n-gram model; TL ordering of words not accurately determined; improvements in quality require linguistic rules; requires large amounts of
bilingual corpora for training.
An ideal transfer system would incorporate all the desirable features from the above
systems, and avoid all the undesirable ones. As argued in Section 1.4.11 the starting point
for the present work will be a rule based system. With this in mind, I will now propose a
number of properties desirable of a rule based transfer MT system. Although the properties
are stated in relation to the transfer component, in many cases similar properties are also
desirable of the analysis and generation modules. The rst four properties are adapted from
Rich (1983:201-2), where they are proposed as characteristic of good rule based systems.
Alshawi et al. (1992:294) also outline a number of properties desirable of an MT system,
including expressivity, compositionality, simplicity, reversibility and monotonicity; these
may be seen as a subset of the properties below. It will also be illuminating to consider
whether the system I will build upon, the LKB, satises these properties.
Representational Adequacy A transfer system must be capable of representing all
relevant cross-linguistic knowledge and generalizations. By exploiting the notions of
unication, subsumption and inheritance, tlinks in the LKB can express bilingual
correspondences concisely and with a sucient level of generality and uniformity.
Inferential Adequacy New cross-linguistic relations should be derivable from existing
ones. For example, new bilingual entries should be derivable from existing entries.
The LKB is the only system from those considered which can infer new crosslinguistic correspondences from existing ones. It achieves this through tlink-rules
which expand the bilexicon by constructing new tlinks from existing ones.
Inferential Eciency Existing cross-linguistic knowledge should guide the construction
of new cross-linguistic relations. Thus, the derivation of new bilingual entries should
not lead to exponential, or worse, computational behaviour. The time taken to
derive new tlinks through tlink-rules is proportional to the size of the bilexicon and
therefore eciently computable.
Acquisitional Eciency Cross-linguistic knowledge should be easily extended. Ideally,
the system should be able to acquire this knowledge (semi-)automatically (Trujillo
and Plowman 1991; Copestake et al. 1992; van der Eijk 1993). As a general system,
the acquisitional eciency of the LKB depends to a certain extent on the application that uses it. If phrasal signs are allowed to appear on either side of a tlink,
acquisitional eciency will be reduced compared with the case where only lexical
signs are allowed.
Declarativity Ordering of transfer relations should not aect the net result of transfer.
In addition, the representation formalism should allow the expression of all and only
the knowledge that is necessary for transfer, regardless of the particular algorithm
used to interpret it. Since TFSs have an implementation independent semantics,
tlinks, themselves TFSs, will also be implementation independent. In addition, their
operation is bidirectional, thus reinforcing their declarativity.
69
Monotonicity At least for research purposes, the eect of a transfer rule should not be
blocked by the addition of another transfer rule. Although this property may be
relaxed for practical systems in order to reduce TL ambiguity, the actual mechanism for achieving this can be made application specic. This property is to be
distinguished from the non-monotonic description of a particular transfer relation;
that is, the transfer module as a whole should be monotonic, whereas the description of particular transfer rules need not be. The reason is that default inheritance,
implemented as a non-monotonic operation, is a very useful mechanism for the description of particular transfer rules. In the LKB, the addition of a new tlink to the
bilexicon can at most increase the number of possible translation relations since the
interpretation of a tlink is independent of other tlinks.
Reversibility The system should translate to and from each language using the same
transfer component. This has already been mentioned as a feature of tlinks.
Modularity As far as possible, the transfer component should be separate from the anal-
ysis and generation modules, particularly from their grammars and specic design.
This property is not relevant to the LKB as an independent system.
Transparency The structure of the transfer rules should reect the structure of cross-
linguistic patterns. In other words, problems should not arise because of the formalism used for transfer but because of the complexity of the phenomena (cf. head
switching in LFG). Lexical rules and tlinks oer motivated and eective solutions to
a number of problems in lexical transfer.
Locality Transfer rules should only refer to the smallest context in which they are ap-
plicable (Estival et al. 1990). Locality ensures maximum generality for a rule by
dening only the essential characteristics of the source and target structures which
enable the transfer relation to hold, therefore maximizing the number of such structures that are consistent with it. Compositionality may be seen as an instance of
locality: for a system to be compositional, the application of transfer rules must be as
independent from its context as possible. The application of tlinks and tlink-rules is
largely independent of context; therefore they operate with a high degree of locality.
Uniformity The representation necessary for transferring a particular construction should
be well-dened; if possible there should be a single level of description at which transfer is eected and which can encode the required transfer relations. Since all transfer
is specied in the bilexicon through a single operation on the tlinks, uniformity of
representation and processing is assured.
Consistency Checking Inconsistencies in the description of cross-linguistic knowledge
such as incorrect value assignments and structural descriptions, should be detected
as early as possible. This is an important and useful property available from the type
system in the LKB. The features which can appear at the same level in a FS must
be licensed by type constraints. It is thus possible to ensure that only linguistically
coherent features appear together in a FS.
70
Some of these properties will be considered again in Chapter 2 when assessing the
suitability of the representation for transfer developed.
1.6 Translation and Theories of Prepositions
The preceding section laid the foundations for the development of a transfer based MT
system and of a formalism for expressing cross-linguistic relations, both of which will
be carried out in Chapter 2. In this section I describe a number of proposals for the
translation of prepositions, progressing towards more linguistically oriented descriptions.
At the end of the section, I describe two theories which although not concerned with
translation will suggest some of the structures and properties that need to be captured
in a cross-linguistically valid description of spatial prepositions. Formal proposals for the
semantics of prepositions from a monolingual stance are not considered in this section;
their contribution will be indicated in Section 5.1.
Descriptions of the translation of prepositions are even more scant than those of transfer
MT systems in general. This may be due to the complexity of the problem or to the lack
of a research tradition in this area. Hence, the description below can only hope to be
representative of previous approaches. The systems considered are: Systran, Metal and
Eurotra. As for the theories described, I discuss that of Hjelmslev (1935) for the structure
of case, and the theory of spatial prepositions of Herskovits (1986).
1.6.1 Systran
The historical background to Systran was briey presented in Section 1.1. Although not a
proper transfer system, some attention has been paid within its design to the translation of
prepositions. The description to follow is based on that of Hutchins and Somers (1992:17786). In Systran, every SL lexical entry is assigned a set of semantic markers (e.g. `physical property', `container'), a semantic type if it is a noun (e.g. `animate', `countable'),
and a TL equivalent. The rst step after input of the SL string is the translation of idioms, where an idiom may be any pattern that can be translated without further analysis.
Processing of the rest of the input then follows the sequence: morphology; compound
noun processing; homograph resolution; segmentation into main and subordinate clauses;
detection of main syntactic constituents (roughly NPs, VPs, etc); identication of coordinated structures and other list-like expressions; identication of subject and object, and
nally identication of deep case relations such as agent and patient. Transfer proceeds
as follows: transfer of idiomatic expressions that assume some form of analysis; transfer of
prepositions; structural transfer; default transfer of remaining expressions; morphological
generation of word forms; reordering into TL order and other changes to the surface form
of the TL expression.
Translation of a preposition involves testing a semantic marker in the verb or noun
following or preceding the preposition. For example, in Russian-English translation, the
preposition do is translated as `up to' if the preceding verb or noun has the semantic
marker `+increase'. However, there is no general guidance for the assignment of semantic
markers nor are they based on any theoretical framework; their ad hoc nature can be seen
from the need to add the marker `+decrease' to translate the same Russian preposition
71
as `down to'. With such a solution there is no limit to the number of semantic markers
that may be added, and in principle the set of markers could grow indenitely with the
addition of new vocabulary. This situation is not only undesirable from an engineering
point of view, but also from a linguistic stance. If semantic markers are developed on
a case-by-case basis their discriminatory properties will diminish since there will be no
guarantee that a marker will be required in all its relevant environments. For instance, a
semantic feature such as `+decrease' cannot be used in other contexts, even if relevant,
unless its precise meaning is indicated. Furthermore, given a lexical entry, a large number
of unrelated semantic markers might lead to inconsistencies since the presence of one
marker may imply the absence of another. Thus, there is nothing to prevent a noun
being assigned both `+increase' and `+decrease' markers other than the carefulness of the
linguist who does the coding; in cases where marking is done by dierent people, it is very
dicult to even approximate some coherent marking scheme in this way.
The system of semantic marking is not hierarchically organized either. For instance,
the marker HUM(an) is not subordinated to the marker AN(imate) and both must be
included in human nouns such as `teacher' and `doctor'. This lack of structure leads
to inecient and redundant codings, and to complex conditions in the transfer lexicon.
To see this, consider the four unstructured markers `+living', `+animate', `+human' and
`+inanimate'. A verb such as `run', which requires an animate subject, will have to test
for the two features `+animate' and `+human'; this is because each can in principle be
used independently of the other. Even worse, a transfer rule for `die', which requires a
`+living' subject, will have to include checking all four features. Alternatively, all related
features could be present in a lexical entry such that only one marker needed to be tested
for, but this means that any new markers that are added to the system will have to be
included in every lexical entry or rule in which that marker is required. Thus, the addition
of the marker `+professional' will need to be explicitly added to a large number of lexical
entries and transfer rules; in addition, every noun that is marked `+professional' will also
require the markers `+human' `+animate' and `+living' to be explicitly encoded.
To summarize, the problems with the translation of prepositions in Systran are unstructured knowledge representations and lack of generic procedures for manipulating this
knowledge. Part of the reason for such deciencies is the lack of a coherent theory of the
semantics of verbs, nouns and prepositions. Unstructured and unmotivated knowledge
representations can work for specic cases, but they cannot form the basis for a generic
MT system.
1.6.2 Metal
An overview of Metal was given in Section 1.4.1; in this section I present its approach
to preposition translation. During transfer, prepositions not contained in the frame of
a word are treated as modiers or adverbials; if they have alternative translations they
are translated based on the semantics of the governed noun. This is done by encoding
with each SL preposition its translation and the semantic restrictions that its complement
should satisfy. For example, the following is one of the several entries for German vor:
(vor (PREP ALL) 20 before (PREP ALL) 0
GC D
72
TY ABS PNT)
In this entry, 20 is the preference value of this translation indicating how appropriate
`before' is as a translation of vor. GC is the grammatical code of the German complement
of the preposition; in this case it indicates that the complement noun has to be in the
D(ative) case for translation to take place. TY is the complement's semantic type which
here may be either an ABS(tract) or PuNcTual NP. PREP ALL indicates the subject elds
in which this transfer rule may be used (all in this case). The 0 is a dummy preference
value not used during lexical transfer. At the point of lexical transfer, prepositions and
cases bound by a valency frame are handled by lexical transfer rules attached to the
bilingual entry to which the preposition belongs; these rules may check the lexical value
of the preposition in the subcategorized PP to decide on an appropriate translation (eg.
bestehen translates as `consist of' if the PP complement has the preposition = aus). The
Metal project has tried to minimize the use of semantic markers for TL selection. Thus,
whenever possible, purely syntactic information has been used to translate a word. When
present, markers are simple and general, including values such as TEMPoral, CAUSative,
SPAtial and the like; their principal purpose is preposition disambiguation (Bennett and
Slocum 1988:117).
Although the design of Metal improves considerably on that of Systran, its semantic
features are still not suciently motivated by a linguistic theory. Thus, what counts as
a punctual noun is more of an operational than an ontological decision, especially if this
decision is not supported by a set of consequences that are implied by the corresponding
marker. In other words, in Metal the German lexicon would classify a gate as `punctual'
but not a `house' in order to handle the translations below:
Ger: Vor unserem Hause stand eine Strassenlampe.
Eng: In front of our house stood a lamppost.
Ger: Die Soldaten sind vor dem Tore der Stadt.
Eng: The soldiers are before the gate of the city.
However, without the English translation there would be no clear way of motivating this
distinction.
Since the preference value determines the order in which dierent transfer rules are
tried, procedurality is another shortcoming of the transfer module. This means that new
lexical transfer rules can only be added after careful consideration of their interaction with
the rest of the transfer module.
Finally, because only the complement noun of a preposition is tested for markers in
the lexical transfer rules, restrictions originating in the modied phrase are not taken into
account. For example, movement verbs allow a range of translations not possible with
non-movement ones. The following example is taken from Asher and Sablayrolles (forthcoming):
Fre: Jean a couru dans le jardin.
Eng 1: John ran into the garden.
Eng 2: John ran in the garden.
These alternatives would not be possible given the approach to preposition translation
implemented in Metal.
73
1.6.3 Eurotra
Proposals for the translation of prepositions within Eurotra are varied. However, they all
rely on the notion of semantic features for the selection of the appropriate preposition.
In this section, I will rst introduce the general approach to PP translation in Eurotra,
and then evaluate one system of semantic features developed under this project. Finally I
describe two specic proposals for PP translation: one approach using interlingua relations
in transfer and two approaches using semantic features and lexemes.
Approach to PP Translation
As noted in Section 1.4.2 transfer is eected in Eurotra at the level of IS. At this level prepositions in argument PPs are deleted and recoded with the verb for which they subcategorize
(Durand et al. 1991:114). For example `depend on' becomes flu= depend, pform-of-arg2=
ong in IS. Modier or adjunct PPs, as described by Steiner et al. (1988b:64, 128), have
their prepositions encoded as part of the governed NP. For instance, a German PP of the
form `zu NP' becomes fcat= np, spec= flu= zu, space= goalgg in IS. A conicting strategy
is described in Durand et al. (1991:119) where they are left as governors of the modier
phrase; that is, they are not incorporated, or `featurized', into the NP; it seems that the
latter has been adopted as the ocial encoding at IS. Whichever way they are encoded,
modier PPs are then transferred into the target IS depending on the preposition used
and using features percolated to the IS representation from the governing predicate and
complement NP. It should be pointed out that at IS, modiers are sisters to the arguments
of a verb (Durand et al. 1991:110).
Semantic Features
An example system of semantic features used for transfer in Eurotra is the one described
by Zelinsky-Wibbelt (1988) who proposes four inventories of features for the semantic
classication of words. These semantic features are used in the disambiguation of word
senses, both in the construction of interface structures and in the selection of TL predicates.
The four inventories are: `situations' which are used with verbal predicates, `entities' which
classify nominal predicates, `properties' which describe adjective predicates, and `speciers'
which describe adverbs, including prepositions and conjunctions. Of these four, she only
gives the entity and property feature systems. A generic feature system is also proposed
which applies to all lexical items; this system is summarized below:
(concrete vs. abstract)
- situations, entities and properties.
(countable vs. mass)
- entities.
(perfective vs. imperfective) - situations, properties.
These distinctions are based on the work of Vendler (1967), Dowty (1979), Talmy (1985)
and others. Thus, `mass' marks nouns such as `industry', `advance', `progress', `capacity'
and `range'.
The `property' feature system shown in Zelinsky-Wibbelt (1988:117) is given in Figure
1.15 in the form of a type hierarchy (see Section 1.4.10) with the exception that type e
corresponds to a null type. Such a detailed hierarchy of the semantic features needed
for translation would be very useful. Unfortunately, Zelinsky-Wibbelt (1988) gives very
74
property
abstraction = abstraction
temp_distr = temp_distr
gradedness = gradedness
phenomenon
semiotic
abstract
cognitive
perceptive
emotive
abstraction
concrete
material
shape
prov(erbial?)
top
characteristic
static
evaluative
quantitative
scale
qualitative
temp_distr
relation
gradedness
dynamic
manner
process
graded
restrictive
nongraded
comp(arative?)
locational
temporal
additive
emphasis
intensifying
amplification
e
downton(ing?)
Figure 1.15: Structure of `property' feature system.
few examples of how to use these features; in addition, the abbreviations adopted are not
explained and this leads to the uncertainty about some of the type names in the diagram.
An analogous hierarchy to that presented in Zelinsky-Wibbelt (1988:123) for the `entity'
features is shown in Figure 1.16, but again, practically no examples are given for any of
the features described. In some cases it is not obvious how to classify a given noun; for
example is `pain' perceptive or emotive? Is `dream' active or passive? No criterion
is given for assigning nouns to any of these categories. There is no explanation of what
it means for a noun to be of type semiotic for instance, or of what can be inferred from
the type privative. If no explanations of this sort are given, it is dicult to imagine
how consistent marking can be achieved within the same language, let alone across many
languages. No doubt information of the above nature is needed for MT; what is overlooked
is that the English labels given to the types above are really only as good as the tests that
can be used to distinguish between them and, for MT in particular, the degree to which
such tests can be `translated' and applied consistently in another language.
Interlingua Relations in Transfer
The rst approach to PP translation in Eurotra that I will consider is that described
by Zelinsky-Wibbelt (1990); it relies on assuming an interlingua locative relation which
is instantiated to a particular preposition by the TL grammar. As an example of this
approach consider the translation of Juan viaja en el bus as `John travels on the bus'.
Analysis of the Spanish sentence results in an IS structure in which the feature value
75
concrete
entity
abstraction = abstraction
quantification = quantification
humanness = humanness
abstraction
abstract
origin = origin
information = information
temp_dist = temp_dist
perspective = perspective
phenomena
origin
quantification
boundedness = boundedness
dividedness = dividedness
definitional
count
boundedness
top
dividedness
complexity = complexity
distribution = distribution
complexity
distribution
spatial
individual
collective
partitive
sortal
numerative
privative
e
mental_mode
social
convention
institution
e
locative
temporal
e
cognitive
perceptive
emotive
consciousness
active
passive
information
semiotic
e
temp_dist
static
dynamic
e
discontinous
continuous
humanness
scale
mental
e
mental_mode = mental_mode
consciousness = consciousness
mensural
e
mass
determinate
definitional = definitional
spatial = spatial
nonhuman
human
artifact
natural
e
perspective
activity
accomplishment
achievement
e
inchoative
resultative
e
Figure 1.16: Structure of `entity' feature system.
76
pair place = part of lm represents the interlingua relation just mentioned. Transfer rules
would map the feature bundles of the Spanish IS structure into English IS structure.
Instantiation of the lexical item `on' in the English grammar would then proceed from
this IS structure as follows. The feature-value pair typical function = large vehicle in the
structure for `bus', in conjunction with a verb of motion, causes the feature-value pair
salient = transportability to be added to the subject of the verb, as well as adding the
feature-value pair salient shape = surface to the object of the PP (the bus). Another
instantiation rule adds the feature-value pair relevant = support to the feature bundle
containing the interlingua relation. Finally, a lexical entry sensitive to the feature-value
pair relevant = support adds the lexical string `on' as the appropriate English preposition
that expresses this interlingua relation. These steps are made explicit in the simplied
example below. In representing the interlingua relation, the feature-value pair place = ..
part of lm indicates that the location of John is in some part of the landmark `the bus'.
The structures below have been attened to make them easier to follow; curly brackets
represent feature-value bundles.
The following instantiation rules add the feature-value pair in bold when the rest of
the bundle unies with the input representation. The lexical entry for `on' is also shown
as part of this group of structures.
Rule I) fcat = v, activity = motiong, fcat = np, salient = transportableg, fcat = p,
place = .. part of lmg, fcat = n, typical function = large vehicle, salient shape =
surfaceg
Rule II) fcat = vg, fcat = np, salient = transportableg, fcat = p, place = .. part of lm,
relevant = supportg, fcat = n, salient shape = surfaceg
fcat = p, lex = on, relevant = supportg
A step by step example follows (bold type represents added information).
1. Input: Juan viaja en el bus
2. Spanish Interface Structure: fcat = v, lex = viajar, activity = motiong, fcat = np, lex = juang,
fcat = p, place = .. part of lmg, fcat = n, typical function = large vehicleg
3. English Interface Structure: fcat = v, lex = travel, activity = motiong, fcat = np, lex = johng,
fcat = p, place = .. part of lmg, fcat = n, lex = bus, typical function = large vehicleg
4. After instantiation rule I: fcat = v, lex = travel, activity = motiong, fcat = np, lex = john, salient
= transportabilityg, fcat = p, place = .. part of lmg, fcat = n, lex = bus, typical function =
large vehicle, salient shape = surfaceg
5. After instantiation rule II: fcat = v, lex = travel, activity = motiong, fcat = np, lex = john, salient
= transportabilityg, fcat = p, place = .. part of lm, relevant = supportg, fcat = n, lex = bus,
typical function = large vehicle, salient shape = surfaceg
6. At some point during generation: fcat = p, lex = on, place = .. part of lm, relevant = supportg
7. Output: John travels on the bus
Zelinsky-Wibbelt's solution has one main disadvantage and that is that it relies crucially on the values large vehicle and surface without indicating how one might decide
when it is appropriate to assign either value to a noun. Neither is it clear how the various
properties and semantic features interact with each other. For instance, the sentence `the
77
passengers walk inside the ferry' could not be generated given Zelinsky-Wibbelt's description because the preposition `on' would be selected instead of `inside'. The criticism here
is not that this problem cannot be overcome, but that there is no clear denition of the required semantic features nor of their interaction with each other. Finally, the notion of an
interlingua locative relation is vague since there is no restriction on the spatial prepositions
that can express this relation.
Semantic Features and Lexemes
The approach just described consists in deriving an interlingua representation of the meaning of the preposition and letting the TL grammar instantiate the appropriate TL preposition. By contrast, the two approaches described by Durand (1992), Badia et al. (1990)
and Melero et al. (1990) both retain the preposition that gives rise to the relations and
augment it with a number of semantic features.
The rst approach does not use semantic features in transfer rules, relying instead on
a separate mapping which will copy the semantic features of the source IS into the target
IS.
flu=forg =) flu=porg
flu=forg =) flu=parag
Thus, one source IS will transfer into two target ISs, one of which is discarded by the TL
grammar using the semantic features copied from the SL. In other words, the source and
target IS will have identical semantic features. For example, the value of modsr in the IS
below would be copied into the corresponding Spanish IS.
fcat=pp,
modsr=causeg
(((hhh
((((
hhhh
frole=gov, modsr=causeg frole=arg1, cat=npg
flu=forg
lack-of-funds
In the second approach, semantic features are used to select the corresponding TL
preposition during transfer:
flu=for, modsr=causeg =) flu=porg
flu=for, modsr=aimg =) flu=parag
Such rules would eect disambiguation at the point of transfer, based on the features
percolated from the complement and governing phrases of the preposition.
As far as the semantic features are concerned, a tentative inventory is proposed of
which a portion is shown in Figure 1.17. An example denition of one of these features is
that for place path (Durand 1992:11):
If a modier of a predicative head denotes the locale which is traversed or passed along in
the course of the event (or process) described [including gurative senses] it has the semantic
relation place path, i.e. fmodsr=place pathg. [Examples:]
The project has progressed a lot along these lines.
The wind whistled across the prairies.
Through the window one can see the railway station.
78
in
place
position
on
goal
origin
path
modsr
cause
aim
concern
accompaniment
instrument
...
Figure 1.17: Semantic features for PP translation.
Determining whether a PP expresses one semantic feature or another is done through
instantiation rules which operate on IS structures. Thus, a simplied rule which adds the
semantic feature modsr=cause to an English IS representation would be:
fmodsr=cause, role=mod, cat=ppg
[...fcat=np, ..., human=nog...]
This rule would apply to phrases such as `for lack of funds', adding the feature cause.
Although this description is very brief it shows the main features of each approach.
One problem with both proposals is that the denition of each feature is at best circular
and does not indicate what properties each feature implies. For instance, should `round'
be assigned a place path feature? A positive answer is not more plausible than a similar
one for `in' which intuitively should not receive this feature. As for the benets of either
approach, each seems suited to particular situations, and since they are compatible with
each other there is no need to totally exclude either. Thus, tests during transfer can be
made when a distinction in the TL cannot be made by the SL grammar.
This concludes the discussion of PP translation in transfer MT systems. Both of the
following sections deal with the general structure of spatial prepositions.
1.6.4 Hjelmslev's Theory of Cases
The theory of cases of Hjelmslev (1935) is relevant to this thesis not only because of its
historical value but because it proposes a structure for case and prepositional systems
which is supported by data from very dierent languages. In what follows, I will reference
the Spanish translation of Hjelmslev's study as Hjelmslev (1978). Hjelmslev's conception
of case is that of a relation between two objects both of which can be of a nominal or a
verbal nature; in addition a case may be expressed by an inection or by a preposition and
may indicate a complement or an adverbial phrase (Hjelmslev 1978:135-37). The theory
proposes that the system of cases in any language expresses combinations of meanings
ordered along three dimensions. This suggestion is based on data from languages, such as
the Caucasian language Tabasaran, with a large number of cases (Tabasaran has around
50 cases). The three meaning dimensions he proposes are: 1) direction, 2) coherence, 3)
subjectivity. Each of these dimensions has three values: positive, negative and neutral.
79
1) The direction dimension expresses the notions of approach, neutrality and separation
in this respect. This dimension can clearly be seen in the case system of Hungarian, where
several cases and postpositions have three forms: the `to' form (approach), the locative
form (neutral) and the `from' form (separation).
2) The coherence dimension can have two versions: inherence and adherence. The
inherence version has the notions of being inside, being outside or being neutral to either.
These can be exemplied by the three English prepositions in, out and between (that is,
between seems to mean enclosure and non-enclosure at once). The adherence version of
this dimension has the notions of contact, neutral and non-contact; for instance, the three
prepositions on, under and above express these distinctions.
3) The subjectivity dimension expresses the notions of objective, neutral and subjective
point of view. Hjelmslev's examples along this dimension are not particularly clear. For
instance, keeping the rst two dimensions constant, he oers the following variations in
meaning: being under (objective), being between (neutral) and being behind (subjective).
That is, an object can be under another object irrespective of the point of view taken
(at least in most cases). On the other hand, things are behind other things in relation to
a given point of view or in relation to the object itself. The reason for `between' being
classied as neutral with respect to subjectivity is not as easy to explain, and I will refrain
from attempting to do so here.
climbing
un- climbing
on, touch- going
not
to un- on,
ing; to on der;
touching;
der
going
in/onto;
being on, being un- being on,
der; under not touchtouching;
on top of (h)
ing; over
neuter
being
being next
in/touching; in all re- to,
spects; be- not touchwithin,
inside (h) tween (h) ing; at
furthering
furthering; furthering
from
through from
touchnot touching; from (h)
ing; from
top of
furthering
from
on, touching; o the
top of
to over
leaving
from under; from
under
leaving
from on,
not touching; from
over
into/onto
nearing;
along (h)
within (h)
nearing not
touching;
to
nearing
to in front, going
touchto
ing; onto behind;
behind
the front
of
being in
front,
touching; on the
front
of
furthering
from in
front,
touching; o the
front of
nearing to
the space
in front; to
in front of
in
being be- being
front,
not
hind; be- touching;
hind
in front of
leaving
leaving
from in
from be- front,
hind; from not touchbehind
ing; from
Figure 1.18: Hjelmslev's Three Dimensional Theory of Case.
in front of
Figure 1.18 summarizes Hjelmslev's ideas; this diagram, adapted from Hjelmslev (1978:
182), should be interpreted as follows: the direction dimension is represented by the vertical
axis running from top to bottom, coherence is represented by the horizontal axis going
from left to right, and subjectivity is represented by the three separate boxes. I have
included sample English prepositions in each box to give an idea of what the combination
at a certain square indicates; these examples are not to be taken as denitive, especially
since Hjelmslev does not provide examples for all the squares. Where a sample English
preposition is an original example from Hjelmslev, I have indicated this with (h); I have
constructed the other examples either by translating prepositions from other languages
or by compositionally combining English prepositions. For example, in the rightmost
box, on the top, middle square is the notion of \going behind" or in other words the
meaning expressed by the English phrase `to behind'. This value is a combination of
the notions of approach (dimension one), neutrality with regards to touching or being
80
contained (dimension two) and being behind, that is to say subjective (dimension three).
Although this might seem like a complex combination, such a value for a case is supported
by one of the cases in Tabasaran in which the sux -qc -indi expresses this notion (the
non-standard phonetic notation is irrelevant for the purpose here): fu're-qc-indi `to behind
the car' (Hjelmslev 1978:202).
Strictly speaking, the values within a dimension may occur simultaneously in a case.
For example, the nominative case has, according to Hjelmslev, a directional component
which includes the values of going, being and furthering, all at once. Since I am not
concerned with purely grammatical cases, and also because most of the properties I wish
to capture may be explained without assigning multiple values within the same dimension,
I will not investigate these combinations any further.
The main import of this theory is that it organizes the spatial relations found in
the case system of several languages in a structured way. Although Hjelmslev (1935)
intended this classication of cases to apply to other semantic elds, the actual connection
is tenuous; thus, for the above Tabasaran example, the same case would be used for the
PP in `look behind you' but the notion of movement is less clear. The main disadvantage
with the theory is that it is not suciently explicit and there is no formalization of the
various values and dimensions, making classication dicult. For example, the distinctions
between `along', `between' and `through' in the middle column of the middle box seem
arbitrary.
1.6.5 Herskovits' Theory of PP Meanings
Herskovits (1986) proposes a theory for the semantics and pragmatics of locative expressions of which typical examples are the teapot on the table and the man behind the counter.
One of the purposes of the theory concerns the process of mapping into and out of the
meaning representations of such expressions using dierent types of knowledge. I will give
an overview of the theory going into some detail in those areas which will be relevant to
the thesis.
Ideal Meanings, Use Types, Situation Types
Herskovits' theory of spatial PP meaning is centred around the notions of Ideal Meanings.
Prepositions have Ideal Meanings, similar to the way that nouns have prototype meanings.
For example, the ideal meaning of the preposition `in' is (1986:48):
Ideal Meaning of: in
Inclusion of a geometric construct in a one-, two-, or three-dimensional geometric construct.
Each preposition has one Ideal Meaning, and associated with it are a number of Use Types.
Use Types encode the various uses (senses) that the preposition has in addition to its Ideal
Meaning. For example, the following are some of the Use Types of `in', together with an
example of their use (Herskovits 1986:149-55):
Use Types of: in
a) Spatial entity in container: the preserves in the jar
b) Gap/object \embedded" in physical object: the sh in the water
81
etc.
A Situation Type is the result of combining a Use Type with the nouns in an expression
in order to construct a representation of its meaning in predicate argument form. For
example, the Situation Type for Use Type a) above would be:
Situation Type for: the preserves in the jar
A(Included) [Place(Preserves),Interior(Place(Jar))]
where `A' is a tolerance shift which operates over the relation `Included'. The geometric
functions `Place' and `Interior' map objects and places into places and interior volumes
respectively; this follows the idea that real world objects used in spatial expressions are
mapped into geometric objects. A Situation Type is derived through a process of analysis
involving complex interactions between linguistic, object and world knowledge, extensive
use of typical functional and spatial relationships between objects, and notions of salient
parts and possible conceptualizations of objects. Situation Types represent \those aspects
of meaning that are the inalienable property of the expression alone" (Herskovits 1986:4).
That is, they do not take into account the actual context in which the utterance occurs.
Another way of understanding the dierence between Ideal Meanings, Use Types and
Situation Types is to see them as expressing increasingly less abstract information, with
Ideal Meanings as the most abstract representation and Situation Types as the least abstract representation; thus, Situation Types contain the most specic information regarding a spatial scene. Although fairly concrete, Situation Types are still ambiguous in many
respects. Consequently, Herskovits denes Normal Situation Types, which are Situation
Types where objects behave and interact in normal ways. The processes of analysis and
generation in Herskovits' theory map into and out of Situation Types; these mappings
use dierent types of knowledge and assume various relations and functions which are
described below.
Knowledge Types
Information for analysis and generation is organized in the form of a knowledge hierarchy with three levels. The highest level of knowledge deals with general properties about
physical entities: matter, gravity, balance, movement and appearance; for example, it
allows normal interpretations of an expression such as the condition that in `Mary is at
the gate', Mary and the gate can not occupy the same volume. Next there is an account
of the general properties of solids, liquids and gases. For example, the fact that water
can contain objects allows the correct interpretation of `the sh is in the water' (cf. `the
sh is in the net'). The lowest level consists of knowledge regarding objects in the real
world; this knowledge describes the following: shape of an object, its size, gravitational
properties, characteristic orientation, geometric conceptualizations, typical physical context, function, action performed with the object, normal interaction with another object,
and interactionally salient parts. For instance, the characteristic orientation of a table
determines that the expression `the cup on the table' means that the cup is on the top
surface of the table, with this surface supported by its legs.
82
Relations and Attributes
Apart from the above types of knowledge, the theory contains a list of relations and
attributes which appear in the Ideal Meaning of a preposition. This list is divided into ve
classes of concepts and includes examples of prepositions requiring each concept in their
meaning. The ve classes are as follow. 1) Topological: these are properties which encode
features which remain constant under topological transformations; they are often explained
using the rubber sheet analogy: if a gure is drawn on a rubber sheet and this sheet is
stretched in various ways, any property of the gure which remains constant is a topological
property. For example, shape and size are not topological properties but separation is:
two points drawn on a rubber sheet cannot be merged by merely stretching the sheet; after
stretching, the two points remain separated, hence separation is a topological property.
2) Geometrical: these concepts rely on Euclidian geometry for their interpretation. 3)
Physical: concepts in this class involve notions such as gravitation, support and force.
4) Projective concepts involve the idea of projecting a human frame of reference onto an
object. For example, in saying that the ball is in front of the car the notion of front is
projected onto the car based on its shape and normal direction of movement. 5) Metric
concepts involve notions such as small, large, bigger than, equal, etc. Below I show the
part of this list which contains geometric objects and relations (Herskovits 1986:55):
(2) geometrical:
objects
straight line: along, behind, ...
plane (horizontal plane with behind, in front of, ...)
cross axis of a two dimensional strip: across
relations
alignment of points: between, behind, ...
parallelism of lines: along
alignment with direction (e.g. vertical): over, under, ...
orthogonality of lines: to the right/left
When used in a spatial expression, these relations and attributes become part of the
expression's Situation Type. Thus, the physical relation `support' would be involved in
the Situation Type for the phrase `the cup on the table':
A(Support) [Place(Cup),TopSurface(Place(Table))]
Geometric Descriptions
The nal set of concepts in the theory are geometric descriptions. Each of the relations
above takes as arguments regions of space, which are obtained by applying geometric
description functions to the objects in the scene. A Situation Type is constructed from
combining one of the relations in the previous section and a geometric function. For
example, the Situation Type of `the bird is in the bush' is shown below:
Included(Part(Place(Bird)),
(Interior(Outline(VisiblePart(Place(Bush))))))
The geometric functions here are Part, Place, Interior, Outline and VisiblePlace. All
geometric functions have as arguments and as result regions of space, except for the
function Place, which takes spatial entities as arguments and returns regions of space.
83
Herskovits (1986:64) proposes an inventory of geometric functions which map regions of
space into their parts (e.g. edge), their idealizations (e.g. point), good forms (e.g. outline),
adjacent volumes (e.g. surface lamina), axes (e.g. main axis) or projection (e.g. projection
on the ground).
Comment
Although the above theory is one of the most detailed and encompassing descriptions of
the meaning and usage of spatial prepositions, there are serious limitations on its applicability to MT. For example, the system described by Japkowicz and Wiebe (1991), which
embodies some of Herskovits' ideas, shows that the range and quantity of information,
and the sorts of operations carried out on it for the interpretation of spatial expressions,
makes it dicult to extend it beyond prototype domains.
One therefore has to ask whether a theory such as that of Herskovits is the most appropriate one for PP translation. The main issue is that, whereas the goal of the above theory
is to construct meaning representations of spatial expressions, an MT system is concerned
with converting a spatial expression in one language into an equivalent expression in another language. Whether the type of understanding envisaged by Herskovits is necessary
for translation is a contentious issue. There are many reasons for not choosing a full understanding approach to translation and choosing a more limited representation instead;
such reasons were mentioned in the context of interlingua MT systems in Section 1.1.1. I
will add here that in the above theory, whenever a concept or predicate is introduced, its
properties and meaning are dened from a language independent point of view (as in the
case of gravity, properties of liquids, etc.) or from an English monolingual point of view
(as in the case of disallowing point conceptualizations of countries, e.g. at England, but
not of train stations; e.g. PtApprox(Place(VictoriaStation)), Herskovits (1986:67)).
However, what is needed in MT are structures motivated using both multilingual and
monolingual criteria. That is, using those structures or features which are common and
those which are dierent between languages. This in eect rejects the need for language
independent world knowledge if it has no bearing on the linguistic aspects of translation.
For instance, a full analysis as proposed by Herskovits (1986) would involve computing the
distance between the cabin and the lake when interpreting `the cabin at the lake' (Olivier
and Tsujii 1994); but this distance would be irrelevant for translating this phrase into
Spanish because it has not eect on the selection of the appropriate TL preposition.
Another important problem is that of precision. Herskovits' descriptions and explanations of the various types of knowledge, relations and functions is vague. For example, she
suggests that the geometric conceptualization of a road is that of a line, but there is no
mention of how this conclusion is arrived at. In other words, there is no mechanism for
determining whether a noun such as `pavement' should also be conceptualized as a line or
not. Part of the reason for this lack of precision is that Herskovits is not attempting to
give a formal account of spatial expressions but a preliminary survey of the issues involved.
Nevertheless, this leaves in doubt the possibility of a computational account of the theory.
84
1.6.6 Conclusion
The principal shortcomings in all the previous systems and theories can be attributed
to the lack of precision in the description of semantic features and of spatial relations.
For example, semantic markers in Systran are continually created in order to cope with
new disambiguation problems, whilst semantic features in Eurotra are quite sophisticated
but their denitions lack a consistent and precise formulation. While Durand (1992)
attempts to give more precise meanings for his semantic relations, the denitions fall
short of providing suciently independent evidence for their existence (see denition of
place path in Section 1.6.3). Finally, the case system proposed by Hjelmslev (1935) does
not supply the mechanisms by which values from dierent dimensions are combined, whilst
Herskovits' theory, although attractive in principle, is not computationally feasible.
An ideal solution to these problems would have a computationally tractable description
of the semantics of a sentence which would be suciently detailed for translating sentences
appropriately, but which was not so far removed from the linguistic form of a sentence
that translation was reduced to paraphrase. Such a solution would require appropriate
and precise semantics for nouns, verbs and spatial prepositions, a description of the way
these semantic descriptions are to be encoded and used, and a demonstration of how
such encodings would be applied algorithmically to the translation and disambiguation of
spatial expressions. The work reported here is a step towards these goals.
1.7 Overview of the Thesis
The remainder of this thesis is structured as follows. Chapter 2 has as its main purpose
that of motivating the representation used by the transfer module when mapping one language into another. The next two chapters describe analysis, transfer and generation in
that order: Chapter 3 describes the grammars and how they are used in parsing, while
Chapter 4 concentrates on transfer and generation. Chapter 5 develops a classication
of spatial relations and shows how they are incorporated into the MT system proposed.
Chapter 6 gives an example of the complete translation process and then describes how
disambiguation of spatial prepositions takes place by adopting a specic noun knowledge
representation. Chapter 7 describes an evaluation carried out on translation quality and
scalability for the system implemented. Chapter 8 is the conclusion and includes a summary of the main ideas developed in the thesis, together with answers to some possible
objections to this work and suggestions for future research.
85
Chapter 2
Representation for Transfer
The construction of a transfer module can be usefully described as consisting of two devices:
a knowledge representation (KR) formalism and a transfer representation. In this chapter
I will use this division to describe and motivate the transfer representation that I have
developed. First, I elaborate on the KR formalism/transfer representation distinction and
suggest TFSs as the appropriate formalism in which to develop an MT system. Then, I
present a lexically based transfer representation which overcomes a number of diculties
found in other approaches, particularly those based on recursive predicate-argument (PA) structures. The proposed representation satises most of the criteria for an adequate
transfer system as proposed in Section 1.5.
2.1 Separation of KR Formalism from Transfer Representation
One of the main disadvantages of some of the earlier MT systems was the conation of
linguistic and algorithmic knowledge within the same system. An important reason for
this was the lack of a linguistically adequate framework in which to express linguistic facts
separately from procedural information. Thus, transfer rules in Metal mix Lisp function
names with system dened procedures to produce a very unconstrained formalism whose
semantics is ill-dened.
Of the rst large MT projects to follow a strict division between linguistic and algorithmic knowledge, the Ariane-GETA project is probably the largest (Vauquois and
Boitet (1988) and references therein). In this system a special language for the description
of tree-to-tree transformations, Robra, was developed which allowed linguists to describe
transfer mappings without much concern for the way these mappings were executed. This
meant that the algorithm which applied the rules could be modied without requiring
changes in the contrastive knowledge base. Most recent MT systems have adopted a strict
separation between linguistic and algorithmic knowledge. In addition, the latest systems
have witnessed an additional level of separation, this time between a general KR formalism and the linguistic knowledge that it expresses. The formalism is feature-based, which
makes it largely independent of any programming language or particular linguistic theory; in addition, a wide range of linguistic theories and phenomena can be expressed in
it. Eurotra's feature bundles belong to this family of feature-based formalisms. Feature
86
bundles are used in Eurotra to describe analysis, transfer and generation rules and in the
description of structures at dierent representation levels. ELU uses feature structures
to express translation correspondences at a single and unied level of analysis, whereas
in Type Rewriting systems, typed feature structures are used for carrying out analysis,
transfer and generation. From now on, I will only be concerned with the separation of
feature-based formalisms from the linguistic objects that they encode. This separation is
inspired by the comments of Shieber (1987) who argues for a distinction between linguistic
theories and the formalisms used to describe those theories.
Kay (1979) introduced the use of unication into Computational Linguistics. Since
then, the computational and linguistic developments of this operation and associated formalisms have been substantial. Feature-based formalisms were rst successfully applied to
the analysis problem in NLP, and in particular to the syntactic description of languages;
computationally tractable descriptions of topicalization, relative clauses, control, agreement, complementation and other phenomena have been achieved using feature-based or
unication based frameworks (Shieber 1986; Pollard and Sag 1994). Apart from having
their own semantics and therefore being largely implementation independent, featurebased formalisms are declarative (except in certain extensions to the formalism) and oer
a well dened and ecient way for describing linguistic objects (see Section 1.4.10). Another characteristic of their development has been the construction of grammatical theories
which incorporate dierent levels of linguistic analysis within a uniform framework (Pollard and Sag 1994). Grammatical theories of this kind are called sign based (see Section
1.4.10) and normally include information from morphological, syntactic and semantic levels of analysis.
Feature-based formalisms, then, are a well established and developed computational
tool for implementing NLP and MT systems. As part of this family, the TFSs in the
LKB have a base of empirical support for their viability. In Section 1.5 I gave some of the
reasons why TFSs also satisfy other, more theoretical criteria expected of a KR formalism
for transfer MT. For these reasons, the LKB has been adopted as the basic framework for
the system I have developed.
2.2 Representation for Transfer
A KR formalism does not constrain transfer suciently; there must also be a transfer representation which limits the many structures that can be represented with the formalism.
Apart from being compatible with the desirable properties of a transfer system given in
Section 1.5, a transfer representation has to satisfy the following two conditions:
Target Adequacy The representation must contain sucient information for the gener-
ation of translationally equivalent TL sentences. Target adequacy ensures that the
TL sentence generated is a faithful translation of the SL sentence.
Independence The representation of a sentence must not contain information which
cannot motivated on monolingual grounds, as this would pose diculties for the
writters of monolingual grammars.
87
The problem for transfer representations is that these two conditions normally stand in
opposition to one another. In other words, the more adequate a representation is for
target generation, the less independent it is from the TL. A simple example is that of
gender: the translation of `a friend' into Spanish requires a decision regarding the sex of
the person denoted, but such information would normally be missing from a representation
produced by a typical English analyser, or in other words information from the SL will be
insucient for TL generation. This problem of information mismatch is not peculiar to
transfer systems but it is intrinsic to the translation task; in fact, some researchers have
described it as the key problem in translation (Kameyama et al. 1991). In view of this,
the representation to be developed can only be a small step towards a uniform solution.
In the rest of this section I propose a transfer representation which tries to address
the conict between target adequacy and independence. The representation is strongly
lexicalist and leads to most of the properties desirable of a transfer system; it is called
indexed lexeme (IL) lists and consists of lexical signs related to each other by way of
indices. I will rst introduce IL lists. Then I describe the notation that I will use to
represent them and supply their BNF syntax. Finally, I give a formal description of their
construction based on the denition of context-free grammars (CFGs). Justication for
the indices and the structure of IL lists in the context of transfer MT is given in Section
2.3.
2.2.1 Original Motivation for IL Lists
IL lists have their basis in the representational decisions made in two of the systems
introduced in Section 1.4. The rst one is the use of lexical bags in SB in which transfer is
eected exclusively through the bilingual lexicon, after certain variable instantiations have
been carried out on the logical form of an expression. The other decision is that made
in the BCI whereby transfer is eected at the level of QLF rather than at levels further
removed from the surface form of a sentence. Thus, Alshawi et al. (1992:280) argue that
both form and content need to be considered in designing a representation for transfer.
For example, in QLF pronouns are not resolved and determiners are not reduced to their
logical equivalents.
However, one problem with the SB approach is that its operation of variable instantiation is not very well-dened. Another problem is that the actual treatment of function
words and other minor categories is not made explicit nor justied. These problems seem to
indicate that using lexemes for transfer while preserving their inter-relationships through
logical formulae does not lead to a clear denition of a transfer representation. On the
other hand the QLF structure relies to a considerable extent on the analysis grammar,
deriving a complex P-A representation which is recursively structured and from which
TL QLFs are constructed and later used for generation. This representation leads to loss
of independence between the transfer component and the monolingual grammars; it also
leads to other problems which will be discussed below.
88
2.2.2 Indexed Lexemes
If one adopts the SB restriction of eecting transfer at the lexical level only and further
restricts this level to one where the logical form of a sentence plays a less prominent
role (e.g. not representing determiners by their logical form) there emerges a transfer
representation which consists purely of lexical signs associated with variables indicating
the entities related to those signs. To develop IL lists from this emerging representation
additionally involves replacing variables by indices, where an index is dened here as a nonquantied variable or a constant, behaving similarly to Prolog variables and constants with
respect to unication. IL lists are structures isomorphic to the surface form of a sentence,
in which every lexical sign (lexeme henceforth) corresponding to an input word is assigned
a number of indices. These indices correspond to the variables used to denote individuals
in logical formulae; they are based on the following proposals for variable assignment and
structure: Dowty et al. (1981) for variables in predicates deriving from nouns, adjectives
and verbs; Davidson (1967) and Parsons (1990) for event and state variables in verb
predicates; Sondheimer (1978) for place object variables in preposition predicates (I will
largely ignore the latter until Chapter 5); in addition, function words such as articles,
complementizers and case markers are co-indexed to the lexeme or lexemes on which they
syntactically depend. One may think of indices as corresponding to both, universally
and existentially quantied variables, with the distinction assumed not to be relevant for
translation purposes, and therefore ignored in ILs. Furthermore, for the purposes of this
thesis, issues of scope will be largely ignored, thus further simplifying the representation
of determiners and other scoping elements. Nevertheless, it should be stressed that the
goal here will be to construct as theory neutral a representation as possible and use it to
develop solutions to a range of translation problems, trying to avoid problems introduced
by a particular theoretical framework. Approaches to translation somewhat similar to
IL lists have been proposed recently: Indexed Logic (Phillips 1993), Lexical Relation
Graphs (Alshawi 1994) and Minimal Recursion Semantics (Copestake et al. 1995). These
proposals point in the direction of an MT representation incorporating aspects of surface
form, syntactic relations and event-based P-A structure not normally available in a single
representation.
By way of introduction, consider the simplied IL list for `a boy runs' which is shown
below in Prolog list notation (each predicate in fact corresponds to a lexical TFS):
[a(X), boy(X), runs(E,X)]
The signicant aspects of this representation are that it is a list of items directly corresponding to those in the original sentence, that the indices of the determiner (i.e. X) and of
`boy' are shared, and that this index is bound to the second index of `runs', indicating that
the individual associated with `the boy' is the subject of the verb. The rst index of `runs'
corresponds to the running event, while the second index is associated with the individual
`the boy'. This representation diers from more standard FOL or P-A representations in
that IL lists do not have explicit quantiers nor logical operators, and that the ordering
of the input sentence is preserved. In addition, IL lists are linear structures, while P-A
structures are recursive; thus, as used here, ILs include no scoping information (scoping
mechanisms such as those of Copestake et al. (1995) could be added, however). A more
detailed comparison between ILs and P-A representations will be presented in Section 2.3.
89
Before dening IL lists more formally, consider another example. The IL list for `the
woman that John saw gave a sweet to the boy' is:
[the(A), woman(A), that(A), john(C), saw(B,C,A), gave(D,A,E,F), a(E), sweet(E), to(D,F),
the(F), boy(F)]
This example shows how closely IL lists mirror the input sentence. Particularly relevant
here are `that' and `to': the former indicates the relativized entity `the woman', while
the latter represents an implicit dative relationship between the event of giving and the
recipient of that giving. From now on I will use a more concise notation for IL lists in which
lexemes are subscripted by their indices, rather than taking indices as their arguments.
Thus, the notation for the above two examples would be, respectively:
[ax, boyx , runse;x ]
[thea , womana, thata , johnc , sawb;c;a, gaved;a;e;f , ae, sweete , tod;f , thef , boyf ]
This notation reinforces the fact that there is no quantication over indices nor explicit
scoping over predicates. In some cases I will forgo the brackets and commas to improve
readability.
2.2.3 Formal Properties of IL Lists
An IL list has two purposes. Firstly, it indicates which lexemes in the SL are associated
with which SL individuals; secondly, it makes it possible to relate the source and target
IL lists through the bilexicon. These two purposes dictate the formal properties of the IL
formalism.
The syntax of IL lists is fairly straight forward; it consists of a list of lexemes and
associated indices. In BNF notation, the IL language is:
IL-list ?! [ Indexed-lexemes ]
Indexed-lexemes ?! Indexed-lexeme j Indexed-lexeme , Indexed-lexemes
Indexed-lexeme ?! Lexeme ( Indices )
Indices ?! Index j Index , Indices
Lexeme ?! the j boy j runs j that j to ...
Index ?! a j b j c j ... j 1 j 2 j 3
For the abbreviated notation one would replace the production `Indexed-lexeme' with:
Indexed-lexeme ?! LexemeIndices
In the `Index' production, letters denote variables while integers denote constants; this
distinction is necessary for modelling the eects of variable instantiation required by SB.
That is, variables can unify with other variables or with any constant, whereas constants
can only unify with a variable or with an identical constant.
A very important property of the IL representation is that every lexeme must be
connected to every other lexeme. By connected I mean the following: if two lexemes share
one index, then they stand in a connected relation; furthermore, the connected relation
is transitive. In other words there must be a chain of (possibly distinct) paired indices
between any two lexemes in the IL list. What this is intended to capture is that an indexed
lexeme cannot be unrelated to the rest of the sentence. For example, an IL list such as:
90
[thea , boya , runse;b ]
is an invalid IL-list because `runs' is not connected to `the' nor to `boy' since it does not
share any indices with them. On the other hand the IL list
[ax, boyx , runse;x ]
is valid because `a' is connected to `boy' and to `runs' through index x. The linguistic justication for this restriction is that every word in a sentence must in some way contribute
to the overall linguistic content of the sentence. However, for transfer, this content need
not be made explicit; instead, transfer requires explicit descriptions of two relations: that
which exists between lexemes and that which exists between entities. The former is encoded through co-indexing dierent lexemes, while the latter is encoded through the same
lexeme being indexed by dierent indices. Thus, in the IL list above `the' and `boy' are
related lexemes by sharing index x, while `runs' establishes a relation between the event e
and the individual x. These relations may be contrasted with those necessary for semantic
representations in which notions of scope, entailment, equivalence, truth and referentiality
are much more important and therefore require additional structures.
Although IL lists constitute a language which is a strict superset of the CF languages
(e.g. IL lists can encode unbounded dependencies) I will give a construction of IL lists
based on CFGs in order to show how they are derived. However, analogous constructions can be established for indexed grammars and related formalisms. For any CFG
(N; T; P; S ), where N is the set of non-terminals, T is the set of terminals, P is the set of
CF productions and S is the start symbol, the IL list representations of the sentences in
the corresponding language are constructed by appending the IL list of daughters in a rule
and assigning the resulting IL list to the mother category (this description is formalized
below). An example construction for `the boy' would be:
NP [thex, boyx]
"b
b
"
Det
N
[thex] [boyx]
This process is identical to the construction of orth values in sign based approaches
to grammar (Pollard and Sag 1987); I will explain the precise implementation of this
construction in the next chapter.
To ensure that the resulting IL list is connected, it is necessary to associate a distinguished indexed lexeme with every element of the alphabet N [ T . For terminal symbols
this will be the terminal symbol plus indices; for non-terminals it will be the distinguished
indexed lexeme of one of its daughters in a production. In addition, the distinguished
lexemes within the daughters of every rule must be connected. For example, the following rule makes the distinguished lexeme of the mother N1, dist, the same as that of the
daughter N1:
N1
dist= 0 Nx
=) AP
N1
dist=Ax dist= 0
Furthermore, the rule co-indexes the distinguished indexed lexeme of AP with that of the
N1 phrase. Distinguished lexemes correspond to the semantic heads of a phrase; I will say
more about this later.
91
To conclude this section I formalize the construction of IL lists and give some related
denitions. Given a CFG backbone (N; T; P; S ), the IL lists it generates may be dened
with the aid of the three additional functions: Distinguished-IL which returns the distinguished IL of an element of N or T ; IL-list which returns the IL list of an element of N
or T ; Append which takes two IL lists and returns their concatenation. Distinguished-IL
is dened as follows: for all t 2 T , Distinguished-IL(t) = t0, where t0 is terminal symbol
t plus indices; also for all rules n ! in P , Distinguished-IL(n) = Distinguished-IL(i)
where = :::i::: Then, IL-list is dened as follows: for all t 2 T , IL-list(t) = [t0]; also,
for all n ! in P , IL-list(n) = Append(IL-list(1),Append(..,(Append(IL-list(l?1),ILlist(l))))) where = 1:::l.
The relation Connected is dened thus: for any two ILs, p and q, Connected(p,q) is
true if there is at least one index in p which is identical to an index in q; in addition, if
Connected(p,q) and Connected(q,r) then Connected(p,r). Now, for every rule n ! 1:::l
in P , it must be the case that Connected(Distinguished-IL(i ),Distinguished-IL(j )) for
all 1 i; j l.
That the above denitions of Distinguished-IL, IL-list, Append and Connected yield a
connected IL list can be demonstrated by noting that a) the distinguished ILs of daughters
are connected, b) the distinguished IL of a mother is the same as that of one of its
daughters, c) that the connected relation is transitive and therefore any distinguished
ILs connected within a rule will also be connected to the IL lists comprising each of the
daughter categories.
2.2.4 Transfer and Generation with IL lists
To use IL lists in translation, the SB techniques for transfer and generation have been
adopted. Transfer involves no structural transformations at all; instead the bilexicon
describes correspondences between sets of ILs in the SL and TL. Based on the bilexicon
the transfer algorithm converts a SL IL list into a TL bag. Generation involves the
reordering of the TL bag into a string licensed by the TL grammar. Although the details
of transfer and generation will be discussed in depth in Chapter 4, I will give a simple
example of translation through IL lists in order to supply sucient background for the
comparisons to follow.
1. Assume as input the string
The young boy loves the little dog.
2. Analysis produces the IL list:
[the1 , young1 , boy1 , loves2;1;3, the3 , little3, dog3]
In this IL list the indices have been instantiated to unique constants such that
dierent constants block incorrect co-indexing during generation.
3. The bilingual lexicon has the following equivalences:
fthex g , felx g
fyoungxg , fjovenxg
92
fboyxg , fni~noxg
flovesx;y;z g , famax;y;zg
flittlex , dogx g , fperritox g
Note that `little' may be an indenite distance from `dog' in the input IL list.
4. Using these translation equivalences, the transfer stage produces the TL bag (note
curly brackets for bags, square brackets for lists):
Spa: fel1, joven1, ni~no1, ama2;1;3, el3 , perrito3 g
5. This representation is used as input to the generator which reorders the Spanish ILs
to give:
El ni~no joven ama el perrito.
Note also that the generator has as input all the lexemes that need to appear in the TL
sentence and that structure in the transfer representation is kept to a minimum.
2.3 The Role of IL Lists in Transfer
Having introduced the IL list representation I will now outline the motivation for index
assignment and sharing, and justify the linear structure of IL lists. The motivation for
index assignment lies in logical representations of sentence meaning and in particular
in the number, position, and binding pattern of arguments. As for the structure of IL
lists, their linearity avoids problems with recursive descent transfer algorithms, including
diculties arising from head switching and discontinuous constituents, while their surfacelike form preserves surface information which is relevant to translation. As the description
progresses I will outline the advantages of IL lists over alternative representations. Section
2.3.1 considers index assignment, while Sections 2.3.2 to 2.3.14 concentrate on the structure
of IL lists by relating and comparing it with that of other representations, including eventbased semantics, P-A structure, f-structure, FOL and Intensional Logic.
2.3.1 Indices in IL Lists
As I said earlier indices correspond to variables ranging over events, objects or place objects. I will now explain how this correspondence is established. Consider events rst,
which correspond to the indices assigned to verbs. Davidson (1967) analyses action sentences by treating events as individuals appearing as an extra argument in a verb predicate.
The justication for this treatment is based on a number of observations including: a) linguistic reference to events through pronouns, b) the multiplicity of adverbial modication
in events, c) the logical entailments that are permitted by sentences containing events
(Castaneda 1967). For example, Davidson (1967:93) oers the following representation:
Eng: I ew my spaceship to the Morning Star.
Davidson: 9e(Flew(I,my spaceship,e) & To(the Morning Star,e))
93
where variable `e' stands for the event of my ying my spaceship to the Morning Star.
One feature of this analysis is that, for sentences such as `I did it slowly', it explains what
kind of entity is referred to by pronouns such as `it'. This treatment of events also allows
formal descriptions of inferences such as:
I ew my spaceship to the Morning Star ) I ew my spaceship
through and-elimination. Event analyses permit multiple adverbial modication without
invalidating such entailments. Thus, the addition of another PP such as `in the year 2050',
or indeed of an adverb, would not induce changes in the logical entailments permitted
by the sentence. For example, modication by `recently' leaves the above entailment
unchanged:
9e[Flew(I,my spaceship,e) & To(the Morning Star,e) & Recently(e)]
Parsons (1990:186-206) further argues for treating states as individuals. Thus, the following sentence receives the analysis shown:
Eng: Brutus is under the tree.
Parsons: 9s[Under(s,the tree) & Subj(s,Brutus) & Holds(s,now)]
Such state variables correspond to indices in state and copula verbs in IL lists.
Parsons (1980, 1990) extends the notion of an event modier to participants as well
as adverbial modiers. Thus, instead of incorporating arguments into the verb predicate,
Parsons relates them to the event variable via thematic roles. Thus, he gives the following
sentence the analysis shown (I include Davidson's analysis for comparison):
Eng: Brutus stabbed Caesar.
Parsons: 9e[Stabbing(e) & Agent(e,Brutus) & Theme(e,Caesar)]
Davidson: 9e[Stabbing(e,Brutus,Caesar)]
Parsons argues that his analysis is more adequate than Davidson's incorporation analysis
for sentences with non-existent participants. In the case of objects he notes that Davidson's
translation of `Brutus stabbed' would be:
Davidson: 9e9y[Stabbing(e,Brutus,y)]
which he argues is incorrect because it can only be true if Brutus stabbed something,
when in fact the sentence is also true if Brutus missed Caesar and stabbed nothing. By
contrast, Parsons' analysis involves omiting the `Theme' role to represent this sentence,
and thus requires no assumptions about the existence of a stabbed object. Justication
for the `Agent' predicate relies on the description of unreal situations such as dreams,
in which stabbings can take place without having been performed by anyone; omission
of the Agent role in Parsons' analysis results in an appropriate formula. However, there
are problems with this sort of reasoning (Parsons 1990:297 note 25) including the fact
that interpretation of unreal situations actually presupposes real situations in which all
stabbings have an agent. Nevertheless, Parsons idea of treating event participants as
modiers will be used to justify certain types of IL lists.
The variables found in the FOL translation of nouns provide the basis for common noun
indices. Following Dowty et al. (1981), a transitive sentence would receive the following
translation into FOL:
94
Eng: A man loves a woman.
FOL: 9x9y[man(x) & woman(y) & love(x,y)]
This FOL formula represents the truth-conditions of this sentence; the sentence is true if
there is a man who loves someone, and that someone is a woman. These facts are indicated
by the predicates acting over both variables. One of the reasons for representing `love' as
a two place relation is that such a representation can express the similarity in meaning
that exists between active sentences and their corresponding passives. Thus, the passive
of the above sentence has a truth-conditionally identical FOL representation:
Eng: A woman is loved by a man.
FOL: 9x9y[man(x) & woman(y) & love(x,y)]
Here, the fact that transitive verbs are represented as two place relations motivates the use
of corresponding indices in the representation of such verbs. The indices of common nouns
are also motivated by this type of formula: the index associated with a noun corresponds
to the variable appearing in its predicate.
In standard FOL proper names are translated as constants. However, this treatment
does not form the basis for the representation of proper names in IL lists. Instead, proper
names are assigned indices just as if they were represented as predicates over variables, as
in:
Eng: Sam sleeps.
FOL0 : 9x[Sam(x) & sleep(x)]
This treatment draws support from that adopted in Situation Semantics. As Barwise
and Perry (1983:165-67) note, names are not unique since there are many people called
`Sam' for example; in addition, proper names can be used as common nouns, as in `all
Sams must go to the rst desk'. Barwise and Perry argue that the use of a name implies
an associated property of being an entity with that name. Devlin (1991:225) gives the
following representation of a proper name in his version of Situation Semantics:
r j= named,p,Jan,1
The important feature of this expression is that an individual p enters in the `named'
relation with the name `Jan'. This approach is taken up by Pollard and Sag (1994:27) for
their representation of proper names. Thus they give the following FS for the background
value (that is, the \conditions on anchors [i.e. assignment functions] that correspond to
presuppositions or conventional implicatures") for the name NP `John':
"
relation = naming
bearer = 1
name = John
#
Here 1 is the parameter to which the anchor assigns an individual. A simplied version
of this representation in which no `naming' relation is assumed forms the basis for the ILs
of proper names.
The following example summarizes the assignment of indices just discussed.
Eng: Mary loves a man.
IL list: [Maryx, lovese;x;y , ay , many ]
95
Thus, `Mary' is assigned an index corresponding to the parameter in the FS above; `loves'
has indices corresponding to events and the parameters and variables associated with
`Mary' and `man'. The representation of `a' and other word classes and constructions will
be discussed subsequently.
Before concluding this section, I will give reasons why truth-conditional and modeltheoretic representations are not very appropriate for translation. FOL, Intensional Logic,
Situation Semantics and related theories that model the structural and logical properties
of a sentence do not seem to be the most relevant for transfer. That is to say, a close investigation of their properties would not lead to a signicant clarication of the problems
of concern in this thesis. Thus, I agree with Rupp et al. (1992:196-97) who argue that
model-theoretic semantics fails as a representation for transfer principally because it does
not take into account the informational content of an expression; they note that formulae
in Intensional Logic, if taken seriously, fail to distinguish between the interpretation of
logically equivalent sentences. In addition, the notions of extension and intension in such
approaches to meaning do not address the problems found in MT. For example, the extension of `blue' in English, normally associated with all the individuals which are blue, does
not help in the task of transfer: such an extension is not matched by a similar extension
in Russian where there are two dierent adjectives neither of which corresponds directly
to the meaning of `blue'. Furthermore, this extension does not give any help regarding a
decision on how to translate `blue'. A well-dened interpretation is a useful property for
making explicit the meaning of a formal language, but it is of little help in selecting the
appropriate translation of `blue'; for this task what is important is the sense of blue used
in the metalanguage in which model-theoretic denitions are made. That is not to deny
the merits of formal and explicit theories of meaning but rather to argue that the goals of
such theories are not always in correspondence with the goals of MT.
2.3.2 Determiners
In this section I justify the representation of determiners in IL lists. Consider the following
sentence and its FOL and IL list representation:
Eng: Every man talks.
FOL: 8x[man(x) ! talk(x)]
IL-list: [everyx , manx, talkse;x ]
The two representations dier mainly in their treatment of `every': while in the FOL
formula `every' has been translated as a universal quantier 8 together with the implication
logical connector (!), the lexeme of the determiner has been left unchanged in the IL list.
This representation of determiners is an important characteristic of IL lists. The relevance
of this representation for translation may be appreciated from the following example:
Eng: Every man talks.
All men talk.
FOL: 8x[man(x) ! talk(x)] 8x[man(x) ! talk(x)]
IL-list: [everyx , manx, talkse;x] [allx, menx, talke;x]
Spa:
Cada hombre habla.
Todos los hombres hablan.
While the truth conditions of the Spanish translations are the same, the intuition is that
neither is interchangeable with the other. It is clear from the FOL formulae that information relevant to translation has been lost. This is evident from the fact that there is a
96
many-to-one mapping from English into FOL during which the lexical form of the determiner is lost. By contrast, the IL list retains the determiner and therefore has sucient
information to construct an appropriate translation. Morphological information, which I
have emphasised by leaving each lexeme in inected form, is also preserved in the IL list.
It would be possible to rene the denition of `all' and `every' in the logical formula
to reect the relevant meaning dierences and hence avoid the loss of information just
described. From a monolingual point of view this would be advantageous since it would
capture subtle distinctions in interpretation. However, as far as translation is concerned,
the benet gained thereby would be to approximate the surface form of the SL sentence.
In IL lists, the indices associated with nouns can correspond to a single object, a group
of objects, the denotation of a mass noun or some other noun denotation. In other words,
the important property for translation is that `man' and `talk' have a common index. The
position of an index is also important. In this example, the noun index occupies the rst
position after the event in the verb IL, which means that it is associated with the `talker'
in the corresponding event.
Comparing the IL list representation of determiners with both the representation used
in LFG transfer and in the BCI shows interesting similarities. In LFG transfer, determiners
are represented in f-structure by a predicate very similar to the lexical item from which
they are derived; in addition, their scope within a sentence is not disambiguated. Thus
the f-structure for `every student' would be:
"
pred = student
num = sg
spec = pred =
#
every In this respect there is agreement between f-structure and IL list representations. An
analogous situation is found in the BCI where the QLF for `every student' is:
qterm(<every>, x, student(x))
This is a qterm (unresolved quantied term) in which a structure corresponding directly
to the determiner is included and where no logical translation nor scope disambiguation
has been performed.
2.3.3 Argument Switching
Sometimes it is the case that the subject in one language corresponds to the object in
another language and vice-versa. The solution to this problem in a hypothetical logicbased MT system is essentially similar to the IL list solution and therefore I will only
describe the latter. Argument switching is handled by simply changing the position of
indices in the bilexical entry of verbs which require this switching. For example, in the
sentence below the subject in English actually corresponds to the object in the Spanish
sentence:
Eng: The boy likes the girl.
Spa: La ni~na le gusta al ni~no.
The entry for `likes' in the bilexicon is (omitting the clitic le and the accusative marker a
for clarity):
97
likese;x;y , gustae;y;x
This indexing, in conjunction with the transfer and generation algorithms, ensures appropriate grammatical relations and word ordering for the Spanish translation. The role of
each participant in the event is dened by its position in the verb's IL as dened by its
predicate in FOL representations. The exact nature of this role does not aect translation
and therefore it is left implicit.
2.3.4 Passives
Consider now passive sentences. Event based semantics and FOL representations of passives are normally equivalent to the respective expressions for actives, Partee (1976b:6566), Halvorsen (1983:590), Parsons (1990:92). By contrast the IL list representation of
passives is dierent from that of corresponding actives.
Eng:
John kicked Pluto.
Parsons: 9e[Kicking(e) & Agent(e,John) &
Theme(e,Pluto)]
FOL:
kick(john,pluto)
IL-list: [Johnx , kickede;x;y , Plutoy ]
Spa:
John pateo a Pluto
Pluto was kicked by John.
9e[Kicking(e) & Agent(e,John) &
Theme(e,Pluto)]
kick(john,pluto)
[Plutoy , wasf;y;e , kickede;x;y , bye;x , Johnx ]
Pluto fue pateado por John.
In this example the many-to-one mapping into event-based formulae and FOL leads once
more to a loss of information with respect to the translation of the sentence into Spanish;
this loss is avoided in the IL list by preserving the copula `was' and the preposition `by'.
The representation of `was' and `by' requires some comment. Including both lexemes
avoids one source of ambiguity during translation. To appreciate this, imagine that the
English analyser constructed the FOL representation:
kick(john,pluto)
This could have arisen either from a passive or from an active sentence. Now, a Spanish
generator would normally construct at least two sentences from this representation, one
active and one passive (that two sentences are generated can be justied on grammar
reversibility grounds). Thus, from a single English sentence, whether active or passive,
two Spanish sentences would be constructed, when intuitively, and as far as the IL list
representation is concerned, only one translation is appropriate. This example and others
like it justify the retention of `was' and `by'. Note that although the justication is a
consequence of the purpose of the representation, namely transfer based MT, each lexeme
is trivially motivated on monolingual grounds alone by the presence of the lexeme in the
input string. As far as `was' is concerned, there are certain semantic distinctions between
active and passive sentences which additionally support a treatment of actives and passives
with dierent representations. Bennett (1976:144) notes that intuitively 1) and 2) below
are equivalent but that 3) is not equivalent to 1):
1) Mary voluntarily loves John.
2) John is loved voluntarily by Mary.
3) John voluntarily is loved by Mary.
98
Sentence 1) is about a voluntary act by Mary, whereas 3) states that John voluntarily
is the object of Mary's love. Intuitively it seems that in 3) `voluntarily' modies the
event denoted by `is loved', whereas in 1) it only modies `loved'. This intuition can be
represented using an index to stand for the event of `is loved', as depicted below; a similar
index is included in the lexeme `was' in the previous example.
[Johnx, voluntarilyf , isf;x;e , lovede;x;y , byf;y , Maryy ]
As far as `by' is concerned, its index assignment is analogous to the association of
arguments in what Dowty (1989:83) calls a \Neo-Davidsonian System" of thematic roles
and which was described above in the context of Parsons' treatment of event participants.
Under Parsons' account `Agent' and `Theme' are thematic roles in English. Since in the
IL list notation only lexemes arising directly from surface entities are included, predicates
such as `Agent' and `Theme' do not correspond to anything in IL lists. However, Parsons'
treatment of associating participants with events through explicit relations is analogous
to the IL `byf;x', which indicates that the participant in a passive event f is related to an
individual corresponding to the index x. This primitive discussion of passives suces to
demonstrate the dierences between IL lists and logic-based representations.
2.3.5 Dative Verbs
One more example of information loss is incurred by standard treatments of dative verbs.
Consider the example below, which compares Parsons' treatment of datives with IL lists.
Note that a similar situation arises with standard FOL treatments of datives such as that
of Bennett (1976:137-38) where dative shift is eected by a meaning preserving transformation.
Eng:
Parsons:
IL-list:
Eng:
Parsons:
IL-list:
He teaches German to the boy
9e[Teaching(e) & Agent(e,He) & Theme(e,German) & Goal(e,the boy)]
[hex , teachese;x;y;z , Germany , toe;z , thez , boyz ]
He teaches the boy German
9e[Teaching(e) & Agent(e,He) & Goal(e,the boy) & Theme(e,German)]
[hex , teachese;x;y;z , thez , boyz , Germany ]
The property to note here is that Parsons' formulae are essentially the same, indicating a
loss of surface information. Although for the purposes of reasoning such a loss may not be
important, for translation it is relevant. For example, consider the following translation
equivalences between English and Hungarian, taken from Zsilka (1967:55-57).
Eng: He teaches the boy German.
Hun: Tantja a u-t nemet-re.
Eng: He teaches German to the boy.
Hun: Tantja nemet-et a u-nak.
It is clear that each English sentence gives rise to dierent Hungarian sentences. However,
a logical representation such as Parsons' above would not be sucient for rendering the
appropriate translation because there is no way of telling from the formula alone whether
the original English sentence was dative shifted or not. By contrast, the IL list contains
sucient information for correct translation since the representation distinguishes between
99
the two dative sentences. Just as with passive uses of `by', `to' indicates an implicit case
relation which is manifest in the surface form of the sentence.
LFG and IL lists coincide in that they both preserve surface information such as the
dative marker `to'. Thus, Kaplan and Bresnan (1982:177, 179) assign dierent f-structures
to the two sentences below (boxes indicate that the complete structure is not shown):
1)
A girl handed the baby a toy.
2
3
subj = a girl
6 tense = past
7
6
7
6 pred = `hand[(" subj)(" obj2)(" obj)]' 7
6
7
4 obj = the baby
5
obj2 = a toy
2)
A girl handed a toy to the baby.
2
3
subj = a girl
6 tense = past
7
6 pred = `hand[(" subj)(" obj)(" to obj)]' 7
6
7
6 obj = a toy
7
6
7
4
5
pcase = to
to = obj = the baby
The treatment of the dative object in the value of pred in the f-structure on the right
is not to be confused with that of the direct object in the Spanish f-structure which I
showed on page 36. Based on these f-structures, the system of Kaplan et al. (1989) could
construct unique and distinct Hungarian translations for dative sentences.
2.3.6 Adjectives
Considering intersective adjectives rst, their logic-based representations is that of a predicate on the variable of the noun they modify (Dowty et al. 1981:110, 144); in the IL list
formalism, the representation is similar in that adjectives are co-indexed with the modied
noun. For example:
Eng: A black cat sleeps.
FOL: 9x[black(x) & cat(x) & sleep(x)]
IL-list: [ax, blackx, catx , sleepse;x ]
While in principle there is sucient information in the Montague formula and in the IL list
for correct generation of the surface form, many computational representations of conjunctive expressions involve recursive structures which increase the complexity of the transfer
component. I will use QLF representations for the next example but the same problem
would aect other systems whose transfer component recursively traverses the transfer
representation (e.g. Metal, Eurotra, ELU, Type Rewriting). Take the following simplied QLF representation of the NP `female English teachers', shown in Prolog notation to
emphasize the computational character of the problem:
[and, [female,X], [and, [english,X], [teachers,X]]]
Translation of this phrase into Spanish would require the pairing `female' and `teachers'
in one QLF translation rule to give maestras, and a dierent rule for `English' to give
inglesas:
trans([and, [female,X], [teachers,X]] <=> [maestras,X])
trans(english <=> inglesas)
However, these rules would not work in the appropriate way for the above example since
the predicate `English' stands between `female' and `teacher'. The problem arises because
100
recursive transfer algorithms cannot easily cope with discontinuities in the transfer structure. The transfer algorithm for IL lists to be described in Section 4.1.1 handles examples
like this by making no distinction between continuous and discontinuous transfer units.
A somewhat related problem arises when multiple adjectives have to occur in a particular order in a noun phrase. Again I use the BCI to describe the problem but this is
because it has the most explicit description of a transfer representation; the same problem would arise with other representations if worked out in sucient detail. Consider for
example the following NPs:
Eng: The erce black cat.
Spa: El gato negro y feroz.
(erce `feroz', black `negro', cat `gato'). In this translation the linear ordering of the adjectives is dierent, and more importantly, not alterable without slanting the grammaticality
of each sentence. An analysis component would, under standard assumptions, construct
dierent QLFs for each of these sentences. In particular, the trees encoding the modication of the respective nouns would not be isomorphic. Now, a system in which transfer
mapped a QLF formula in the SL into a QLF in the TL would have problems translating between these two phrases because although the two QLFs have equivalent meanings,
their logical forms are not identical. Possible representations constructed by the respective
analysers might be:
Eng: qterm(<the>, X, [and, [fierce, X], [and, [black, X], [cat, X]]])
Spa: qterm(<el>, X, [and, [gato, X],
[and, [negro, X], [feroz, X]]])
It should be clear that a mapping between these two QLFs will either involve structural
transformations, or require the SL analyser to construct a QLF that resembles the representation constructed by the TL analyser, or involve a generator which could take proper
account of the logical equivalence of syntactically dierent logical expressions. Each of
these options is problematic: the rst one substantially increases the complexity and nonperspicuity of the transfer component; the second one diminishes the modularity of each
monolingual component; the third option is a special case of the logical equivalence problem (Shieber 1993) to be considered in Section 2.3.14. In the IL list approach structural
dierences are not problematic because the generation component treats the output of
transfer as a bag of ILs rather than as a recursive representation.
Non-intersective adjectives such as `clever' are discussed by Parsons (1990:43-44) among
others. Parsons notes that the following inference is not valid:
John is a clever teacher & John is a parent 6) John is a clever parent
From this he concludes that clever should not be treated as an intersective modier, as
such a treatment would allow the above inference. Instead he suggests that the meaning
of `clever' is always `clever for an F', where F is contextually supplied as the standard
of cleverness. I will adopt a representation based on this analysis by using an additional
index in the adjective corresponding to this contextually determined standard. Thus, the
following are translationally equivalent IL lists:
Eng. IL-list: [Johnx , ise;x;y , ay , cleverf;y , teachery ]
Spa. IL-list: [Juanx , ese;x;y , uny , maestroy , inteligentef;y ]
101
Intensional adjectives such as `former' have a similar representation but this time the
additional index in the adjective's IL stands for the temporally restricted intension of the
noun it modies (Dowty et al. 1981:163):
Eng. IL-list: [thex , formery;x, teacherx ]
Spa. IL-list: [elx, ex-y;x , maestrox]
2.3.7 Copulas
The indexing of copula verbs such as `be' is based on the proposal by Parsons (1990)
who suggests the introduction of individuals which denote states. However, the format of
IL lists diers from the corresponding predicates suggested by Parsons. The dierences
between the two approaches can be illustrated with the following example.
Eng: Brutus is under the tree.
Parsons: 9s[Under(s,the tree) & Subj(s,Brutus) & Holds(s,now)]
IL list: [Brutusx , iss;x;r , underr;s;y , they , treey ]
The place index r will be motivated in Section 5.1.3. The main source of discrepancy is
that the verb `is' has a corresponding IL whereas there is no such predicate in Parsons
formula. On the other hand, the relation `Holds', indicating that a state holds at a given
time, is not present in the IL list, although the same information is present in the tense
of the lexeme `iss;x;r'. Including `is' in the IL list discriminates between PP and relative
clause modication of nouns. Thus, the representations of:
The dog under the tree sleeps.
The dog which is under the tree sleeps.
will be distinct and therefore each sentence will translate dierently in cases where the TL
allows both structures.
2.3.8 Relative Clauses
Dowty et al. (1981:211-15) describe a treatment of relative clauses in which relative clauses
are interpreted as intersective modiers of the noun and use this result as the argument
to determiners in order to eect quantier scoping. A similar approach is adopted by
Parsons (1990:301 note 6), for his event-based meaning representations. The IL list treatment of relative clauses is not explicit as to whether the relative clause is an intersective
modier or not, nor does it resolve the scoping domain of determiners. For example:
Eng: A man who(m) John saw died.
FOL: 9x[[man(x) & saw(j,x)] ! die(x)]
IL-list: [ax, manx, who(m)x, Johny , sawe;y;x, diedd;x ]
The main new IL in this example is `whomx' the index of which is associated with the
relativized noun. This indexing is based on the syntactic and semantic role that a number
of grammatical theories such as those of Gazdar et al. (1985) and Chomsky (1981) assign
to this preposed pronoun, namely that of gap ller. The pronoun is co-indexed with the
verb index corresponding to the missing complement in order to reect this analysis.
102
Including the relative pronoun in the IL list overcomes one type of ambiguity in the
same way that `was', `by' and `to' did. The following example from Norwegian exemplies
this. In both Norwegian and English relative pronouns may be omitted from relative
clauses:
Eng: The man (whom) John saw died.
Nor: Mannen (som) John sa dde.
That is, the Norwegian translation can approximate the use of a pronoun in English. This
approximation is achieved in IL lists by preserving the relative pronoun in the transfer
representation; its main advantage lies in the translation of sentences in which a pronoun
is preferred on criteria other than syntactic or semantic. For example, long sentences seem
to read more easily when a relative pronoun is used:
The man I thought Mary wanted to play the piano with died.
vs.
The man who(m) I thought Mary wanted to play the piano with died.
One slight diculty with this analysis is that an IL list such as:
[thex , manx, thatx , talkse;x, laughsd;x]
would result, if used for generation, in the two sentences:
1) the man that talks laughs
1) the man that laughs talks
This problem can readily be overcome by including the index of the relative clause verb
in the IL of `that', to give:
[thex , manx, thatx;e, talkse;x, laughsd;x ]
The linear structure of the IL list can also simplify the translation of sentences in which
the structure of the relative clause is not best analysed as embedded in the main clause.
Consider for example the following sentence and its translation into Hindi, a language with
large numbers of speakers in northern India (Keenan 1985:164):
Eng: I saw the man whose dog is sick.
Hindi: Jus
a:dmi ka kutta bema:r hai, us a:dmi ko mai ne dekha
Gloss: corel man gen dog sick
is, that man do I
erg saw
Lit: Which man's dog is sick, that man I saw.
As the literal translation suggests, the interesting property of the Hindi sentence is that
the relative clause does not appear to be subordinated to the noun it modies. Such
constructions are called corelatives (corel) and their particular feature is that what in
English would be a NP plus relative clause followed by a VP, becomes a sentential structure
followed by a main clause. Keenan (1985:164) suggests that the structure of corelatives
can be depicted as:
S
XX
S
rel
XXXX
X
XX
X
Smain
!aa
!!
a
corel nprel npana 103
where corel marks the noun within Srel which is modied by the relative clause. Now,
a compositional construction in LFG of the Hindi corelative's f-structure could be represented, in a drastically simplied form, as:
"
rel =
main =
which man's dog is sick
that man I saw
#
By contrast the f-structure of its English counterpart would be:
2
subj = I
6 pred = saw
"
6
6
head =
4
obj =
mod =
3
the man
whose dog is sick
7
#7
7
5
The problem with eecting transfer between these two structures is similar to the head
switching problem of Section 1.4.4: the root structure in the English f-structure (the
main clause) is not the root structure in the Hindi f-structure (that corresponding to
the S node in Keenan's tree); this situation can lead to doubly rooted f-structures when
sentences appear as sentential complements. The way this problem is overcome in the IL
list representation will be described after considering sentential complements.
2.3.9 Sentential Complements
Parsons (1990:17) analyses certain kinds of sentential complementation as involving two
events, one for the main clause and one for the complement clause. This treatment forms
the basis of the IL representation for this type of sentence. For example:
Eng: Mary saw Brutus stab Caesar.
Parsons: 9e[Seeing(e) & Subj(e,Mary) & 9e0 [Stabbing(e) & Subj(e0 ,Brutus) & Obj(e0 ,Caesar)
& Obj(e,e0 )]]
IL List: [Maryx , sawe;x;e , Brutusy , stabe ;y;z , Caesarz ]
0
0
Parsons justies this analysis by arguing that what Mary sees is an eventuality rather than
the participants in that eventuality. Index e0 in the IL `sawe;x;e ' corresponds to this event.
The representation of other sentential complements is similar; and I will only comment on
the representation of sentences with `believe'.
In Intensional Logic verbs such as `believe', which result in referentially opaque constructions, are analysed as predicates over the intension of their complement. Consider
the example below, taken from Dowty et al. (1981:207), and its corresponding IL list representation (the operator ^ returns the intension of its argument, that is, its extension in
all possible worlds):
0
Eng: John believes that a sh walks.
Intens. Logic: believe(j,^9x[sh(x) & walk(x)])
IL-list: [Johnx , believese;x;d , thatd , ay , shy , walksd;y ]
The Intensional Logic analysis accounts for the dierence in meaning between sentences
such as:
John believes the Morning Star is the Morning Star.
John believes the Morning Star is the Evening Star.
104
However, a distinction between these two sentences is already a feature of their IL lists
because each sentence is assigned a dierent representation.
The Intensional Logic formula and the IL list dier in that the complement sentence
in the former is recursively included in the predicate for `believe', whereas in the IL list
representation of `John believes that a sh walks' the relation between `believe' and its
complement is established through index d from the IL `walksd;y '. The intuition here is
that the IL `believese;x;d' relates index x, which indexes the lexeme representing the believer, with index d, which indexes the IL representing the belief. Davidson (1984) and
Hand (1993) argue on monolingual grounds that in reported speech and propositional attitude sentences (e.g. with `believe') the complementizer `that' acts as a demonstrative
which refers to the utterance appearing as complement; their treatment leads to a paratactic analysis in which there is no embedding; instead the complementizer `that' acts as
a pronoun which refers to the complement clause. For example, Davidson (1984) would
analyse the sentence `Galileo said that the earth moves' as consisting of the two clauses:
Galileo said that.
The earth moves.
where `that' refers to `the earth moves'. Davidson therefore proposes that the two sentences
are semantically and logically unconnected, and that their relationship is explained through
discourse relations rather than through logical form. If this view of attitude verbs is
accepted, the index assigned to `that' in an IL list can be motivated by its referent,
namely the event associated with the subordinate clause.
From a multilingual point of view there is also evidence that non-embedded representations are valid for sentential complements. For example, Noonan (1985:55-56, 76)
describes paratactic constructions found in certain languages of the world. Paratactic
constructions are used in the expression of sentential complements in a form essentially
analogous to the non-embedded analysis of Davidson (1984). Thus, Noonan (1985:55)
oers the following example from Lango, a language spoken in Uganda (the transcription
includes two minor typographical changes):
n opoyo
Lango: At
okworo kal.
Gloss: child remembered-3sg sifted-3sg millet
Lit. trans.: The child remembered it, he sifted millet.
Eng: The child remembered to sift the millet.
From the literal translation of the Lango sentence it seems that its structure is that
of two clauses neither of which is embedded in the other. This is unlike its English
counterpart in which there is a subordinate clause which forms part of the verb phrase.
What paratactic sentences suggest is that the IL list representation, although ultimately
grounded in monolingual event-based semantic theory, is reected in the syntax of some
languages.
2.3.10 Head Switching
The problem of head switching in recursive structures was already considered in Section
1.4.4 within the context of Structural Correspondence transfer. Problems arose in the
case of sentential complements where the treatment of Kaplan et al. (1989) led to doubly
105
rooted structures. Although the IL list formalism alone cannot handle head switching
appropriately, the addition of one general mechanism, that of bilingual lexical rules, does
allow an adequate and novel solution to the problem. Consider the sentences that were
problematic for Kaplan et al. (1989) together with their respective IL lists:
Eng: I think that the baby just fell.
Spa: Yo creo que el bebe acaba de caerse.
Eng IL-list: [Ix, thinke;x;d, thatd , they , babyy , justd , felld;y ]
Spa IL-list: [yox, creoe;x;f , quef , ely , bebey , acaba def;y;d , caersed;y ]
Unlike the semantic solution to head switching suggested by Kaplan and Wedekind (1993),
expressing transfer representations in the IL list notation suggests a lexically based solution to the problem. The main idea is that instead of treating the translation as purely
compositional, such that `just' were translated as acaba de independently of the verb it
modies, the bilexical entry for this adverb should also contain the verb it modies:
fjustd , felld;y g , facaba ded;y;d , felld;y g
By doing this, the problem of deciding on the syntactic head of the construction is left to
the respective grammars; consequently, transfer is only concerned with the lexical equivalences between source and target IL lists and not with their structural organization.
At rst sight it would seem that there are two problems with this solution. The rst
one is a technical point: the IL `acaba ded;y;d' in the bilexical entry shares its own event
index d with that of its complement; this would lead to a clash in the instantiation of
indices when translating from Spanish into English, since during analysis of the Spanish
sentence, these indices would not be bound, making unication into the bilingual entry
after instantiation fail. This problem can be overcome in two ways: a) one can assign
the English IL a value corresponding to the disjunction of the two separate, instantiated
Spanish indices (Copestake et al. 1995):
fjust1_2, fell1_2;yg , facaba de1;y;2, fell2;y g
b) one can slightly modify the SB translation process such that instantiation of indices is
delayed until after transfer; only this latter alternative has been implemented here.
The second problem is perhaps more important: the bilexical entry above implies a
considerable expansion in the size of the bilexicon, since now every verb needs an extra
entry when it appears with the adverb `just'. For example (indices omitted):
just arrived
just fell
just left
just died
..
.
$
$
$
$
acaba de llegar
acaba de caerse
acaba de irse
acaba de morir
..
.
Again, there are two solutions to this problem. One involves developing bilexical patterns,
exemplied below:
justd , (Vd;y ) , acaba ded;y;d , (V02;y )
106
The parentheses here indicate ILs which are not actually translated by this bilexical entry,
but simply serve to provide appropriate index bindings for `just' and acaba de; other
bilexical rules would be involved in the translation of the main verbs (see also Section
7.5). The other solution is to describe the above pattern using a bilexical rule. Bilexical
rules are a modication of the tlink-rules in the LKB; the main dierence is that they
are interpreted as mapping between bilexical entries containing lexical signs only, unlike
tlink-rules for which multiple lexemes on either side of a tlink are analysed as involving
phrasal signs. This dierence can be elucidated with the following schematic comparison:
With tlink-rules: VP(just arrived) , s-VP(acaba de llegar)
With bilexical rules: fjust, arrivedg , facaba de, llegarg
Here, `VP(just arrived)' indicates that only VPs including the lexical signs for `just' and
`arrived' can match the bilexical entry. The shortcomings of the phrasal sign approach to
multi-lexeme equivalences was described in Section 1.4.10.
Returning to the `just' example, a single bilexical rule would map the bilexical entry
of each verb to construct a new bilexical entry which included the adverb `just' and its
translation acaba de. The schematic structure of such a rule would be:
Ved;y $
justd Ved;y
Vsd;y
+
$ acaba ded;y;d Vsd;y
It is worth emphasizing that morphological considerations have been ignored; in particular, the usual tense on the English side, namely past tense, is not mirrored by the Spanish
translation in which present tense in the nite verb (i.e. acabar de) is the correct equivalent.
Given the bilexical entry `fall - caerse' this rule would generate the new bilexical entry
`just fell - acaba de caerse'. The implementation and operation of bilexical rules will be
described in Section 4.1. However one implementation detail I will mention now is that
new bilexical entries are constructed dynamically, based on the SL sentence. In other
words, after analysis, the SL IL list is used to collect all bilexical entries that could be
used in its translation. The bilexical rules are then applied to these collected entries in
order to temporarily expand the bilexicon prior to transfer.
Derived verbs whose translation is not directly encoded in the bilexicon would require
a two phase application of bilexical rules. For example, from the noun-noun equivalence:
defeatx , derrotax
a bilexical rule would construct a corresponding verb-verb equivalence:
defeatx
defeate;y;z
$ derrotax
+
$ derrotare;y;z
Then, the `just - acabar de' bilexical rule would apply to the output of this rule to derive
the equivalence `just defeated - acaba de derrotar'.
It is worth noting that the solution of Kaplan and Wedekind (1993) (Section 1.4.4)
results in the loss of a logical entailment. Their logical form for a sentence containing
`just' is:
107
Eng: John just fell.
Kaplan et al.: just(fell(john))
The problem is that the inference:
John just fell ! John fell.
is no longer a logical inference but instead needs to be stated as a meaning postulate for
the adverb `just'. Informally, the postulate might be:
just(X) ) X
By contrast, an event-based treatment would allow the inference to follow from the logical
formula alone:
9x[fell(x,john) & just(x)] =) 9x[fell(x,john)]
This observation supports the event-based treatment that is the basis for the IL list of
such sentences.
Adverbs such as `nearly', for which the above inference does not hold, would nevertheless be represented similarly to `just'. The purpose of the above discussion, however, is
to suggest that the treatment of `just' proposed by Kaplan and Wedekind is not entirely
satisfactory from a semantic point of view. If the reason behind their treatment, however,
were to construct a structure which enabled a purely compositional transfer of `just', then
a similar eect can be achieved by using bilexical rules, namely that only one piece of
contrastive knowledge (i.e. one bilexical rule) needs to be stated.
2.3.11 Bilexical Rules in Other Problems
Consider now the Hindi corelative clause. Below I give a possible IL list for it (without
trivializing the example, I present Hindi lexemes in English italics):
Eng: I saw the man whose dog is sick.
Hindi: Which man's dog is sick, that man I saw.
Eng. IL-list: [Ix , sawe;x;y , they , many , whosey;z , dogz , isd;z;z , sickz ]
Hin. IL-list: [whichy , many , 'sy;z , dogz , isd;z;z , sickz , thaty , many , Ix , sawe;x;y ]
0
0
0
0
The enabling equivalence between the two IL lists is:
fthey , many , whosey;z g , fwhichy , many , 'sy;z , thaty , many g
which could be generated from the bilexical entry `man - man' by the bilexical rule (indices
omitted):
$ Nh
+
the Ne whose $ which Nh 's that Nh
Ne
The purpose of this example is not to claim that this rule has been implemented or that
this is the most adequate way of overcoming the structural mismatch; instead, it shows
how a purely lexicalist solution can be formulated for transfer between quite dierent
constructions.
108
Another problem which can be tackled with bilexical rules is what Talmy (1985) describes as dierences in the lexicalization patterns of languages. By this it is meant that
dierent aspects of the meaning of a sentence are encoded by dierent words. Take the
following expressions for example:
Eng: John swam across the river.
Spa: Juan cruzo el ro nadando.
Lit: John crossed the river swimming.
In the English sentence, the verb and preposition indicate bounded extent and manner of
movement. However, in the Spanish version the verb indicates a bounded extent while the
manner of motion is conveyed by the gerundive verb.
A way of tackling these translation mismatches is described by Sanlippo et al. (1992).
The basic idea there is that of establishing a relationship between movement verbs and
their respective translations in bounded path contexts. This proposal can be expressed
using bilexical rules which eect the following mapping (indices omitted):
$
Vs
+
+ gerundive
Ve across $ cruzar Vs
Ve
This rule derives the translation of a bounded extent verb complex in English by mapping
bilingual entries for movement verbs into entries in which there are two Spanish verbs in the
TL. An example of its application would be the following mapping (without morphological
synthesis):
$
nadar
+
swim across $ cruzar nadar+ando
swim
Finally, consider the transfer relation found in the following pair of phrases:
Eng: The dog in the park.
Spa: El perro que esta en el parque.
Since the IL approach assumes that all lexical items necessary for generation are made
available by the transfer component, the Spanish lexemes que esta would have to be introduced during transfer. This is achieved by the following bilexical rule (index bindings
suppressed):
p-lex $
s-p-lex
+
p-lex $ que esta s-p-lex
which introduces the appropriate lexical entries into the Spanish side of the output bilexical
entry. In conjunction with the Spanish grammar, this rule leads to appropriate translations
of phrases containing nouns modied by PPs. These three examples demonstrate that
bilexical rules with lexical insertion can account for many translation mismatches.
There is an apparent shortcoming with the use of the bilexical rules just described and
this is that complexity in the system has not been reduced, but instead it has been shifted
from the analyser to the bilexicon and the bilexical rules. However, there are reasons
why this shift is appropriate in a transfer MT system. Firstly, the patterns described with
109
these rules are closer to the word patterns used by translators, as pointed out by Kawasaki
et al. (1992). That is, lexical patterns such as those expressed by bilexical rules and entries
express more naturally the type of knowledge that translators possess, including knowledge of accepted translations, cultural transplants, intended rather than literal meanings,
register and style equivalences, and idiomaticity of equivalent expressions. Also, as it has
been shown by the preceding discussion, surface information does plays a role in the translation process, and bilexical rules of the above form preserve this information. Finally,
the cost of compiling bilexical rules can be oset by the use of statistical techniques. For
example, in the statistical approach to MT (Section 1.4.11) of Brown et al. (1993) groups
of words in the SL and TL which are translations of each other are automatically extracted
from bilingual corpora; based on these groupings bilexical rules may be constructed in a
semi-automatic fashion.
2.3.12 Lexical Gaps
Whenever the meaning of a lexeme in one language has to be approximated by multiple
lexemes in another language it is said that a lexical gap exists. Below are some examples:
Eng: get up early young bull along
Spa: madrugar
novillo
a lo largo de
A possible treatment in event-based semantics of the rst example is similar to the IL list
treatment where lexical gaps are transferred by equating a single IL with multiple ILs.
For example:
Eng: John got up early.
Spa: John madrugo.
The simplied event-based and IL list representations of these sentences are:
Eng. event-based: 9e[Got up(e,John) & early(e)]
Spa. event-based: 9e[Madrugo(e,John)]
Eng. IL-list: [Johnx , got upe;x, earlye]
Spa. IL-list: [Johnx , madrugoe;x]
Possible bilexical entries to overcome this lexical gap might be:
Event-based: Got up(e,X) & early(e) , Madrugo(X)
IL-list: fgot upe;x , earlye g , fmadrugoe;xg
Apart from the recursive character in the event-based representation, there are no substantial dierences between the way the two approaches overcome this particular gap.
However, the other examples are more problematic. Take the translation `along - a lo
largo de'. One problem with this translation equivalence is determining the semantic representation of the Spanish phrase. A representation such as a-lo-largo-de(x,y) to mirror
`along(x,y)' seems to be the best option even though monolingually the Spanish phrase
does not appear to be totally xed or idiomatic (e.g. a lo ancho de - breadthwise). Unfortunately, to obtain this predicate, somewhere in the Spanish grammar there would have
to be a rule along the lines of:
110
PP[sem: a-lo-largo-de( ,x)] ?! a lo largo de NP[sem: x]
The problem is that any other gaps between Spanish and another language would also
have to added to the Spanish grammar in the form of similar rules. This would mean loss
of modularity, as the addition of new languages to the system would require modications
to the monolingual components. Any alternatives to this approach within an event-based
framework or in other logic-based approaches would involve a detailed analysis of the
semantic structure of a lo largo de, including appropriate predicates and associated entailments for each of its lexemes.
In the IL list approach lexical gaps of the `along' type are overcome in the transfer
module alone by equating the corresponding ILs:
falongx;yg , fax;z , loz , largoz , dez;y g
Another problem with the logical form approach to lexical gaps is that it cannot cope
easily with discontinuous gaps. This problem is similar to the one described for adjectives.
For example, the equivalence `young bull - novillo' would not normally be detected by
recursive descent transfer algorithms in phrases such as:
Eng: The young black bull.
Spa: El novillo negro.
The reader is referred to Section 2.3.6 for further discussion.
2.3.13 Anaphora Resolution
An important problem in translation is that of anaphora resolution; the following example
illustrates the issue.
Spa: Mara bostezo. Estaba aburrida.
Eng: Mary yawned. She was bored.
Translation into English requires the sex of the subject of `was' to be determined in order
to select the appropriate pronoun.
Anaphora resolution has been and continues to be a subject of intensive research in
Computational Linguistics (Hirst (1981) for a survey, also Carter (1987), Aone and McKee (1993)). Many algorithms have been developed to tackle this problem both in CL and
in MT. In this section I show briey how a relatively simple pronoun resolution strategy
may be incorporated into the IL list approach. However the purpose of the section is not to
argue that this is the best way of incorporating anaphora resolution in transfer based MT
and much less to claim that IL lists are the most adequate representation for modelling
pronoun resolution.
Grishman (1986:131) notes that the pronoun resolution algorithm of Hobbs (1976) is
quite successful despite its simplicity. The algorithm works by searching the parse trees of
preceding sentences in a top-down, breath-rst, left-to-right fashion until an appropriate
referent is found for a pronoun. Adapting this algorithm for searching through the IL lists
can be done by approximating each of the three search strategies as follows:
111
Top-down: Starting with the distinguished IL for the whole sentence, inspect all ILs directly
connected to it. Repeat the process recursively for all appropriate ILs (i.e. major categories)
thus found.
Breath-rst: In the top-down search, consider all the appropriate ILs which are directly
connected to the IL being considered before applying the procedure recursively.
Left-to-right: Use the ordering information in the IL to restrict the search for an antecedent
to proceed left-to-right.
The idea of a distinguished IL, as required by this algorithm, is modelled on the notion of
a \nucleus" in HPSG (Pollard and Sag 1994:320). The HPSG conception of a nucleus is
that of a quantier free state of aairs (e.g. a non-quantied predicate) which embodies
the semantic head of a phrase. Distinguished IL's dier from nuclei in that the latter
recursively allow other nuclei to be part of their arguments, whereas ILs are not recursive
structures; in addition, the distinguished IL of a modied structure is not the head of the
modier, as is the case with nuclei, but that of the modied constituent.
To select the pronoun `she' in the translation above would require inspection of the
rst sentence in the SL, namely Spanish, in order to determine the sex of the subject in
the second sentence. The algorithm starts with the distinguished IL `bostezoe;x'. For this
simple text, the rst and only IL to be considered is `Marax' because: a) it is directly
connected to the distinguished IL through the index x, and b) it is the leftmost, appropriate
IL that is thus connected. The algorithm will declare this IL as determining the properties
of the pronoun necessary for the English subject.
2.3.14 IL Lists and Logical Forms
Whilst indexing in IL lists follows the arity and binding structure of logical forms, the
structure of IL lists only allows quite rudimentary forms of inferencing. In view of this it
is worth considering the following valid objection: the eort needed to build a grammar
is very great; therefore, it seems wasteful to discard logical forms as a representation for
transfer and instead use a representation that closely mirrors the surface structure of the
sentence. In particular, since a logical formula may be constructed with the same grammar
rules as those required for syntactic analysis, it would be worth using logical formulae as
a uniform representation for transfer and also for any sort of reasoning that the output of
the analyser may be used for.
The response to this criticism has to do with the generation of TL sentences from logical
form. Shieber (1993:180) argues that a range of meaning representation formalisms cause
what he calls the \problem of logical form equivalence, [i.e.] the problem of constructing
a generator that can generate not just from canonical logical forms [in the broad meaning
of this term, which would include many of the transfer representations and logical forms
considered in the preceding discussion] but from all logical forms that mean the same",
where the standard for meaning identity is taken to be natural language meaning identity
(i.e. not necessarily logical equivalence). The problem of generation from a transferred
structure falls under this category of generation since it is generally the case that the SL and
TL grammars can construct arbitrarily dierent logical forms with equivalent meanings.
It is argued by Shieber that under such conditions, a generator would have to solve the AI
problem if it were to solve the problem of logical form equivalence. This is because any
112
system that purports to reason in any way will require a notion of semantic equivalence
which roughly coincides with human notions of equivalence. Semantic representations at
present only approximate such notions and hence cannot generally generate sentences with
equivalent meanings. Furthermore, some of these representations, including full rst order
logic, are undecidable as to whether two expressions are equivalent. With IL lists this
problem is avoided by sacricing general reasoning capabilities and using a representation
which almost trivially corresponds to surface form. That is, IL representations are distinct
for dierent surface forms of a sentence and hence any translational equivalences that exist
between two languages have to be expressed explicitly in the bilexicon and the bilexical
rules, instead of following from the logical structure of a sentence (even assuming lexical
transfer and restricted structural transformations).
One can also view the distinction between transfer based on logical form and transfer
based on IL lists in a slightly dierent way. Generation from logical form avoids decisions
on what to generate (the strategic generation problem) but it normally has to tackle the
problem of how to say it and with which lexemes (the tactical generation problem). By
contrast, the IL list approach avoids strategic and tactical generation problems by having
as input to the generator all the lexemes that need to appear in the TL sentence; its disadvantages are a small expansion in the size of the bilexicon and a seemingly exponential
worst-case complexity in the generation algorithm. The rst problem is not very signicant, especially with the use of bilexical rules. As for the second problem, I will describe
in Sections 4.2.1 and 4.2.3 a number of techniques for improving the execution time during
generation.
2.4 Adequacy of the Representation
I will now consider some of the desirable properties of a transfer system as proposed in
Section 1.5 in the light of the IL list representation just presented. Properties not included
here are discussed in their original formulation in Section 1.5.
Representational and Inferential Adequacy
On the representational side, the bilexical rule mechanism with lexical insertion oered a
solution to head switching and corelative constructions which demonstrated the power of
this mechanism for expressing complex transfer relations.
Inferential adequacy is also achieved through bilexical rules. Consider the following
translational regularities:
Fruit
Tree
English Spanish
almond
apple
cherry
orange
plum
almendra
manzana
cereza
naranja
ciruela
English
almond tree
apple tree
cherry tree
orange tree
plum tree
Spanish
almendro
manzano
cerezo
naranjo
ciruelo
For words such as `apple', adding the word `tree' is the most natural way of translating
the corresponding Spanish word. The equivalences on the right-hand columns are a clear
113
case of knowledge which should be inferred from existing knowledge. A bilexical rule such
as
Nex
Nex treex
$ Nsx
+ + fruit-tree
$ Nsx
encodes the relevant relationship. This says that any bilexical entry relating the name of
fruits in English (Ne) and Spanish (Ns) gives rise to another bilexical entry relating the
name of their respective trees; this is achieved by adding the lexical entry for `tree' in the
English side and applying a fruit-to-tree lexical rule to the Spanish noun. For example:
applex
applex treex
$ manzan-ax
+ + fruit-tree
$ manzan-ox
I have not made explicit the eect that this lexical rule has on the semantic aspects of the
representations. In particular, the index in the input and output bilexical entries would
be associated with logical form variables ranging over distinct objects: fruits on the input
side and trees on the output side. To overcome these diculties one could either eliminate
index bindings from input to output in bilexical rules, or adopt a coercing mechanism such
as that of Pustejovsky and Boguraev (1993).
Inferential Eciency
Inferential eciency is a property of bilexical rules. There are three restrictions to guarantee this property. Firstly, bilexical rules apply dynamically on demand, expanding the
lexicon just suciently to activate all new bilexical entries likely to be necessary for the
current sentence; this minimizes unnecessary bilexical entries. Secondly, bilexical rules
are restricted to be non-recursive in the sense that a rule cannot apply to its output; this
avoids innite expansions of the bilexicon (Carpenter 1991). Finally, the interpretation of
bilexical entries as sets is suspended in order to achieve a more ecient list interpretation
during rule application; this is so that the rules which are applicable to a given bilexical
entry can be determined by linear matching. The last condition is included to optimize
the application of rules with more than one IL in their input. For example, the interaction
of causative verbs (e.g. John marched the soldiers) with `just' can lead to multiple ILs on
the input side of a rule:
Input
Output
English Spanish
march
y
work
English
Spanish
hacer marchar just marched acaba de hacer marchar
hacer volar
just ew
acaba de hacer volar
hacer trabajar just worked acaba de hacer trabajar
A possible rule for this relationship would be:
Ve $
hacer Vs
+
just Ve $ acaba de hacer Vs
By treating the input bilexical entries as lists, it is possible to determine eciently which
rules are applicable to a bilexical entry.
114
Acquisitional Eciency and Transparency
The format and content of bilexical rules is very similar to the knowledge obtained for
transfer models in statistical approaches to MT. Therefore many of the techniques used
there can be adapted to the semi-automatic acquisition of bilexical entries and rules.
Also, results from Computational Lexicography can be used directly as part of the crosslinguistic knowledge base (Sanlippo et al. 1992). As far as transparency is concerned
it was already noted above that translators appear to make use of lexical patterns and
standard string equivalences. A representation such as the IL list, which expresses these
equivalences directly, is adequately transparent.
Modularity
One important property of IL lists is that they do not assume any specic syntactic description of the sentence except for that which is needed for the indexing of lexemes. While
independence of monolingual and transfer modules is not fully achieved (or even achievable) the IL list assumes very little structure. Not only does this allow for very dierent
syntactic and semantic descriptions of the languages involved but also for extensions and
renements to be made to these descriptions without aecting the overall structure of the
transfer component.
Uniformity
IL lists are an explicit and uniform level of representation in which the majority of translation correspondences between dierent languages can be eectively expressed. Thus,
lexical gaps, argument switching, head switching and dierences in complementation patterns and structural descriptions can all be handled.
2.5 Conclusion
The main purpose of this chapter has been to describe and motivate the IL list representation used for transfer. IL lists a) are maximally lexical linear structures which, in
conjunction with bilexical rules, can handle a wide range of linguistic and translation
phenomena including argument switching, head switching and lexical gaps; b) consist of
lexemes and indices; these indices correspond to variables in a semantic representation
of meaning but without the associated quantications; c) express the relations between
indices and between lexemes, which, it is argued, are sucient for many translation phenomena. Furthermore, the structure of IL lists is linear, as opposed to recursive, and this
enables a uniform treatment of a large range of translation problems while maintaining
modularity and perspicuity.
The next chapter concentrates on the analysis component of the MT system, showing:
how grammar rules are encoded in the LKB, how the parser uses these rules and how
IL lists are represented and built. The English and Spanish grammars will be developed
independently of each other and separately from the requirements of the transfer module.
115
Chapter 3
Analysis and Grammars
In the previous chapter I proposed TFSs an an adequate formalism for developing a transfer
based MT system, and motivated IL lists as a representation for eecting transfer. In this
chapter I merge these two notions and show how the analysis phase of the present system
works.
I start by showing how phrase structure (PS) rules expressed as TFSs are used in
parsing; this description will clarify the fragments of the English and Spanish grammars
that follow. Each fragment was developed independently both of each other and of the
transfer representation used; this is reected in the structure of the chapter where I have
motivated each grammar without reference to the IL lists nor to the TL into which it
translates. In the nal section of the chapter I describe how IL lists are extracted from a
parse tree.
3.1 Parsing with TFSs
3.1.1 Algorithm
The parsing algorithm is a direct adaptation of the parsing techniques originating in the
work of Earley (1970), Kaplan (1973) and Kay (1973). In essence, these techniques store
the result of intermediate stages in the analysis process in order to avoid their reduplication. The outline of the algorithm shown in Figure 3.1 is taken from Winograd (1983:120).
An active chart is a data structure in which all constituents and partial constituents produced during parsing are recorded. It consists of: vertices, indicating a position in the
sequence of words being parsed; edges, representing a constituent or partial constituent
and its position in the input; a list of pending edges not yet entered into the chart. The
initializing step creates edges for the input words, while the combining step consists principally of the fundamental rule of chart parsing dened as follows (taken from Gazdar and
Mellish (1989:197) with two minor modications):
Fundamental Rule
If the chart contains edges < i; j; A ! W 1:BW 2 > and < j; k; B 0 ! W 3: >, where A and
B = B 0 are categories and W 1, W 2 and W 3 are (possibly empty) sequences of categories or
words, then add edge < i; k; A ! W 1B:W 2 > to the chart.
Note here the use of dotted rules in which a dot indicates how much of the rule has been
satised by the parser. The proposing step has as input a category which, in conjunction
116
Purpose: Test whether a sequence of words is a sentence in the language dened by a context-free
grammar
Inputs: A sequence of words
Background: a context-free grammar and a dictionary
Working Structures:
Chart: an active chart
New Edge: an edge
Basic Method:
Set the chart to the result of initializing a chart for the input sequence.
Keep repeating:
{ Remove any member of the pending edges of the chart and assign it as the new edge.
{ Combine the new edge with the chart.
This may produce new pending edges.
{ If the new edge is active, Propose the rst symbol of its remainder at its ending vertex
in the chart.
This may produce new pending edges.
Conditions: Succeed if at any time there is a complete edge in the edges or pending edges of the chart
with:
starting vertex = the rst element of the vertices of the chart
ending vertex = the last element of the vertices of the chart
label = the distinguished symbol of the grammar
If there are no pending edges when one is to be chosen, fail.
Figure 3.1: Recognition with an active chart.
117
with the grammar rules, is used to add new active edges looking for constituents of that
category (i.e. whose mother is equal to it); these edges are added to the pending edges
list unless identical edges have already been added.
Since Winograd's algorithm in Figure 3.1 is a recognizer, changing it to work as a
parser involves modifying the success statement in the Condition eld to read:
Succeed if there are no pending edges and there is at least one complete edge in the chart
with: (see Figure 3.1)
Retrieving a parse tree from the chart requires keeping a record of the edges used in the
construction of a given edge.
In addition, the algorithm needs to be changed in order to operate on feature-based
grammars. There are two main changes. First, instead of testing the equality of B and
B 0 in the fundamental rule, the two are unied; the added edge then is < i; k; A0 !
W 10B 00:W 20 >, where B 00 is the result of unifying B and B 0. The second change is to the
proposing step. Grammar rules for constructing active edges are selected if their mother
unies with the input category. However, when the active edge is constructed the rule
used is the original rule without the changes resulting from unication. Furthermore, if an
edge to be added during the proposing step is subsumed by an edge in the chart or in the
pending edges list (e.g. the edge has been proposed before) then no addition takes place.
I have used the chart parser implemented by Copestake (1993b) for the analysis component. Before giving an example of parsing with TFS grammars I will explain how rules
are represented in the LKB.
3.1.2 Rules as TFSs
Grammar rules, like lexical rules, are a subtype of rule. The mother in a grammar rule
is conventionally associated with feature 0 while the daughters are assigned to feature 1
to n, where n is the maximum rule size. Because of the dierent number of daughters,
and therefore features, in a rule, a dierent type is dened for each possible rule size. The
type constraint for a rule with two daughters is:
binary-rule v rule 3
6 0 = sign
7
6
7
4 1 = sign
5
2 = sign
2
There is a function in the LKB which, given a rule, determines the linear ordering of its
daughters; at the moment this function returns daughters in numerical order. Thus, in
the binary rule above, the daughter with feature 1 is the leftmost daughter, and so on.
Given this, I will sometimes represent a rule with the more usual arrow notation:
sign =) sign
sign
where the left most daughter corresponds to the value of feature 1 and so on. As input to
the LKB, a rule is specied as follows:
s-np-vp
<0> =
<1> =
<2> =
binary-rule
s
np
vp.
118
where s-np-vp is used as an identier for indexing the rule. This denition would correspond to the usual topmost rule S ) NP VP. Since the rule is a TFS, variable bindings
and path value assignments can be included in its denition.
3.1.3 Example
An example will show how rules apply. The grammar in Figure 3.2 analyses simple intransitive sentences. To analyse the sentence `John runs' the parser rst initializes the chart
CF back-bone: S ) NP
2 VP
2
3
np
s
6 orth = orth
h
6 orth = orth 7
6
n
4 syn = v
5 =) 6 syn =
4
agr = 0
sem = sem
sem = sem
CF
back-bone:3 VP 2) Vint
2
v-int
vp
4 orth
= 0
syn = 1
sem = 2
0 orth
1 syn
sem = 2 sem
orth =
5 =) 6
4 syn =
2
3
vp
orth
v
6 orth = 6
6 syn =
4
agr
7
i7
7
5
sem
sem =
3
= 0 agr
7
7
7
5
3
7
5
Lexical
entries: John, runs.
2
3
np
6 orth = (John)
7
37
2
6
n
6
3 7
2
6
77
6
6
agr
77
6 syn = 6
5
4
num = sg 5 7
4 agr =
6
7
6
7
per = 3
4
5
sem = sem
2
3
v-int
6 orth
6
6
6
6
6 syn
6
6
4
sem
= 2(runs)
v
6
= 64 agr =
2
agr
4 num
per
= sem
= sg
= 3
7
37
3 7
77
77
557
7
7
5
Figure 3.2: Simple rules implemented as TFSs.
by creating edges for the lexical signs corresponding to `John' and `runs', and by adding
to the pending edges an edge containing the S rule with a dot in left most position. Next,
this edge is made the new edge and combined with the chart. The result of combination
is an active edge with the dotted rule (1):
2
2
(1)
s
3
orth
v
= sem
6 orth =
4 syn =
sem
7
5 =)
np
3
orth
n
6 orth = 2
6
6
6 syn = 4
6
agr
4
sem =
= 0
sem
agr
num =
per =
sg
3
37
7
7
57
7
5
2
vp
3
orth
v
= 0
= sem
6 orth =
h
6
6 syn =
4
agr
sem
7
i7
7
5
This edge is added to the pending edges list and in turn becomes the new edge, leading to
the construction and addition to the pending edges list of an edge with the VP rule and
with starting vertex between `John' and `runs'. On combination with the chart (i.e. with
`runs'), this edge results in the dotted rule
2
2
(2)
vp
3
4 orth = 0 5 =)
syn = 1
sem = 2
v-int
6 orth =
6
6
6 syn =
6
4
02 orth
1
sem = 2
4
v
agr =
sem
3
agr
num =
per =
sg
3
37
7
7
57
7
5
and a corresponding inactive edge. Finally, this inactive edge is combined with that for
119
rule (1) by unifying the mother of the former with the active daughter of the latter, thus
ensuring that the agreement features unify. No more edges are left in the pending edges list
and an edge with top category spans the whole input. Analysis of `John runs' is therefore
completed successfully.
The grammar in this example is very nave. A slightly more elaborate grammar is now
presented.
3.2 English Grammar
The grammar of English I have used in this thesis has as its main purpose that of testing
the MT system as a whole; therefore, some compromises were made, particularly in terms
of coverage, conciseness and type of analysis. In this section I outline the aspects of the
grammar which are relevant for the main body of the thesis and comment on some of its
relevant properties.
The CF back-bone of the grammar is shown in Figure 3.3. The preterminal categories
S ) NP VP VP ) V
NP ) Prn
VP ) VP fN/PgP
N1 ) Ncom VP ) VP PP
N1 ) AP N1
N1 ) N1 PP PP ) AdvP PP
NP ) Det N1 PP ) Pnp NP
AP ) A
PP ) Ppp PP
Figure 3.3: Outline of the English grammar used.
here are: Prn (proper name), Ncom (common noun), Det (determiner), A (adjective),
V (intransitive, transitive or dative verb), Pnp (preposition taking a NP complement)
and Ppp (preposition taking a PP complement).
3.2.1 PP Structure
Most of the rules in Figure 3.3 are standard descriptions of the phrase structure of English.
I will only comment on the PP rules.
The analysis of adverbial modication of PPs follows Grover et al. (1993) who propose
the following structure for P2 constituents:
P2
P
A2[+ADV]
PP
P
P1
!aa
!!
a
P <complements>
The bottom two PP rules may be compared with the structure of PPs proposed by
Jackendo (1973) and Jackendo (1977:79) in which the structure of PPs is described as:
P0 ) P - (NP) - (PP)
where parentheses indicate optional daughters. The main dierence between his rules and
those above is that I do not allow the variant where both NP and PP are sisters; that is,
120
PP ) P NP PP, which Jackendo (1977) suggests for analysing `the ight from Boston
to Chicago', is not in the rule set. Instead, following Parsons (1990:47), I analyze phrases
such as `the ight from Boston to Chicago' by making each PP a modier of the noun
`ight'.
Below are some phrases of the type analysed by the PP rules.
PP ) AdvP PP
(right)(inside the house)
(directly)(above the stadium)
PP ) Pnp NP
(in)(the park)
(on)(the horse which John likes)
PP ) Ppp PP
(from)(behind the church)
(at)(about four o'clock)
Thus the syntactic trees for `in the park' and `from behind the church' are:
PP
PP
PP
P
P
"b
Ppp
PP
"
b
"
b
!aa
Pnp NP
!!
a
"b
NP
from
Pnp
b
"
HH
in the park
H
behind the church
It seems that, at least in the spatial domain, PP complementation of prepositions is most
common with `from', although other prepositions also allow PP complementation in certain
contexts:
over in Asia, the computer industry is thriving (Durand 1992:14)
Trevor walked across to the post oce (Bennett 1975:78)
Trevor went to behind the door (Bennett (1975:22) although regarded as \not very good
stylistically")
Down from above the alter (Jackendo 1973:79)
However, I have investigated `from' as the only preposition having category Ppp, especially
because it is not clear that all the above examples constitute instances of PP complementation as opposed to an instance of a verb particle followed by a PP.
One issue regarding the two PP rules in question is overgeneration since it seems that
the ungrammatical `* from from behind the door' is accepted. While this problem is not too
damaging for MT, it can be solved by introducing restrictions based on the classication
of spatial relations presented in Section 5.2. To anticipate, restricting PP complements to
static spatial relations dissallows ungrammatical sequences, because `from' is not a static
relation and therefore cannot occur as its own complement.
3.2.2 TFSs for Categories
Signs in the grammars are loosely based on those of Pollard and Sag (1987), with the
most notable dierence being that there is no equivalent to their sem feature. The reason
for this is that ILs are intended to be as theory neutral as possible, such that alternative
121
semantic frameworks can be adopted and experimented with without aecting the translation mechanism. Thus, whether one uses Situation Semantics (Pollard and Sag 1994)
or instead uses Underspecied Discourse Representation Theory structures (Frank and
Reyle 1995), transfer at the level of IL lists can remain relatively unmodied (the main
change would be index assignment to lexemes to correspond to the argument structures
adopted in the respective semantic theory).
Figure3.4 shows the principal distinctions between the signs of Pollard and Sag (1987)
and mine. Starting with the TFS from Pollard and Sag (1987) on the left, type loc(al)
2
sign
orth
syn
3
6 orth = 2
6
6
6
6
6
6
6
6
6 syn = 6 loc =
6
6
6
6
6
4
6
4
binding
sem =
2
loc
head
lex
subcat = list
= binding
6 head =
6
4 lex =
sem
37
7
3 7
77
7
77
7
77
7
57
77
77
57
7
5
2
sign
orth
syn
3
6 orth =
2
6
2
6
6
6
6
4 head =
6 syn = 6
6 loc =
6
4
subcat =
6
6
6
nonlocal
=
6
6 trans =
6
4 lan =
loc
trans
lan
sense-id = sense-id
head
list
nonlocal
7
37
7
77
7
577
77
57
7
7
7
7
7
5
3
Figure 3.4: Pollard and Sag (1987) signs and signs used in the system.
contains information relevant to the local context of a rule; thus, head has features which,
following the head feature principle, are shared between a head daughter and its mother in
a rule; type lex, used by constituent ordering principles, marks lexical signs; list is a list
which contains the categories that a lexeme subcategorizes for. The type binding stores
slash values for treating unbounded dependency constructions. Finally, sem holds the
Situation Semantics representation of an expression. This organization of features diers
from those used in the present system in four ways: a) there is no lex feature; instead
lexical signs are distinguished by the presence of the feature sense-id which contains
information regarding the source of a lexical entry; b) the name of bindings is replaced
by non-local to reect the contrast with the feature loc; c) sem is replaced by the
feature trans which stores the indices from the distinguished IL; d) feature lan indicates
the language of the sign.
Type loc has two subtypes: major, for the TFSs of nouns, verbs, adjectives and
prepositions, and minor, for determiners, complementizers, etc. Below, I give an example
of the verbal and nominal subtypes of major:
2
nmajor v major
6 head = head
6
4 qualia = qualia
subcat = list
3
7
7
5
2
vmajor v major 3
4 head = head 5
subcat = list
The type qualia will be explained in Section 6.3.2; its main purpose is the description of
the lexical semantics of nouns. The type head contains information such as agreement
and case.
While there have been developments and changes in the organization of information in
HPSG (Pollard and Sag 1994), most of what will be said in the thesis could be adapted
to their new sign structure.
122
Rules
I will now illustrate the structure of a more complex rule by describing how subcategorization is handled. The basic idea, described by Shieber (1986:29-32) but traceable to
Dahl (1981), assumes that every verb encodes the TFSs of its complements in the subcat
list. During parsing, the head of the list is unied with the rst complement in the input,
while the rest of the list is shared with the value of subcat in the mother. The process is
initiated with a single rule (only relevant paths shown):
(3) CF
back-bone: VP ) V
h
i
v
vp
=) syn:loc = 0 vmajor
syn:loc = 0
Application of this rule constructs a VP out of any verbal lexical sign. To combine the
resulting VP with one of the verbal complements, the following rule is added:
(4) CF back-bone: VP ) VP fN/PgP
2
vp
6
4 syn:loc
"
vmajor
#
head = 0
subcat = 1
=
2
3
vp
2
6
6
7
5 =) 6
6 syn:loc
4
=
vmajor
3
v
cons
6 head = 0 "
6
6
car
4 subcat =
= 2
cdr = 1 list
3
7
7
#7
77
77
55
2 sign
Features in the type list follow Lisp nomenclature: car is the head of the list while
cdr is its tail; cons, continuing with the Lisp analogy, is the type of lists with at least
one element. If an adequate sign unies with the left most daughter of this rule, the
resulting dotted rule would have the value of 2 instantiated to the sign at the head of
the subcategorization list. For example, an entry for a transitive verb is:
2
v-tra
6
6
6
6
6
6
6 syn:loc
6
6
6
4
2
=
vmajor
6 head:agr
6
6
6
6
6
6 subcat =
6
4
3
=2 0
cons
6 car
6
6
6
6 cdr
4
= 2np
cons
=
6
4 car
np
= syn:loc:head:agr = 0 agr
cdr = (end)
3
7
377
77
77
777
3777
777
77
77
777
5557
5
Note in this entry that the subject sign in the subcategorization list appears last, and that
agreement between the subject and the verb takes place in the lexical entry for the verb.
During analysis, rule (3) is applied to the verb lexical sign above to construct a sign of
type vp; this sign unies with the left daughter of rule (4) to give the dotted rule:
(5) CF
back-bone: VP ) VP . NP
2
3
vp
6
4 syn:loc
2
"
head = 0
subcat = 1
=
vp
6
6
6
6
6
6
6 syn:loc
6
6
6
6
4
vmajor
2
=
vmajor
#
7
5 =)
v
= 2
cons
h
6 head = 0
agr
6
2
6
6
6
6 car
6
6
6
6 subcat = 6
6
6
4 cdr
4
= 23
=
3
i
cons
6
4 car
np
= 1 syn:loc:head:agr = 2 agr
cdr = (end)
123
3
7
77
77
377
77
77
3777
7
7
777
77
777
5557
5
3 np
Combination with a sign of type np to the right would then complete the VP. A similar
procedure is applied for subjects, but in that case there is an additional rule which combines
with an NP to the left.
The subcategorization mechanism just described will be used extensively in the development of the Spanish grammar now presented.
3.3 Spanish Grammar
The grammar of Spanish I have developed covers a range of structures, including relative
clauses and various types of clitic constructions, which are not treated in the literature
within a single system. These structures have been considered because of their high
frequency and because they were necessary for translating the sentences accepted by the
English grammar described in the previous section.
PP Modication of Nouns
One dierence between the present English and Spanish fragments is that the latter includes relative clauses whereas the former does not. This dierence arises because, in
general, Spanish does not easily allow spatial PP modication of nouns. Instead it uses
relative clauses with the verb estar (to be) followed by the spatial PP: el perro que esta
en el parque (the dog that is in the park).
The restricted character of PP modication of nouns in Spanish has been noted in
various studies, including Hickey (1993) and de Carlos and Pountain (1993). In these
the acceptability of locative Spanish Noun-Prep-NP sequences was tested against native
speakers intuitions. It seems that the majority of informants considered as awkward or
ungrammatical certain types of PPs if used as noun modiers, particularly when appearing
in subject position. The following examples, arranged in decreasing order of grammaticality, are taken from de Carlos and Pountain (1993); I have added an English translation
and a more natural Spanish version.
Eng: The bar across the street
The man in front of the window
Spa: ? El bar al otro lado de la calle
?? El hombre delante de la ventana
Spa: El bar [que esta al otro lado de la calle]
El hombre [que esta delante de la ventana]
Eng: The room at the top of the house
The stall in the market
Spa: * La habitacion en lo alto de la casa
* La tienda en el mercado
Spa: La habitacion [que esta en lo alto de la casa] La tienda [que esta en el mercado]
These phrases include only one type of relative clause; other types treated in this section
are given at the top of Table 3.1.
Clitic Constructions
In addition to relative clauses, both clitic climbing and clitic doubling are handled by
the Spanish grammar. By clitic climbing I mean the possibility of accusative and dative
pronouns appearing next to the verb that introduced them or, alternatively, next to any
subject equi verb that dominates this verb. Examples of clitic climbing are given in
124
Rel Vestar PP
Rel Vtra
Rel ProDat Vdat DatA NP
Rel NP VP/NP
Vvinf Vtra-ProAcc
Vvinf Vdat-ProDat-ProAcc
ProAcc Vvinf Vtra
ProDat ProAcc Vvinf Vdat
Vint
NP VP
Vtra NP
ProAcc Vtra
ProAcc Vtra AccA ProPre
ProDat Vdat NP
ProDat Vdat NP DatA NP
ProDat ProAcc Vdat DatA NP
ProDat ProAcc Vdat
ProDat Vdat NP DatA NP
que esta en el parque (that is in the park)
que ama (that loves )
que le da a John (that gives to John)
que John ama (that John loves )
quiere leer-lo (wants to read it)
quiere dar-se-lo (wants to give it to her/him)
lo quiere leer (wants to read it)
se lo quiere dar (wants to give it to her/him)
duerme (sleeps)
John duerme (John sleeps)
lee el libro (reads the book)
lo lee (reads it)
la ve a ella (sees her)
le da el libro (gives her/him the book)
le da el libro a John (gives the book to John)
se lo da a John (gives it to John)
se lo da (gives it to her/him)
les da el dinero a ellos (gives the money to them)
Table 3.1: Relative clauses, clitic climbing and clitic doubling data.
the middle part of Table 3.1. In Spanish, clitic climbing can only take place when the
complement verb is non-nite.
Clitic doubling means that the occurrence of a pronoun with dative or accusative
case does not preclude the occurrence of an additional non-pronominal complement in
a sentence. That is, the verb seems to have two complements, one pronominal and one
non-pronominal. The particular case in Spanish (or at least in Colombian Spanish) is
that the dative pronoun must always appear with a dative verb, with the non-pronominal
indirect object as optional. The accusative pronoun behaves dierently: it does not allow
doubling; thus, either a pronoun appears in a verb phrase, or a non-pronominal direct
object does, but not both. There is only one caveat to this general rule: it is possible
to have doubling with transitive verbs by having an accusative pronoun as before and a
personal pronoun headed by the case marker a. For example:
Spa: La veo a ella
Lit: Her see-1s hum/def her
Eng: I see her
Further examples of clitic doubling are shown at the bottom of Table 3.1.
Uses of a
This Spanish word has a variety of senses of which I have considered three: as a dative and
accusative marker and as a locative preposition; I will consider the last of these senses in
Chapter 5. Only one property is shared by the three senses: they all assign prepositional
case to their complement NP. Otherwise, their distribution is quite complementary.
As dative case marker, a is relatively unproblematic: it always appears when the
indirect object of a verb is not a clitic. For instance:
Spa: Le da el dinero a John/ella
Eng: S/he gives the money to John/her
125
The situation with accusative a is dierent since a transitive verb may or may not mark
its direct object with a depending on a number of factors (Butt and Benjamin 1994:312)
roughly corresponding to the humanness and deniteness of the complement.
Spa: Vi el perro
Eng: I saw the dog
Vi a Fido
I saw Fido
Vi a John
I saw John
Spa: Llame un amigo Llame a un amigo
Llame a John
Eng: I called a friend I called a (certain) friend I called John
These examples are for expository purposes only since the exact distribution of accusative
a is much more dicult to capture, as testied by the studies of Calvo-Perez (1991) and
de Kock (1992). For this thesis I have adopted the following convention: human nouns,
proper names and personal pronouns with prepositional case take accusative a; other nouns
do not.
3.3.1 Phrase Structure Grammar of Spanish
The grammar of Spanish I have developed builds on the work of Beaven (1990), who gives
a Categorial Grammar (CG) account of clitic climbing, clitic doubling and dislocated noun
phrases. However, unlike him, I develop phrase structure rules. Therefore I will start by
giving some justication as to why his treatment cannot be converted directly into a phrase
structure description.
CGs and CF phrase structure grammars (CFPSGs) are weakly equivalent in the sense
of Hays (1964:519). That is, they can both generate the same, context free languages,
as proved by Bar-Hillel et al. (1960). Unfortunately the equivalence does not extend to
the structures assigned by the two formalisms because of the binary structure of CGs. A
trivial example of this inequivalence is the description of the one string language, abc with
category C. In a CFPSG the simplest way of encoding this is with the rule C ) a b c. In
CG, by contrast, one must construct a binary derivation tree; for example, the following
category when assigned to a generates the relevant string: (C/b)/c. However, a derivation
from such a category will not be isomorphic to the one from the PS rule. This fact makes
it dicult to obtain optimal PS rules directly from categorial signs.
Despite this mismatch I will adopt two assumptions from Beaven (1990). The rst
follows Whitelock (1988) in treating subjects as sentence modiers; the second is similar
to proposals presented by Sanlippo (1990) and Balari (1992) in which a verb is assigned
a -domain, which is a list of possible event participants represented as a case or theta role
list, in addition to a syntactic subcategorization list (subcat list henceforth). The idea
is that the rst assumption captures subject PRO-drop, whilst the second allows clitic
doubling by having two dierent slots, one of which is consumed by the clitic and the
other by the non-pronominal complement. Clitic climbing is handled by inheriting the
subcat and case lists from verbal complements upwards through the analysis tree.
One important distinction between Beaven's formalism and mine is that he assumes
the operations of union and unication for sets of feature structures. Johnson (1991:137)
notes that, although Rounds (1988:33) formalizes the unication of set valued feature
structures, the computational complexity of this type of unication has not been addressed
properly. In addition, any approximation to set valued FSs using disjunction runs into
126
eciency problems; this follows from the observation made by Kasper and Rounds (1986)
that unication algorithms for FSs with disjunction probably have no polynomial solution
(i.e. are NP-complete). Kasper (1987) and Eisele and Dorre (1988) address this issue, but
the use of union operations disregards much ordering information necessary for generation
of grammatical sentences. Moreover, the treatment of Pulman (1994) using a process analogous to gap threading for encoding sets does not allow a straightforward implementation
of set union of the type Beaven assumes. Therefore, I have left sets and disjunctive unication and values out of the system. This decision led to the explicit encoding of unications
and, in particular, of ordering constraints on constituents, left largely unaccounted for by
Beaven (1990).
The CF back-bone of the portion of the grammar discussed in this chapter is shown
in Figure 3.5. The sections of the type hierarchy pertaining to the grammar are shown in
Figure 3.6; the types dier slightly from grammar categories in order to make the grammar
easier to read. A brief description of some of the types will help in the explanation of
Srel ) rel Sn
V1 ) v
NP ) Nprn
V2 ) NPnom V1phr
N1 ) Ncom
V1 ) Vestar PP
N1 ) AP N1
V1 ) V N2acc
N1 ) N1 AP
V1 ) NPpro V1
N1 ) N1 Srel
V1 ) Vvinf V1inf left
NP ) Det N1
V1 ) Vvinf V1inf middle
CP ) Case NPpre V1 ) Vvinf V1inf end
AP ) A
V1phr ) V1phr CP
PP ) Pnp NPpre V1phr ) V1phr PP
PP ) Ppp PP
Vinf ) Vinf NPpro:dat
Vinf ) Vinf NPpro:acc
PHR-SIGN ) Figure 3.5: CF back-bone of the Spanish grammar used.
sign
((
((( !! PPP
PP
(((( !!!
(
lex-sign n-sign v-sign phr-sign
!
!
Z
@ !
Z
!
!@
ZZ !! @ n !vP
n2aX v1
case
sentP
XX
H
HPP X
aaXX
@
"
PX
!
H
H
PX
!
"
PP
aaXX
PX
H
H
!
X
PX
"
X
P
!
H
H
@
XX
PX
a
X
P
!
PP
"
X
@
H
H
a
XXX
P
H
!
P
"
a H XX
X
H
sn
clex-a clex-de v-dat v-tra-ph v-tra v-vinfv-int v-estar
np
srel
cp
HH
!
!
\
H
%D
A
!! % H
D AHH \
! !
D A H
%
!
!
cp-a
v1-phr\v2
cp-de
np-nom np-pre np-acc np-pro
A
X
XX
XXX
A
XXXA
paccnh
pdat
pacch
Figure 3.6: Portion of the Spanish type hierarchy.
clitics and relative clauses. The notation NPpro:dat in Figure 3.5 indicates that some
path in type NPpro:dat has value dat. A successful parse must unify with the constraint
127
of type sn. Although this type does not appear as a mother type, the hierarchy shows
that it has the two subtypes v2 and v1-phr, which correspond to complete sentences with
and without explicit subjects. Type v1 allows verb phrases with non-empty subcat lists,
whereas v1-phr requires an empty subcat list. Type n2 stands for noun phrases which
may or may not be marked by the case marker a; it has two subtypes np and cp (case
phrase). A case phrase (CP) consists of a case marker (i.e. a) followed by a noun phrase.
There are NPs with dierent cases, including nominative, prepositional, accusative and
dative; morphological dierences due to case are only apparent in pronouns. Finally, in
the grammar, represents the empty string as needed by the gap hypothesis.
3.3.2 Clitic Doubling
Beaven (1990), using ideas from Jaeggli (1986), handles clitic doubling by requiring certain
clitics to absorb case when they combine with verbs. In this section I present a treatment
of clitic doubling which improves on Beaven's solution in three main ways: rstly, it
handles constructions with the accusative marker a; secondly, it does not rely on operations
on sets of feature structures; nally, it makes explicit the ordering of clitics in various
environments. In the rest of this section, types and features have been preceded by s- or
s- respectively to distinguish them from their English counterparts.
Now, consider the simplest Spanish sentence, consisting of a nite intransitive verb,
and the rule and category that describe it, as shown in Figure 3.7. In this sentence there
Spa: Corre
Eng: S/he runs
2
s-v1-phr
h
s-v
(6) s-v1
syn = 0 s-syn =) syn = 0
(corre)
s-syn
3
6 orth = 2
6
2
6
6
6
6
6
6 s-head =
6
6
s-subcat =
6
6 s-loc = 6
6
6 syn = 6
s-nom-case =
4
6
6
6
s-case-list =
6
6
4
s-junk-list =
6
6
s-nonlocal
=
6
4
trans = trans
lan = (spanish)
i
s-vmajor
s-v
(end)
s-nom-role
(end)
(end)
s-nonlocal
37
7
7
77
7
77
7
77
7
77
7
7
577
77
57
7
7
7
5
3
Figure 3.7: Simple sentence with corresponding rule and category.
is no explicit subject, but the fact that the s-v1-phr is a subtype of s-sn lets this
phrase count as a complete sentence. However, since the nominative case slot (-role)
s-nom-case is non-null, it is still possible for this category to combine with exactly one
nominative NP to give a s-v2 phrase. Since the resulting TFS is also uniable with the
constraint of s-sn, it would also count as a complete sentence. A simplied s-v2 rule is
shown below to indicate how this combination would take place.
128
2
s-v2
syn:s-loc:s-subcat = 0 (end)
(7) 64 syn:s-loc:s-nom-case
= s-null-cat
syn:s-loc:s-case-list = 1
(end)
3
7
5 =)
s-np-nom
syn:s-loc:s-head = 2
2
s-nom-role
s-v1-phr
3
4 syn:s-loc:s-subcat = 0
5
syn:s-loc:s-nom-case = 2
syn:s-loc:s-case-list = 1
Note here that the value of s-nom-case in s-v2 is null-cat. This prevents the occurrence
of more than one subject in a sentence. I have factored the nominative case out of the
case list and placed it on a separate feature to allow combination of subjects regardless of
whether other optional phrases appear in a sentence. This is analogous to the proposals
by Borsley (1987) and Balari (1991) where the subject category is stored as a distinct
feature, separate from the main subcat list. An essential property of the mechanism above
is that the lexical entry for the intransitive verb corre must be specied as having empty
subcat and case lists, and that its nominative case feature s-nom-case has a nominative
role.
For verbs which allow clitic doubling, namely dative verbs and transitive verbs with
personal pronoun objects, their lexical entry will have parallel values in the subcat and case
lists. As will become apparent, items in the case list stand for, and unify with, optional
arguments (i.e. modiers under the present treatment), whereas items in the subcat list
indicate obligatory arguments.
To see how clitic doubling works, consider transitive verbs with personal pronoun objects rst (e.g. `He saw her'). The lexical entry for ve (sees) is shown in Figure 3.8 (I shall
omit the features s-junk-list and s-nonlocal for clarity; they will become relevant in
subsequent sections). Here, the subcat value can unify with any of the clitics lo, los, la
s-v-tra-ph
6 orth = 2(ve)
s-syn
6
6
2
3
2
s-vmajor
s-v
s-pacch-phr ; (end)
=
=
= s-nom-role
s-case-list = s-acc-role ; (end)
s-nonlocal = s-nonlocal
trans = trans
lan = (spanish)
6
6
6
6
6 syn
6
6
6
6
6
6
4
6
6
6
6 s-loc
6
6
6
4
6 s-head =
6
6 s-subcat =
6
4 syn:s-loc:s-nom-case
37
7
7
77
777
777
777
777
577
77
57
7
7
7
5
3
Figure 3.8: Lexical entry for ve.
and las when they appear to the left of the verb. This unication would be eected by
the following rule:
2
(8)
s-v1
0 list
= 1 s-nom-role
syn:s-loc:s-case-list = 2 list
6 syn:s-loc:s-subcat =
6
4 syn:s-loc:s-nom-case
3
7
7 =)
5
2
3 s-phr-pro
s-v1
3
4 syn:s-loc:s-subcat = 3 ; 0 5
syn:s-loc:s-nom-case = 1
syn:s-loc:s-case-list = 2
Binding 0 in this rule shows that although the subcat list in s-v1 could be left empty
after combination with the clitic, the case list and nominative features are copied complete
from head daughter to mother (I have left out the step that applies rule (6)). The eect
of this rule is to make the parse of la ve (s/he sees it) a complete sentence in Spanish
whilst allowing the optional combination of two more phrases. One is a case phrase with
a personal pronoun using the rule
129
(9) s-v1
syn:s-loc:s-case-list =
0 list
h
=) s-v1
syn:s-loc:s-case-list = 1 ; 0
s-cp
syn:s-loc:s-head = 1
i
s-acc-role
which allows sentences such as la ve a ella (s/he sees her). The other is a nominative phrase
using rule (7) to allow, for example, John la ve a ella (John sees her).
Implicit in Figure 3.8 are the exact bindings between the subcat and case lists. These
bindings allow agreement between a clitic and its co-referring optional case phrase:
Spa: La ve a ella * La ve ellas
Las ve a ellas
Eng: S/he sees her (S/he her sees them) S/he sees them
They also allow index sharing between the index of a case phrase and that of an argument
in the verb:
Spa: John la ve
John la ve a ella
IL: John1 la2 ve3;1;2 John1 la2 ve3;1;2 a2;3 ella2
Lit: John her sees John her sees per/acc her
A fuller representation of the TFS for ve is shown in Figure 3.9. Note in this TFS that
2
s-v-tra-ph-3s
6 orth = 2(ve)
s-syn
6
6
3
s-vmajor-simp
6 s-head = s-v
2
6
cons
6
6
6
6
6
6
6
6
6
2
6
6
6
6
6
6
6
6
6
6
6 orth = 2
6
6
6
6
6
6
6
6
6
6
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 s-head = 0
6
6
6
6
6
6
6
6
s-loc
=
6 car = 6 syn = 6
6
6
s-subcat
=
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4 s-subcat =
6
6
6
6
6
6
6
6
6
6
6
6
s-qualia =
6
6
4
6
6
6
6
6 syn = 6 s-loc = 6
6
6
6
s-nonlocal =
6
6
6
6
6
6
6
4
6
6
6
6
trans
=
6
4
6
6
6
lan =
6
6
6
6
6
cdr
=
6
6
6
6 s-junk-list =
6
6
6
2
3
6
6
6
6
6
6
6
s-n-agr
=
6
6
6
7
6
6
6
6 s-n-case =
7
6
6
6 s-nom-case = 6
7
6
6
6
s-n-type
=
4
5
6
6
6
6
6
6
s-n-ind = 2
6
6
6
6
6
"
#
6
6
6
6
6
6
6 s-case-list =
car = 0
6
6
4
6
6
cdr =
6
6
4
s-vcomp =
6
6
6
s-nonlocal =
3
2
6
6
6
7
6 lex =
6
7
6 trans:dist = 6 ind1 =
4 ind2 = 2 5
4
(end)
s-n
s-pacch-phr
orth
s-syn
trans
(spanish)
(end)
s-nmajor
cons
(null-type)
s-nonlocal
3ind-lex
ve 1
e
2
s-n
(end)
ind3 = 1
s-agr
(s-acc)
(s-pro)
= 1 s-human
6 s-n-agr =
6 s-n-case =
4 s-n-type =
s-n-ind
(end)
qualia
s-nonlocal
s-agr
(s-nom)
s-n-type
obj
37
7
7
77
77
7
3777
7
3 77
77
7777
7
7 77
777
37
77
7777
7
3
7
7
77777
3 7
77777
7
7
777777
7
7 7777
77
777
7
7
7 77
777
7
777
577
77
777
77
77
7 7
7
7
777777
7
777777
7
577777
7
7
777777
7
7
577 77
777
77
7
7
77
77
77
57
7777
57
7
77
7
77
7
77
7
777
777
777
777
777
777
777
777
777
777
777
777
777
777
577
7
77
57
7
7
7
7
7
7
7
5
3
2
Figure 3.9: Fuller TFS for ve.
value 0 is shared with the head feature of the rst item in the subcat list. Since TFS 1 is
shared with the third index of the verb, any phrase that unies with TFS 0 (e.g. a clitic)
130
will have its index bound to the third index in the verb (i.e. the direct object/patient)
and to the index of the non-clitic pronoun.
Now consider dative verbs. The entry for da (gives) is shown in Figure 3.10. The
2
s-v-dat
3
(da)
s-syn
6 orth = 2
6
2
6
6
6
6
6
6 s-head =
6
6
6
6
s-subcat =
s-loc = 6
6 syn = 6
6
6
6
6
4 s-nom-case =
6
6
6
4
s-case-list =
6
6
s-nonlocal =
4
trans =
s-vmajor
s-v
s-np-acc ; s-pdat-phr ; (end)
s-nom-role
s-dat-role ; (end)
s-nonlocal
trans
37
7
7
77
777
777
777
777
577
77
57
7
7
5
3
Figure 3.10: Lexical entry for da.
exibility of having two lists can now be exploited. On the one hand, the subcat list
species that an accusative object (clitic or non-clitic) and a dative clitic are obligatory
complements in dative Spanish sentences, whilst on the other hand, the case list indicates
that dative CPs are optional. The order of list elements is important in this framework:
a verb in Spanish normally combines rst with the direct object (clitic or non-clitic) and
then with the dative clitic.
To analyse the sentence le da el dinero (s/he gives the money to her/him), rst the
verb combines with a non-pronominal direct object to the right with rule (10) below:
2
(10)
3
s-v1
0 list
= 1 s-nom-role
syn:s-loc:s-case-list = 2 list
6 syn:s-loc:s-subcat =
6
4 syn:s-loc:s-nom-case
s-v
2
7
7 =) 4 syn:s-loc:s-subcat =
syn:s-loc:s-nom-case
5
3; 0
= 1
syn:s-loc:s-case-list = 2
3
5
3 s-np-acc
This results in a single element in the subcat list of the mother. Then, using rule (8), the
dative clitic le combines with da el dinero to complete the analysis. Note that the case list
and nominative feature are left unmodied by both rules. Modiers may then combine
with rules (7) or (9).
((
((((
NP-nom
prn
John
V2
((hhhh
hhhh
V1-phr
(((hhhhhh
h
((((
CP
V1-phr
PP
PP
NP-pro
"b
b
"
case NP-pro
V1
H
HH
prn
v-dat
le
da
NP-acc
Q
Q
det
N1
el
n-com
a
ella
dinero
Figure 3.11: Parse tree for John le da el dinero a ella.
A similar process takes place in the analysis of se lo da (s/he gives it to her/him),
131
but this time the rules that apply are (6) once, and (8) twice. The result of yet another
example is the parse tree for John le da el dinero a ella (John gives the money to her)
shown in Figure 3.11. The parse tree shows pictorially how the nucleus of the Spanish
sentence, the lower s-v1-phr phrase, is extended by case phrases and nominative subjects.
3.3.3 Clitic Climbing
John
?
quiere intentar
?
poder
?
dar -se -lo
John
?
quiere intentar
?
poder
?
dar -le el dinero
John wants to try to be-able to give him it (the money)
Figure 3.12: Spanish clitic climbing.
In this section I will show a mechanism for handling clitic climbing which generates grammatical expressions only, assuming just two minor morphological operations: 1) le is
changed to se if it is followed by the accusative clitic; 2) clitics following a non-nite
verb are orthographically appended to the verb - dar se lo ) darselo. The characteristics
of clitic climbing in Spanish (disregarding dialectal dierences) may be described with the
aid of Figure 3.12, in which the arrows indicate alternative clitic positions.
The condition for climbing is that either all or none of the clitics can climb. In the rst
sentence, it would be ungrammatical for each clitic to be associated with dierent verbs.
For example:
Spa: * John quiere intentar-se poder dar-lo.
Lit: John wants to-try-CliDat to-be-able to-give-CliAcc.
However, as the second sentence of Figure 3.12 shows, it is grammatical for the dative
clitic to undergo climbing even when a non-pronominal direct object remains with the
dative verb.
Spa: John le quiere intentar poder dar el dinero.
Lit: John CliDat wants to-try to-be-able to-give the money.
Of course the clitics that can appear at any point in the sentence depend on the verb that
introduces them; if the dative verb is replaced by a transitive one, only an accusative clitic
may appear:
Spa: John quiere intentar-lo poder ver.
Lit: John wants to-try-CliDat to-be-able to-see.
The mechanism for handling these conditions relies on maintaining a list for analysed
clitics encoded as the value of the feature s-junk-list. I will illustrate this mechanism by
describing the analysis of the sentence quiere intentar-se-lo poder dar (s/he wants to try
to be able to give her/him it). Consider the following schema of possible clitic positions
as indicated by the square brackets:
132
[ ] ... intentar [ ] ... [ ]
For any subject equi verb, clitics may appear at any point before it, immediately after it
as a clitic, or at any point after it as the clitic of another verb. Note that clitics follow
non-nite verbs and precede nite ones.
To combine a clitic immediately to the right of intentar (try) it is necessary to determine
a) whether all the clitics have undergone climbing (see restrictions on climbing above) from
the verb that introduced them, and b) whether some clitic has climbed. A combination of
four rules achieves this. One rule combines intentar with the clitics. Another rule passes
the list of allowed clitics upwards through the analysis tree. One more rule ensures that
the clitics combined with intentar are licensed by a transitive or dative verb. Finally,
another rule blocks the occurrence of spurious clitics in positions to the left of intentar.
The rst rule is given below; its main purpose is to add to the junk list any clitics that
have formed a constituent with intentar (there is a dative and accusative version of this
rule due to dierent clitic orderings in innitive verbs).
2
(11)
3
s-v-lex
= 0 list
syn:s-loc:s-junk-list = 1 ; 2 list
4 syn:s-loc:s-subcat
2
3
s-v-lex
5 =) 4 syn:s-loc:s-head:s-vform
= s-inf 5
syn:s-loc:s-subcat = 1 ; 0
syn:s-loc:s-junk-list = 2
1 s-np-pro
Note that the mother and the head daughter are both lexical categories. This reects the
fact that clitics in Spanish form an orthographic and phonological word when combined
with non-nite verbs. This rule will handle both of the following combinations: intentar
and se lo to give the climbed clitic expression intentar-se-lo (to-try him it), and dar and
se lo to give the non-climbed clitic expression dar-se-lo (to-give him it).
The next rule follows Beaven (1990) in passing the subcat value through the subject
equi verbs in order to make it available at the point where the clitics occur.
(12)
"
s-v1
syn:s-loc:s-subcat = 0 cons
syn:s-loc:s-junk-list = 1 (end)
#
"
=)
s-v-vinf
#
syn:s-loc:s-junk-list = 1
syn:s-loc:s-vcomp = 2
2
s-v1
=
2 4 syn:s-loc:s-head:s-vform
syn:s-loc:s-subcat = 0
syn:s-loc:s-junk-list = 1
3
s-inf 5
In this rule, value 0 is passed from the right daughter to the mother. The rule also shows
that the verbal complement unies with the value of the feature s-vcomp and that the
junk list is also passed from head daughter to mother. The rule will combine poder and
dar into poder dar (to-be-able to-give), such that the resulting phrase still requires clitic
complements in order to form a grammatical sentence. The value (end) in the junk list
ensures that all clitics undergo climbing
As the next rule will show, specifying a verbal complement in a separate feature rather
than as an element of the subcat list allows the s-v-vinf verb to combine to its right with
either" a V1 or a clitic.
#
"
#
s-v1
s-v-vinf
syn:s-loc:s-subcat = 0 (end)
(13) syn:s-loc:s-junk-list
=) syn:s-loc:s-vcomp = 2
= 1 cons
syn:s-loc:s-junk-list = 1
2
s-v1
=
2 4 syn:s-loc:s-head:s-vform
syn:s-loc:s-subcat = 1
syn:s-loc:s-junk-list = 0
3
s-inf 5
This rule combines intentar-se-lo (to-try him it) and poder dar to give intentar-se-lo poder
dar. By unifying the subcat list of the dative verb, obtained through (12), with the list
133
of clitics accumulated via rule (11) in the junk list, it is ensured that only licensed clitics
appear with intentar.
To complete the analysis, a rule combines a s-v-vinf verb with a phrase in which all
clitic positions (i.e. subcat items) in its complement have been saturated, or in other
words, through which no clitic can climb:
(14)
"
s-v1
syn:s-loc:s-subcat = 0 (end)
syn:s-loc:s-junk-list = 0
2
#
=)
s-v-vinf
4 syn:s-loc:s-subcat
3
= 0
syn:s-loc:s-junk-list = 0
syn:s-loc:s-vcomp = 2
2
5
s-v1
3
syn:s-loc:s-head:s-vform = s-inf
5
2 4 syn:s-loc:s-subcat = 0
syn:s-loc:s-junk-list = list
This rule combines quiere (wants) and intentar-se-lo poder dar to give the sentence quiere
intentar-se-lo poder dar.
To go though another example, consider the sentence le quiere intentar poder dar el
dinero. A right to left derivation of this sentence is shown in Figure 3.13, where `junk'
and `subcat' indicate the current value of these two features for the right most edge in the
analysis.
Analysis
le quiere intentar poder dar el dinero
le quiere intentar poder dar NP
le quiere intentar poder V1:inf
le quiere intentar V1:inf
le quiere V1:inf
le V1:n
V1:n
Rule
usual NP rules
(10)
(12)
(12)
(12)
(8)
Notes
junk= [ ], subcat= [ProDat]
junk= [ ], subcat= [ProDat]
junk= [ ], subcat= [ProDat]
junk= [ ], subcat= [ProDat]
junk= [ProDat], subcat= [ ]
Figure 3.13: Analysis of: le quiere intentar poder dar el dinero.
To summarize, four dierent rules are needed to handle clitic climbing correctly. One
rule, (11), combines innitives with clitics to the right, while the other three, (12), (13)
and (14), handle the interaction between the junk and subcat lists during innitival complementation; each of these rules applies depending on whether a clitic appears at the
beginning, in the middle or at the end of the verb phrase.
3.3.4 Relative Clauses
The approach I have adopted for relative clauses relies on a simplied gap threading or
dierence list mechanism rst described in Pereira (1981) and later modied in Pereira
and Shieber (1987:125). In the version used here only one gap may accumulate in the
dierence list.
I discuss relative clauses because of their usefulness in translating nouns modied by
PPs, and also because the analysis I have adopted for subjects implies that gaps may
appear in subject position without being subcategorized by a substantive head (Pollard
and Sag 1994:172).
I have partly followed the analysis of Castel (1990), who takes a dierent line from
that of Gazdar et al. (1985) (GKPS henceforth). Castel proposes that the Lexical Head
Constraint of Gazdar et al. (1985:158), which restricts the occurrence of gaps in subject
134
position, is not appropriate for describing Spanish embedded questions. That is, a gap in
Spanish does not have to be introduced by a lexical element and may therefore occur in
subject position. Figure 3.14 shows Castel's and GKPS's analysis for equivalent sentences.
The explanation Castel gives for introducing an alternative analysis is relevant here, since
VP
XXXX
X
X
V
S[+INV]
XXXX
X
pregunta NP[+Q]
quien
S/NP
PPP
P
"
V
V
b
NP
"b
b
"
NP/NP
e
vio a Mara
VP
P
VP
"b
PP
P
S[+Q]
!aa
!!
a
he-asks NP[+Q]
who
VP
QQ
V
NP
saw Mary
Figure 3.14: Castel's and GKPS's analyses respectively.
I adopt part of the analysis but do not agree with the reasoning behind it. Castel rst
assumes that an Immediate Dominance rule for Spanish such as:
ID: S ! NP, VP
would yield both of the following strings if not restricted in some way (Vswh = verb with
embedded question e.g. `ask'):
1) Vswh quien vio a Mara.
2) * Vswh vio a Mara quien.
Eng: Vswh who saw Mary
To prevent 2) he proposes the Linear Precedence (LP) rule:
LP1: [+Q] < VP
to disallow question words from following verb phrases. But he notices further that even
this LP rule would allow (note que that, que what)
* Vswh que Mara compro.
Vswh what did Mary buy.
To stop this sentence he proposes another LP rule:
LP2: VP < NP
135
implying that subjects in Spanish follow their predicates. Because of this rule he is forced
to give embedded questions the analysis of Figure 3.14. This is because both LP rules
above come into play when the subject is a question word, making an extraction analysis
the only one possible. I note in passing that his analysis of simple Spanish sentences
requires extraposition, as shown in Figure 3.15.
S
XXXX
X
X
NP
S/NP
PPP
P
VP
Mara
HHH
V
NP
"b
"
b
NP/NP
e
compro un libro
Figure 3.15: Castel's analysis of Mara compro un libro.
My principal disagreement is with Castel's LP2. The data below shows that in certain
sentence types, subjects cannot follow their predicate, and that when subjects follow their
predicate there is usually a version of the sentence in which the subject precedes it.
Indicative:
a. Mara quiere comprar un libro grande y rojo.
b. * Quiere comprar un libro grande y rojo Mara.
Eng: Mary wants to buy a big red book.
Interrogative:
a. >Mara compro un libro?
b. >Compro un libro Mara?
Eng: Did Mary buy a book?
With clitics:
a. Mara lo compro.
b. Lo compro Mara.
Eng: Mary bought it.
Relative clauses:
a. El perro que Mara compro duerme.
b. El perro que compro Mara duerme.
Eng: The dog that Mary bought sleeps.
Thus, I assume that in general, subjects precede their predicates in Spanish, just as in
English, with certain exceptions which include embedded questions.
Despite the above criticism, I have followed Castel in allowing subject extraction. By
doing this a uniform treatment of relativization can be achieved which consists of the two
rules:
(15)
s-n1
syn = 0
2
s-n1
3
syn = 0
s-nmajor =) 4 syn:s-loc:s-head:s-n-agr = 1 (s-agr) 5
syn:s-loc:s-head:s-n-type = 2 (s-n-type)
2
s-srel-phr
4 syn:s-nonlocal:s-gapin:s-gap-cat:s-n-agr
3
= 1
syn:s-nonlocal:s-gapin:s-gap-cat:s-n-type = 2
syn:s-nonlocal:s-gapout:s-gap-cat = s-null-cat
136
5
2
(16)
s-srel-phr
= 0 s-v
syn:s-loc:s-nonlocal = 1 s-nonlocal
4 syn:s-loc:s-head
3
"
5 =)
s-rel-lex
s-sn-phr
#
syn:s-loc:s-head: = 0
syn:s-nonlocal = 1
Rule (15) restricts the gap to one which agrees with the ller and shares its noun type.
Rule (16) binds the head features and the non-local values between head daughter and
mother to construct a phrase of type s-srel.
The reasons for adopting this analysis instead of the one described by GKPS now
follows. English relative clauses in GKPS are analysed using two rules. One is an extension
of the rule S ) NP VP in which the relativizer is treated as the subject NP. This prevents
gaps in subject position, thus dissallowing strings like:
* Who do you think that will come?
In Spanish, however, such sentences are grammatical:
>Quien crees que vendra?
The other rule from GKPS is an extension of the preposed constituent rule S ) NP
S/NP. This rule assumes that the mechanism for preposed constituents is similar to that
of non-subject relatives, which seems to be the case for English:
Mary, John likes
The woman that John likes
However, in Spanish this is not so:
Mary, John la quiere.
La mujer que John quiere .
These two Spanish sentences suggest that preposed constituents in Spanish leave a clitic
whereas relative clauses leave a gap.
Thus, a GKPS-like treatment of relative clauses in Spanish would disallow sentences
with subject gaps, and incorrectly predict the occurrence of gaps with preposed constituents. One more justication for rules (15) and (16) is that Spanish relative clauses
always contain a relativizer:
Eng: The man I saw sleeps.
Spa: * El hombre vi duerme.
Spa: El hombre que vi duerme.
The two rules above handle both subject and non-subject relative clauses, thus further
justifying their descriptive validity.
In Figure 3.16 I give the parse trees for two sentences to demonstrate the analyses
resulting from the above rules; they conclude the description of the Spanish grammar.
3.4 IL Lists as TFSs
Section 2.2.3 described IL lists and their construction from an abstract point of view. In
this section I present their implementation. An IL list can be built in two ways. One
way is to parse the input, reconstruct the parse tree from the chart by performing all
unications necessary for correct instantiation, and then use the leaves of the tree, in the
137
NP
PP
P
P
Det
N1
PPP
P
Srel/NP
P
N1
el
N
PP
P
V2/NP
Rel
PP
P
P
perro que NP/NP
e
V1
HH
H
V
PP
!aa
!!
a
esta en el parque
NP
!a a
a
!!
a
!
Det
N1
!aa
!!
a
Srel/NP
H
N1
el
N
H
H
Rel V1-phr/NP
perro que
?@
?
@
V NP
vi e
Figure 3.16: Trees for \the dog that is in the park" and \the dog that I saw".
order in which they occur in the input, to construct the IL list. The alternative is to
construct the IL list as part of the TFS through a feature with value isomorphic to the
value of orth. I have adopted the former mechanism as it requires the least assumptions
about the rules and categories.
The IL list extraction algorithm constructs an IL list for each inactive edge which spans
the input string by essentially performing all the implicit unications in the analysis tree.
Each inactive edge will have an instantiated rule either with phrasal sign daughters and
associated pointers to their edges, or with lexical signs. If the edge consists of phrasal
daughters, they are unied with the mothers of their respective edges, and the algorithm
is applied recursively. If the edge consists of lexical signs, or is itself a lexical edge, the
recursion stops, and the lexical sign is added to the result IL list.
For example, assume that rule (17) is a complete parse, forming part of a spanning edge.
2
np
orth =
6
the, cat (17) 4 trans:dist
= 0
3
ind-lex
ind1 = 1
obj
2
det
6 orth =
7
5 =) 4
"
the
trans:dist =
n1
3
orth = cat
trans:dist = 0
ind-lex
ind1 = 1
7
5
#
Extraction of the corresponding IL list starts by noting that the rst daughter (det) is a
lexical sign; it is therefore added to the result IL list; this ensures that index 1 is bound
to the determiner in the IL list. The second daughter (n1) is then unied with the mother
of the rule in the edge to which it points:
138
2
n1
orth =
6
cat
(18) 4 trans:dist
= 0
3
ind-lex
ind1 =
obj
7
5 =)
"
#
n
orth = cat
trans:dist = 0
which leads to the noun being bound to index 1 , and to the IL for `cat' being added to
the result IL list. Figure 3.17 shows the two ILs for this phrase.
2
det
= the
trans:dist =
6 orth
4
3
2
n
4
trans:dist =
6 orth =
cat
ind-lex
ind1 = 0
obj
7
5
3
ind-lex
ind1 = 0
7
5
Figure 3.17: Indexed lexemes for `the cat'.
Indices are encoded in the value of dist: feature ind1 has as value the rst index in a
lexeme, and so on. For example, in the sign for `cat' the type ind-lex includes the index
associated with this noun. Indices are typed according to whether they are associated
with an event, an object or with a spatial relation of the kind proposed in Section 5.2;
the portion of the type hierarchy describing the top level index types is shown in Figure
3.18. Every index has a feature index-id which is used for instantiation during transfer.
top
entity
e
obj
rel
Figure 3.18: Index types: event, object, relation.
If the value of this feature is (string) then the index acts as a variable (i.e. a letter in
the abbreviated notation for IL lists); if its value is an integer then it acts as a constant
(i.e. an integer in the abbreviated notation). For the purposes of analysis, this feature is
concealed from the grammar writer and it is only accessible to the transfer module.
3.5 Conclusion
This chapter had two purposes. The rst was to introduce the analysis mechanisms used in
the MT system being presented. Many aspects of syntax can be described eciently using
TFSs arranged as CF rules with a single mother and a number of daughters. By using
unication and structure sharing many syntactic phenomena can be described elegantly,
including subcategorization and agreement, while maintaining the advantages of ecient
parsing using chart based techniques.
The second goal of the chapter was to motivate a grammar of Spanish for certain
phenomena particular to this language or whose analysis diers from comparable analyses
139
in English. Thus, clitic doubling is treated by proposing a case list, in addition to a
subcategorization list, such that a clitic unies with an item from the subcategorization
list while its doubled non-clitic NP unies with an item from the case list. Clitic climbing
is handled by using an auxiliary list (feature s-junk-list) in which analysed clitics can
be passed up through the verb chain. Since Spanish and English dier in the structure of
their relative clauses, particularly in the case of relativized subjects, it was proposed that
all relative clauses be analysed as involving a gap, even in subject position; this analysis
required arguing against an English-like approach in which subject relatives do not leave
a gap.
The modularity oered by the IL list approach enabled grammars to be written independently of the transfer representation and of the TL. After developing the grammars
it was shown how the IL lists were derived from the parse tree of a sentence. Indices in
lexemes consisted of a type and at least one feature, index-id, in which the status of the
index as variable or constant was encoded.
This is the rst of two chapters describing the operation of the translation system.
Issues relating to transfer and generation were largely ignored as they are taken up in the
next chapter.
140
Chapter 4
Lexicalist Transfer and Generation
In Section 2.2.4 I gave an example of the overall operation of the MT system developed
here. In this chapter I shall concentrate on the modules that eect the transfer and
generation steps. These modules take as input an IL list as constructed from the grammars
given in the previous chapter and convert it, rst into a TL bag and then into a TL
sentence. Algorithms similar to those used by Whitelock (1992) and Beaven (1992a) (see
Section 1.4.8) form the starting point for the main procedures; the transfer algorithm
has been extended to cope with bilexical rules while the execution time of the generation
algorithm has been improved.
4.1 Transfer
Transfer in lexicalist MT takes the form of an algorithm which maps an IL list in the SL
into a bag of ILs in the TL. The algorithm groups all the source ILs in such a way that
they match the SL side of the bilexicon. If this is achieved the output of transfer is the
union of all the TL sides of the bilingual entries used. Originally the source side of bilexical
entries was interpreted as a bag; however, this proved inecient and the nal version of
the algorithm assumes that the input IL preserves the relative order in the source side of
bilexical entries.
4.1.1 Transfer Algorithm
To describe the algorithm, consider the following schematic bilexicon.
<f go g , f ir g>
<f outside g , f fuera , de g>
<f go , outside g , f salir g>
<f young , bull g , f novillo g>
<f the g , f el g>
The following examples show some possible mappings:
[the, young, bull] maps to fel , novillog
[go, outside] maps to fir , fuera , deg and f salir g
[young] maps to (no output)
141
The functions to achieve this mapping are now described. Cover-SL-List takes as input
an IL list and returns a list of covers, where a cover is a set of the bilingual entries which
eect one possible translation of the input. For the rst example above, the relevant cover
would be:
f <ftheg, felg>, <fyoung,bullg, fnovillog> g
Make-TL-Bags takes each cover in the output of Cover-SL-List and returns the TL bag
which constitutes a possible translation. These bags become the input to the generator.
The algorithm for Cover-SL-List is given in Figure 4.1. Intuitively, Cover-SL-List
works by matching the SL side of a bilexical entry against the input IL list; if matching
is successful, then matched ILs are removed from the IL list and the algorithm is applied
recursively to the ILs that remain. The output of Cover-SL-List serves as input to the
function Make-TL-Bags which steps through a covers list and, for each cover, constructs
a TL bag by taking the union of all of the TL sides of its bilingual entries. I will not give
the details of this algorithm as it is quite straightforward. Instead I present an example
which will illustrate the important features of Cover-SL-List.
Example
Given the list
[go, outside]
and using the bilexicon above, the step-by-step computation of the covers list:
[cover 1: f <fgog,firg> , <foutsideg,ffuera,deg> g,
cover 2: f <fgo,outsideg,fsalirg> g ]
by Cover-SL-List is shown in Figure 4.2. This list becomes the input to Make-TL-Bags
which produces the TL bags:
[bag 1: fir,fuera,deg,
bag 2: fsalirg]
The original transfer algorithm, which assumed unordered SL IL bags, allowed transfer
relations to be expressed economically. Thus, whether the input was `to John Mary gives
the owers' or `Mary gives the owers to John', a bilexical rule such as:
gives, to , le, da, a
would have achieved transfer. Unfortunately, if the Cover-SL-List algorithm treated
the SL side of bilexical entries as a bag, the procedure would solve a type of problem
comparable to certain problems in set theory. For example, Garey and Johnson (1979:221),
following Karp (1972), describe the problem of exact cover by 3-sets. This problem
involves a decision on whether or not a set of 3q elements, where q is an integer, can be
exactly covered with subsets of three elements where every element of the set is in exactly
one of the subsets. This is a special case of the original lexicalist transfer problem in
which every source bag consists of three elements and the input sentence always contains
a number of ILs which is a multiple of three. The fact that exact cover by 3-sets is
142
Purpose: given a source IL list construct all possible covers that the list can have using a bilexicon
which denes allowed lexical correspondences
Inputs: an IL list
Results: a covers list
Background: a bilingual lexicon
Working Structures:
Current IL: an IL
Partial IL List: an IL list constructed by removing elements from the input IL list
Source List: a list of SL indexed lexemes taken from a bilingual entry
Current Bilingual Entry: a pair <source list, target bags> containing ILs standing in the trans-
fer relation; its source side contains one of the ILs in the input
Cover: a set of bilingual entries describing the translation of an IL list
Result: a list of all the covers possible from a source IL list; used as nal value of Cover-SL-List
Partial Result: just like result but used as a temporary store for the value of the recursive step
Basic Method: Iteration through bilingual entries, recursive call on part of the input, iteration through
result of recursion:
If the input is the empty list then assign to result an empty cover as its only element and
return result; this covers the base case of the recursion.
Otherwise initialize result to the empty list in order to accumulate results in subsequent steps
Remove the rst member from the input list and assign it to the current IL.
For each bilingual entry in the bilingual lexicon whose SL side contains the current IL as its
rst element do:
{ Make this the current bilingual entry
{ Assign to source list the SL side of current bilingual entry
{ Remove every element of source bag from the input IL, if they occur in the same relative
order, and assign this value to partial IL list
{ Assign partial result the value of applying Cover-SL-List to partial IL list
{ For each cover in partial result do:
Complete the result of the recursive step by adding the current bilingual entry to
this cover, thus incorporating the bilingual entry used for the current level call
Append the cover thus obtained to result, thus collecting all possible translations
of the input
Return result
Figure 4.1: Cover-SL-List algorithm.
143
Step 1 Entering Cover-SL-List with input [go; outside]
Step 2 After initialization, result = [], current IL = go
Step 3 Two bilingual entries have go as rst element, < fgog; firg > and < fgo; outsideg; firg >
Step 4 Make current bilingual entry < fgog; firg >, and source bag fgog
Step 5 Remove go from [go; outside]; partial IL list = [outside]
Step 6 Entering Cover-SL-List with input [outside]
Step 6.1 result = []; current IL = outside
Step 6.2 One bilingual entry has outside as rst element, < foutsideg; ffuera; deg >
Step 6.3 Current bilingual entry = < foutsideg; ffuera; deg >
Step 6.4 Remove outside from [outside]; partial IL list = []
Step 6.5 Recursive call to Cover-SL-List leads to partial result = [fg]
Step 6.6 For each cover in partial result, add current bilingual entry; this leads to result =
[f< foutsideg; ffuera; deg >g]
Step 6.7 No more bilingual entries; hence return [f< foutsideg; ffuera; deg >g]
Step 7 Partial result, from recursion, = [f< foutsideg; ffuera; deg >g]
Step 8 For each cover in partial result, add current bilingual entry: this leads to result =
[f< foutsideg; ffuera; deg >; < fgog; firg >g]
Step 9 Next, current bilingual entry = < fgo; outsideg; fsalirg >
Step 10 Remove go; outside from [go; outside]; partial IL list = []
Step 11 Recursive call to Cover-SL-List leads to partial result = [fg]
Step 12 For each cover in partial result, add current bilingual entry: [f< fgo; outsideg; fsalirg >g]
Step 13 Append this value to result to give result =
[f< foutsideg; ffuera; deg >; < fgog; firg >g; f< fgo; outsideg; fsalirg >g]
Step 14 No more bilingual entries; return result.
Figure 4.2: Computing Cover-SL-List([go; outside]).
144
shown to be NP-complete implies that the original formulation for lexicalist transfer may
have no polynomially bound algorithm in the worst case.
There are two main ways of overcoming this ineciency. One is to restrict bags in
the bilexicon to one or two ILs, which renders the algorithm solvable in polynomial time
(Garey and Johnson 1979:221); the problem with this is that many lexical gaps require
more than two ILs to be equated. The other option, adopted here, interprets the source
side of ILs as a list; the disadvantage of this solution is that the bilexical entry for `gives to'
above could not translate `to John Mary gives the owers'. To translate this sentence one
could specify an additional bilexical entry with the correct ordering. A better alternative
would be to allow a mixture of bag-like and list-like structures in the bilexicon. While
this solution would again make the algorithm NP-complete in the worst case, performance
might be much better than this if most bilexical entries are list-like. This option needs to
be investigated.
There now follows a description of how bilexical entries are encoded as tlinks relating
ILs, and how the interpretation of tlink-rules has been modied to implement a purely
lexicalist view of transfer.
4.1.2 ILs in Tlinks
I describe in this section how bilexical entries consisting of ILs are encoded and how
index sharing between source and target ILs is implemented. I have used the tlinks of
Copestake et al. (1993) (Section 1.4.10) as the basis for the representation of bilexical
entries. However, there are some dierences between the two representations which make
it desirable to use a term other than tlink. In simple cases their approach does not dier
substantially from the one I have taken. Thus, a simplied bilexical entry for `Maryx Marax' is shown in Figure 4.3. Of relevance to this section is the bound value 2 . While
1l-1l-1i-1i-t
2
identity-rule
6
2
6
n-prn
6
6
2
6
6
6
6
6
6
6
6 sfs
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 tfs
6
6
6
6
6
4
=
6
6
6
6
6
6
60
6
6
6
6
6
6
6
6
4
orth =
= 0
6
6 syn =
6
6
6
6
6
6
6 trans
6
6
4
(mary)
syn2
trans
=
6
6
6
6 dist
6
6
4
=
ind-lex
(mary 1)
obj
6 lex = "
6
6 ind1 =
index-id
4
= 2 (string)
1 = 0
2
=
identity-rule
2
s-n-prn
6
6
6
6
6
6
60
6
6
6
6
4
= 4
(mari1a)
s-syn
2
trans
6 orth =
6 syn =
6
6
6
6
6 trans =
6
4
6
6
6 dist
4
=
2
3
7
77
77
777
7 7
377
77
77
3 7
777
7777
#77777
777
77
777
77
7
77
577
7 7
557
77
77
77
57
7
7
7
3
7
7
3
7
7
7
7
77
7
77
377
7
7
7
7
3 77
7
777
7
7
77
77
7
7
7
7
5557
7
7
7
5
5
3
2
3
ind-lex
1)
= (mari1a
s-human
ind1 = index-id = 2
6 lex
4
1 = 4
Figure 4.3: Direct bilexical entry.
feature index-id was hidden from the analysis module, it is accessible during transfer; its
145
purpose is to model index binding across languages. That is, monolingual index binding
is kept separate from bilingual index binding in order to maximize the independence of
type assignments to indices. For instance, in Figure 4.3 the English index for `Mary' has
type obj whereas in Spanish Mara has index type s-human indicating the fact that this
noun requires the preposition a when it occurs as direct object (see Section 3.3). Since
index binding between languages is not aected by the type assigned to an index, the
monolingual grammars are thus very independent of each other.
Turning now to lexical gaps and multiword correspondences, I proposed in Section
2.3.12 a mechanism whereby lexemes were placed into translation correspondence. For
example, the novillo lexical gap in English is overcome with the entry:
youngx bullx $ novillox
(If `young' is analyzed as a non-intersective modier, the description below should be
modied accordingly { see Section 2.3.6). To implement entries like this using tlinks
requires a modication to their interpretation. The new interpretation is this: lexemes
from the SL bag are unied with features 1 to n on the SL side of the TFS, using the
algorithm of Section 4.1.1. If all the source side lexemes are unied, the result of transfer
for that particular TFS corresponds to the values of features 1 to m on the target side.
That is, the main dierence between tlinks and bilexical entries is that the feature 0 no
longer constitutes the point at which translation correspondence is dened. Figure 4.4
shows the simplied bilexical entry for the correspondence above.
2
2l-1l-1i-1i-t
2
2l-rule2
6
6
6
6
6
6
6
6
6
6
6
6
6
6 sfs
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 tfs
6
6
6
6
4
=
6
6
6
6
6
6
61
6
6
6
6
6
6
6
6
6
6
6
6
62
6
6
4
2
=
=
(young)
syn2
trans
6 orth =
6 syn =
6
6
6
6
6
6 trans =
4
2
=
adj-lex
6
6
6 dist
4
=
n-com-sg
(bull)
syn
trans
6 orth =
6 syn =
2
6
6
6
6 trans = 6
4 dist
4
=
6
6 trans
6
4
=
6
6
6 dist
4
=
= (young
1)
ind1 = 0 obj
index-id =
3
"
=
identity-rule
2
s-n-com-sg
6 orth = (novillo)
6 syn = s-syn
2
6
trans
6
6
6
6
6
6
6
6
61
6
6
6
4
ind-lex
6 lex
4
2
ind-lex
lex = (bull
ind1 = 0
1)
7
7
37
# 7
7
77
55
ind-lex
= (novillo
1)
obj
ind1 = index-id = 1
6 lex
4
3
7
77
7
7
77
7
77
377
77
7
7
3 777
7
777
777
77
7
777
55577
77
1
77
77
77
77
77
77
77
77
77
77
77
57
7
7
3
7
7
3
7
7
7
7
77
7
77
7
377
7
7
3 7
7
77
7
7
777
7
7
7
77 7
7
7
557
7
55
5
3
2
3
(string)
Figure 4.4: Tlink for `young bull $ novillo'
4.1.3 A Modication to the Tlink Rules
Section 1.4.10 described the tlink-rules mechanism of Copestake et al. (1993:120) and considered some of the problems arising from using phrasal signs within tlinks for overcoming
146
lexical gaps and similar problems. In Sections 2.3.10 and 2.3.11 I showed how a variety
of translation problems could be overcome by using bilexical rules in which the output
bilexical entry did not contain phrasal information. In this section I describe how bilexical rules are implemented through a modied interpretation of tlink-rules. I do this by
considering the rule for `just' reproduced schematically below:
Ved;y $
Vsd;y
+
$ acabar ded;y;d Vsd;y
justd Ved;y
The modied interpretation consists of viewing tlinks as establishing a translation relation
between sets of lexemes, and not between the TFSs appearing as the value of feature 0 in
the tlink (see also Section 4.1.2). Figure 4.5 shows the simplied TFS encoding the `just'
bilexical rule, with the direction of the mapping shown as an arrow.
1l-1l-2i-2i-t
2
identity-rule
2
6
v-lex
6
6
6
2
6
6
6
6 sfs
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 tfs
6
6
6
6
4
=
6
6
6
6
6
61
6
6
6
4
2
=
= 0
6
6
6
6
6
6 trans
6
6
4
2
=
6
6
6
6
6 dist
6
4
identity-rule
2
s-v-lex
6
6
6
6
6
6
61
6
6
6
4
= 7
6
6
6
6
6
6 trans
6
6
4
2l-2l-2i-3i-t
2
2l-rule2
6
trans
ind2 =
trans
2
=
=
6
6
6
6
6 dist
6
4
6
6
6 ind1
6
4
=
=
6
6
6
6
6
61
6
6
6
4
2
=
=
6
6 ind1
6
6
4
ind2 =
adv-lex
=
(string)
index-id =
obj
index-id = 5
3
(just)
syn
trans
6 orth =
6 syn =
2
6
6
6
6 trans = 6
4 dist
4
2l-rule2
index-id = 5
+
"
=
2 = 0
6
6
6
6
6
6
6
61
6
6
6
6
6
6
4
(string)
index-id =
entity
2ind-lex
lex = (string)
h
= 9 eve
2
2
6
6
6
6
6
6 sfs
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 tfs
6
6
6
6
6
6
4
s-v-vinf-3sg
orth = (acaba de)
s-syn
2
trans 2
6
6 syn =
6
6
6
6
6
6 trans
6
6
4
=
6
6
6
6 dist
6
4
=
ind-lex
lex = (just
ind1 = 2
3
7
77
77
377
7
# 7
77
777
557
7
5
3ind-lex
lex = (acaba
de 1)
h
= eve
6
6 ind1
6
4
2 = 7
1)
3
7
77
7
7
3 777
7
7
77
777
777
i77
7
77777
77777
3
77777
55777
557
7
7
7
3
7
3
7
7
7
3 7
7
7
3 7
7
7
7
777
7
7
7
i77
7
7
7
7
77
7
77
7
7
3 7777
7
7
77
7
55 7
55
5
3
2ind-lex
lex = (string)
h
= 2 eve
2
3
3
index-id = 3
ind2 = 10
ind3 = 9
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
37
7
3 7
77
77
777
777
3777
77
3 7
7 7
777
77
777
i77
777
77
77
77
7 77
7
557
577
77
55
Figure 4.5: Bilexical rule for translating between `just' and acabar de.
Application of a bilexical rule is the same as for tlink-rules and involves unifying a
bilexical entry with the value of t0 to give as output the TFS at t1. The above rule
147
for `just', applied to `arrived - llegar' (morphological inequalities ignored), results in the
bilexical entry `just arrived - acaba de llegar' shown in Figure 4.6. The only connection
2l-2l-2i-3i-t
2
2l-rule2
6
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6 sfs
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 tfs
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
=
6
6
6
6
6
61
6
6
6
6
6
6
6
6
6
6
6
6
6
6
62
6
6
6
6
4
2
=
3
adv-lex
6 orth =
6 syn =
=
6
6
6
6
6 trans
4
2
=
6
6
4 dist
=
v-int-3s
orth =
6
6 syn =
6
6
6
6
6
6 trans
6
4
(just)
syn2
trans
=
(arrived)
syn2
trans
6
6
6
6 dist
4
=
=
2
4
2
ind-lex
lex = (justh 1)
eve
ind1 = 1 index-id =
2ind-lex
lex = (arrived 1)
= 1 ind2 = 5 entity
6
6 ind1
4
s-v-vinf-3s
orth = (acaba de)
6
6 syn = s-syn
2
6
trans 2
6
=
6
6
6
6
6
6
6 trans
6
6
6
4
2
s-v-int
=
6 orth =
6 syn =
=
6
6
6
6
6 trans
4
6
6
6
6
6
6 dist
6
6
6
4
=
index-id =
=
6
6
4 dist
3ind-lex
lex = (acaba
h de 1)
= 12 eve
6
6 ind1
6
6
6
6 ind2
6
4
ind3
(llegar)
s-syn
2
trans
=
7
77
77
7 77
7 77
37 77
77
3 7
7 7
77 7
77
7 77
i57
55 77
77
2
77
7
37
77
77
7
77
777
3777
77
7
3 7
77
7777
77
7
7
77 77
777
77
7
557
57
57
7
6
7
7
3
7
7
3
7
7
7
7
7
77
7
77
7
377
7
7
3 77
7
7
777
7
7
7
i7777
7
7
7777
7
7
2 7
7
7
7
7777
7
7777
7
7777
7
6 7777
7
i 5557
7
7
7
7
7
2
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
5
3
2l-rule2
6
6
6
6
6
6
6
6
6
61
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
62
6
6
4
3
index-id =
= 8 obj
index-id =
h
eve
= 9
index-id =
3
2
2ind-lex
= (llegar 1)
4 lex
ind1 = 9
ind2 = 8
7
7
37
3 7
7
77
77
555
(string)
(string)
Figure 4.6: Bilexical entry for `just arrived $ acaba de llegar'.
between the source and target sides of this bilexical entry is via the feature index-id, which
is instantiated to a unique integer after Cover-SL-List is applied. This instantiation has
the eect of disallowing unication of indices not explicitly bound either during analysis
or during transfer (see Sections 1.4.8 and 2.3.10).
A nal remark on the application of bilexical rules. Before Cover-SL-List is called,
the bilexical rules are applied to all the bilexical entries which contain ILs in the source IL
list. This results in a set of new bilexical entries. It is the union of these entries with the
existing entries which constitutes the bilexicon used for transfer. The bilexical rules may
be applied once again to the set of new entries in order to derive further bilexical entries;
more applications are also possible, but care must be taken to avoid non-termination
(Carpenter 1991).
148
4.2 Generation
The last translation step is generation. The input to generation is a bag of target ILs
with instantiated indices (i.e. integer values for the index-id feature) and its output is a
TL sentence and associated parse tree. Generation is achieved by modifying a standard
chart parsing algorithm, such as that described in Section 3.1.1, to allow it to cope with
the lack of ordering information in the input bag. The modication has been to let the
fundamental rule of chart parsing apply depending, not on the adjacency of substrings in
the chart, but on whether the edges to be combined share any ILs. That is, edges which
include the same ILs cannot combine. The modied version of the rule is given below with
the changes shown in bold face.
Modied Fundamental Rule
If the chart contains edges < l; A ! W 1:BW 2 > and < m; B ! W 3: >, where A and B
are categories and W 1, W 2 and W 3, are (possibly empty) sequences of categories or words,
and l \ m = ;, where l and m are subsets of the input ILs analysed by each
edge, then add edge <m [ l; A ! W 1B:W 2 > to the chart.
Other functions in the algorithm are essentially left unchanged. The eect of this modication is to construct all orderings of the TL bag which are grammatical sentences of the
TL. However, this change also has the eect of changing the algorithm from a polynomial
to an exponential time algorithm. In the rest of this section I will elaborate on this increase
in complexity and on the way it was ameliorated.
Brew (1992) shows that the problem of bag generation, that is, the problem of constructing a parse tree from an unordered sequence of lexemes, is NP-complete. This follows
from the fact that a polynomial time reduction into this type of generation may be established for a known NP-complete problem. A sketch of his demonstration now follows. The
3-dimensional matching problem is NP-complete, as proved, for example, by Garey
and Johnson (1979:50-53). In 3-dimensional matching it is asked whether or not the
elements of three equally sized sets can be grouped into triples having one element from
each set, such that each triple in the solution is a member of a set of restrictions of allowed triples. This problem can be converted to bag generation by dening the input
to the generator as being all the elements of the three sets. Restrictions can then be
encoded as ternary rules of the form x ! aibj ck , where < ai; bj ; ck > is a restriction of
the 3-dimensional matching problem. Furthermore, the single, top rule x ! xx is
introduced to construct valid parses. Thus, a complete valid parse of the input bag would
be equivalent to a \yes" answer in the original NP-complete problem, while if no parses
were possible, the answer would be \no", and the NP-complete problem would be solved.
4.2.1 Brew's Algorithm
In view of the above result, some form of heuristic is necessary to improve generation.
Brew (1992) provides the basis for the algorithm developed here. At the heart of his
proposal is the observation that any valid path must include all the lexical entries in the
input bag. He notes that the main source of ineciency in lexicalist generation is the
construction of substrings which cannot lead to a successful parse. For instance, given the
bag
149
fthe, erce, little, brown, cat, sleepsg
he observes that in bag generation, at least the following substrings are constructed none
of which will ever lead to a complete parse:
The erce little cat
The erce brown cat
The little brown cat
The erce cat
The little cat
The brown cat
The cat
Each of these will further combine with `sleeps' to form strings which will never lead to
complete parses either.
The specics of Brew's proposal rely to some extent on the use of categorial grammar
signs for TL lexemes; by decomposing these into their basic constituent signs, he is able to
construct a graph indicating, by way of edges, the possible functor-argument relationships
between TL lexemes. This graph is rened with a constraint propagation algorithm to
delete functor-argument relations which cannot possibly lead to complete parses. He then
applies a modied shift-reduce parser to the TL bag in order to construct a complete parse,
making use of the reduced functor-argument graph to minimize the number of reductions
made during execution. In addition, every time a reduction is actually carried out, the
graph is updated using constraint propagation; this allows any decisions on the order of
the TL lexemes to constrain the application of future reductions.
In adapting the above proposal for use with PS grammars, Brew's graph has been
reinterpreted as an adjacency graph in which nodes correspond to lexical entries and edges
represent possible adjacencies in a valid parse. For example, the graph shown in Figure
4.7 is the unconstrained graph that results from the input bag above. The grammar that
the
START
fierce
little
END
brown
sleeps
cat
Figure 4.7: Unconstrained adjacency graph.
gives rise to these adjacencies is Brew's adjective grammar, in which certain adjectives
must precede certain others. This graph shows that an adjective like `erce' must precede
every other adjective, `brown' must follow every other adjective, and `little' must precede
`brown' but follow `erce'. Given an adjacency graph, a valid parse will trace a Hamiltonian
circuit through it; that is, it will trace a circuit in which every node in the graph is visited
150
exactly once. Nodes start and end are added to simplify the algorithm; the dotted arrow
between them is included to comply with the denition of a Hamiltonian circuit (Garey
and Johnson 1979:47).
A constraint propagation algorithm such as that of Waltz (Winston 1984:72) deletes
those edges which cannot be part of a complete parse. The algorithm works by iteratively
deleting edges from the graph. An edge is deleted if: a) its target node has an incoming
edge which is the only outgoing edge of some node, or b) its source node has an an outgoing
edge which is the only incoming edge of some node. The iteration stops when no more
changes are made to the graph. For this example with ordered adjectives, the intuitive
eect is that of using the ordering information in the grammar to linearize the input bag.
However, the graph and constraint propagation technique are useful in other circumstances
as I will show in Section 4.2.3.
Applying the constraint propagation technique to the graph in Figure 4.7 can be explained as follows. The only way `brown' can be part of a Hamiltonian circuit is if it
precedes `cat', or in other words, the edge from `brown' to `cat' has to be in any valid
path. This means that one may delete the edges leading to `cat' from `the', `erce' and
`little', since they will never be used according to the denition of a Hamiltonian circuit.
This in turn leads to other nodes having just one incoming or departing edge thus allowing further edge deletion. For this particular case, constraint propagation alone leads to
a single path through the graph. Intermediate and nal graphs are shown in Figure 4.8.
In general however, a single path will not be obtained by constraint propagation alone, in
the
fierce
the
START
little
fierce
little
END
brown
START
END
brown
sleeps
cat
sleeps
cat
Figure 4.8: Middle and nal stages in constraint propagation.
which case the bag generation algorithm will inspect and update the graph before applying
the fundamental rule. In doing this, many possible applications of the rule are blocked:
rst, if an edge in the graph has been deleted and the proposed active-inactive pair assume
that edge (i.e. their right most and left most leaves respectively correspond to the nodes of
the deleted edge) then application is disallowed. Similarly, if application of the rule, and
hence selection of a particular sub-path in the graph, leads to an impasse, application is
also disallowed. The construction of an adjacency graph is the concern of the next section.
4.2.2 Constructing FOLLOW for TFS Grammars
To construct the sort of adjacency graph required for improving bag generation, it is
necessary to have information regarding which lexical categories may follow which others.
151
In a PS grammar this information must be extracted from the rules using an algorithm of
the sort used in the construction of predictive parsers. In addition, for the case of TFS PS
grammars the resulting adjacency information must contain as many of the index sharings
as possible such that the adjacency graph is maximally restrictive. For instance, given the
bag:
fthe1 , fat1, dog1, chased2;1;3 , the1 , white1, cat3 g
the fact that `fat' and `dog' must share indices disallows an edge going from `white' to `dog'
in the adjacency graph. If these bindings are not available when computing the possible
adjacency edges, the graph will lead to few improvements.
I have modied the algorithm presented in Aho et al. (1986:189) in order to apply
it to the domain of TFSs. Aho et al. describe an algorithm originally at the heart of
the pioneering analysis procedures of Knuth (1965) and Earley (1970). The algorithm
takes as input a CFPS grammar and returns a function called FOLLOW which denes,
for each category in the grammar, all the lexical categories that may immediately follow
it in a valid derivation. The algorithm is divided into two steps: the rst one computes
the function FIRST which denes for each non-terminal category all the lexical categories
which may appear leftmost in a derivation of that category. The other step uses FIRST
to construct the value of FOLLOW for every category.
Diculties arise in trying to adapt an algorithm for CF grammars to one using TFSs.
One diculty is found in establishing a nite set of TFSs on which to operate. For example,
consider the following noun phrases and their corresponding values for orth.
2
np
the, dog 7
5
syn
= sem
6 orth =
4 syn =
sem
3
2
np
3
the, fat, dog 7
5
syn
= sem
6 orth =
4 syn =
sem
2
np
3
the, big, fat dog 7
5
syn
= sem
6 orth =
4 syn =
sem
Each of these TFSs is distinct from the others, yet one should expect the value of FOLLOW
for all three to be the same. Thus, some mechanism is needed which would allow all
three TFSs to be grouped together. Such a mechanism is proposed by Shieber (1985)
who introduces the idea of a restrictor. A restrictor is a set of paths indicating what
should be taken into account when determining whether two FSs are equivalent. In the
previous example, the restrictor would be dened as = fsyn,semg, implying that only
information in these features is used to determine equivalence between FSs. The version
of restriction used in this thesis is the more powerful negative restriction introduced by
Harrison and Ellison (1992), in which a restrictor is a set of paths to be disregarded in the
process of establishing an equivalence. Its formal denition is
p?P l i (9r 2 P; r < p) ) 9q 2 P; p q
which states the following: path p may be extended by feature l if p is not the result of
extending a path in the restrictor; if it is, then the whole of p must be part of a path in
the restrictor in order for pl to be included in the restricted FS; otherwise do not include
path pl in the restricted FS. In the case of TFSs, one extra condition must be satised
and this is that the resulting TFS must be well-typed; this means that the value of a
path in the resulting TFS will be a type specied in the relevant type constraint. Figure
4.9 shows a TFS before and after applying the negative restrictor forthg. The square
152
2
np
6
6
6
6
6 orth =
6
6
6
6
6
4 syn =
sem
2
cons
car = the
2
cons
car = fat
6
cons
= 4
cdr = car =
cdr =
6
6
6
6 cdr
4
syn
= sem
3
dog
end
3
7
377
7
77
7
777
557
7
7
7
7
5
2
restricts to
np
orth
syn
= sem
6 orth =
4 syn =
sem
3
7
5
Figure 4.9: Applying the negative restrictor = forthg.
brackets around orth indicate that the complete TFS for orth is shown.
One more problem arises when trying to compute FOLLOW using the basic CF algorithm, which is that the bindings between a category and its FOLLOW value cannot be
preserved. That is, because the algorithm assumes atomic categories, information cannot
be shared between a category and its FOLLOW value.
Modied Algorithms
The solution that follows is an slightly augmented version of that presented in Trujillo (1994). It consists of viewing the value of FIRST and FOLLOW as a set of pairs
of TFSs rather than as sets of values associated with each category. Thus, the bindings
between a category and its FOLLOW value can be preserved for each distinct pair of
categories.
As an example of the results produced by the algorithm, a grammar such as
S
=) NP[dist:ind1= 0 ] VP[dist:ind2= 0 ]
NP[dist:ind1= 0 ]
=) Det[dist:ind1= 0 ] N[dist:ind1= 0 ]
VP[dist:ind1= 0 ,dist:ind2= 1 ] =) Vint[dist:ind1= 0 ,dist:ind2= 1 ]
results in the following value for FIRST:
FIRST = f (S,Det), (NP[dist:ind1= 0 ],Det[dist:ind1= 0 ]),
(VP[dist:ind1= 0 ,dist:ind2= 1 ],Vint[dist:ind1= 0 ,dist:ind2= 1 ]) g
Based on this set, computing the value of FOLLOW results in (end of input is represented
by $):
FOLLOW = f (S,$), (NP[dist:ind1= 0 ],Vint[dist:ind1= 0 ]), (VP,$),
(Det[dist:ind1= 0 ],N[dist:ind1= 0 ]), (N[dist:ind1= 0 ],Vint[dist:ind1= 0 ]), (Vint,$) g
The bindings presented here are rather more constrained than those one would get from
a more realistic grammar. This is especially so in the case of noun-verb index bindings
since, in fact, it is not obligatory for a noun to be the subject of the verb it precedes,
e.g. `the cat in the park sleeps'. However, preserving index bindings is very important for
the eectiveness of the constraint propagation algorithm described earlier, particularly for
pruning the search space when determiners and adjectives are involved.
The algorithms for computing the set of pairs for FIRST and FOLLOW assume an
operation + which constructs a set S 0 = S + p in the following way: if pair p subsumes
an element a of S then S 0 = S - a + p; if p is subsumed by an element of S then S 0
= S ; else S 0 = S + p. The algorithms work iteratively, computing new pairs in each
iteration. A new pair is constructed as follows: unication of a daughter with the lhs of
153
an existing pair in FIRST or FOLLOW results in a modied rule and a modied pair in
which bindings between the mother category and the rhs of the pair are established. The
modied mother and rhs are then used to construct the new pair which is then added to
the respective set. For instance, given rule X ! Y and pair (L; R), unify Y and L to give
X 0 ! Y 0 and (L0; R0); from these the pair (X 0; R0) is constructed and added to the set. It
should be noted that the pairs constituting the value of a set can themselves be compared
using the subsumption relation, in which reentrant values are subsumed by non-reentrant
ones, and combined using the unication operation. Thus, in the principal step of the
algorithm, a new pair is constructed as described above, a restrictor is applied to it, and
the resulting, restricted pair is added to the set using the operation +.
FIRST
Computation of FIRST takes the form of an iteration over all the rules in the grammar,
treating the mother of each rule as the category for which a FIRST value is sought. The
algorithm is as follows:
1. Initialize F irst = fg.
2. Go through all the daughters in the grammar. If X is pre-terminal, then F irst = F irst +
(X; X )! (where (X; X )! means apply the negative restrictor to the pair (X; X )).
3. For each rule in the grammar with mother X , apply steps 4 and 5 until no more changes
are made to F irst.
4. If the rule is X ! , then F irst = F irst + (X; )!.
5. If the rule is X ! Y1 ::Yi ::Yk , then F irst = F irst + (X 0; a)! if (Yi0 ; a) has successfully unied with an element of F irst, and (Y10 ; 1):::(Yi0?1; i?1 ) have all successfully and simultaneously unied with members of F irst. Also, F irst = F irst + (X 0; )! if (Y10 ; 1):::(Yk0; k )
have all successfully and simultaneously unied with elements of F irst.
6. Now, for any string of categories X1::Xi::Xn , F irst = F irst + (X10 :::Xn0 ; a)! if (X10 ; a)
has successfully unied with an element of F irst, and a 6 . Also, for i = 2:::n,
F irst = F irst + (X10 :::Xn0 ; a)! if (Xi0; a) has successfully unied with an element of
F irst, a 6 , and (X10 ; 1 ):::(Xi0?1; i?1 ) have all successfully and simultaneously unied
with members of F irst. Finally, F irst = F irst + (X10 :::Xn0 ; )! if (X10 ; 1):::(Xn0 ; n ) have
all successfully and simultaneously unied with members of F irst. (This step may be
computed on demand).
The last action of steps 5 and 6 adds as a possible value of FIRST; such a value results
when all daughters or categories have as their FIRST value. Since most grammatical
descriptions assign a category to (e.g. to bind onto it the information necessary for
correct gap threading), the pairs (X 0; ) or (X10 :::Xn0 ; ) should have bindings between their
two elements. This creates the problem of deciding which of the s in the FIRST pairs
to use, since it is possible in principle that each of these will have a dierent value. In
the present implementation, the pair added to First in these situations consists of the
mother category or the string of categories, and the most general category for as dened
by the grammar. This value eectively ignores any bindings that may have within the
154
constructed pair. A more accurate solution would have been to compute multiple pairs
with , construct their least upper bound, and then add this to First.
An additional step to the algorithm, applying before computation of FIRST and FOLLOW, was added to handle verb signs containing subcategorization lists and other such
mechanisms. The problem arises because a rule such as:
2
h
vp
syn:loc:subcat = 1
i
=)
vp
6
4 syn:loc:subcat
"
=
cons
car = 2
cdr = 1
#
list
3
7
5
2 sign
will result in a FOLLOW pair (V,sign) which would allow any lexical item whatever to
follow a verb. The additional step computes equivalence classes for lexical categories
using negative restriction, and then expands the grammars by unifying each rule with
each equivalence class, where appropriate. The result is a grammar which includes lexical
information in order to restrict the type of pairs computed.
FOLLOW
The solution obtained for FIRST forms the basis for the computation of FOLLOW, which
proceeds thus:
1. Initialize Follow = f [S,$] g
2. For all rules in the grammar apply the following steps until no more changes are made to Follow.
3. If there is a production A ! B , then Follow = Follow + [B 0 ; a]! for all a such that [ 0 ; a]
successfully unies with an element of First.
4. If there is a production A ! B , or a production A ! B where [; ] successfully unies with
an element of First (i.e. ) ), then Follow = Follow + [B 0 ; a]! for all a such that [A0; a]
successfully unies with an element of Follow.
Using the value of FOLLOW an adjacency graph may be constructed by unifying
lexemes on the left and right side of a pair. If unication is successful, the left lexeme
may be followed by the right lexeme, thus introducing an edge into the adjacency graph.
If unication fails with every pair in FOLLOW, the left lexeme cannot be followed by the
right lexeme in a valid parse.
4.2.3 Reachability Constraints
So far the generation algorithm described does not dier substantially from Brew's: the
grammars for which it is most eective are those where a strict ordering of constituents
is necessary. For example, the ecacy of the technique for the sentence `the erce little
brown cat sleeps' relies on the TL grammar requiring the adjectives to appear in this
order. If this condition is relaxed, the algorithm is less eective, as an example with f the,
thirsty, hungry, sleepy, dog g will show. Assuming there are no ordering constraints on the
three adjectives in this bag, Figure 4.10 shows its adjacency graph before parsing starts,
and after the subparse for `the sleepy dog' has been constructed. The problem here is that,
as described in Section 4.2.1, Brew's algorithm cannot detect the impossibility of `thirsty'
and `hungry' not being linked to the main graph. The reason is that his algorithm only
disallows reductions (or the application of the fundamental rule as the case may be) if
a node is left without edges going into or out of it. Since in a cycle every node has at
155
thirsty
thirsty
the
hungry
S/E
sleepy
the
hungry
S/E
dog
sleepy
dog
Figure 4.10: Constraint propagation with unordered adjectives.
least one incoming and outgoing edge, an impossible partial parse goes undetected. The
solution to this problem relies on augmenting the constraint propagation algorithm with a
reachability check which eciently tests the graph to ensure that every node is reachable
from the start node. The test takes place after the constraint propagation phase has been
performed. This can be done by recursively taking the union of all the possible target
nodes for every node, beginning with the start node. If the resulting set of nodes equals
the total number of nodes, then every node is reachable from the start node and a parse
may be possible from the graph.
4.2.4 Connectivity Constraints
Another important source of ineciency are modiers such as PPs and relative clauses
(Phillips 1993). The problem here is that neither constraint propagation nor reachability
constraints prevent the construction of invalid parses. For example, a bag such as:
fthe, dog, in, the, park, by, the, station, chases, the, catg
with intended interpretation `the dog in the (park by the station) chases the cat' results
in arrows connecting `the cat' and `by the station' (actually, they connect `cat' and `by'
but this may be ignored) among other pairs, as shown on the left graph in Figure 4.11
(only relevant arrows shown). This is because the restrictions in FOLLOW for pairs with
the dog
in the park
by the station
in the park
S/E
chases
the dog
by the station
the cat
S/E
chases
the cat
Figure 4.11: Arrows arising from PP modication.
prepositions are not suciently constraining to detect incorrect modication. The problem
156
with the resulting graphs is that even after several applications of the fundamental rule,
impossible parses go undetected because there are always sucient arrows to ensure that
every node is reachable from the start node. The graph on the right in Figure 4.11 shows
the state of the adjacency graph after the subparse for `the dog in the park chases the cat'
has been constructed; this subparse can never be part of a complete sentence because the
PP that modies `park' is missing. In fact, all of the following sentences are constructed,
but only the last one is complete:
The dog chases the cat.
The dog in the park chases the cat.
The dog in the park by the station chases the cat.
The solution to this problem exploits the requirement that the distinguished ILs in a
rule must be connected (see Section 2.2.3). In this solution, the construction of a subparse
is followed by a check to ensure that the distinguished IL of its root node is connected to the
remaining ILs in the TL bag. This is a valid condition on which to base a decision on the
possibility of a subparse because, once a subparse has been constructed, its ILs can only be
connected to the rest of the bag through its distinguished IL. Since the distinguished IL in
any rule in which the subparse is a constituent must come from the remaining ILs, which
in turn are all connected, it must be the case that the distinguished IL of the subparse is
connected to the remaining ILs.
With these restrictions, the invalid subparse shown on the right in Figure 4.11 above
can now be detected and avoided as follows. First, in the construction of the adjacency
graph, a parallel connectivity graph is derived which determines which lexical signs in
the input bag are directly connected (i.e. share a variable). During lexicalist generation,
whenever the fundamental rule is applied, a check is made to determine whether the lexical
signs not covered by the resulting edge plus the edge's distinguished lexical sign form a
connected graph. If this is not the case, then the edge is discarded. For example, after
variable instantiation, the above sentence may have the following constant assignments in
its semantic representation:
fthe1 , dog1, in1;2, the2 , park2 , by2;3, the station3, chases1;4 , the4 , cat4 ]
Its simplied connectivity graph is shown in Figure 4.12 (double arrows indicate directly
connected lexical signs). Importantly, `by the station' is not directly connected to `chases'.
in the park
the dog
by the station
chases
the cat
Figure 4.12: Connectivity graph.
To discard the edge for `the dog in the park chases the cat', it is necessary to verify that
157
the lexical sign `chases', which is the distinguished lexical sign for the sentence, and the
lexical signs remaining in the input bag (i.e. `by' and `the station') are not connected.
Since this is the case, the edge is discarded. The left of Figure 4.13 shows the connectivity
graph after this edge is constructed, with the connected lexical signs encompassed by the
line. The graph to the right of Figure 4.13 shows an invalid subparse which is allowed
in the park
the dog
in the park
by the station
the dog
by the station
chases
the cat
chases
the cat
Figure 4.13: Connectivity graph for two impossible subparses.
even with connectivity checks. This example is discussed in Section 4.2.6 below.
4.2.5 Results
Table 4.1 shows the results of testing the three pruning techniques above on a version of
the LKB running on a Macintosh LC. One can see from the timings that initialization and
maintenance of the various graphs is fairly expensive in the sense that for simple sentences,
the overhead incurred causes generation with pruning to be slower that without. However,
for more complicated sentences with several modiers, there is a marked improvement in
execution time. The column headed + Pruning indicates that a lazy evaluation of the
conjunction of all three algorithms introduced above (i.e. adjacency constraints, reachability and connectivity) is used to determine whether to allow successful application of the
fundamental rule in the modied chart parser.
Modied chart parser + Pruning
Bag
Time
Wfss Time Wfss
The dog chases the cat
10
14
17
14
The fat dog chases the little cat
24
31
30
22
The fat brown dog chases the little
78
86
60
47
tame
cat
The dog in the park by the station
47
43
64
38
chases
the
cat
The fat dog in the big park by the 321
233 137
80
noisy station chases the tame cat
Table 4.1: Eect of pruning technique on dierent constructions.
158
4.2.6 Remaining Problems
Even with the connectivity constraints presented in Section 4.2.4, there remain invalid
parses which cannot be detected. For example, during generation from the bag `the dog
in the park by the station chases the cat' the edge `the dog chases the cat' is constructed
because `chases' remains connected to the remains of the bag through `in the park', as
shown on the right of Figure 4.13. A solution to this source of ineciency may be to
restrict the indices included in the distinguished lexeme of an edge to be those strictly
necessary for further semantic binding. Thus, if the distinguished IL made available only
the index of the main verb, as in `sleepse', the subparse
The dog sleeps.
would also be blocked because in that case `in the park' would be disconnected from the
subparse. Such mechanisms could keep under control the ineciencies just described.
Although this requires further restrictions on the grammars, it may be that automatic
checking for the relevant restrictions is possible.
Another solution to the modier problem suggested by Phillips (1993:225) is to maintain a queue in which to store modiable constituents (e.g. N1s) in order to delay their
combination with other constituents until modiers (e.g. PPs) have been analysed; the
two constituents would then be combined and treated as a single constituent. While this
algorithm works in practice, it leads to a loss in completeness of the generator.
A substantially dierent approach to the lexicalist generation problem has been developed by Poznanski et al. (1995). Their approach, which can nd a valid TL sentence
in time O(n4), proceeds by rst attempting to parse an arbitrary permutation of the TL
bag. If this fails, an iterative algorithm is invoked which rewrites and reparses this initial
permutation and associated partial analysis until a valid TL sentence is found. In the
following example of their technique, constituents are shown without labels for simplicity.
Assume the following initial, arbitrary TL ordering:
the dog big brown barked
Parsing of this list of lexemes results in only one valid constituent, `(the dog)':
(the dog) big brown barked
Since this constituent does not include all the TL lexemes, the rewriting step is invoked.
This step searches for two constituents, either lexical or phrasal, which can be combined;
assume `(the dog)' and `big' are chosen. They are combined by inserting `big' before
`dog' and reparsing to give `(the (big dog))'. Combination involves reordering constituents
and/or inserting constituents into other constituents. The state of the generator at this
point may be depicted as follows:
(the (big dog)) brown barked
Again, the rewriting step is invoked. Imagine `(big dog)' and `brown' are chosen for
combination. Assuming an English grammar which disallows `brown big dog', combination
involves inserting `brown' between `big' and `dog' in the constituent `(big dog)', to give
`(big (brown dog))'. This time, reanalysis of the TL bag results in:
159
(the (big (brown dog)) barked)
which is a complete parse. Generation then terminates.
The algorithm by Poznanski et al. (1995) requires special data structures to be maintained in order to record the structure of the current permutation of the lexemes (i.e. the
state of the generator). In addition, it imposes two restrictions on the TL grammar. One
is that the grammaticality of a constituent can only be a function of its daughters; this
means that unbounded dependencies are not allowed. The other restriction is that the
grammaticality of a constituent must be preserved whenever a subconstituent is enlarged
by insertion; this implies for example, that when trying to generate `I did not drink any
water' insertion of `any' into `(I (did (drink water)))' must be allowed, even though one
could argue that it should not, given that `I did drink any water' is ungrammatical on
monolingual grounds alone, i.e. the TL grammar should reject such a sentence. These
restrictions may not be unsurmountable, and this approach could prove satisfactory.
4.3 Conclusion
This chapter concludes the presentation started in Chapter 3 on the computational mechanisms that form the basis of the present translation system. One goal of the chapter was
to describe the transfer and generation algorithms in some detail and discuss a number
of complexity issues. While it is possible that the transfer algorithm has no polynomial
solution, its behaviour in practice depends more on the linguistic aspects of the problem
than on purely computational ones.
The generation algorithm used a modied chart parser which did not take ordering
information into account. This placed the algorithm in NP-complete space. To improve
on its computational characteristics, the constraint propagation technique of Brew (1992)
was implemented. The technique was adapted to PS grammars from which adjacency
graphs were built and used to restrict the number of subparses constructed. For this, it
was necessary to extend an algorithm that operated over CF grammars to compute its
FOLLOW function; the extension enabled operation over feature-based grammars. The
characteristic feature of the extended algorithm was its use of a) pairs of feature structures
to encode the value of functions over categories and of b) an addition operation which was
sensitive to subsumption relations between pairs. In addition, an ecient reachability test
was added to Brew's technique in order to extend its applicability to a wider range of
constructions. Finally, a connectivity check was added to prune further the search space.
However, there remain eciency problems that need to be tackled. Modiers such as
PPs and relative clauses are particularly dicult to handle in the current framework. Some
instances of this problem where handled by using connectivity constraints. However, constraints applicable to the general case need to be found. Three techniques were suggested
which could cope with these constructions: restricting the indices passed to mothers, using
a queue of modiers, and placing constraints on the grammars. Unfortunately, neither of
these solutions is completely satisfactory and further research needs to be carried out to
determine better alternatives.
Another goal of the chapter was to describe the implementation of bilexical rules.
The structure of bilexical rules is a modied version of the the tlink-rules of Copestake
160
et al. (1993). The modication interpreted tlinks as establishing correspondences between bags of lexemes rather than between single, phrasal TFSs. Therefore, bilexical rules
mapped from bilexical entries without phrasal signs into entries also without phrasal signs.
The bilexical entries which resulted from this modication eected bindings between indices across languages through a feature index-id which was not available to the analysis
module and which allowed the type of indices to be monolingually dened.
161
Chapter 5
Classication of Spatial Relations
While previous chapters have described the mechanisms and structure of translation using
IL lists, in this chapter and the next the focus of attention shifts to the other principal
concern in this thesis, namely spatial prepositions. The two issues to consider are representation and disambiguation. This chapter concentrates on the former; it develops a
hierarchical classication of spatial relations whose cross-linguistic validity is supported
by successfully applying it to the classication of Hungarian spatial relations. The classication is then used to specify bilingual entries and rules between English and Spanish.
Chapter 6 considers the issue of disambiguation.
A classication of spatial relations is useful to transfer-based MT for a number of
reasons. Firstly, it delimits the domain of study to be considered; this is particularly
relevant for investigating prepositions, whose highly polysemous behaviour needs to be
circumscribed for an appropriate study. Secondly, such a classication can be used to
describe the mismatches between languages and thus highlight the similarities and differences between the spatial domains of dierent languages. Finally, each relation has a
particular semantic content which, while not necessarily expressible in another language
directly, restricts the potential ambiguity that the relation gives rise to, thus simplifying
the description of disambiguation conditions.
However, whilst on the one hand most of the spatial classications used in MT, such
as the one shown in Figure 1.17, are neither suciently detailed nor precise enough to
achieve the purposes just mentioned, on the other hand, monolingual descriptions of the
properties and semantics of spatial prepositions, particularly those of Sondheimer (1978),
Cresswell (1985) and Asher and Sablayrolles (forthcoming), do not address a number of
issues of concern in MT such as lexical gaps, unavailable distinctions in a language and
translational inequivalence. In what follows I will try to bridge the gap between the needs
of MT and the precision of formal semantics.
5.1 Properties and Semantics of Spatial Prepositions
This section surveys some of the criteria that have proven useful in describing the properties and semantics of spatial prepositions. While most of the accounts surveyed have
formalized their claims in terms of logical formulae or model theoretic interpretations I
will be concentrating on their linguistic aspects and especially on the tests used to identify
162
the presence or otherwise of a property or semantic form in a spatial preposition. Such an
exercise will supply a number of tests and classications directly relevant to the hierarchy
I will develop.
5.1.1 Vendler Classes
While the verb classication of Vendler (1967) is not directly concerned with the semantics
of spatial prepositions, there are common elements, both in method and in substance,
between his classication and the one used here.
The distinction between verbs that entail some form of end or result and verbs that do
not can ultimately be traced back to Aristotle's Metaphysics (Dowty 1979:52). However, it
is Vendler (1967) who rst proposes four distinct classes of verbs based on their restrictions
on time adverbials, tenses and logical entailments. The four classes are: states (e.g.
`know'), activities (e.g. `run'), accomplishments (e.g. `paint a picture') and achievements
(e.g. `reach'). Vendler (1967) notes that whereas states and achievements lack progressive
tenses (e.g. `? I am knowing'), activities and accomplishments allow them (e.g. `I am
running'). He also notes that accomplishments normally take temporal adverbials with
`in' while activities take temporal adverbials with `for'.
He painted a picture in an hour ? He painted a picture for an hour
He ran for an hour
? He ran in an hour
Furthermore, if an activity is stopped, then the activity has taken place. This is not so
with accomplishments. Thus, if one stops running, one has run, but if one stops drawing
a circle, one has not drawn a circle.
Dowty (1979:60) further renes this classication by adding a number of criteria, including possibility of modication by adverbs and ambiguity in certain contexts. Among
these criteria are certain entailments which distinguish between verbs; these now follow.
States and activities allow the following type of entailment:
He ran for an hour ) He ran at all times in the hour
whereas the other two classes do not. Also, activities allow entailments of the form:
He is running ) He has run
while the other three classes do not. Accomplishments are the only class to allow:
He painted a picture in an hour ) He was painting a picture during that hour
Finally, movement activities, when occurring with a `to' PP or with a direct object, behave
like accomplishments:
He walked to the park in an hour.
He walked a mile in an hour.
Much of the work which has evolved from the classication of Vendler (1967) is concerned with aspectual and temporal issues (Verkuyl 1972; Pustejovsky 1991b) and I will
therefore not consider it in detail. Nevertheless, it is appropriate at this point to dene
the verbal classication I will be assuming throughout; this is shown in Figure 5.1. The
163
e
homogeneous
state
eve
activity
movement
accomplishment
achievement
non-movement
Figure 5.1: Simple event type hierarchy.
classication is a simplied version of that developed in Sanlippo (1990) which is ultimately derived from Vendler's. Type homogeneous includes states and activities and
type eve(ntuality) includes activities, accomplishments and achievements. I have
extended the type activity with movement and non-movement verbs. Movement
verbs include `run', `walk', `swim', etc. They normally involve a change in the location
of the subject and may be characterized by the presence of one or more of the following
meaning components: causation, path, moving object, a reference location (e.g. origin
or destination) and manner of movement (Talmy 1985). I will not be concerned with
causation nor manner of movement.
Verbs of type non-movement include `play the piano', `act', `sit' and `watch'. Asher
and Sablayrolles (forthcoming) distinguish between verbs of change of location (e.g. `enter
the kitchen'), verbs of change of position (e.g. `run on the street') and verbs of change
of posture (e.g. `lean against the wall'). In terms of these distinctions, movement verbs
in the present classication correspond to verbs of change of location and position, while
non-movement verbs correspond to verbs of change of posture.
5.1.2 Lexical Decomposition in Montague Grammar
Dowty (1979:207-19) proposes formal semantics for a number of prepositions as an extension to Montague semantics. He rst notes that while the following two accomplishments
are derived from activities by adding a PP phrase, the two sentences dier from each other
in that the rst sentence indicates movement of the subject entity, while in the second one
it is the direct object that is moved:
John walked to Chicago.
John moved a rock to the fence.
A variant of this observation will form the basis for disambiguating the prepositions
`among' and `between' to be presented in Section 6.4.8.
Next, Dowty (1979):214 decomposes the meaning of a number of prepositions. Thus
his translation of `John walked to Chicago' into logical form would be:
Dowty: walk(j) & 9z[become :be-at(j,z)] & become be-at(j,c)
The denition of become is based on the interval semantics of Dowty (1979). Thus,
become be-at(j,c) may be loosely paraphrased as saying that given three consecutive
time intervals, a, b and c, it is true that John is not `at' Chicago during a and that he is
164
`at' Chicago during c; certain technical conditions apply during b which are not relevant to
the discussion. This formula also shows that Dowty (1979) introduces an implicit location
(variable z) from which movement originates. Similar decompositions are also proposed
for `into' and `onto' based on the predicates `be-in' and `be-on', and for `away from', `out
of' and `o of' using the negations `:be-at(x,y)', `:be-in(x,y)' and `:be-on(x,y)'. I should
note that Dowty (1979) neither oers an appropriate semantics for `be-at' and the other
predicates nor species when any of `to', `into' or `onto' can be used.
The basic intuition for the above decomposition has been proposed many times before.
For example, in Leech (1969:191) it is argued that \the prepositions to, onto, and into are
respectively the dynamic equivalents of at, on, and in". That is, they allow the following
inferences (Ibid.):
`He has gone to the station' implies `He is at the station'.
`He has gone onto the platform' implies `He is on the platform'.
`He has gone into the restaurant' implies `He is in the restaurant'.
Furthermore, Leech (1969:163) analyses `away from', `o' and `out of' as `not at', `not
on' and `not in', just as in Dowty (1979). Bennett (1975:73) also presents similar pairs of
related prepositions:
away from to
o
onto
out of
into
For example, one says `o/onto the bus', `out of/into the house'.
Bennett (1975:50-57) also extends these types of distinctions to prepositions such as
`over' and `under' which have senses implying a nal destination. Thus compare:
My hand is over the table.
Please put the lamp over the counter. (i.e. to)
The second sentence implies that the position of the lamp changes from not being over
the counter to being over the counter.
5.1.3 Reference to Locations
Sondheimer (1978) oers an analysis of sentences containing spatial PPs in which the PP
refers to the location of an event or state. He introduces a \place object" as the location
in which an event is located. Thus, Sondheimer (1978:244) proposes that the FOL formula
for `John stumbled in the park', with one inconsequent modication, is:
Sondheimer: 9x9p[Stumbling(x) & S(x,John) & P(x,p) & IN(p,the park)]
This formula assumes an event based analysis with a treatment of participants similar to
other Neo-Davidsonian approaches (see Section 2.3.1), but with some dierences in the
actual thematic relations which I will ignore. The relevant features of his analysis are
case `P' and place object `p'. Informally, `P' is the place case which relates the event of
stumbling with the volume the event occupies during its lifespan. Place object `p' is a set
of volume-time pairs in and at which this event takes place. The predicate `IN(p,the park)'
is true just in case every volume in `p' with time within the duration of the stumbling, is
165
in the relation `IN' to the park. Sondheimer notes that this analysis overcomes a number
of diculties with previous approaches. For example, he notes that the predicate analysis
of Lako (1970) for multiple PP modiers is unsatisfactory:
John stumbled in the park under a tree.
Lako: UNDER(IN(John stumbled, the park), a tree)
The problem here is that `UNDER' locates, not the event of John's stumbling, but of the
relation of `being in something'. By contrast, a place object analysis describes the relevant
modication directly:
Sondheimer: 9x9p[Stumbling(x) & S(x,John) & P(x,p) & IN(p,the park) & UNDER(p,the
tree)]
This expression uniformly expresses the contribution of both PPs to the sentence. Place
object analyses also overcome some diculties with treatments of the type Davidson (1967)
suggests, particularly in the case of motional contexts. For example, in the original proposal of Davidson (1967), the following sentence would receive the analysis shown:
I walked across the street to the barber's.
Davidson: 9e[Walking(I,e) & Across(e,the street) & To(e,the barber's)]
Such analyses have no way of expressing the fact that `e' takes place, rst `across the
street' and then `to the barber's', since each PP modies exactly the same event. Using a
place object, and exploiting the fact that these objects consist of volume-time pairs, it is
possible to encode the relevant conditions as (in simplied form and without quantiers):
Sondheimer: GOING(x) & S(x,I) & P(x,p) & ACROSS(SEGMENT(p,t1 ,t2),the street) &
TO(UNIT(p,t),the barber's)
Here, SEGMENT is a function which returns a subset of `p' with times ranging from `t1'
to `t2' (i.e. the volumes occupied while crossing the street), while UNIT returns a subset of
pairs from `p' with times less than or equal to `t' (i.e. the volumes occupied while moving
to the barber's).
Further evidence for the existence of place objects may be added to Sondheimer's justications above in the form of deictic reference. Recall that part of the motivation for event
entities is that they can be referred to by pronouns (Section 2.3.1). Jackendo (1983:5055) notes that expressions such as `here', `there', `everywhere' and `somewhere' refer to
neither events nor objects but to regions in space. In the place object analysis, the referent
of these pronouns and adverbs of place can be adequately identied with place objects and
have as their denotation the volumes that place objects stand for.
I will be adopting the notion of a place object as an index for prepositional phrases
in order to represent prepositional stacking and modication and also as a place holder in
which to indicate the type of spatial relation that a preposition has.
5.1.4 Paths and Journeys
Bennett
Bennett (1975) decomposes the meaning of a number of spatial prepositional senses into
their basic semantic components in order to account for part of their behaviour. Following
166
Fillmore (1971) he uses the ve cases: `locative', `extent', and the three cases `source',
`path' and `goal' called directionals, in order to develop his analysis. His `source' and
`goal' prepositions include `to' and `out of' (see Section 5.1.2); as for `extent', it is used to
describe the spatial sense of `for' as in `we walked along the road for three miles' which I
will not be considering in this thesis. His analyses of locative senses suer from the same
problem as those described in Section 1.6; that is, in some cases they are rather circular.
For example, take his analysis of `over' shown below:
My hand is over the table.
Bennett: [L [superior of table] place]
where `L' indicates locative case and `place' states that this is a spatial, as opposed to
temporal, locative case. Bennett supplies no precise or independent denition of `superior'. While I will not resolve such circularities, I will claim that they do not need to
be made explicit in MT, and that other characteristics of the preposition are of greater
relevance. Somewhat similarly, there are problems with his decomposition of `at', `in' and
`on', resembling those of Herskovits (1986). Thus, he suggests the following for `on'
The ball on the grass.
Bennett: [L [surface of grass] place]
without indicating what counts as a surface and which nouns can reasonably be expected
to give rise to surfaces.
Bennett's analysis of `path' prepositions is more relevant to the classication to follow.
He identies `path' senses in the following prepositions: `across', `along', `past', `through'
and `via'. Thus, `across' is analysed as (`P' indicates case `path'):
She walked across the road to the bank.
Bennett: [P [L [transverse of road] place]]
In his view, a path provides information about the route taken during a movement; he
notes that paths may be modied by sources and goals:
She walked from her house across the road to the bank.
In addition he proposes that paths are normally specied in relation to the location their
route takes. Thus, Bennett (1975:19, 84) proposes that in the sentence
Gwyneth walked through the kitchen to the hall.
Gwyneth walked via the interior of the kitchen. In other words, part of the path described
by the walking takes place in the kitchen, thus suggesting that the path described by
`through' shares the notion of interior with `in'. `Through' is not the only path to share
semantic components with other prepositions. Bennett (1975:86, 60) associates `via' with
`at', and `by' with `past'. Indeed, he shows that there are a number of prepositions,
including `under', `over', `behind', `in front of' and `by' which, apart from their purely
locative sense as in
The ball is over the table.
also have path senses such as:
167
The glass slid over the table to the other side.
Here the glass slides through a path that goes over the table. Paths can occur more than
once as modiers of the same verb phrase, as pointed out by Fillmore (1971). For example:
He walked across the bridge along the river to the mill.
While Bennett (1975) prefers conjunctions separating each occurrence of the path PP, the
possibility of repetition is clear. I agree with these analyses of path prepositions and will
be using them in the classication that follows.
The last relevant observation is from Bennett (1975:35-36) and concerns expressions
such as:
The post oce is over the hill.
It is argued that the sense of `over' in this sentence is dierent from its standard locative
sense. The sense is this: the location specied by the preposition is identied \by indicating
the journey one would have to take in order to get there" (1975:36). In other words, in
the post oce is at the end of a journey going over the hill; I will refer to this sense as
a path-end sense. Bennett (1975) identies two properties of path-end senses: rst they
may be paraphrased as `on the other side of':
The post oce is on the other side of the hill.
The second property is that path-end senses allow the overt specication of the point at
which the path starts. Thus, in
The post oce is over the hill from here.
the PP `from here' species the starting point of a journey that goes over the hill to where
the post oce is. To summarize the observations made by Bennett (1975), one can say
that a preposition such as `over' has four senses which could be loosely paraphrased by:
at a place over, to the place over, via a path going over, on the other side of. These
distinctions will be important for the classication to be developed.
Cresswell
Based primarily on the analyses of Bennett (1975), Cresswell (1985) formalizes the compositional structure of sentences containing spatial prepositions with particular emphasis
on the path and path-end prepositions just described. Cresswell's purpose is to show how
knowledge of a word is used in the interpretation of a sentence; thus, in his denition
of `across', for example, he does not elucidate on the lexical semantics of this word, but
rather, he shows how a knowledge of this semantics is used in the interpretation of sentences containing `across'. His semantics for this preposition rely crucially on the denition
of a relation Racross (p,a,w) where p is a path, a is an object and w is a world. Cresswell
denes a path as a function from moments in time to spaces such that a time interval
consisting of moments traces a path on a region in space; in this sense, his denition is
similar to the SEGMENTs of Sondheimer above (5.1.3). The relation Racross(p,a,w) is
true in case path p is across a in world w. However, Cresswell indicates that the aim
of his analysis \is not to dene Racross but rather to show how the ability to recognize
168
this relation is used by native speakers of L [i.e. a subset of English] when they employ
sentences containing the word across" (1985:103). Such a goal, while relevant to MT, is
not its main concern; that is, the denition just given does not help in deciding when to
use `across' instead of `through', for example, or which of the possible interpretations of
`across' are plausible for any given sentence, or what types of expressions are used in other
languages for expressing the meaning of `across'.
Despite this mismatch in goals, certain criteria used by Cresswell (1985) have been used
in this thesis. He notes that paths can be modied by phrases referring to the intrinsic
temporal structure of a path:
Two days across the desert we ran out of water.
In its path-end sense, `across' may be paraphrased as:
Arabella walks across a meadow from Bill.
Arabella walks at the end of a journey across a meadow from Bill.
which makes explicit the notion of a position at the end of a journey or path. Other
sentences are possible which make reference to the end of a path:
Arabella walks from across a meadow from Bill.
Here, the source of movement required by `from' is supplied by the path-end sense of
`across'.
Finally, he notes that certain prepositions allow measure modication:
Three yards behind the bush a band is playing excerpts from Trial by Jury.
?Three yards by the bush a band is playing excerpts from Trial by Jury.
I have supplied the second sentence to show the dierence in behaviour of `behind' and
`by'.
5.1.5 Lexicalization Patterns
A cross-linguistic study of motion verbs is presented by Talmy (1985), where it is argued
that dierent languages lexicalize meanings in dierent ways. Talmy (1985:59) denes lexicalization as the regular association of one or more meaning components with a morpheme.
For example, it is common for English verbs to incorporate two meaning components `Motion' and `Manner' in a single morpheme (Ibid. p. 62):
The rock rolled/bounced down the hill.
Here, the verbs `roll' and `bounce' encode both, that movement took place, and that the
movement was in a rolling or bouncing manner. Contrast this with a verb such as `enter' in
which `Manner' is not encoded, but rather the course and direction of movement (Talmy's
`Path') is lexicalized.
Talmy's principal insight is that dierent languages prefer dierent lexicalization patterns. In other words, while English normally expresses `Motion' and `Manner' in the verb,
Spanish tends to lexicalize `Motion' and `Path'. In fact, Spanish does not easily lexicalize
`Motion' and `Manner':
169
Spa: * La roca rodo/reboto abajo de la colina.
Gloss: The rock rolled/bounced down of the hill.
Instead, the three meaning components of the English sentence, `Motion', `Manner' and
`Path' are distributed dierently within the sentence:
Spa: La roca bajo la colina rodando/rebotando.
Lit: The rock went down the hill rolling/bouncing.
In the Spanish sentence, `Motion' has been lexicalized with `Path' as a verb, while `Manner'
has been expressed by a gerundive modier.
The notion of lexicalization will be used when analysing certain dierences between
various classes of English and Spanish prepositions. Such analysis will lend support to the
notion of lexicalization and will demonstrate its practicality in MT.
This concludes the survey of treatments and proposals on which I will be building
directly. The next section describes a classication of spatial relations which is justied
on monolingual and multilingual grounds.
5.2 Multilingual Spatial Relations
None of the theories described so far has provided an overall framework for the description of spatial relations. In this section I will motivate a hierarchical classication for a
signicant proportion of the spatial prepositions in English and Spanish using the criteria isolated from the theories just described, augmented with further tests which I have
constructed, and including some of the insights of Herskovits and Hjelmslev presented in
Section 1.6.
The classication developed here will take the form of a type hierarchy, as shown in
Figure 5.2, in which types correspond to dierent classes of relations. Any property of
a relation in the hierarchy is inherited by all the relations lower down. Certain nodes in
the hierarchy introduce features into the relations. The properties associated with these
features seem to apply to relations in a way that depends little on the inheritance structure
of the hierarchy. For example, the scalar feature (i.e. `inside' is scalar: `2 metres inside
the Round Church') applies to some vertical relations (e.g. `below') and not to others (e.g.
`under'); it also applies to one non-lexicalized relation (i.e. `inside') and some intrinsic ones
(i.e. `in front of'). This type of distribution is better captured with features rather than
types.
The description of the hierarchy begins with an explanation of how a spatial relation
is identied, and of how relations are represented within the lexical entry of a preposition.
This is followed by a discussion of each node, working left to right, depth rst through
the hierarchy. To anticipate, the main types of spatial relations may be exemplied as
follow: dynamic relations include prepositions such as `to', `from' and `along'; internal
relations include `in', `on' and `inside', while external relations are exemplied by `behind',
`under' and `next to'. The intuition behind these classes is that dynamic prepositions are
associated with movement verbs, while the other two classes are used to indicate the
location of events and objects. Internal relations make implicit or explicit reference to the
internal or functional structure of their complements, while external relations do not. A
slightly expanded version of this section appears as Trujillo (forthcoming).
170
relation
dynamic
limit [limit-path, limit-internal]
goal
to
bound
source
from
static [transitive, scalar]
unbound
path-like
path [path-loc, parallel]
direction at
along
across
towards
internal
lexicalized
path-end
[path-type]
non-lexicalized
inside
in-on
external
intrinsic
behind
extrinsic
vertical
proximity
under
in
on
vague
non-vague
next to
graded
near
non-graded
by
Figure 5.2: Type hierarchy of spatial relations (arrows indicate lexical rules).
5.2.1 Spatial Relations
There are a number of ways of identifying the presence of a spatial relation in a PP. Firstly,
spatial expressions can be appropriately used as answers to `where' questions:
Where did she go? She went to London/inside the room/behind the curtain.
Secondly, spatial PPs may be modied by the word `right' to indicate precision.
She is galloping right along the river/right on his eld/right behind you.
While, `right' can also modify temporal uses of a preposition (e.g. `right in time'), in
conjunction with the previous criterion it can further support the existence of a spatial
relation. A third criterion is that spatial PPs may be replaced by the adverb `there' to
become deictic representations of space.
The baby played in the garden/there
In terms of their incorporation into the lexical sign of a preposition, I follow Sondheimer (1978) (see Section 5.1.3) in including a referent to place objects in the representation of a preposition. In terms of the IL list framework the place object corresponds to
a third index in the preposition. However, unlike Sondheimer I do not assume a separate
`Place' case but instead include all the relevant indices with the preposition's IL. Thus, an
IL for `by' will not only have an index for its complement and for the entity it modies,
but also for the relation it contains. Thus, its lexical sign is:
2
p-by
3
orth
syn2
trans
6 orth =
6
6 syn =
6
6
6
6
6
6 trans =
6
4
6
6
6
6 dist
4
=
2
3ind-lex
= relation
= entity
ind3 = obj
6 ind1
6
4 ind2
7
7
7
37
3 7
77
7
7
77
7
77
7
557
5
171
where ind1 indicates the type of relation found in the preposition.
The advantage of this approach is that it solves the preposition stacking problem by
allowing a compositional derivation of the representation of stacked prepositions. Thus,
the IL representation of a stacked PP, as described in Section 1.2.2, is:
[fromp;e;q, withinq;p;z , thez , communityz]
In this bag the indices p and q are introduced by the two prepositions; these correspond
to the place objects that Sondheimer (1978) proposes, although for the purposes of this
thesis they become place holders for spatial relations. As the example shows, the ordering
of indices in a preposition is: spatial relation, index of modiee, index of complement. For
example, index e corresponds to the event modied by the stacked structure.
Compositionality is preserved by virtue of there being just one grammar rule which
combines a preposition with a PP, namely the rule PP ) P PP (see Section 3.2.1). An
additional property of the extra index is that it allows prepositional modiers to be bound
to the preposition:
[righty , withiny;w;z , thez , communityz]
thus preserving the essence of the IL approach. The discussion of the properties of each
spatial relations in Figure 5.2 now follows.
5.2.2 Dynamic Relations
This is the rst main subtype in the hierarchy; it corresponds roughly to the direction dimension of Hjelmslev (see Section 1.6.4) and covers the directional cases of Bennett (1975).
Dynamic relations involve the notion of movement; they are normally found in prepositions
such as `from', `to', `along', and `towards', usually modifying movement verb phrases. It is
this property, and the converse diculty of modifying nouns with dynamic relations, that
is their most distinctive feature. Within the group of dynamic relations, it is possible to
identify four subtypes of relations, described below.
Limit, Bound, Unbound and Path-like Relations
Limit
The relations associated with `from' and `to', termed here limit relations, have the
following characteristic: many languages of the world use prepositions (or their analogues)
to express these notions. For example, Japanese has e (to) and kara (from). It is interesting
to note also that in many languages these prepositions may combine with other types of
spatial relation to give more rened meanings (eg. to under, from in front, etc.). The
case of Hungarian will be considered later. Other examples include Malay, where ka- (to)
and dari- (from) may combine with other prepositions to give, for instance, dari bawah
(from under), and Turkish where -e/a (to) -den/dan (from) can also combine with other
postpositions, as in onunden (from in front).
Another property of limit relations is that they seldom appear more than once in an
expression:
1) * She walked from King's from the chapel.
2) * They travelled to Scotland to Aberdeen.
172
Finally, in English they may select for certain properties of the following noun phrase.
This property is associated with the feature limit-internal and will be explained in
more detail in Section 5.3.
Bound
Bound relations are those which limit in some way the spatial extent of an action. Their
distinguishing feature is that they induce telic events for simple past tense sentences,
as may be veried by their compatibility with temporal PP modiers headed by `in'
(Aske 1989:6):
She ran to the store in 5 minutes.
This test is taken from Vendler for the identication of accomplishments (see Section 5.1.1)
and it brings out the bounded character of the movements induced by these relations.
Unbound
Unbound relations do not restrict the spatial extent of an action in the same way that
bound relations do. The main consequence of this property is that they may be modied
by temporal PPs with `for':
She ran from/along/towards the store for 5 minutes.
This test, introduced by Vendler (Ibid.), is used here to capture the unbounded character of
the relation. Under certain circumstances an expression containing an unbound preposition
may be coerced into a telic expression. For example,
She ran from the store in 5 minutes
is grammatical under the meaning `she ran from the store to here in 5 minutes'. Whether
the telic character of such expressions is induced by the spatial or the temporal PP is
unclear. For instance, the temporal PP alone does not induce a telic reading:
? She ran in 5 minutes
On the other hand, modication by `for' blocks further modication by `in':
* She ran from the store for 5 minutes in 5 minutes.
It might be that unbound relations are, in eect, underspecied as to telicity (cf. Copestake and Briscoe (forthcoming)). Under this analysis, unbound relations would allow both
`in' and `for' modication, while bound relations with simple past verbs would only be
modiable by `in'.
Path-like
These relations have the property of being able to modify source and path relations
(the use of path here and in what follows is not to be confused with the notion of `Path'
used by Talmy (1985)):
1) She walked from the shop along the pavement/towards the car park.
2) She walked along the pavement through the tunnel/towards the car park.
Thus in 1) `walked from the shop', which contains a source relation, is modied by the
path-like relations `along' or `towards' giving perfectly acceptable sentences. The same
applies to `walked along the pavement'.
173
Source, Goal, Path and Direction Relations
These four relations constitute the lower end of the dynamic relations sub-hierarchy.
Source
This relation is expressed by `from' for example, and it indicates the place of origin of a
movement. An important feature of this relation in English is that it allows its complement
to be a PP: `he walked from behind the shop'. Furthermore, PP's with source relations
are not allowed as complements of other prepositions: * `he walked at from the shop'.
In Spanish the source relation is slightly more complex than in English since there are
two prepositions expressing this notion: de and desde. The distinction made by them is one
speakers of English are not accustomed to, as explained by Butt and Benjamin (1994:431):
\Desde stresses the idea of movement or distance more than de. It is therefore appropriate
when motion `from' a place requires some unusual eort, or when the point of origin is
mentioned but not the destination...". The following sentences make this more precise.
1) * Corr dos millas de mi casa.
2) Corr dos millas desde mi casa.
I ran two miles (all the way) from my house.
Since de does not emphasize the notion of distance, 1) is not appropriate; the correct
preposition for conveying the meaning intended here is desde. Another eect of this
dierence may be seen in the following existential sentences:
3) * Hay faroles de mi casa.
4) Hay faroles desde mi casa.
There are lampposts (all the way) from my house
In 4) the lampposts are placed along a path from my house up to some unspecied place.
3), if meaningful at all, means that there are lampposts belonging to my house.
The above distinction is expressed in the form of a feature in the type associated
with Spanish limit relations. The resulting lexical type for desde is shown in Figure 5.3.
The important TFS in Figure 5.3 is the one with type s-source. It contains the feature
2
s-p-desde
6 orth = orth
6
6 syn = s-syn
2
6
trans
6
6
6
6
6
6
6
6
6 trans
6
6
6
6
6
4
=
3
(null-lex)
3ind-lex
(desde 1)
s-source
6 il-list =2
6
6
6 lex = 2
6
6
6
6
6
6 ind1 = 4 index-id =
6
limit-path =
6 dist = 6
6
6
6
6
6 ind2 =
6
6
6
4 ind3 =
4
entity
entity
(string)
(+)
7
7
7
37
7
377
77
77
3777
777
777
5777
777
777
777
777
777
557
5
Figure 5.3: Lexical type for Spanish preposition desde.
limit-path with value (+) to distinguish it from the corresponding structure for de with
limit-path value (-). The representation of `from' is similar to that in Figure 5.3 but
without the feature limit-path since the relevant distinction is not made in English.
174
Goal
This relation, present in the preposition `to', expresses the nal destination of a movement. One of the main properties of this relation is that it induces the bound movements
described in Section 5.2.2. Other properties include inferences of the following type: in a
sentence of the form `Subj V-ed to Y', where V is a movement verb such as `walk', one can
infer that for some time t `Subj was in/on/at Y'; this type of inference is similar to those
suggested by Dowty (see Section 5.1.2) and it is also discussed by Parsons (1990:78). An
associated condition is that for all times t0 < t, one may not infer `S was in/on/at Y' at
t0. For instance, in `Mary ran to the shop' one can infer that Mary was in/at the shop at
some time in the past, but that she was not in/at the shop during the running.
As with source relations, goal relations in Spanish are slightly more complex than in
English since there are again two prepositions that express goal: a and hasta. Fortunately
the distinction is similar to the one for source. The following two sentences exemplify the
dierences:
1) * Hay coches a la entrada.
2) Hay coches hasta la entrada.
There are cars (all the way) to the entrance.
Here a in 1) cannot be interpreted as a goal relation and must instead have a static meaning
translated as `there are cars at the entrance'. By contrast hasta can convey the meaning
of goal and the path along which the cars are located. The entry for hasta is shown in
Figure 5.4.
2
3
s-p-hasta
orth
s-syn
2
trans
6 orth =
6
6 syn =
6
6
6
6
6
6
6
6
6
6 trans =
6
6
6
6
6
4
(null-lex)
3ind-lex
(hasta 1)
s-goal
6 il-list =2
6
6
6 lex = 2
6
6
6
6
6
6 ind1 = 4 index-id =
6
limit-path =
6 dist = 6
6
6
6
6
6 ind2 =
6
6
6
4 ind3 =
4
entity
entity
(string)
(+)
7
7
7
37
7
377
77
77
3777
777
777
5777
777
777
777
777
777
557
5
Figure 5.4: Lexical type for Spanish preposition hasta
Again, the structure for a will have (-) as its value of limit-path; English `to' will not
have this feature at all.
Path
This type of relation is present in prepositions such as `along' and one sense each of
`through' and `across'. Since path relations are a subtype of path-like relations, it follows
that paths may be repeated in a sentence, as mentioned in Section 5.1.4.
In addition to path-like properties, paths have other characteristics. Unlike directions,
they can be modied by goals:
He walked along the road to the river.
175
They also allow inferences, at any point throughout the duration of a movement, of
the form `Subj Ved through/across/along Y' ) `Subj was in/on/at Y', this is in contrast
to goal relations where the inference is only valid on termination of the event. In addition,
they allow temporal modication as noted by Cresswell (see Section 5.1.4):
Fifteen minutes through the tunnel he got sick.
The distinctions between the various path relations (i.e. those expressed by `through',
`across' and `along') are studied by Bennett (1975). Firstly, that between `through' on
the one hand, and `across' and `along' on the other, is due to the relation between each
point in the path and the object of the preposition, which in the case of `through' is one
of containment while in `across' and `along' is not; that is to say, each point in space of
a `through' path is contained by the object of the PP. This may be seen in the sort of
inference allowed in typical uses of `through':
The train went through the tunnel ) the train was in the tunnel.
Compare this with the sentence `the boat went across the river' where concluding that
`the boat was in the river' is not permitted (this sort of inference is also noted by
Sjostrom (1990:32) for certain Swedish prepositions). In the case of `across' and `along'
the relation is less specic (Bennett 1975:86); I will assume that their relations are of
position on a plane (or rather, of position on something) in the case of `across', and of
general positioning in the neighbourhood of the object in the case of `along'. Although
this may not be correct in all cases (e.g. `he jumped across the stream') it is sucient for
demonstrating the overall structure of these prepositions.
Secondly, to distinguish between `across' and `along' recall that Herskovits includes the
relation `parallelism of lines' as a component of `along' (see page 83). The import of this
relation may be seen in the inferences available from the following two sentences:
1) She walked along the street to that place.
2) She walked across the street to that place
In 1) the expression `that place' cannot refer to a location on the opposite side of the
street from where the walking originated. Conversely, in 2) the place referred to must be
on the other side of the street relative to the point where movement started. Thus, the
interpretation of `along' requires the notion of a path which is parallel to the object of
the preposition whereas `across' does not. Instead, `across' takes a point of reference and
describes a path in terms of a line which is perpendicular to some contextually specied
plane. It should also be stressed that the term `parallel' in this context is used mnemonically and no direct relation to the geometric concept is intended; thus the path described
by `along' and that denoted by the object of the preposition need not be straight lines,
nor are they required to intersect only at innity.
In view of the distinctions between `through', `across' and `along', it is necessary to
include two features in the type path, as shown in Figure 5.5 (the relevant TFS has been
extracted for clarity). path-loc indicates the type of relation the path has with respect
to the object of the preposition. In this case the relation has not been specied, but it
will be one of the internal relations proposed later (note that in general path relations
may have any type of static relation as the value for path-loc). The feature parallel
176
2
3ind-lex
1)
2
6 lex = (across
6
path
6
6
6 ind1
6
6
6
6
6 ind2
6
4 ind3
3
(string)
internal
= (-)
6 index-id =
6 path-loc =
4
=
parallel
= entity
= entity
37
7
7
77
77
57
7
7
7
7
7
5
Figure 5.5: Indices for `across'.
has the value (-) to express the non-parallel interpretation this preposition requires. The
corresponding representation for `along' would have this value set to (+).
Direction
The direction relation is found in the preposition `towards'. It indicates the orientation
a certain movement has taken but without stating what the nal destination of that
movement is. Directions can act neither as argument nor as functor of other relations:
1) * She drove from towards the tower.
2) * She drove towards under the bridge.
Like paths, they can modify source and path relations, but unlike them it is not possible to
modify direction relations with goal relations: `?she walked towards the bridge to college'.
They also dier from paths in that no inferences like those allowed by paths are possible:
She jogged towards Grantchester 6) She was at/in/on Grantchester.
The relevant TFS for `towards' is given in Figure 5.6.
3ind-lex
lex = (towards
1)
6
direction
6
6
2
ind1 =
6
6
6 ind2
6
4
index-id =
= entity
ind3 =
entity
3
(string)
7
7
7
7
7
7
7
5
Figure 5.6: Indices for `towards'.
5.2.3 Static Relations
Static relations are primarily associated with objects or with the location of the whole of
an event. They correspond to Hjelmslev's coherence and subjectivity dimensions; in fact
the main two divisions of static relations, internal and external, correspond approximately
to these two dimensions.
The most important property of these relations is that they induce inferences of the
following form:
She ran in the park for 2 mins ) She was in the park for 2 mins.
That is, the complete running event takes place in the park. This is unlike dynamic
relations where this type of inference is disallowed:
177
She cycled through the tunnel for 2 mins ) ? She was through the tunnel for 2 mins.
It is this property that distinguishes static from dynamic prepositions and the one that
motivates the static node in Figure 5.2.
Static relations have two features to mark certain properties. The scalar feature
marks whether the relation may be modied by a measure phrase or not. For example:
* She saw a stain 4 in. on the table/next to the cup/under the margin.
She saw a stain 4 in. along the line/inside the tank/below the handle.
The meaning added by measure phrases is one indicating the distance at which an object
is located with respect to a point of reference. For example, in the case of `along the line'
a stain is on the line, 4 inches away from some unspecied point.
The transitive feature derives its name from transitive relations in mathematics, for
which the following holds: 8xyz:xRy & yRz ) xRz. Prepositions expressing this type of
relation include `in' and `inside'. Thus, the following inference holds:
If the toy is in the bag & the bag is in house then the toy is in the house.
Prepositions such as `on' are not transitive however:
The book is on the table & the table is on the oor 6) the book is on the oor.
The notion of transitivity is a language independent test for distinguishing between different types of spatial relations.
There are three types of static relations: internal with subtypes lexicalized and nonlexicalized, external with subtypes intrinsic and extrinsic, and path-end. The last of these
will be considered in Section 5.3.2.
Internal, Lexicalized and Non-lexicalized Relations
Internal
Internal relations generally make reference to certain aspects of their complement NP.
Typical internal relations include `at', `in' and `on'. One says for example `at the theatre'
when the functionality of the theatre is important, or `in the box' to refer to the interior
of the box, or `on the table' to highlight the top surface of the table.
There are two properties associated with these relations. First, in Herskovits (1986:34)
the following eect is observed for `at', `in' and `on'; in a sentence such as:
The bedroom is a pleasant place to work.
one may infer that the place referred to is `in' the bedroom. It is suggested by Herskovits
that it is only with these three prepositions that this sort of inference is possible. Here I
will extend her claim by including `inside' and `on top of' as prepositions exhibiting this
property.
The second property of internal prepositions has to do with the way they induce distributive interpretations in certain expressions. Take the following two sentences:
1) There is a tablecloth on all the tables.
2) There is a tablecloth over all the tables.
178
In 1) a distributive interpretation, where there is a tablecloth on each table, is the preferred
one. On the other hand a collective interpretation, where there is only one tablecloth covering all the tables, is the preferred one in 2), even though pragmatically one may argue
that a distributive interpretation should be preferred. The reason for this dierence is
that somehow the preposition `on', tied to some aspect of the object `table', gives rise
to a distributive reading, because its application depends on individual properties of its
complement. By contrast, `over' does not discriminate in relation to the structure of its
complement and therefore readily takes wide scope over the NP.
Lexicalized
The three prepositions `at', `in' and `on' are identied by Herskovits as having a particular place in the system of prepositions in English (Herskovits 1986:127). In this section
I shall give substance to this claim by classifying them as lexicalized relations and showing what their distinguishing properties are. They are called lexicalized because, as will
become clear, they are best treated as part of the lexical entry of a noun.
A very important feature of lexicalized relations is that they co-occur with certain
generic nouns in a way which can be almost idiomatic. For example, one says `in the
centre', `in the middle', `at the beginning', `at the end', `on this spot', `in this place', `on
that side', `on the edge', `at this point', etc. Table 5.1 gives the approximate frequency
with which certain generic nouns co-occur in the LOB corpus as complements of these
prepositions. It should be noted that spatial, temporal and other senses of these words have
at
in
on
beginning end point centre middle place spot side edge
24 95
62
8
1
9
4 20
6
0 21
0
13
29
51
2
4
1
0
5
14
0
0
1
4 137
18
Table 5.1: Co-occurrence for some nouns and lexicalized prepositions.
been grouped together in deriving the above table. Nevertheless, the relative frequency of
certain preposition-noun pairs is quite high.
Another property of lexicalized relations is that they do not occur as complement nor
functor of other prepositions (especially if one does not view `into' as being a syntactic
combination of `in+to' but as a separate preposition). For example, neither of the following
sentences is grammatical:
1) * The bird ew from in the cage.
2) * The train stopped at by the platform.
A third property of lexicalized relations is that they do not allow measure modication:
3) * The town is 2 miles in the border.
4) * The plate is 5 inches on the table.
Thus the feature scalar will be marked (-).
Lexicalized relations select their complement NPs based on what I have called the
noun's locative type (for example, `bus' has locative type `on', `car' has type `in'). I follow
Hawkins (1988:247) in using the co-occurrence of a noun with certain prepositions (in this
case `at', `in' and `on') to dene the noun's locative type. This idea also follows the proposal
by Grimaud (1988) that speakers of dierent languages, because of socio-historical and
179
at
in
on
bus coach car building house station table (fur.) chair seat window
0
0
0
1
8
10
22
1
1
7
3
3 30
8
52
1
0
17
7
8
2
2
2
0
2
0
28
4
9
1
Table 5.2: Frequencies for determining the locative type of a noun.
cultural dierences, emphasize dierent aspects of an object, thus causing discrepancies in
prepositional usage. Furthermore, Bowerman (forthcoming) shows that from a very early
age, children learning dierent languages classify spatial situations involving motion verbs
in language specic ways; for example, Korean children learn that notions of tight and
loose t are relevant to activity classication, while English children learn that surface and
containment are relevant (Bowerman 1989:157). It is plausible to assume that prepositions
such as `on' and `in' are associated with particular nouns during language learning, as a
language specic categorization. In the present framework, this categorization is reected
by the locative type of a noun, indicated by the feature locative-type in its lexical
entry. (see Section 6.3.3 for further discussion on locative types from the point of view of
the noun). The assignment of locative types will be explained shortly.
In addition to the general properties of lexicalized relations, one can group `in' and
`on' into a distinct type. This grouping is reected by the following test:
The book is partly on the table.
The toy is partly in the box.
? The train is partly at the station.
This suggests that there is a degree of precision required by `at' which is not needed
for `in' and `on'. To reect this distinction, a subdivision in the lexicalized relations is
introduced into the type hierarchy; this distinction is compatible with that presented in
Durand (1992) (see Figure 1.17).
Computing the locative type of a noun is done by a statistical process similar to that
used in constructing Table 5.1: the number of times a noun occurs with these prepositions
in spatial contexts is recorded and the preposition with the greatest number of occurrences
is used as an indicator of the type of the noun. Table 5.2 shows some example results using
this procedure.
The locative type of a noun eectively encodes statistical information. This raises the
question of why not use a purely statistical approach to the translation of PPs. Part of the
answer to this question was already given in Section 1.4.11 in the context of statistical MT.
To the arguments presented there may be added the following. In the simplest approach
to statistical MT, probabilities are computed for a SL word by initially assuming that
it can translate into any of the TL words on an aligned sentence. Based on this initial
guess, probabilities are rened through an iterative process. When the translation and
language models are more sophisticated, the possible translations in the original guess
decrease because of the addition of more structure or better heuristics. It would seem that
in the limit, this addition of structure leads to representations and classications similar
to those developed in many knowledge based approaches. The response to the original
question, then, is that this knowledge is already available in the form of a classication of
spatial relations and that its use can greatly improve the eciency of a statistical model.
Put dierently, it is advantageous to restrict the use of the statistical machinery to those
180
phenomena where KB approaches seem inadequate. It is the contention in this thesis
that for translating spatial prepositions the relevant phenomenon for which statistical
techniques are most appropriate is the translation of lexicalized relations.
Returning now to locative types, the case of `coach' and `bus' is interesting as they
seem to be ambivalent between type `in' and type `on'. The following examples are taken
from the LOB corpus:
\Can't go lighting bonres on this bus!" ..
.. but I went on the bus about six o'clock.
\.. It [the printed page].. can speak .. in the railway carriage or in the bus .."
He spent the hour-long journey in the bus trying over a dozen dierent speeches ..
The uncertainty on the preferred preposition for these nouns may be explained partly by
the observation made in Procter (1978) under the usage notes for `train' where \one travels
in (or, esp. AmE, on a train, bus...)". That is, there seems to be a dialectal dierence
regarding the locative type of these nouns. This gives further evidence for the usage based
selection of these prepositions, and hence for marking each noun with its locative type.
The fact that `in' and `on' are related will form part of the treatment for this type of
ambivalence.
The TFS for the preposition `in' is given in Figure 5.7. The subcat list of the prepo2
p-in
3
orth
syn
6 orth = 2
6
6
2
6
6
6
6
6
6 head =
6
6
6
6
6
6
6 syn = 6
6 loc = 6
6
6 subcat =
6
6
6
6
6
4
6
6
4
6
6
6
2
6
6
6
il-list =2
6
6
6
6
6
6
6 lex =
6
6
6
6
6
6
6
6
6
6
6 trans =
6 ind1 =
6
6
6
6
dist
=
6
6
6
6
6
6
6
6 ind2 =
6
6
6
6
6
4
4
4
ind3 =
trans
major
p2
7
7
7
7
7
7
7
7
7
377
7
77
7
7
7
7
777
7
577
7
7
57
7
7
5
7
7
7
37
7
377
77
77
3777
777
7 7
777
77
5777
777
777
777
777
777
557
5
3
cons
6 car
4
= np-in
cdr =
(null-lex)
3ind-lex
(in
2 1)
in
(end)
(string)
= (-)
6 index-id =
4 transitive
scalar
entity
entity
3
= (+)
Figure 5.7: TFS for the preposition `in'.
sition states that its NP complement must be of type np-in; that is, it must be a noun
phrase with its feature locative-type set to the relation `in'. The way locative type is
encoded in the TFS of a noun will be explained in Section 6.3.3.
Non-lexicalized
Prepositions such as `inside' and `on top of', although satisfying the criteria associated
with internal relations, are not as strongly associated with their complement nouns. For
this reason they are classied as non-lexicalized prepositions. Consider `inside' rst: the
main dierence in syntactic behaviour between `inside' and its synonym `in' is that inside
may be modied by measure phrases, whereas `in' may not:
181
1) The village was two miles inside the border.
2) * The village was two miles in the border.
The TFS for `inside' will thus be marked (+) for both the transitivity and scalar
features.
Regarding `on top of' and `on', they dier in that the latter is less specic about the
particular spatial situation described:
1) The mug on the book is on the table.
2) The mug on top of the book is on top of the table.
Here, 2) is slightly anomalous as it requires that the mug be simultaneously on the top
surface of the book and on the top surface of the table. By contrast, 1) allows the mug
to be on the top surface of the book but not on the top surface of the table and still be
appropriate.
External, Intrinsic and Extrinsic Relations
External
There are many prepositions for which the properties associated with internal relations
do not hold and therefore they have been classied as external relations. Examples in this
group include `behind', `below', `under', `to the left of', `near', and their antonyms and
synonyms. They are called external because they usually take the complement NP as a
point of reference for the location they represent rather than indicating a location which
is a functional part of their complement.
Intrinsic
Intrinsic relations can impose an orientation on an object which then serves as reference
for determining the object or action located by the spatial expression. An example should
make this clearer. In Figure 5.8, dierent points of view give rise to dierent descriptions
of the same scene.
?@
?
@
?
@
Point A
f
J
f
Point B
Figure 5.8: Dierent points of view for the same scene.
In this diagram the following expressions are appropriate descriptions depending on
whether a human-like orientation on the house is imposed or not: the car is to the right of
the house (orientation is imposed on the house); the car is to the left of the house (from
the reader's point of view, i.e. no orientation imposed). In the same way the following are
possible: the house is behind the car (from A with no orientation imposed); the house is
in front of the car (from B, and also with respect to the orientation of the car).
One eect of this multiplicity of descriptions is that an isolated expression can be
ambiguous depending on the interpretation intended. For example, if somebody said that
182
a tree was to the right of the house it would not be possible to determine whether it was on
the side of point A or on the side of point B. It is this sort of ambiguity that characterizes
intrinsic relations: they always have at least two interpretations.
Testing for transitivity shows that interpretations in which a human-like orientation is
imposed on an object do not allow transitive inferences. The following invalid inference is
based on Figure 5.8 with orientations imposed on the car and on the house.
The dog is to the right of the car & The car is to the right of the house 6) The dog is
to the right of the house.
Here, the dog may actually be in front of the house (i.e. on the side of the house with the
door and windows) and still count as being to the right of the car (i.e. on the side of the
car where the driver sits in Britain).
On the other hand, if intrinsic expressions are interpreted without an imposed orientation, transitivity is possible. For instance, taking A as the point of view in Figure 5.8,
the following holds:
The car is in front of the house & The house is in front of B ) The car is in front of B.
It is very dicult to give formal denitions of intrinsic relations without being circular
in some way, and therefore I will not do so here. Instead I will assume that notions such as
`behindness' and the like are found in many languages, and furthermore, that there is no
need to distinguish formally between a particular intrinsic relation and its opposite. That
is, I contend that any ambiguity that arises in the translation of a spatial relation will not
involve choosing between `in front of' and `behind' nor between `in front of' and `to the left
of' (at least for English-Spanish translation; this may not hold for certain languages such
as those mentioned in Bowerman (forthcoming) and those studied by Levinson (1991));
rather, disambiguation in such cases will require selecting between synonyms like `in front
of' and `at the front of'. I will therefore describe the type of dierence found between
pairs of synonymous prepositions.
Consider the dierences between `in front of' and `at the front of'. Firstly, their
behaviour with respect to modication by measure phrases is as follows:
They broke down 10 metres in front of/*at the front of the petrol station.
This means that scalar should have the value (-) for `at the front of', while `in front of'
should be marked (+). This raises the issue of compositionality, since I have marked `in'
as scalar = (-). To avoid any potential clash it is necessary to treat `in front of' as an
idiom whose semantics is not compositional and which is entered as a single unit in the
lexicon.
Secondly, both phrases allow transitive and non-transitive interpretations. Hence the
main distinction between `in front of' and `at the front of' is measure modication. Figure
5.9 gives the entry for `in front of'. Note that in this TFS transitive has value boolean
to encode the ambiguity just described.
Extrinsic
As mentioned earlier, the main dierence between intrinsic and extrinsic relations is
that the latter are not systematically ambiguous in the same way as the former:
183
3
2
p-in front of
6 orth = orth
6
6 syn = syn
2
6
trans
6
6
6
6
6
6
6
6
6 trans
6
6
6
6
6
6
4
=
(null-lex)
3ind-lex
(in front of 1)
intrinsic
6 il-list =2
6
6
6 lex = 2
6
6
6
6
6
6 index-id =
6
6
6 ind1 = 4 transitive
6
scalar =
6 dist = 6
6
6
6
6
6 ind2 =
6
6
6
4 ind3 =
4
(string)
(+)
= boolean
entity
entity
7
7
7
37
7
377
7
77
77
3777
777
7 7
777
7
5777
777
777
777
777
777
557
5
Figure 5.9: TFS for `in front of'.
Room T46 is above/next to John's oce.
In these sentences there is only one interpretation regarding the relative position of the
rooms. That is, T46 is above (next to) John's oce whether an orientation is imposed on
the latter or whether an external point of view is taken.
Vertical, Proximity, Vague and Graded Relations
Vertical
The rst subgroup of extrinsic relations includes four main prepositions: `above', `over',
`below' and `under'. The reason for calling them vertical should be clear from their meanings. The rst thing to note is that vertical relations are transitive:
The bird is over/above the chimney & The chimney is over/above my room ) The bird
is over/above the chimney.
This is one of the features that distinguishes vertical from proximity relations.
Modication by measure phrases, however, divides these prepositions into two groups:
`above' and `below' are in one, and `over' and `under' are in the other:
The apple was 2 feet above/*over me.
The diver was 20 feet below/*under us.
Again, I will not try to characterize relations with opposite meanings but rather present
the distinguishing features between synonyms.
Measure modication is just one dierence between `above' and `over' (and `below'
and `under'). In Bennett (1975:57) other distinctions are presented. One is that `above'
sounds strange when used to denote a path.
1) The bird ew over/?above the hill to its nest.
Another dierence that Bennett observes is that unlike `above', `over' can have a deictic
meaning involving the end of a path over an object:
2) The helicopter is over/*above the hill.
If this sentence means that the helicopter is at the end of a path going over the hill
from an unspecied point, then use of `above' is not possible. That is, `above' can only be
184
interpreted as vertically superior, whereas `over' additionally has a path-end interpretation
(see Section 5.1.4).
Encoding of measure modication is done through the scalar feature using (+) for
`above/below' and (-) for `over/under'. However, I have not explained how a static preposition is associated with behaviour characterized as typical of dynamic prepositions as in
example 1), or what the relation in 2) is. Both of these issues will be resolved in Section 5.3.
Proximity
As its name implies this type of relation is used to express proximity relationships
between objects. The prepositions in this group include `near', `close to', `by' and `next
to'. One property of these relations is that they are not transitive:
Fido is near the tree & The tree is near the house 6) Fido is near the house.
However, proximity relations of this type are commutative in the following sense:
Fido is near the tree , The tree is near Fido.
Vague
Proximity prepositions can be divided into two classes depending on whether they may
be modied by the words `somewhere/anywhere' or not. The rst group is called vague
relations and includes `near', `close to' and `by':
She lives somewhere near/close to/by the Cambridge Drama Centre.
Outside this type is the preposition `next to', which is classed as a non-vague relation
for which `somewhere/anywhere' modication is not common.
Graded
A further distinction may be made within the vague relations depending on whether
they can be modied by the adverb `very':
She reads very near/close to/*by the radio.
The rst group is called graded relations and it is not to be confused with the measure
modication prepositions for which any measure modier, but not `very', is applicable (e.g.
`two inches above the table', `? very above the table'). The second group, containing `by',
constitutes the non-graded relations. Any other distinction within the graded relations
is dicult to establish. Perhaps the only observation in this respect is that `close to'
indicates a smaller distance than `near':
1) The bike is near the bench but not close to it.
2) ? The bike is close to the bench but not near it.
In 1) it is possible to negate that the bike is close to the bench without making the sentence
abnormal. In 2) on the other hand, if the bike is not near the bench it may not be close
to it either. This ordering of meaning is useful for establishing the correct translation of
these prepositions but it will not be encoded in their respective TFSs.
This concludes the survey of the basic spatial relations.
185
5.3 Ambiguity and Other Relations
In the preceding sections I have tried to describe those spatial relations which cannot be
further decomposed into more basic ones. However, a number of spatial meanings cannot
be captured by these relations alone and therefore mechanisms have to be introduced to
express them.
5.3.1 Path and Goal Alternations
First, I will consider some typical alternations in prepositional meaning as described by
Bennett (1975:50-83). It is argued there that `over', `under', `in front of', `behind', `inside'
and `outside' all have senses in which the spatial relation expressed is either a static one,
a path or a goal. I will exemplify each sense with `under' although alternative examples
can be constructed with the other prepositions. The static sense of `over' is exemplied
by the sentence:
The dog is under the table.
Its path sense can be induced by modication with a `to' PP, as in:
The dog ran under the table to the other side of the kitchen.
This sentence means that part of the path followed by the dog was under the table (this
part may be only a proper subset of the total path described by the running); in addition,
the position of the dog at the end of the running is not under the table but at the other
side of the kitchen. By contrast, in its goal sense, `under' does identify the nal position
of the moving object but not its path:
The dog ran under the table in ve seconds.
In this sentence the nal position of the dog is under the table and nothing is said about
the path it takes to get there.
2
p-simp
6
6
6
6 trans:dist
6
4
2
=
3ind-lex
6 lex =
6 ind1 =
6
4 ind2 =
ind3
+
p-simp
6
6
6
6
6
6
6
6
6
6 trans:dist
6
6
6
6
6
6
4
2
2
=
3
2
3
entity
= entity
3
7
77
77
77
7
55
3ind-lex
6 lex =
6
6
6
6
6
6 ind1 =
6
6
6
6
6
6
6
4 ind2 =
ind3 =
3
22(string)
path
index-id =
6
6
6
6 path-loc
6
6
4
(string)
2
intrinsic
= 3
parallel =
entity
entity
6 index-id =
4 transitive
scalar =
boolean
(string)
= boolean
(+)
7
7
377
77
7
3777
7
777
77
77
77
57
777
777
577
77
77
77
77
55
Figure 5.10: Lexical rule for intrinsic to path alternation.
186
3
It is clear that there is some relationship between the static, path and goal senses in
such expressions since in one way or another they all involve the notion of being under.
They dier in that in the static sense, the complete action takes place under the object of
the preposition, while in the path sense only part of the path followed by the movement is
under the object, and in the case of a goal sense only the destination is under the object.
It would be unenlightening and inecient to have unrelated, separate entries for each
of these senses because they would not capture the regularities in the meaning alternations
of the six prepositions in question. Instead I will relate the various alternations by way of
lexical rules as (Copestake et al. (1993) and Section 1.4.10). The lexical rule for mapping
intrinsic relations (e.g. `in front of', `behind') to their corresponding path relations is
shown in Figure 5.10 (only relevant paths shown).
The important value to note is re-entrant structure 3 which, being an intrinsic type,
becomes the value of the feature path-loc. That is, the path described by the output
preposition is dened in terms of the value of the spatial relation from which it derives.
For example, the spatial relation on the output TFS after applying the rule to the lexical
entry of `in front of' is shown by the feature ind1 below:
3ind-lex
front of 1)
lex = (in
6
2
path
6
6
3
2
6
6
6
6 ind1
6
6
6
6
6
6
6
4 ind2
=
index-id = (string)
2
intrinsic
6 index-id = (string)
= 4 transitive = boolean
scalar = (+)
6
6
6
6 path-loc
6
6
4
parallel =
boolean
= entity
ind3 = entity
37
7
377
7
77
7
777
577
7
77
57
7
7
7
7
5
In a similar fashion, `behind' would result in a TFS such as this but with the type (behind 1) as the value of lex. Thus, a single rule derives path senses for all intrinsic
prepositions. Two more rules would be necessary to capture the path senses of `over' and
`under' and of `inside' and `outside'. To derive the goal senses of these prepositions, three
additional rules are needed: one for the intrinsic `in front of' and `behind', one for vertical
`over' and `under' and one for non-lexicalized `inside' and `outside'. The reason three rules
are needed for both path and goal mappings is that, as presented, each rule only applies
to one pair of prepositions and to no others. Since the input side of a rule can only contain the specication of one type of relation, dierent rules with dierent relation types
in the input are necessary. This proliferation of rules could be easily overcome by using
disjunctive feature structures as the value of the input spatial relation or by introducing
meta-lexical rules which constructed goal deriving rules from path deriving ones.
Instead of elaborating on the goal deriving rules just mentioned, I will describe the
derivation of goal senses for the lexicalized relations `at', `in' and `on'. These rules need
additional commentary because, unlike the corresponding rules for `over' in which no
morphological changes take place, the goal sense of a lexicalized relation is orthographically
irregular (e.g. `in' results in `into', `at' results in `to'). That this relationship does in fact
exist, however, was discussed in Section 5.1.2.
The problem posed by these derivations arises in the values of orth and lex because
they are not identical in the input and output signs in the lexical rule. While one may
187
argue that `in' and `into' are closely related, the fact that the meaning I am trying to
capture is more like `to in' than `in to', and that there is no orthographic relation between
`at' and `to', makes an orthographically related analysis unappealing.
To encode the goal sense derivation for lexicalized prepositions, I will dene a general
lexical rule which can be minimally modied to derive each of `to', `onto' and `into'. The
rule, which constructs goal prepositions out of lexicalized ones, is shown in Figure 5.11.
The important re-entrant structures to note are 0 and 3 . The rst one forces the same
2
p-simp
np-acc3 7
7
3ind-lex
7
6 lex = (string) 7 7
syn:loc:subcat:car = 0
6
6
6
6
6 trans:dist
4
2
3
2
=
= 3
= 2
ind3 = 4
6 ind1
4 ind2
77
55
+
p-simp
= 0
2
6 syn:loc:subcat:car
6
6
6 lex = 2
6
6
6
6
6
6
6 index-id =
6
6
6
6
6 ind1 = 6
6
6 limit-internal
6 trans:dist = 6
6
4
6
6
6
6
6
6
6
6
6
4 ind2 = 2
4
3ind-lex
(string)
goal
ind3
3
(string)
= 3
2
lexicalized
4 index-id
= (string)
transitive =
scalar = (-)
boolean
entity
= 4 entity
37
7
7
377
77
77
3777
777
777
5777
577
77
77
77
77
55
Figure 5.11: Lexical rule type p-lexicalized-2-goal.
type of noun phrase on both subcat values. For example, the fact that the lexical entry for
`on' requires NPs of locative type `on' is carried forward into the derived goal preposition
`onto' by virtue of this binding. Re-entrancy 3 preserves the properties of the input
preposition in terms of the nal location of the moved object. That is, `onto something'
must encode the fact that after completion of movement, the moved object will be `on
something'. This is achieved by encoding the sort of limit implied by the relation in the
feature limit-internal.
Specializing this rule such that the relationship between, for example, `on' and `onto'
is made explicit in the system, is done by specifying the input and output orthographies
and predicates. In the LKB notation (see Section 1.4.10) this can be expressed very
economically as:
p-on-2-onto p-lexicalized-2-goal
<1> <= on_1 <>
<0:orth> = ``onto''
<0:trans:dist:lex> = ``onto_1''.
This says that the lexical rule p-on-2-onto is dened by restricting the input side of type
p-lexicalized-2-goal to be the lexical entry for `on' and the orth and lex features on
the output to have the string type (onto) and (onto 1) respectively. Application of this
rule to `on' results in the appropriate sign for `onto'. Similar rules can be dened which
derive static and source senses of `o' and `out of' from `on' and `in' (see Section 5.5.3).
188
5.3.2 Path End Static Relations
There is one more type of spatial relation to consider; it is the one found in the sense of
`across' discussed by Bennett (1975) and which I referred to as the path-end sense (see
Section 5.1.4). It is exemplied by the following sentence:
The shop is across the road.
To interpret this sentence one imagines a path going across the road, and at the end of
that path one nds the shop. According to Bennett (1975) the following prepositions also
allow path-end senses: `over', `under', `through', `along', `round', `past'.
The interpretation of the sense of `across' just introduced cannot be covered by any
of the relations discussed so far. Therefore it will be necessary to introduce a relation
called path-end which is used only on the output side of lexical rules. In other words this
relation does not occur in the lexicon but is instead derived from prepositions denoting
paths. While nothing crucial depends on this pattern of occurrence, the point is worth
making in order to justify the delayed discussion of this spatial relation.
Using the criteria for static relations presented in Section 5.2.3, it is possible to establish
that path-end relations are a type of static relation. That is, the following inference holds
when `across' is interpreted as indicating a position at the end of a path:
She walked across the river for 1 minute ) She was across the river for 1 minute.
However, it is dicult to classify path-end relations under a more specic type in the
hierarchy. Thus, `across' as a path-end cannot be an internal relation because it does not
allow certain default inferences. Compare the following two sentences:
The river is a good place for swimming.
The river is a good place for walking.
In the rst sentence, one assumes that the location of the swimming is in the river, because
as an internal relation, `in' refers to locations functionally and conventionally associated
with its object. By contrast, the meaning of the second sentence does not coincide with
the meaning which would be predicted if `across' were an internal relation. Such a meaning
would have to be that the walking necessarily takes place on the other side of the river
from where the speaker is.
Similarly, `across' is not an external relation because it prefers distributive readings:
There is a battalion across all the bridges.
In this sentence the preferred reading seems to be one where there are several battalions,
one at the end of every bridge. For these reasons it has been necessary to classify pathend as a sister type to internal and external relations, as shown in Figure 5.2. I will now
describe some other properties of path-end relations.
Path-end relations, because of their deictic character, may be modied by the phrase
`from here' (see Section 5.1.4):
The post oce is across the road from here.
or by other PPs with `from'. An example with the path-end sense of over is:
189
The bird is over the fence from where the cat is.
* The bird is above the fence from where the cat is.
In addition, the following paraphrase preserves the basic meaning of the sentence:
The post oce is across the road from here.
The post oce is at the end of a journey across the road from here.
With these tests, the existence of a path-end sense may be established.
The construction of path-end from path prepositions is done through a lexical rule
whose output contains a relation of type path-end. The rule is given in Figure 5.12.
This rule shows that, just as with paths and goals, the input relation is assigned to the
2
3
p-simp
6
6
6
6
6
6
6 trans:dist
6
6
4
2
=
= 0 (string)
= 1
= entity
ind3 = entity
+
2
p-simp
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 trans:dist
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
3ind-lex
6
6 lex
6
6 ind1
6
6 ind2
4
37
7
77
77
77
77
77
77
57
5
3
2
3ind-lex
6
=
6 lex
6
6
6
6
6
6
6
6
6
6
6
6
6 ind1
6
6
6
6
6
6
6
6
6
6
6
6
6 ind2
4
ind3
=
=
02
path-end
= (string)
2
6
6 index-id
6
6
6
6
6
6
6
6
6
6 path-type
6
6
6
6
6
6
4
path
=
1
6
6 index-id
6
6
6
6
6 path-loc
6
6
6
6
6
4 parallel
= (string)
2
=
internal
= (string)
= boolean
= boolean
6
6 index-id
6
4 transitive
scalar
= boolean
= entity
= entity
37
7
7
77
7
377
7
77
7
77
777
7
377
7
77
7
77
7
7777
7
7
7
37777
7
777
7
77
77
77
7
7
77777
77777
57777
7
777
7
777
7
777
7
5777
577
7
77
77
77
77
57
5
Figure 5.12: Path to path-end lexical rule.
value of a feature in the output relation. This time, the value of path-type in the output
preposition is taken from the input preposition. That is, a preposition such as `across'
applied as input to this rule results in a sense of `across' in which the location described
by the preposition is at the end of a path which is across (as opposed to e.g. through) its
complement. In this way, distinctions such as:
Her car is through the tunnel.
Her car is across the meadow.
can be described.
190
Unclassied Prepositions
There are a number of spatial prepositions in English which do not t well into the type
hierarchy just developed. `Outside', `against' and `opposite' are just three of them. The
meaning of `outside' seems to be related to the object it modies. That is, its object
must be a volume or a type `in' noun. Therefore, it should be possible to analyse it as an
internal relation, but this would mean that `outside' could be inferred in certain contexts.
However, this is not possible since a sentence such as `the best place to hide the toy is the
cupboard' does not usually mean that the place referred to is outside the cupboard.
`Against' appears to indicate some sort of contact between surfaces, so perhaps one
would guess it was an internal relation. But again, an expression such as `there is a ladder
against all the walls' is decidedly odd, and an interpretation where there is a ladder placed
sideways against a series of walls is probably the preferred one. Thus `against' does not
seem to induce distributive interpretations in the same way that `in' or `on' do.
Finally, `opposite' is in some ways similar to `in front of' but it cannot qualify as an
intrinsic relation because it only appears to have oriented meanings. For example, the
vehicle in Figure 5.8 cannot be said to be opposite the house, whereas it can be said to
be in front of the house if one takes point of view B. A similar problem is posed by the
Spanish for `opposite', namely frente a.
5.4 Description of Hungarian
In this section I apply the type hierarchy to Hungarian, a language which is quite unrelated
to English and Spanish. The main reason for doing this is that, since Hungarian is a
Finno-Ugric language and therefore not Indo-European, the cross-linguistic validity of the
hierarchy above would be supported if spatial relations in this language could be classied
into the same types.
Before justifying the classication, I will give a very brief sketch of the Hungarian
language in order to place the system of postpositions into context. Most of the information
here has been gathered from informal descriptions of the language such as those of Arthur
and Ginever (1909), Payne (1987) and Erd}os et al. (1990).
Morphologically the most important feature of Hungarian is that it has vowel harmony.
Very roughly this means that, in general, words only contain vowels from one of two
groups: front and back. In addition, any sux attached to the word must conform with
this harmony; that is, a word consisting of front vowels will require a sux with front
vowels.
Turning now to morpho-syntax, and to the noun phrase in particular, Hungarian has
no gender distinctions for nouns, adjectives or articles. It has no number agreement in the
noun phrase between the adjective and the noun, nor between the article and the noun.
However, in sentences in which adjectives act as predicates (for example `the car is red')
the adjective agrees with the subject in number. There are denite and indenite articles,
the latter being somewhat similar to the English `one'; when an indeterminate meaning
is expressed, it is common to leave the noun phrase determinerless. As in English, the
adjective precedes the noun. Any predicative sentence which in English would be formed
with `to be' has no copula in Hungarian. There are postpositions instead of prepositions to
191
indicate the relation of an NP to a sentence. Most grammatical relations of NPs, however,
are expressed by cases rather than postpositions or word order. Postpositional phrases
may not modify nouns directly but rather they must be placed in a relative clause to the
left of the noun. Possession may be marked both on the possessor or on the possessed
noun.
Regarding the verb, Hungarian is a `pro-drop' language meaning that sentences need
not express their subject explicitly. Verbs conjugate for person, number, and, in the case
of transitive verbs, for deniteness of the direct object. Preverbs may be prexed to verbs
to alter the verb's meaning and/or aspect (cf. English particles). Most verbs inect for the
following tenses and moods: present indicative, past indicative, imperative, subjunctive
and present conditional. There is also an innitive form for verbs. Other tenses and
moods are expressed by combinations of these conjugations and auxiliary verbs. Modality
and causation are expressed by suxes added to the verb. There is no equivalent for the
possessive sense of English `have'; instead van (to be) is used, with the possessor in the
dative case. Finally, there is no passive voice in Hungarian.
Word order in Hungarian is looser than in English, depending mostly on informational
structure. A phrase which is immediately before the verb usually receives emphasis. Normally, Hungarian exhibits SOV and SVO word order, depending frequently on whether
the object has a determiner or not.
Here are some examples to illustrate the above description:
Hun: Tanulok. (Pro-drop)
Eng: Work-I = I work.
Hun: Peter-nek a kert-je szep. (No copula in predicates)
Eng: Peter-DAT the garden-his beautiful = Peter's garden is beautiful.
Hun: Peter-nek van haza. (No direct equivalent for possessive `have')
Eng: Peter-DAT is house = Peter has a house.
Hun: Penz-t ad E va-nak. (SOV order with determinerless objects)
Eng: Money-ACC gives Eva-DAT = s/he gives Eva some money.
Hun: A tanito a u-val ir-at-ja a leveleket. (Suxes express causation; denite conjugation
of verbs)
Eng: The teacher the boy-with write-makes-DET the letters = the teacher makes the boy
write the letters.
Hun: Ir-hat-ok. (Modality is expressed with suxes)
Eng: Write-may-I = I may write.
Hun: Az eger fut a macska el}ol. (Certain postpositions have nominative NPs)
Eng: The mouse runs the cat from = the mouse runs from the cat.
Hun: A kocsi-ban ul}o u sr. (PostP modifying nouns go in a relative clause)
Eng: The car-in sits boy cries = the boy (sitting) in the car cries.
Table 5.3 shows all the place and direction cases and postpositions, classied according
to whether they answer questions with `where?', `to where?' or `from where?'. Some
postpositions govern cases other than the nominative.
5.4.1 Classication of Postpositions
Using the spatial relations hierarchy it is possible to order a large proportion of the above
set of cases and postpositions, as shown in Figure 5.13. If the predictions associated with
192
at
in
on
under
above
between/among
in front of
behind
beside
near
below
inside
above
outside
beyond
by (proximity)
next to
against
as far as
along
across
over
through
this side of
opposite to
facing
towards
away from
Where?
-nal
-ban
-n
alatt
folott
kozott
el}ott
mogott
mellett
kozel -hoz
-n alul
-n belul
-n folul
-n kvul
-n tul
korul
kozvetlenul
ellen
-ig
menten
-n at
-n altal
-n keresztul
-n innen
-val szemben
-val szembe
...
...
To where?
-hoz
-ba
-ra
ala
fole
koze
ele
moge
melle
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
fele
...
From where?
-tol
-bol
-rol
alol
fol}ol
kozul
el}ol
mogul
mell}ol
kozel-r}ol -hoz
-n alul-rol
-n belul-r}ol
-n folul-r}ol
-n kvul-r}ol
-n tul-rol
...
...
...
...
...
...
...
...
...
...
...
...
fel}ol
Table 5.3: Hungarian locative cases and postpositions.
relation
PP
PP
PP
PP
static
dynamic
.
(
XXXXX
(((( l
(
(
XX
((
l
limit((
internal
external
( bound(( unbound
(
(
(
PP
PP
(
aa
( !!
(
(
(
(
(
P
PP
(
?
!
!
P
?
((( aa(
?
(((
goal
source
path-like
lexicalized
intrinsic
extrinsic
non-lexicalized
PP
PP
!H
H
el
o
tt
-hoz
!
-tol
-n belul
? HH
HH
PP
PP
!
?
P
P
!
mogott!!
-ba
-bol
path
direction -nal -ban -n
proximity
HH
!!
-ra
-rol
-n altal
fele
!
!
H
-n keresztul
vertical
vague
non-vague
6
HH
menten
ul
alatt
H kozvetlen
-n at
?
?
folott
-n alul
-n folul
graded
non-graded
-hoz kozel korul
mellett
Figure 5.13: Hungarian spatial relations hierarchy.
193
each type in the hierarchy are borne out, then support for the multilingual validity of the
classication scheme I have proposed will be strengthened. Each of the following examples brings out a prediction from each type of relation. They have all been checked for
grammaticality with a native speaker.
Goal
Bounded activity and telic inference are exemplied by the following sentences.
Hun: 5 ora alatt utazott Budapest-re.
Eng: 5 hours in travelled-3sg B.-onto = s/he travelled to B. in 5 hours.
Hun: Utazott Budapest-re ) B.-en volt.
Eng: Travelled-3sg to B. ) s/he was in B.
Source
One can see from Table 5.3 that source relations are the most common after static ones,
conrming the prediction that they allow spatial arguments more readily. Their unbound
character is brought out by `for 5 hours' modication:
Hun: 5 ora-t utazott Budapest-rol.
Eng: 5 hours-ACC travelled-3sg B.-from = s/he travelled from B. for 5 hours.
Path
Paths allow path and goal modication.
Hun: A folyo menten setalt a haz-ba.
Eng: The river along walked-3sg the house-into = s/he walked along the river to the house.
Hun: Az alagut-on keresztul setalt a folyo menten.
Eng: The tunnel-on through walked-3sg the river along = she walked along the river through
the tunnel.
Paths also allow inferences at any point during movement with a lexicalized relation:
Hun: Ment az alagut-on keresztul ) az alagut-ban volt.
Eng: S/he went through the tunnel ) she was in the tunnel.
Direction
Among other things, directions do not allow spatial complements, goal modication,
or `telic inference':
Hun: * Ment a haz-ba fele.
Eng: * Went-3sg the house-into towards = s/he went towards into the house.
Hun: * Ment a falu fele a haz-ba.
Eng: * Went-3sg the village towards the house-into = s/he went towards the village to the
house.
Hun: Ment a falu fele 6) a falu-ban volt.
Eng: S/he went towards the village 6) she was in the village.
Lexicalized
Certain spatial common nouns have strong preferences for lexicalized prepositions (see
Table 5.4).
In addition, lexicalized relations do not allow measure modication:
194
Hun elej-en
a veg-en ebben a kerdes-ben a kozep-en
Eng at the beginning at the end on this point
in the middle
Table 5.4: Hungarian common nouns and lexicalized prepositionss.
Hun: * A falu 20 merfold-re van a orszag-ban.
Eng: * The village 20 miles-onto is the country-in = the village is 20 miles in the country.
Furthermore, like all static relations, they imply that the complete duration of an
action takes place in the specied locations:
Hun: 5 perce-t setaltam a park-ban ) a park-ban voltam 5 parce-t.
Eng: I walked in the park for 5 minutes ) I was in the park for 5 minutes.
Non-lexicalized
The equivalent of `inside' in Hungarian can be modied by measure phrases:
Hun: A jatek 2 meter-re van a kor-on belul.
Eng: The toy 2 metre-onto is the circle-on inside = the toy is 2 metres inside the circle.
Intrinsic
The dening feature of these relations was ambiguity which depended on whether an
orientation was imposed on the complement NP or not. Given Figure 5.8 the following
two sentences are valid descriptions of the position of the house:
Hun: A haz a kocsi elott van. (Orientation imposed on the car)
Eng: The house is in front of the car.
Hun: A haz a kocsi mogott van. (No orientation imposed, from A)
Eng: The house is behind the car.
Vertical
In addition to transitivity, which also holds for these relations in Hungarian, measure
modication is a distinguishing factor between synonymous relations:
Hun: A hal/madar 20 meter-re van a csonak alatt/folott.
Eng: The sh/bird 20 metre-onto is the boat below/above = The sh/bird is 20 metres
below/above the boat.
Hun: * A hal/madar 20 meter-re van a csonak-on alul/folul.
Eng: * The sh/bird are 20 metres under/over the boat.
Graded, Non-graded and Non-vague
As in English, graded relations in Hungarian may be modied by an intensier:
Hun: A konyv nagyon kozel van az asztal-hoz.
Eng: The book very near is the table-to = the book is very near the table.
Similarly, there should be non-graded vague relations and indeed this is so. Below are
examples of relations which allow `somewhere/anywhere' modication.
Hun: A konyv valahol az asztal korul/mellett van.
Eng: The book somewhere the table by/beside is = the book is somewhere by the table.
Hun: * A konyv az asztal nagyon korul van.
Eng: * The book the table very by is = the book is very by the table.
Hun: * A konyv az asztal nagyon mellett van.
Eng: * The book the table very beside is = the book is very close to the table.
195
In addition, one would expect the existence of a non-vague proximity relation. There
is one such relation:
Hun: * A konyv valahol az asztal kozvetlenul van.
Eng: * The book somewhere the table next-to is = the book is somewhere next to the table.
Lexical Rules and Ambiguities
I have indicated in Figure 5.13 the lexical rules that connect the various postpositions
and cases in Table 5.3, thus covering most of the postpositions I left out in the exposition
above. For instance, the static to source rule (indicated by the arrow leaving the static
node in Figure 5.13) is intended to cover all the expressions under the column labelled
From where? in Table 5.3.
Finally, Hungarian allows path alternations for some of its static postpositions:
Hun: A hd alatt setalt a park-ba.
Eng: S/he walked under the bridge to the park.
Hun: A templom mogott ment a jatszoter-re.
Eng: S/he went behind the church to the playground.
But unlike English it does not have path-end relations in the sense that there is no single
lexical entry which conveys this relation:
Hun: Az utca masik oldala-n lakik.
Eng: The street other side-on lives = s/he lives on the other side of the street.
5.5 Bilingual Correspondences
Having described the encoding of bilingual entries in the LKB and the spatial relations
hierarchy, it is now possible to show how cross-linguistic spatial knowledge is encoded.
Recall that in lexicalist MT the bilingual lexicon is the only source of contrastive information; hence, all cross-linguistic knowledge will reside there or in bilexical rules operating
over it. In this section I present the various correspondences between English and Spanish
using the spatial relations hierarchy as a tool for describing dierences and similarities
between the two languages. The relations hierarchy for Spanish, shown in Figure 5.14,
will serve as a guide to the discussion that follows.
5.5.1 Simple Equivalence
To begin with, consider a fairly unambiguous spatial preposition, namely `inside', and
its Spanish translation dentro de. Figure 5.15 shows the bilexical entry that encodes the
equivalence between these two lexemes. It will be noticed that the values of ind1 in the
source and target TFSs are identical, reecting the synonymity of these two prepositions.
In such cases there is no translation mismatch in the sense of Kameyama et al. (1991)
between the two languages; that is, transfer does not result in a loss of information in
either direction.
196
relation
dynamic
limit [limit-path, limit-internal]
s-goal
a
hasta
s-source
de
desde
bound
static [transitive, scalar]
unbound
internal
path-like
path[path-loc,
parallel]
por
lexicalized
direction
en
a
external
path-end
al otro lado de
non-lexicalized
intrinsic
dentro de
delante de
fuera de
extrinsic
vertical
hacia
proximity
debajo de
vague
non-vague
junto a
graded
non-graded
cerca de
por
Figure 5.14: Spanish spatial relations hierarchy.
2
1l-1l-3i-3i-t
2
p-inside
6
6
6
6
6
6
6
6 sfs:1
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 tfs:1
6
6
6
6
6
4
=
6
6
6
6
6
6
6 trans
6
6
6
6
4
2
=
=
2
trans
6
6
6
6
6
6 dist
6
6
6
4
=
s-p-dentro2de
trans
6
6
6
6
6
6
6
6 trans
6
6
6
6
4
=
6
6
6
6
6
6
6 dist
6
6
6
6
4
=
2
2
3ind-lex
(dentro de 1)
non-lexicalized
3
= (+)
(+)
entity
ind3 = obj
6 lex = 2
6
6
6
6 index-id =
6 ind1 = 4 transitive
6
scalar =
6
6
6 ind2 =
4
3
3
Figure 5.15: Bilingual lexical entry for `inside - dentro de'.
197
3
7
7
77
7
777
37777
7777
7777
57777
7777
7777
7777
7777
5557
7
7
7
7
3
7
3
7
7
3 7
7
7
7
7
77
7
3777
7
777
7
777
7
7777
7
5777
7
777
7
777
7
777
7
777
7
555
5
3ind-lex
(inside 1)
non-lexicalized
3 (string)
= (+)
(+)
entity
ind3 = obj
6 lex = 2
6
6
6 ind1 = 4 index-id =
6
transitive
6
scalar =
6
6 ind2 =
4
3
5.5.2 Translation of Regular Alternations
In Trujillo (1992) it was observed that certain spatial prepositions in English have predictable translations in Spanish. The source of this predictability arises from the dierent
senses of the same preposition, as identied by Bennett (1975). The following example
illustrates this regularity.
Eng: She ran understatic the bridge (in circles).
Spa: Corrio debajo del puente (en crculos).
Eng: She ran underpath the bridge (to the other side).
Spa: Corrio por debajo del puente (hasta el otro lado).
Eng: She ran undergoal the bridge (and stopped there).
Spa: Corrio hasta debajo del puente (y all se detuvo).
Depending on the sense of `under', its equivalent in Spanish varies, but this variation is
regular. The regularity can be more clearly appreciated if other prepositions are compared,
as shown in Table 5.5. The pattern that emerges is that the translation of the path sense
P
static
path `along P' goal `to P'
behind
in front of
inside
under
over
near
detras de
delante de
dentro de
debajo de
encima de
cerca de
por detras de hasta detras de
por delante de hasta delante de
por dentro de hasta dentro de
por debajo de hasta debajo de
por encima de hasta encima de
hasta cerca de
Table 5.5: Regular translations for path and goal alternations.
requires the introduction of the preposition `por' while that of the goal sense requires
`hasta'.
Translational patterns of this sort can be captured using the bilexical rules described
in Section 4.1.3 which create new bilexical entries from existing ones and which allow
lexical introduction into the output. Figure 5.16 shows the general structure of one of the
bilexical rules needed to encode the regularities in question. Application of this rule to
the bilexical entry `under - debajo de' eects the mapping:
under
+ path-rule
under
$
debajo de
+
$ por debajo de
This bilexical rule states that an application of the English lexical rule path-rule to an
English preposition requires the addition of the preposition por to its Spanish translation
to obtain an expression of equivalent meaning. Such a mapping reects the lack of path
lexicalization (Talmy 1985) for many Spanish prepositions.
5.5.3 Translation of Irregular Alternations
Although many relations are covered by a few rules like the one in Figure 5.16, there are
other relations in English which are not easily captured in a regular way. Of particular
interest are path to path-end alternations. For example, consider the two senses of `across'
and their translations:
198
2
tlink-rule
= (e-s-vert-path)
6 tlink-rule-id
2
6
1l-1l-3i-3i-t
"
6
identity-rule
6
6
6
6 sfs =
1 = 0 p-simp
6
6
6 t0 =
6
6
6
6
6
6
6
6
6
6
6
6
6
6 t1 =
6
6
6
6
6
6
6
6
6
6 srule
6
6
6
6
6
6 trule
4
6
6
6
4 tfs
2
"
=
37
7
7
77
77
77
77
#77
77
57
7
7
7
3 7
7
#
7
7 7
7 7
7 7
7 7
3 7 7
7 7
7 7
5 5 7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
#
identity-rule
1 = 1 s-p-simp
1l-2l-3i-3i-t
"
identity-rule
= 1 = 2 p-simp
2
2l-rule
4
= 1 = s-p-por
6
6 sfs
6
6
6
6
6
4 tfs
2
=
3
2 = 1
path-rule 3
40
= 2
1 = 0
5
3
0-top-rule
h
i
= 4 0 = top 5
2
Figure 5.16: Bilexical rule for regular translation mapping.
She walked across the meadow.
Camino a traves de la pradera.
The shop is across the street.
La tienda esta al otro lado de la calle.
The Spanish prepositional expressions in each translation are very dierent, thus seeming idiosyncratic. However, considering the translation of other path-end relations, one
immediately notices that Spanish has just one way of naturally expressing such notions:
Her oce is through the corridor.
Su despacho esta al otro lado del pasillo.
The lake is over the hill.
El lago esta al otro lado de la colina.
That is, ne meaning distinctions allowed by the English prepositions are not captured
easily in Spanish. One explanation for this would be that, because Spanish does not distinguish between three dierent locative types, as English does, there is a lack of discrimination in other spatial relations too, including path-end relations. Given this observation,
it is possible to write one bilexical rule to derive the necessary English-Spanish bilexical
entries. The general structure of such a bilexical rule is shown below:
p-simppath
+ path-end-rule
p-simppath?end
$ s-p-simppath
+
$ a el otro lado de
This says that a path-end reading derived from a path preposition in English translates
as the bag al otro lado de in Spanish. Unfortunately, this bilexical rule is not very general
because it assumes that the path preposition in Spanish consists of one lexeme, which is
not usually the case. For example, the translation of `over' in its path sense is the twolexeme expression por encima de, whose bilexical entry `over - por encima de' does not
199
match the input side of the rule. There are only two single lexeme path prepositions in
Spanish, por and a traves de and therefore the rule as it stands only applies to these two.
Hence, an additional rule such as:
p-simppath
+ path-end-rule
p-simppath?end
$ por s-p-simp
+
$ a el otro lado de
has to be specied. This rule says that any bilingual entry relating path prepositions in
which the Spanish side has por as one of its lexemes, gives rise to a new bilingual entry
relating the path-end sense of the preposition in English with the bag a el otro lado de in
Spanish. For instance:
overpath
$ por encima de
+ path-end-rule +
overpath?end $ a el otro lado de
The rule can be made to operate over the output of other bilexical rules such as those of
Section 5.5.2 to construct path-end bilexical entries. The two-step mapping would be:
oververtical
+ path-rule
overpath
+ path-end-rule
overpath?end
$ encima de
+
$ por encima de
+
$ a el otro lado de
This mechanism would establish a large number of translation relations with minimal
encoding.
To summarize some of the preceding discussion one can say that path-end relations in
Spanish are expressed by al otro lado de regardless of the locative relationship the path
has with its complement, whereas in English path-end relations can express ner meaning
distinctions.
Another case that needs to be included in this section is that of the prepositions `on'
and `in' and their counterparts `o' and `out of' in their static senses. For example:
The boat is o the coast.
The dog is out of the house.
Since the relationship between `on/in' and `o/out of' is morphologically irregular, it
will be necessary to specify dierent bilexical rules for each output entry, just as if their
output had been encoded in the bilexicon directly. However, it is important to capture any
connections that exist within the bilexicon even if they do not reduce the total number of
information structures. The relevant bilexical rules are:
$ en
+ on-2-o + + en-2-fuera
o
$ fuera de
$ en
+ in-2-out + + en-2-fuera
out of
$ fuera de
For example, given the bilexical entry for `on - en', a new entry `o - fuera de' is conon
in
structed. By expressing the bilingual entries for `o' and `out of' in this way, the fact that
the output English preposition requires its complement NP to be of a specic locative type
is made explicit. That is, `o' requires its complement to be of locative type `on', whereas
`out of' requires an `in' type noun.
200
The patterns expressed by these two rules are interesting in two respects. Firstly, they
show that the lack of an `on/in' distinction in Spanish is reected in the translation of
`o' and `out of'. Secondly they make use of the type of rule dened in Section 5.3.1,
thus exploiting the relationship established there between prepositions which have no
orthographic relation.
To conclude the presentation of bilexical entries and rules required, I give summary
tables in Figure 5.17 of a number of translations of English and Spanish spatial expressions. The tables show simple and multi-word equivalences, equivalences obtained through
Single lexeme
Multi-lexeme
Regular
above
above
across
against
among
behind
below
between
from
in front of
in front of
inside
near
next to
on top of
over
over
through
to
towards
up to
under
under
encima de
sobre
a traves de
contra
entre
detras de
debajo de
entre
de
delante de
frente a
dentro de
cerca de
junto a
sobre
encima de
sobre
por
a
hacia
hasta
debajo de
bajo
across
through
over
across
through
over
from
to
into
onto
o
out of
along
beyond
beside
by
next to
a lo largo de
mas alla de
al lado de
al lado de
al lado de
Irregular
al otro lado de
al otro lado de
al otro lado de
hasta el otro lado de
hasta el otro lado de
hasta el otro lado de
desde
hasta
hasta
hasta
fuera de
fuera de
behind
in front of
inside
over
under
behind
inside
near
over
under
por detras de
por delante de
por dentro de
por encima de
por debajo de
hasta detras de
hasta dentro de
hasta cerca de
hasta encima de
hasta debajo de
Lexicalized
at
at
in
in
on
on
a
en
a
en
a
en
Figure 5.17: Summary of translation equivalences.
morphologically regular bilexical rules, equivalences which require irregular morphological modications, and the relationship between the lexicalized prepositions in the two
201
languages. The latter are discussed in the next chapter in the context of their disambiguation.
5.6 Conclusion
The purpose of this chapter was to motivate a classication of spatial relations and to
use this classication as a framework in which to describe the translation correspondences
between English and Spanish prepositions. The classication took the form of a type
hierarchy of relations augmented with a number of features, where each type and feature
was associated with a number of dening properties.
Certain prepositions were analysed as ambiguous between senses involving dierent
types of relation. I captured this ambiguity by establishing a number of lexical rules
which mapped between senses; the way preposition disambiguation takes place will be
considered in detail in the next chapter.
To demonstrate the cross-linguistic validity of the spatial relations hierarchy, the spatial
system of Hungarian was classied using tests similar to those developed for English. This
led to a fruitful description of the locative system of this language and to the identication
of the lexical processes associated with its spatial postpositions.
A number of bilexical rules were dened which constructed new bilingual correspondences from existing ones. I presented regular patterns relating the path and goal sense
of a number of English prepositions and showed how they could be used for relating them
to their Spanish counterparts. I also described two types of irregular alternation: one in
which Spanish only has one expression for translating dierent prepositions in English, as
with path-end relations (e.g. `across/through - al otro lado de); the other in which a regularity in the meanings expressed was not manifest in the orthography of either language
(e.g. `on - en' and `o - fuera de').
It emerged that English and Spanish dier in their sensitivity to the `in/on' distinction
and this is reected in their prepositional systems. For example, only one preposition,
fuera de, expresses the meaning of both `out of' and `o'. Similarly the distinction between `across' and `through' is lost in Spanish: the most adequate translation for both
prepositions in their path-end senses is al otro lado de. Furthermore, the lexicalization of
path and location exemplied by the path sense of `under', for example, was not possible
in Spanish, and instead the phrase por debajo de was necessary.
202
Chapter 6
Translation and Disambiguation of
Prepositions
This chapter brings together the ideas presented in Chapters 2 to 5 to show how the notions of IL representation, bilexical entries and rules, lexicalist transfer and generation,
and the classication of spatial relations, are used to translate spatial prepositions. Although the exposition is centred around English and Spanish translation, the approach
extends to other languages without posing great theoretical diculties, especially given
the generality of the assumptions made about the transfer representation and the crosslinguistic adequacy of the spatial hierarchy. I begin by working through an example of
PP translation in order to consolidate the mechanics of the system. This is followed by a
description of the general strategy for disambiguation and of the framework for representing the knowledge necessary for translation. In the rest of the chapter I present dierent
types of ambiguity and the sources of information needed for resolving it.
6.1 Lexicalist Translation of Prepositions
I will illustrate the operation of the translation system by translating the sentence below,
indicating at various points where dierent mechanisms such as lexical rules or index
instantiation apply:
Mary waits inside the hotel.
For the sake of conciseness I will only present the relevant paths in TFSs. The rst step
in translation is analysis, which applies the active chart parsing algorithm presented in
Section 3.1 to construct a parse tree from which the source IL list is extracted (Section 3.4).
During parsing, lexical rules are applied as if they were grammar rules with one daughter;
for this example, however, none of the lexical rules apply. The parse tree constructed for
this sentence is shown in Figure 6.1. Its IL list is shown in Figure 6.2. Note that all these
ILs may be considered part of a single TFS in which all reentrancies are preserved.
Transfer is based on the algorithm described in Section 4.1.1. Before transfer takes
place, all the bilexical rules are applied to bilexical entries containing any of `Mary',
`waits', `inside', `the' or `hotel'; for this simple example, the new bilexical entries thus
constructed are not used. I will only describe the unication of elements in the source IL
203
S
PPP
P
NP
VP
PP
P
P
Nprn VP
PP
HH
H
Mary Vint
Pnp
NP
ZZ
waits inside Det
N1
the
N
hotel
Figure 6.1: Parse tree for `Mary waits inside the hotel'.
2
n-prn
6
4 trans:dist
2
p-np
6
6
6
6 trans:dist
4
"
=
ind-lex
lex = (mary 1)
ind1 = 0 obj
#
2
3
7
5
,
6
6
4 trans:dist
2
3ind-lex
6 lex = (inside 1)
ind1 = non-lexicalized
= 6
4
ind2 = 1
ind3 = 2
2
n-com
6
4 trans:dist
v-int
3
7
77
77
7
55
"
=
3
2
,
2
=
2ind-lex
(waits 1)
1 non-movement
6 lex =
4 ind1 =
ind2 = 0
det
6
4 trans:dist
ind-lex
lex = (hotel
ind1 = 2
1)
#
"
=
ind-lex
lex = (the 1)
ind1 = 2 obj
3
3
7
77
55
#
,
3
7
5
,
3
7
5
Figure 6.2: IL list after analysis.
list with entries in the bilexicon. It will be sucient to consider `inside'. First of all, the
algorithm implemented requires making an identical copy of the bilexical entry shown in
Figure 5.15, to give the TFS shown in Figure 6.3. Unication of the IL for `inside' (Figure
6.2) with the value of sfs:1 achieves two purposes. First, it binds the value of features
ind2 and ind3 in the copy of the bilexical entry, to those of `waits' and `hotel'. In other
words, it binds the indices in the bilexial entry with those in the input IL list. The other,
related objective is to bind, via the values of the feature index-id in the various indices,
the source ILs with the target side of the bilexical entry. These two types of binding
may be depicted by the following summary of the state of the system after one of these
unications:
Source IL list: [... waitse::, insider:Z;e:X;o:Y , ..., hotelo ]
Unied bilexical entry: f insider:Z;e:X;o:Y g , f dentro der :Z;e :X;o :Y
0
0
g
0
where e:X stands for index-id = X in index e. The notation is intended to show bindings
of English indices (e.g. e) as separate from those expressed in the bilexical entry (e.g. X
and Y ) and in the Spanish lexeme (e.g. e0). Once such unications are carried out for all
the source ILs, instantiation of indices takes place. For `inside' the result is:
Instantiated bilexical entry: f insider:3;e:1;o:2 g , f dentro der :3;e :1;o :2 g
0
204
0
0
2
1l-1l-3i-3i-t
2
p-np
6
6
6
6
6
6
6
6
6 sfs:1
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6 tfs:1
6
6
6
6
6
6
4
=
=
2
s-p-np
3
7
3
3ind-lex
77
7
lex
=
(inside
1)
7
377
2
6
non-lexicalized
6
77
77
7
6
7
7
6 ind1 = 4 index-id = 0 (string) 5 7 7
transitive = (+)
6
77
77
7
6
7
scalar = (+)
= 6
77
77
7
6
7
entity
6 ind2 =
777
7
index-id = 1 (string) 7 7
6
77
6
77
4
5
obj
57
7
ind3 = index-id = 2 (string)
7
7
6
6
6
6
6
6
6
6 trans:dist
6
6
6
6
6
4
2
3
6
6
6
6
6
6
6
6 trans:dist
6
6
6
6
6
4
2
=
3
3
3ind-lex
7
(dentro
de 1)
2
377
77
non-lexicalized
77
0
index-id
=
4
577
transitive = (+) 7 7
77
scalar = (+) 77
77
entity
77
7
6 lex =
6
6
6 ind1 =
6
6
6
6
6 ind2 =
6
6
4
ind3 =
index-id = 1
obj
index-id = 2
77
77
55
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
Figure 6.3: Copy of bilexical entry for `inside - dentro de'.
Construction of TL bags then proceeds by taking the union of the TL side of all the
bilexical entries used. For this example the TL bag is shown in Figure 6.4. Using the
generation algorithm discussed in Section 4.2 the following parse tree is constructed:
S
PPP
P
P
NP
VP
PPP
P
P
Nprn
VP
Mara
Vint
PP
!aa
a
!!
Pnp
NP
Z
Z
espera dentro de Det
el
N1
N
hotel
The fact that the values of index-id are instantiated to integers disallows the construction
of el hotel espera dentro de Mara (the hotel waits inside Mary). When several TL bags are
produced by transfer, only those which are licensed by the TL grammar will be allowed;
this achieves TL disambiguation through TL ltering as described in the next section.
6.2 Disambiguation during Generation
Ambiguity is one of the most dicult problems in Natural Language Processing and MT.
Therefore, the solutions presented here can only go part of the way to solving them. In
the rest of this chapter I will only consider TL disambiguation, dened as the selection of
a restricted number of translations from a SL word. I have said a restricted number of
translations to allow for certain unavoidable ambiguities which arise when, for example,
205
2
s-n-prn
6
6
6 trans:dist
4
2
s-p-np
6
6
6
6
6
6
6
6 trans:dist
6
6
6
6
4
2
=
ind-lex
= (mar
a 1)
s-human
ind1 = index-id = 0 (0)
6 lex
4
3
2
3
7
77
7
55
3
,
6
4
ind3 =
index-id = 1 obj
index-id = 2
2
s-n-com
6
6
6 trans:dist
4
2
=
6
6
6
6
6 trans:dist
6
4
2
=
,
3
3ind-lex
7
lex
=
(dentro
de
1)
2
377
6
non-lexicalized
6
77
6
77
index-id
=
(3)
5 7
6 ind1 = 4
transitive = (+) 7
6
77
,
77
scalar = (+)
= 6
i
h
6
77
7
non-movement
6
77
6 ind2 =
7
2
s-v-int
2
s-det
6
6
6 trans:dist
4
6
6
6 ind1
6
4
2
=
77
57
5
ind-lex
3
2ind-lex
7
lex = (espera
1)
h
i77
77
= non-movement
77
1
index-id
=
(1)
77
7
55
ind2 = obj
3
ind-lex
index-id = 0
= (el
1)
obj
ind1 = index-id = 2
6 lex
4
3
3
3
7
77
7
55
,
3
7
= (hotel
1)
77
7
55
obj
ind1 = index-id = 2 (2)
6 lex
4
Figure 6.4: Bag after transfer.
the SL does not make a distinction which is necessary in the TL. Thus, the Spanish
reloj can mean `watch' or `clock', but the necessary distinction is irrelevant in Spanish
and very dicult to retrieve from context. I will not consider this type of ambiguity;
instead I assume that some form of interaction with a user is necessary to resolve it, as
has been done in some versions of the BCI (Alshawi et al. 1992). I will also assume that
structural ambiguities such as those arising from PP attachment have been resolved during
analysis, and that lexemes other than prepositions are unambiguous. Therefore I will be
concentrating on the selection of one TL spatial preposition from a number of alternatives.
6.2.1 TL Filtering
In transfer based MT, disambiguation can take place either during analysis, during transfer, during generation or any combination thereof. In direct systems such as Systran,
and even in second generation systems such as Metal, the disambiguation of a SL word
was guided by the possible translations that the word had in the TL language. As far as
modularity is concerned this is an undesirable strategy because it makes it dicult to add
new languages to the system. Disambiguation during transfer maintains the modularity
of the SL component but requires a large amount of eort which grows polynomially with
the number of languages in the system. Disambiguation during generation overcomes the
problems with these two options. Firstly, it preserves the independence of the monolingual
grammars since selection has to be made based on monolingual distinctions. Secondly, the
transfer module is simplied because it does not encode restrictions on individual bilingual
entries. This approach to disambiguation has been adopted in certain versions of the BCI,
where it is referred to as TL ltering (Alshawi et al. 1992); it is also used in Eurotra in the
context of preposition translation (Section 1.6.3) and for the translation of collocations
such as `be thirsty' and `be hungry' (Danlos and Samvelian 1992).
206
In TL ltering the transfer component is deliberately vague, allowing overgeneration.
It is up to the TL expert, namely the TL grammar, to reject some translations on the
grounds that they cannot lead to valid TL sentences. As a nave but illustrative example,
the transfer step would produce both el and la as possible translations of the English
determiner `the'. During generation, the Spanish TL grammar would select one of these
translations and reject the other depending on the gender of the noun associated with
the determiner. From this proposal it follows that TL ltering requires a large amount of
information in the monolingual components. However, in many cases the TL grammar will
independently need this information in order to avoid overgeneration. This information
would be compiled on the basis of monolingual criteria only, but would be applicable
to translation from any SL. In the present system, this information takes the form of
selectional restrictions involving verb and noun knowledge.
Restrictions involving verb knowledge have used the classication described in Section
5.1.1; restrictions involving noun knowledge have used the theory of Qualia structure
described below. These restrictions may be seen as a development of selectional restrictions
as used widely in Computational Linguistics and NLP. These restrictions may be traced
back to the work of Katz and Fodor (1963) but their use in NLP is normally associated with
the work of Wilks (1975). The approach taken here diers from many implementations of
selectional restrictions mainly in the degree of structure and motivation of the knowledge
representation involved.
6.3 Noun Knowledge
In this section I describe a theory of noun knowledge and extend it in order to incorporate
information about the locative type of a noun. The theory is the Qualia structure of
Pustejovsky (1991a) and has been developed independently of the purpose to which it will
be applied here. I have adopted this theory not only because it has applications other than
TL disambiguation, but also because of its well-dened structure and anity with TFSs.
Indeed, Briscoe et al. (1990) and Copestake and Briscoe (forthcoming) use a computational
implementation of Qualia structures using FSs as part of a sentence analysis system.
Qualia structure is described by Pustejovsky (1991a) who presents it as part of a
framework for a theory of lexical semantics. One of the distinctive features of his proposal
is the view that much of the polysemy in natural language may be attributed to systematic
variations in the meanings of words, especially that of nouns and adjectives, and not to
dierent senses of a verb. For example, in the two sentences:
a) Mary enjoyed reading the book
b) Mary enjoyed the book
classic semantic theory would argue that there were two predicates, and therefore two
senses, associated with the verb `enjoy': one which takes an object (e.g. a book) and
another which takes an event as complement. According to Pustejovsky (1991a) this
solution is not satisfactory: the verb `enjoy' has one entry only in which its complement
is an event. He therefore proposes that in the case of b) grammaticality is explained by
associating with the noun `book' the notion of reading and then applying a type coercing
207
mechanism within the theory to allow the book to stand for the event of reading it, thus
complying with the subcategorization needs of `enjoy'. Before describing Qualia structure
in more detail I will give an overview of Pustejovsky's proposal.
6.3.1 Pustejovsky's Levels of Representation
Qualia structure is one of four types of information associated with a word. The four types
are:
1. Argument Structure: The predicate-argument structure of a word species the
number, organization, syntactic realization and type of the arguments of a predicate.
2. Event Structure: Relating mainly to verbs, the event structure species the temporal and aspectual behaviour of words.
3. Qualia Structure: Nouns and adjectives (and verbs) possess attributes which allow
them to be interpreted and used appropriately during semantic interpretation.
4. Inheritance Structure: A word in a language is related to other words in that language through a variety of relations. Such relations are encoded through inheritance
relations, thus structuring lexical knowledge.
Since the main concern here is knowledge associated with nouns, I will only describe
Qualia structure in detail.
6.3.2 Qualia Structure
The Qualia structure of a noun determines the properties which allow it to be used appropriately during semantic interpretation. Qualia structure consists of four roles:
1. Constitutive Role: Relation between the object and its constituents or proper parts,
representing in some sense the object's internal relations; it roughly answers the question:
what does X have?
Material. Weight. Parts and component elements.
2. Formal Role: Distinguishes the object within a larger domain by considering how it
relates to other objects; it roughly answers the question: how is X?
Orientation. Magnitude. Shape. Dimensionality. Color. Position.
3. Telic Role: Determines the purpose and function of the object: it is the object's likely
contexts forward in time from the time of its creation, or the object's abstract eects; it
roughly answers the question: what does X do?
Purpose that an agent has in performing an act.
that species certain activities.
Built-in function or aim
4. Agentive Role: Contains the factors involved in the origin or coming into being of the
object: it species the likely contexts or situations backward in time from the time of the
object's creation, or the abstract causes of the object; it roughly answers the question:
what made X?
208
Creator. Artifact. Natural Kind. Causal Chain.
Consider the following example. The Qualia structure of the word `novel' is the following
(Pustejovsky 1991a:427):
novel(x1)
Constitutive: narrative(x1)
Formal: book(x1), disk(x1)
Telic: read(T,y,x1)
Agentive: artifact(x1), write(T,z,x1)
This says that a novel x1 consists of a narrative; it is normally found in the form of a book;
it is typically used for reading (a transition event T going from book x1 not having being
read by agent y to having being read by that same agent), and it is an artifact created
by an event of writing (a transition event going from book x1 not having being written to
having being written by agent z). Thus, whenever the noun `novel' is used in a sentence, it
will be possible to retrieve notions associated with its meaning; this allows various types of
interpretation such as those using logical metonymy (roughly the selection of a predicate
given one of its arguments) and other forms of lexical polysemy.
In the representation I will adopt, the above structure would be expressed as the
following TFS:
2
qualia
6 constitutive
6
6 formal =
6
4 telic =
agentive
= `narrative'
`book, disk'
`read'
= `artifact, write'
3
7
7
7
7
5
where the shrunk types are abbreviations for structures containing a value similar to that
of dist. For instance, `narrative' is:
"
ind-lex
lex = (narrative
ind1 = obj
1)
#
6.3.3 Locative Type
One item of information not present in the Qualia structure and which is relevant for preposition translation is the idealization or conceptualization of an object. Herskovits (1986:67)
argues on monolingual grounds for geometric functions which map onto the idealization of
an object (e.g. to a point, line, etc.); she also argues that this idealization should be part
of the object's knowledge (see Section 1.6.5). From a bilingual stance, Grimaud (1988)
suggests that cultural, historical and psychological considerations lead to a particular
preposition being associated with a noun. Thus, Grimaud notes that while English and
French dier in the prepositions they use for streets, the use of the preposition will be
consistent with the conceptual structure of the situation described. For example, consider
the following PPs:
Eng: on the street.
Fre: dans la rue. (in the street)
209
Grimaud (1988) proposes that while the expressions dier, both are compatible with a
view of streets either as roadways or \as a kind of U-shaped container that includes the
buildings on either side" (1988:56). Therefore, Grimaud continues, one can understand
the dierence in prepositional usage as a dierence in the conceptualization of an object
in dierent language communities. From the perspective of language learning, the ndings
of Bowerman (forthcoming) were already described in Section 5.2.3. They indicated that
children learn early on dierent spatial categorizations which are language specic.
I agree with the intuition of the views above, but to apply it to MT it is necessary to
formulate a coherent lexical structure in which to represent it. Qualia structure, with its
emphasis on the interaction of a noun with its context, seems an adequate location for
encoding idealizations or learnt classications. That is, by analogy with Qualia structure,
idealizations would determine the behaviour of a noun with respect to spatial prepositions,
particularly those of the lexicalized type. In the context of language learning, one can
model the language specic spatial features of nouns through locative types in Qualia
structures; for example, native speakers of English learn to associate `on' with tables and
`in' with cupboards. Thus, just as speakers of languages such as Spanish have to learn the
gender of nouns, for spatial expressions speakers of a language need to learn the particular
lexicalized relation associated with the noun.
Furthermore, as Herskovits (1986) suggests, idealizations should be part of the knowledge of a noun and as such they should be included in the same structure that encodes
other types of object knowledge. However, because the word idealization has geometric
connotations which I do not wish to imply, and also because Grimaud's highlighted aspects of a noun are not well circumscribed, I will use the term `locative type' to indicate
the preposition used by a noun when determining that noun's conventionalized use as a
location.
If locative types are included in Qualia structures, then they must be part of some role.
It appears that the Formal role is the appropriate place for two reasons. Firstly, locative
types distinguish an object within the domain of prepositional usage, as in:
She sat on the sofa.
She sat in the armchair.
Secondly, in some sense the locative type can form part of an answer to the question `how
is X?' introduced in Section 6.3.2. For instance:
How is a sofa (viewed)? Like (as) a long seat (e.g. \A comfortable seat ... wide enough for
usu[ally] 2 or 3 people" Procter (1978))
In other words, the locative type `on' of `sofa' is a reection of this view of the noun's
meaning.
Following Hawkins (1988) (see page 179), the locative type of a noun is determined by
taking the frequency of co-occurrence of a lexicalized preposition and its complement in
spatial PPs, rather than by assigning it a geometric idealization. The reason for adopting
this strategy rests on the assumption that semantically related words will co-occur more
frequently than unrelated ones. If one assumes the locative type analysis, then a noun will
be semantically related to the preposition in its locative type, and therefore co-occurrence
will be empirical evidence for a particular type. Clearly, criteria like this has the drawback
210
that a particular corpus might spuriously associate a noun with an incorrect lexicalized
preposition. To detect and avoid such situations, one would need a more formal denition
of what constitutes a locative type. As a rst step towards this goal, one may dene
the interpretation of a locative type as a noun-specic function which maps the meaning
structure of a noun into a specic spatial relationship. For example, if the locative type
of `armchair' is `in', the spatial relationship of `in the sofa' is given by in(armchair).
This denition is vague to allow for dierent representations of nouns and spatial
relationships in dierent applications. For example, the representation of a noun in a
robotics application could be dierent to that used in the manipulation of objects on a
computer screen. To appreciate the import of locative types, imagine a robot to which the
command `put the brick in the armchair' has been issued. Clearly one does not wish the
brick to be placed inside the armchair; instead the brick should be put on its seating area.
If one interprets locative types as functions, then in(armchair) will return the appropriate
spatial relationship with respect to the armchair. Based on this relationship the robot
could execute the command appropriately.
The locative type of a noun restricts the locative type functions which can apply to
that noun. For example, since `armchair' has type `in', the function in will be applicable,
but the function on will not. This raises the issue of interpreting expressions such as:
The plank is on the armchair.
in which the locative type of the noun does not coincide with the preposition used. I
will argue that in expressions such as this, there is a context independent meaning of `on'
which is at play, roughly synonymous with `on top of'. This sense is present in what may
be termed the literal interpretation of a lexicalized preposition:
The y on the box (i.e. the y on top of the box)
The bird on the car (i.e. the bird on top of the car)
Thus, the lexicalized prepositions have two mechanisms for interpretation: one is through
a function which is dependent on the semantics of the noun it dominates, the other is
independent of context and is the same for all objects. For example, the phrase:
There is a coin in the armchair.
can mean that the coin is on top of the seating area of the armchair (the locative type
reading) or inside the structure of the armchair (the literal reading). This type of ambiguity
is not easily available with
There is a coin in the chair.
which prefers a literal reading, or else signals inappropriate use of `in'. There are parallels
between this dual character of lexicalized prepositions and the notions of Ideal Meaning
and Use Type formulated by Herskovits (see Section 1.6.5); for example, the literal interpretation of a preposition is analogous to its Ideal Meaning, while its locative type
interpretation could be compared to its Use Types, since Use Types \correspond to various classes of uses distinguished by dierent conventions" (Herskovits 1986:3). However,
Use Types dier from locative type interpretations in that the former manifest \the ideal
meaning [of the preposition] in some way" (Herskovits 1986:18), while the latter do not
assume such a manifestation.
211
To summarize, the locative type of a noun species the preposition which gives rise to
a noun dependent interpretation of a spatial PP. The initial indicator of a noun's locative
type is the most frequently co-occurring lexicalized preposition in spatial contexts. In the
case of general nouns such as `side' and `middle', this is sucient for determining their
locative type; for more specic nouns such as `armchair' and `bus', the locative type is the
lexicalized preposition which gives rise to a dierent interpretation to that predicted by
the context independent meaning of the preposition; when noun specic interpretations
are not induced by a PP (e.g. `in the car', `on the chair') the locative type of the noun
defaults to the most frequent lexicalized preposition co-occurring with the noun in spatial
contexts.
Locative types are encoded as a feature in the Formal role of the noun's Qualia structure
with a lexicalized relation as its value. The Qualia structure for `chair' illustrates this.
2
qualia
3
constitutive
formal
=
6 constitutive
2
6
6
..
6
6
6
.
6
6
6
6
6 formal = 6
6 locative-type
6
6
6
6
6
4
6
..
6
6
.
6
4 telic =
2
=
on
4 index-id
= (string)
transitive =
scalar = (-)
telic
agentive
agentive =
(-)
37
7
7
77
7
377
77
77
577
77
77
57
7
7
7
5
A number of grammar rules are simplied if the Qualia structure is a feature in the value
of loc. For instance, the main features in the lexeme `chair' are:
2
n-com
(chair)
syn
6 orth = 2
6
6
6
6
6
6
6 syn = 6
6
4 loc
6
6
4
trans =
=
trans
3
2
major
6 head =
6
4 qualia =
subcat =
n
37
7
7
77
777
777
557
7
7
5
3
qualia
list
This arrangement permits the Qualia structure to be available at NP nodes through sharing
of loc values between mother and head daughters in grammar rules. This concludes the
introduction to the noun knowledge representation formalism adopted.
6.4 Target Language Disambiguation
It is now possible to describe the dierent strategies and sources of information for disambiguating expressions containing spatial prepositions. Dierent types of ambiguity are
dened based on the spatial relations hierarchy and dierent sentence constituents. The
sources of information needed for disambiguation are the NP complement in the PP, the
preposition itself, the phrase modied by the PP and the entity located by the PP.
6.4.1 Translation of Lexicalized Relations
Translating the prepositions `at', `in' and `on' into and from Spanish a and en is an important problem in preposition translation because of the seemingly unpredictable character
212
of their usage. Justication for grouping these prepositions as a separate, language dependent category was already given under the lexicalized relations described on page 179.
The mechanism for translating these prepositions is as follows. During analysis, lexicalized prepositions are converted into generic prepositional ILs. Thus, English `at', `in'
and `on' are all replaced by `e-prep-lexicalized' (cf. Section 1.6.3), while Spanish en and
a are replaced by `s-prep-lexicalized'. They are equated in the bilexicon for use during
transfer as:
fe-prep-lexicalizedx;y;z g , fs-prep-lexicalizedx;y;z g
The actual representation of this entry in the LKB is:
e-prep-lexicalized_1 / s-prep-lexicalized :
1l-1l-3i-3i-t.
Here the type 1l-1l-3i-3i-t contains the appropriate index bindings for this bilexical entry. After transfer and bag generation, the generic preposition is replaced with the TL
lexicalized preposition appropriate for the complement noun.
An example will show how this mechanism operates. To translate the phrase
Tome el bus en la estacion.
into English, the following SL bag is constructed:
SL bag: f tome2;1;3 el3 bus3 s-prep-lexicalized4;2;5 la5 estacion5 g
(I am ignoring the literal reading bag for this sentence; see the next section). Using the
bilexical entry for `s-prep-lexicalized', together with the translation of the other lexemes
in the SL sentence, the following TL bag is constructed:
TL bag: fI1 caught2;1;3 the3 bus3 e-prep-lexicalized4;2;5 the5 station5 g
Generation from this bag results in the sequence
I caught the bus e-prep-lexicalized the station.
As Table 5.2 shows, the locative type of `station' is `at'. Thus, prior to output, the generic
lexicalized preposition is replaced by the locative type of its complement, to give:
I caught the bus at the station.
A similar procedure would translate English into Spanish. For instance, the English
sentence below would give rise to the bag shown:
Eng: There is a guard at the entrance.
SL bag: fthere1 is2;1;3 a3 guard3 e-prep-lexicalized4;2;5 the5 entrance5 g
Transfer would result in the Spanish bag:
TL bag: fhay2;1;3 un3 guardia3 s-prep-lexicalized4;2;5 la5 entrada5 g
In the Spanish lexicon, entrada would have locative type `a', which would lead to the
appropriate lexicalized preposition being substituted:
Spa: Hay un guardia a la entrada.
213
The mechanism above can be used to resolve one aspect of the `car/bus' problem
described in Section 1.2.2, in which each of these two nouns required dierent prepositions
to indicate essentially equivalent meanings. The example given there is reproduced below.
Spa: Juan viaja en el bus.
Juan viaja en el coche.
Eng: John travels on the bus. John travels in the car.
In translating from Spanish into English, the generic lexicalized preposition `e-prep-lexicalized' would form part of the English (TL) bag. After bag generation, the preposition
would be replaced either by `in' if its complement was `car', or by `on' if its complement
was `bus'.
6.4.2 Lexicalized Relations in Other Contexts
Considering now the translation of lexicalized relations which appear with nouns of a
dierent locative type, there are two cases to consider: those in which the meaning of the
PP is dierent from the locative type meaning, and those where the meaning is the same.
Meaning Changes
Take for example the case of `house' with locative type `in'. Table 5.2 shows that this
noun also occurs with `at' fairly frequently. A typical example taken from the LOB corpus
is:
Nancy was found yesterday at the house in Chambres-road...
Unlike the alternative version with `in', this sentence does not require that Nancy be inside
the house; she might have been found in the garden for example. Thus, as expected, there
is a dierence in meaning between the locative type interpretation (i.e. `in the house') and
the literal interpretation (i.e. `at the house') of the PP.
Whenever a lexicalized preposition appears in a context other than with a noun of the
respective locative type, the preposition itself will appear in the bilexicon, together with
restrictions on its use. For the example above, the following equivalence holds:
at_1 / en_1 :
1l-1l-3i-3i-t
<sfs:0:syn:loc:subcat:car:syn:loc:qualia:formal:locative-type> = in.
This entry states that `at' can translate as en only when the complement in English is
of locative type `in' (e.g. house). The equivalence expresses the intuitive loss of information caused by the lack of an appropriate Spanish equivalent for `at'. Translation into a
language such as Hungarian with a similar tripartite lexicalized system to that of English
might not show this loss of information:
at_1 / -na1l_1 :
1l-1l-3i-3i-t.
Returning to the English-Spanish example, translation begins with the analysis of the
English sentence to give the IL list:
214
SL bag: fNancy1 was2;1;3 found3;6;1;4 yesterday3 at4;3;5 the5 house5 ... g
Note that this time `at' is not represented as `e-prep-lexicalized' because `house' does not
have the necessary locative type. Transfer of this IL list results in the Spanish bag:
TL bag: fNancy1 encontraron3;6;1;4 a1 ayer3 en4;3;5 la5 casa5 ... g
From which the sentence
Spa: Ayer encontraron a Nancy en la casa ...
Translation in the Spanish-English direction proceeds similarly. Thus, starting with
the sentence
Spa: Encontraron a Nancy en la casa.
analysis results in two SL bags, one for a locative type reading, and one for a literal
reading:
SL bag: f encontraron3;6;1;4 a1 Nancy1 f en4;3;5 j s-prep-lexicalized4;3;5 g la5 casa5 g
(for brevity, the two bags are collapsed into one bag, with their dierent prepositions
separated by j). Each of these SL bags will lead to a dierent English bag:
TL bag: f Nancy1 was2;1;3 found3;6;1;4 f at4;3;5 j e-prep-lexicalized4;3;5 g the5 house5 g
Generation and preposition substitution leads to two English sentences:
Nancy was found at the house.
Nancy was found in the house.
Such TL ambiguity is unavoidable and even desirable. It is unavoidable because both
sentences are appropriate translations of the original Spanish sentence due to the underspecicity of the original Spanish preposition. It is desirable because it ensures the
completeness of the translation process and allows for independent mechanisms to be used
in selection. For instance, one might prefer a translation with a locative type reading (e.g.
`in the house'), or allow human intervention in the selection process.
Consider another example:
Eng: The toy is on the box.
Spa: El juguete esta encima de la caja.
where `on' translates as the non-lexicalized preposition encima de `on top of' (note that
sobre could also be used). This situation arises because `on the box' cannot receive a
locative type interpretation since it has type `in'. Thus a literal interpretation is induced.
The appropriate bilexical entry to establish this translation is:
on_1 / encima_de_1 :
1l-1l-3i-3i-t
<sfs:0:syn:loc:subcat:car:syn:loc:qualia:formal:locative-type> = in.
which states that when `on' appears with an NP object of type `in', its translation is
encima de. Translation of
215
The toy is in the grass.
proceeds similarly; since `grass' has locative type `on', a literal interpretation of the expression is obtained. The bilexical entry
in_1 / dentro_de_1 :
1l-1l-3i-3i-t
<sfs:0:syn:loc:subcat:car:syn:loc:qualia:formal:locative-type> = on.
which species that `in' translates as dentro de (inside) when its complement is of type
`on', allows the appropriate Spanish translation
El juguete esta dentro del pasto.
to be generated.
Solution of the second aspect of the `car/bus' problem follows from the above distinction. Translation of `on the bus' as both encima del bus (on top of the bus) and en el
bus (inside the bus) is achieved through the literal and locative type reading respectively
of `on the bus'. The ambiguity of `on the bus' introduced by the two readings may only
be resolved by using contextual, domain or other types of knowledge. Again, one could
decide to select a locative type reading over a literal reading since the former involves a
stronger semantic relationship; this heuristic could be overridden by other factors in order
to select the literal reading.
To nish this section, consider the following sentence:
The y is on the car.
Since the locative type of `car' is `in', this expression should take the literal interpretation
`the y is on top of the car'; in fact, its paraphrase could be `the y is on the side of the
car'. It seems that the located object has induced a dierent interpretation. One way
of approaching this problem might be to introduce a feature into the Formal role of the
Qualia structure called, located-type. This feature would be analogous to the locative
type, but would dene, for every noun, whether a non-literal interpretation was necessary
when the noun was modied by a particular lexicalized preposition. For example, the noun
`y' would have `on' as its located type such that whenever `y' was modied by a spatial
PP headed by `on', an interpretation paraphrasable as `on the side of' would be obtained.
Other nouns with this particular located type would be `poster', `sticker', `picture', etc.
For transfer a translation pattern would be dened (see Sections 2.3.10 and 7.5) which
identied the occurrence of a noun modied by its located type:
f (N[located-type= on]x) ony;x;z g , f (N0 [located-type= en]x ) eny;x;z g
With this pattern, the appropriate preposition would be chosen to give the correct translation:
Spa: La mosca esta en el coche.
Note, however, that the translation in which the y is on top of the car can still be
obtained. Disambiguation in this case may be motivated by the presence of a located type
reading, which would override the locative type reading.
The purpose of the previous paragraph is to outline a strategy for translating expressions such as `the y on the car'. However, further work is necessary to determine when
exactly a meaning dierence is due to a locative or a located type, and whether located
type interpretations do in fact override locative ones.
216
Meaning Invariant
As an example in which dierent lexicalized prepositions negligibly aect the meaning of
an expression consider:
The car is on the main road.
Inspection of the LOB corpus reveals that `road' has locative type `on', but that it also
occurs with `in' fairly frequently, as in:
The car stayed in the road.
In both of these sentences the corresponding Spanish preposition is en:
Spa: El coche esta en la carretera principal.
Spa: El coche permanecio en la carretera.
The problem in this case is that while the locative type of `road' is `on', the expression
`in the road' does not seem to have a literal reading, preferring instead an interpretation
similar to that of `on the road'. In other words, it seems that `road' has two locative types
`in' and `on'.
One can view this situation as a case of vagueness with respect to the locative type of
`road'. Following Copestake and Briscoe (forthcoming), I will argue that the locative type
of `road' needs to be specialized to either `in' or `on' by the preposition that governs it.
The locative type of `road' is therefore assigned the value in-on:
n-com
syn:loc:qualia:formal:locative-type =
in-on
That `road' is vague with respect to its locative type may be ascertained by using the
co-predication test suggested by Pustejovsky (forthcoming):
The car is on and will stay in the road.
Compare this with `house' which does not exhibit such vagueness:
? The bicycle is at and will stay in the house.
In this case, co-predication is disallowed because `house' has a non-vague locative type,
namely `in'.
Translation from English into Spanish proceeds as before: the preposition is represented as a generic lexicalized preposition which is translated into its Spanish counterpart;
after bag generation, the appropriate Spanish preposition is substituted, depending on the
locative type of the complement noun.
While the co-predication test brings out the vagueness of the locative type for `road',
I must note that no discernible meaning change appears to occur, unlike the cases of
vagueness that Copestake and Briscoe discuss. Also, as in the case of `at/in' above,
ambiguity will be present during Spanish-English translation because both prepositions
will be translations of Spanish en. Given the character of this type of ambiguity, the
best way to resolve it would be by selecting the most frequent preposition for the dialect
or register at hand (see Copestake and Briscoe (forthcoming) for further discussion on
statistics in lexical semantics).
217
Consider now the third aspect of the `car/bus' problem, which is that `in the bus' is
synonymous with `on the bus' even though `bus' has locative type `on'. This is a case of
vagueness similar to that of `road'; that is, `bus' is underspecic as to whether its type
is `in' or `on'. Consequently, contrary to what had been suggested in Section 6.4.1, the
locative type of `bus' is in-on in order to account for this vagueness. The entry for `car',
however, will have locative type in and therefore parsing and generation will construct
adequate translations in both directions. The selection of `in' or `on' as the preposition for
`bus' is left as a dialectal distinction (e.g. British English will select `in') to be resolved
within a particular implementation.
To summarize, the three aspects of the `bus/car' problem are solved as follows:
1. The locative type of `bus' and `car' allows the appropriate preposition to be associated with each noun: `on the bus' but `in the car'.
2. Two interpretations of `on the bus', one literal and another based on the locative
type of `bus', account for the two dierent interpretations of `on the bus': `on top of
the bus' and `inside the bus'.
3. Vagueness, encoded as the type in-on, in the locative type of `bus' gives `in the bus'
a synonymous interpretation to that of `on the bus'.
6.4.3 Dierences in Lexicalized Relations
It might have become apparent that the assumption so far has been that the meaning
expressed by locative type interpretations is implicitly preserved during transfer. In other
words, the precise meaning in such cases is only preserved by a combination of a noun and
a neutral lexicalized preposition. However, consider the following examples:
Spa: Hay varios puentes sobre el desladero. Hay varios platos sobre la mesa.
Eng: There are several bridges over the gorge. There are several plates on the table.
Here, the Spanish non-lexicalized preposition sobre translates as the vertical preposition
`over' but also as the lexicalized preposition `on' with locative type interpretation (i.e. the
locative type of `table' is `on'). The problem here is that, according to the above assumption, `on the table' should have originated from a lexicalized preposition with locative type
reading; instead a non-lexicalized preposition is present in the SL (for the sake of argument
I am ignoring the lexicalized alternative, en la mesa).
To account for such translations, sobre will have to be mapped to both `over' and `on'
in the bilexicon. However, as the following sentence shows, mapping sobre into `on' is not
always correct:
? There are several bridges on the gorge.
This mistranslation would be a consequence of indiscriminately mapping the non-lexicalized
preposition sobre into lexicalized `on' with locative type interpretation; the result is that
the translation equivalence will only be a limited approximation. Furthermore, such an
approximation will be incorrect in a number of cases including the example with `gorge'
above. At the heart of this problem is the mismatch between the lexicalized relations
in dierent languages and between the ways these relations are used to describe scenes
218
in the real world. The solution to this problem, in the form of selectional restrictions,
is therefore encoded in the bilexicon, where such dierences between languages are overcome. The restrictions indicate the range of contexts for which a particular lexicalized to
non-lexicalized translation relation holds. Thus, the bilexical entry for `on - sobre' needed
to translate sobre la mesa as `on the table' is:
e-prep-lexicalized_1 / sobre_1 :
1l-1l-3i-3i-t
<sfs:1:syn:loc:subcat:car:syn:loc:qualia:formal:locative-type> = on
<sfs:1:syn:loc:subcat:car:syn:loc:qualia:formal>
<= furniture_1 <syn:loc:qualia:formal>.
This entry restricts the equivalence to contexts in which the complement NP is an item
of furniture in order to disallow puentes sobre el desladero translating as `bridges on the
gorge'. To complete the translations of `over', a bilexical entry without restrictions is
dened for the equivalence `over - sobre':
over_1 / sobre_1 :
1l-1l-3i-3i-t.
6.4.4 Disambiguation of Path-End Relations
The locative type of a noun can also be used to disambiguate path-end relations. Consider
again the sentences of Section 5.5.3.
Spa: La tienda esta al otro lado de la calle. Su despacho esta al otro lado del pasillo.
Eng: The shop is across the road
Her oce is through the corridor.
Since al otro lado de can translate as either `across' or `through', TL ltering must select
between these two prepositions during generation. In the present framework, selection of
the correct preposition is made by using the locative type of the complement NP.
To show how this is implemented, rst consider the lexemes for the path senses of
`across' and `through' on which their path-end senses are based:
"
p-across
#
"
p-through
#
trans:dist:ind1:path-loc = 0 on
syn:loc:subcat:car:syn:loc:qualia:formal:locative-type = 0
trans:dist:ind1:path-loc = 0 in
syn:loc:subcat:car:syn:loc:qualia:formal:locative-type = 0
As can be seen, these entries restrict their complements to those of the appropriate type.
During path to path-end mapping, these restrictions would be inherited by the path-end
prepositions, which would in turn restrict the complements of the latter; during generation, one of these prepositions would be discarded depending on the locative type of the
complement. The rule to generate the appropriate path-end senses could be specied as
follows (see also Figure 5.12). Note in particular the last path equation which binds the
locative type of the complement with that of the path in the output relation.
path-2-path-end pre-lex-rule
<1:trans:dist:ind1> = path
<0:trans:dist:ind1> = path-end
219
<0:trans:dist:ind1:path-type> =
<1:trans:dist:ind1>
<0:trans:dist:ind1:path-type:path-loc> =
<0:syn:loc:subcat:car:syn:loc:qualia:formal:locative-type>.
Given a path sense of `across', this rule constructs a path-end sense (i.e. one that can be
paraphrased as `on the other side of') which requires an NP of type `on'. Since translation
from the Spanish al otro lado de results in both `through' and `across', the locative type
of the complement will enable TL ltering to select between the two.
6.4.5 Disambiguation Based on Complement Noun
Although lexicalized prepositions account for a substantial number of ambiguities, other
relations also give rise to multiple TL translations. In this section I consider those for which
disambiguation is based solely on the complement NP. Consider the following example:
Eng: They slept under the sheets.
They slept under the night sky.
Spa: Durmieron debajo de las sabanas. Durmieron bajo el cielo nocturno.
There is a very strong preference in Spanish for choosing bajo when the complement NP
is a light source or a celestial body:
Spa: Esperaban bajo un farol.
El pajaro vuela bajo las nubes.
Eng: They waited under a street lamp. The bird ies under the clouds.
This preference may also stem from an intuitive association of bajo with notions of inuence
and aectedness: John esta bajo/*debajo de su control `John is under her control'.
On the other hand, debajo de prefers items of furniture and other artifacts with clearly
dened lower sections:
Spa: El perro esta debajo de la mesa. La escoba esta debajo de la escalera.
Eng: The dog is under the table.
The broom is under the ladder.
Spa: La mosca esta debajo del cuadro. La photo esta debajo del periodico.
Eng: The y is under the picture.
The photograph is under the newspaper.
Unfortunately, notions such as inuence and aectedness, or even lower sections are
dicult to predict and identify because they usually rely on metaphors, analogies and
conventions for which independent motivation is not easily obtainable. Below I will use
Qualia structure in order to arrive at more concrete conditions on the application of these
prepositions. Admittedly, it would be preferable to use the idea of inuence to predict
when either preposition could be used, but a study of these phenomena would be beyond
the scope of this thesis.
In addition to the relatively well-dened usages above, there are certain types of nouns
which can appear with either preposition. Some examples with trees and body parts
include:
Spa: Caminaron bajo/debajo de los eucaliptos. Tena un libro bajo/debajo del brazo.
Eng: They walked under the eucalyptus.
She had a book under her arm.
Such alternative renderings will be discussed below.
To restrict the application of bajo to light sources or celestial bodies, the following two
lexical signs are dened.
220
h
h
s-p-bajo
syn:s-loc:s-subcat:car:syn:s-loc:s-qualia:telic:purpose:lex =
s-p-bajo
(iluminar 1)
syn:s-loc:s-subcat:car:syn:s-loc:s-qualia:agentive:nature:lex =
i
(celestial 1)
i
These TFSs restrict the use of bajo to complements whose purpose is iluminar `to illuminate' or whose nature in the Agentive role of the Qualia is that of a celestial body. As for
debajo de, only one lexical sign is needed:
h
s-p-debajo de
syn:s-loc:s-subcat:car:syn:s-loc:s-qualia:agentive:nature:lex =
(artefacto 1)
i
This says that debajo de is used with complements of type `artifact' such as ladders, tables
and pictures.
To capture the ambiguous condition in which both bajo and debajo de may be used,
two lexical rules are dened which accept either bajo or debajo de and derive lexemes which
take plant or body part complements. In the LKB notation, the rules are dened as:
s-vert-sub-2-plant pre-lex-rule
<1:syn:s-loc:s-head:s-pform> = (OR bajo debajo_de)
<1:trans:dist:ind1> = vertical
<0:trans:dist:ind1> = vertical
<0:syn:s-loc:s-subcat:car:syn:s-loc:s-qualia:formal> <=
planta_1 <syn:s-loc:s-qualia:formal>.
s-vert-sub-2-bodypart pre-lex-rule
<1:syn:s-loc:s-head:s-pform> = (OR bajo debajo_de)
<1:trans:dist:ind1> = vertical
<0:trans:dist:ind1> = vertical
<0:syn:s-loc:s-subcat:car:syn:s-loc:s-qualia:agentive:nature:lex> =
(parte-del-cuerpo_1).
These rules accept any of the three prepositions above and return a similar preposition
but with dierent restrictions on its complement. Translation of a sentence such as `they
walked under the eucalyptus' starts with the analyser producing one TL bag with bajo
and another one with debajo de. During generation the lexical rules above apply allowing
plant complements for both bajo and debajo de, each of which leads to a valid Spanish
sentence. Selection between these two translations can then be made by a human user or
through some other application specic procedure.
6.4.6 Disambiguation Based on Measure Phrases
It was noted in Section 5.2.3 that some prepositions allow measure modication phrases,
while others do not:
Her bike is 5 metres in front of/*opposite the shop.
Exploiting this distinction can lead to disambiguation when a preposition has two TL
equivalents, one of which does not allow measure modication. Consider the following
sentences:
Eng: Her bike is in front of the shop.
Her bike is 5 metres in front of the shop.
Spa: Su bicicleta esta frente a/delante de la tienda. Su bicicleta esta a 5 metros delante de la tienda.
221
The fact that certain dialects would prefer en frente de instead of delante de in both translations does not aect the essence of the argument. As the right column in the example
shows, the presence of a measure modication phrase forces the selection of one preposition, thus eectively performing TL disambiguation. To encode this condition in the
present framework requires a rule in which a measure modier phrase selects for a PP
which allows measure modication.
h
i
s-pp-phr =) s-pp-a
syn:s-loc:s-head:s-pcomp-qualia:telic:purpose = (medir 1)
h
s-pp-phr
trans:dist:ind1:scalar =
(+)
This rule states that when the modier phrase has Telic role medir `measure', as in the
noun metro `metre', the spatial relation in the PP must allow measure modication. Note
that the relevant measure phrases in Spanish are prepositional phrases headed by a, and
that the Qualia structure of the complement NP is made available through the feature
s-pcomp-qualia. This rule enables parsing of
a 5 metros delante de la tienda.
but not of
* a 5 metros frente a la tienda.
since the latter contains a preposition marked (-) for the feature scalar.
Although disambiguation based on measure phrases is less frequent than that based
on the complement and modied constituent, examples of such expressions do occur. The
following sentences are taken from the LOB corpus.
The nearest light was about six feet in front of the car.
He was observed 150 yards in front of the engine.
The corresponding translations are:
La luz mas cercana estaba a unos seis pies delante del coche.
Se le vio a 150 yardas delante de la locomotora.
Such translations would be correctly predicted by the rule above in conjunction with the
scalar feature.
6.4.7 Disambiguation Based on Modied Constituent
Apart from the complement NP, there is another important determining factor in the
selection of a preposition, namely the constituent modied by the PP. I will consider verb
modication rst and leave the case of noun modication, or more precisely noun location,
for the section that follows.
The clearest example of a preposition depending on a verb is when a PP is one of the
verb's complements:
She looked at the girl. They relied on the weather.
Translation in such cases can be treated at the bilexical level by including the complement
preposition with the verb. Unfortunately, the issue of determining whether a PP is a
complement or a modier is far from settled. For example, if one used passivization to
argue that the PPs above are complements (i.e. `the girl was looked at', `the weather was
222
i
relied on'), the PP in `he went to the shop' would be classied as a modier since `?the
shop was gone to' is odd. However, this PP qualies as a complement with other tests:
`it was the shop he went to'. Dierent tests suggest dierent analyses, making it dicult
to achieve consistent representations. Further tests and treatments of complementation in
the context of MT are described by Somers (1987), Steiner et al. (1988a) and Allegranza
et al. (1991) amongst others. I will consider PPs which are relatively clearly modiers and
show how TL disambiguation is eected through monolingual restrictions.
I consider disambiguation involving movement and non-movement verbs as described
in Section 5.1.1. These two classes are used for disambiguating static and dynamic prepositions. As illustrated in Table 5.5 a number of prepositions in English give rise to multiple
translations in Spanish. For example:
Eng: The mouse ran under the table.
The mouse is under the table.
Spa: El raton corrio debajo de la mesa.
El raton esta debajo de la mesa.
Eng: The mouse ran underpath the table.
* The mouse is underpath the table.
Spa: El raton corrio por debajo de la mesa. * El raton esta por debajo de la mesa.
However, as the left hand column shows, the ambiguity is only possible for movement
verbs; non-movement verbs do not readily allow path modication. This restriction can
be included in the present framework as a restriction on the application of the lexical rule
that derives path relations.
vert-2-path prep-lex-rule
<1:trans:dist:ind1> = vertical
<1:trans:dist:ind1:measure> = <0:trans:dist:ind1> = path
<0:trans:dist:ind1:path-loc> = <1:trans:dist:ind1>
<0:trans:dist:ind2> = movement.
This entry denes the lexical rule vert-2-path to one where its output is a path preposition
restricted to modifying an event of type movement. Thus, given the input
The dog sleeps under the table.
the parser will construct just one interpretation in which `under' expresses a vertical relation, which on input to the transfer component results in a single translation for `under'.
A restriction to movement verbs for dynamic relations is further corroborated by the usage
of goal senses:
Eng: The mouse ran under the table.
* The mouse is undergoal the table.
Spa: El raton corrio hasta debajo de la mesa. * El raton esta hasta debajo de la mesa.
Only movement verbs allow goal interpretations, and hence disambiguation takes place as
for path relations.
6.4.8 Disambiguation Based on Modied Constituent and Complement
The extrinsic prepositions `among' and `between' are two possible translations for Spanish
entre, and in this section I will consider ways of disambiguating between them. The rst
dierence to note between these two prepositions is that `between' appears with conjoined
nouns much more often than `among':
223
The Scottish border lies between/*among Berwick-upon-Tweed and Gretna.
? I saw her among Mary and John.
One could argue that this is a consequence of the meaning of `between' in which two objects
are needed for dening the location identied by the preposition, and that therefore one
could disambiguate based on the presence of a conjunct. A related observation is that
`between' frequently occurs with the numeric determiner `two', as in
The road extends between those two cities.
The park is between the two buildings.
However, these observations do little to resolve ambiguities in which no conjunction or
determiner `two' appears. It is therefore worth searching for more general conditions on
the use of these prepositions.
One interesting distinction between the spatial senses of `among' and `between' is that,
in a number of cases, the object located by `among' is required to be part of the complement
NP, whereas no such requirement holds for `between'.
1) Mary sat among friends.
2) Mary sat between friends.
In 1), Mary must one of the friends indicated by the complement of `among', while in 2)
she may or may not be.
There are two main sources of information required to exploit this property of `among'
in disambiguation: the complement NP and the constituent modied by the PP. There
are also two main types of condition in such cases. Firstly, the complement and modied constituent of `among' must be compatible at a certain level because of the part-of
relationship that tends to exist between them. Secondly, the noun that is modied by `between' is, in many cases, a connecting or separating noun such as `gap', `border', `space',
`tunnel', `line' and `connection'.
For the rst condition, consider the following translations:
Spa: El martillo esta entre mis herramientas. El cipres crecio entre aquellos arboles.
Eng: The hammer is among my tools.
The cypress grew among those trees.
Although in each of these pairs `between' is a possible translation for entre, one would tend
to translate the two as shown. Now, in both cases the following relationship holds between
the two nouns involved: the genus term of the subject is the same as the elements denoted
by the plural complement NP. For example, a hammer is-a tool, while a cypress is-a tree.
The presence of such relationships in expressions involving entre is a strong indicator for
a translation using `among'.
When the relationship does not hold, there is no particular preference for either translation:
Spa: Hay una casa entre los arboles.
Eng: There is a house among/between the trees.
In the rest of this section I describe how the above heuristics for the use of `among'
and `between' can be encoded and used in order to achieve TL disambiguation. Before
proceeding, however, consider one more example:
224
Spa: Encontre el martillo entre mis herramientas.
Eng: I found the hammer among my tools.
In this translation, the is-a relation required by `among' exists between its complement
and the complement of `found'. This is dierent from the previous two examples in which
the relation held between the subject of the intransitive verb and the prepositional complement. A related observation was presented in the context of movement verbs in Section
5.1.2.
Two mechanisms are necessary for selecting `among'. Firstly, the located noun is
determined by the verb (i.e. whether its subject or its object is located). Secondly, the
is-a relationship between this noun and the complement of `among' must be encoded in
the lexical entry for `among'. I will outline an implementation of these mechanisms.
For the rst one, the located object is determined by the verb's lexical entry, and its
Qualia structure is bound to a new feature in the verb's TFS. The located object would
be the subject of the verb in the case of intransitive verbs or copulas, or the direct object
in the case of `nd', `keep', `hide' and `sit'. The new feature is passed from daughter to
mother in the VP rules in order to make it available at the point of PP modication.
For the second mechanism, there is one obstacle to dening an appropriate entry for
`among' with the necessary is-a restrictions, which is that the Qualia of the located noun
is not available in the lexical entry of `among'. One way of overcoming this problem is
by sharing Qualia information about the located object with the spatial PP via another
feature in the preposition's TFS. Sharing then takes place through a rule of the form:
vp-phr =)
vp-phr
syn:loc:head:loc-comp = 0
qualia
pp-phr
syn:loc:head:loc-modi = 0
in which the Qualia value of the object to be located is passed from the VP onto the PP.
Thus, restrictions made on the value of loc-modi in the lexeme of `among' permeate to
the value of loc-comp and thence onto the located object. A possible entry for `among'
is:
"
p-among
#
syn:loc:head:loc-modi:formal = 0 formal
syn:loc:subcat:car:syn:loc:head:qualia:formal = 0
This entry ensures, albeit indirectly, that the located object stands in the is-a relation to
the prepositional complement.
Assuming the above mechanism, it is possible to dene an entry for `between' that
incorporates the second type of information mentioned above, namely that this preposition normally modies connecting or separating nouns. If nouns such as `border', `gap',
`line' and possibly `space' were dened as having a separating Telic role, and `tunnel' and
`connection' had a connecting role, the two monolingual lexemes for `between' might be:
h
h
p-between
syn:loc:head:loc-modi:telic:purpose:lex =
(separate 1)
syn:loc:head:loc-modi:telic:purpose:lex =
(connect 1)
p-between
i
i
With these entries, the following disambiguation would be eected:
Spa: Hay un tunel entre los valles mas importantes.
Eng: There is a tunnel between/*among the most important valleys.
225
since a tunnel is not a kind of valley. It may be argued that `between' is selected in such
cases because one infers that there are two connected valleys. However, it seems that it is
the lack of a common genus that triggers this inference and induces selection of `between'.
In cases where the conditions for both `among' and `between' are satised, both prepositions are possible. In the example below I have followed Procter (1978) in classifying
tunnels as a kind of passage, although using other nouns such as `route' would give the
same results:
Spa: Hay un tunel entre los pasajes mas importantes.
Eng: There is a tunnel between/among the most important passages.
The next section discusses the last type of disambiguation considered.
6.4.9 Disambiguation Through Discourse Semantics
Up to this point I have only dealt with disambiguations that required no logical inference.
In this section I set this assumption aside and discuss a type of ambiguity that requires inferencing at the discourse level. The problem I consider involves movement verbs modied
by prepositions that allow static and goal alternations:
Eng: The cat ran under the table.
1) Spa: El gato corrio debajo de la mesa.
2) Spa: El gato corrio hasta debajo de la mesa.
Selecting between 1) and 2) is not possible in the present system because disambiguation
of this type requires contextual reasoning. However, I consider these ambiguities because
proposals have been made for their resolution, and it is important to see how compatible
these proposals are with the present one.
Before considering one such proposal, I will make one clarication. The IL framework
attempts to provide a theory neutral representation with which to investigate translation
problems. Furthermore, by only using lexical signs during transfer the modularity of the
monolingual and multilingual components is increased. However, there are many issues
in translation that need more elaborate structures for their solution, including the type
of disambiguation just described. In such cases, construction of a semantic representation
can take place without jeopardizing the modularity of the system if transfer is kept at
the lexical level. For example, the structures of Asher and Sablayrolles (forthcoming)
(AS94 henceforth), to be described next, can be constructed during analysis and used to
disambiguate the SL sentence before lexicalist translation takes place.
With the goal of dealing with the spatio-temporal structure of a text, AS94 propose
a compositional theory of meaning for motion complexes in French which is used for predicting a number of properties of a text. For instance, one of the goals of the theory is
to calculate the position of a discourse participant at a certain time. The theory consists
of the following components: the formal language described by Asher (1993), based on
the Discourse Representation Theory of Kamp (1981), which includes a number of axioms
establishing the discourse relations that exist between sentences in a text; a typology of
spatial entities consisting of locations, positions and postures; a classication of motion verbs into verbs of change of location, change of position, inertial change
of position and change of posture; a set of functions which return the source, path,
226
goal and spatio-temporal referent of an event, the event's moving entity and its reference
location; seven spatial relations matched by the seven generic locations: Z-inner-halo,
Z-contact, Z-outer-halo, Z-outer-most, Z-inner-transit, Z-contact-transit and
Z-outer-transit; polarity functions that determine whether a verb or preposition is
initial (source), medial (path) or final (goal), and a number of axioms encoding the
compositional rules governing the combination of verbs and PPs. In addition, AS94 oer a
classication of French prepositions into positional (static) and directional (dynamic),
further subclassifying the latter depending on the preposition's polarity and the generic
locations associated with it. Finally, a subclassication for verbs of change of location
is also proposed based on the generic location involved in the meaning of the verb.
Rather than explain each of these points in detail, I will give an overview of how AS94
eect the disambiguation in question and then compare the classications they propose
with the one presented here. Consider the following two-sentence discourse:
Discourse 1
Fre: Jean a couru dans le jardin. Il a vu le chat a travers la fen^etre et a voulu l'attraper.
Lit. John ran into the garden. He saw the cat through the window and wanted to catch it.
In the French version, the rst sentence is ambiguous between `John ran in the garden'
and `John ran into the garden'. It is only through the second sentence and the semantics
of the discourse that disambiguation can be achieved. To select the intended reading, the
rst step in the theory of AS94 is to construct the semantic representation of the two
sentence readings; these readings are that John ran in the garden and that he ran from
outside the garden and into it. Next, the discourse relation between the two sentences is
established; for this short discourse it is Explanation. In addition, a series of inferences
lead to John's running being interpreted as part of his goal to catch the cat. After this,
the initial position of John is established to be indoors (not in the garden), because of his
seeing the cat through the window and other properties of the two sentences. From this,
it is deduced that the running does not start in the garden. Finally, further reasoning
reveals that John's destination is where the cat is, namely the garden. Hence the running
is into the garden and disambiguation is achieved.
Although the above theory is quite dierent in purpose to that proposed in this thesis,
the two are compatible to a certain extent. I shall compare them in terms of their preposition and verb classications only. Their similarities and dierences are best discussed
with the aid of Table 6.1. For a pictorial comparison with Figure 5.2, the diagram for the
seven generic locations of AS94 is shown in Figure 6.5.
One of the main dierences between AS94 and the spatial classication I have proposed
is that the latter does not incorporate the distinction location, position and posture.
One reason for this is that these notions are intrinsically associated with the semantics
of the verb, and since verb meaning was only partly considered in this thesis, it is not
surprising that AS94s classication is more detailed in this respect. Another dierence
is their use of transit zones in describing the semantics of change of location verbs.
Again this is a verb semantics issue, as may be conrmed by noting that AS94 do not use
transit zones to describe the semantics of any preposition. On the other hand, AS94
do not consider path-end relations in any detail, nor do they explain how the selection of
prepositions closely associated with the noun they modify (i.e. the lexicalized relations
227
AS94
directional prepositions
positional prepositions
initial polarity (prep)
medial polarity (prep)
nal polarity (prep)
inner-halo/contact zones
outer-halo zone
transit zones
outer-most zone
location
position/posture
change of location verbs
change of position verbs
change of posture verbs
Proposal Here
dynamic relations
static relations
source relations
path relations
goal/direction relations
internal relations
external relations
path-end relation
spatial relations
transitive movement verbs
movement verbs
non-movement verbs
Table 6.1: Comparison between Asher and Sablayrolles and present proposal.
proximity limits
Z-contact-transit
Z-contact
Reference location
Z-outer-most
Z-inner-halo
Z-inner-transit
Z-outer-halo
Z-outer-transit
Figure 6.5: Asher and Sablayrolles' seven generic locations.
228
and related path and path-end relations) is eected. For instance, it is not clear that AS94
can treat the three aspects of the `bus/car' problem appropriately.
Discourse semantics of the AS94 type can be incorporated into the lexicalist framework
advocated here by constructing DRSs in parallel with IL representations. This might seem
redundant but one must remember that the IL representation is based on the very lexemes
that serve as input to the parser, which means that they have to be available during
analysis anyway. Lexical disambiguation of prepositions through discourse semantics can
then proceed in the way described above either in the SL, in the TL module or in neither.
For example, if a language could preserve the path and goal ambiguities that English has,
disambiguation of the above kind would not have to be carried out; in this way, the same
level of specicity in source and target language could be maintained.
6.5 Conclusion
This chapter showed how the IL representation together with lexicalist generation, Qualia
structures and the spatial relations hierarchy are combined into a single system for translating and disambiguating prepositions.
I proposed Qualia structure as an appropriate theory for expressing noun knowledge
and showed how locative types could be incorporated into it. Qualia structure can be
represented naturally as TFSs; it is also independently motivated, having arisen from
a desire to explain certain types of polysemy. Locative types embody the insights of a
number of researchers but in a much more uniform and practical way by using frequency
of co-occurrence to dene the type of a noun and using the lexicalized relations as the
value of this type.
In the latter part of the chapter, dierent types of disambiguation arising in PP translation were considered. As a framework in which to discuss the various ambiguities arising
during translation, the spatial relations hierarchy proved useful, particularly in the case
of the lexicalized relations. It was also used for deciding on the possible translations a
preposition could have. TL ltering was the key technique for disambiguation, allowing
complete independence of the monolingual modules and signicantly reducing the complexity of the transfer module. Information from dierent sources was necessary for TL
selection; most of this information was made available through the same mechanisms of
unication and binding that are necessary for parsing and generation. In cases where inferencing was unavoidable, it was argued that the IL representation could be constructed
independently of the logic or semantic formalism adopted.
229
Chapter 7
Evaluation
The evaluation of machine translation systems is a topic of major importance which has
spurned much recent research (Arnold et al. 1993b). It is also a dicult issue due in
large part to the many factors which have to be taken into account when determining the
quality of a translation and its appropriateness. In this chapter I consider the issue of
MT evaluation, and describe the method adopted and the results of an evaluation of the
system proposed here as applied to a wider range of real data than that used in the rest of
the thesis. In addition, I assess the scalability of the system by describing an experiment
to determine the eect that system expansion has on translation quality. This is followed
by a simple example of how translation problems caused by ellipsis resolution could be
approached within the IL framework. Finally, a discussion of some problems identied
during evaluation and scalability assessment are discussed.
There are many issues common to MT and NLP systems evaluation; these include
the degree of coverage of the system, its robustness, the amount of human intervention
required, eciency, total cost, maintainability, extendability, ease of integration with other
information products such as Desktop Publishers and communication networks, possibility
of integration with other tasks such as speech processing, reasoning, machine learning,
information retrieval, expert systems and database technology.
7.1 Evaluation of Translation Systems
As far as MT is concerned, the distinguishing factor in evaluation is determining the quality and appropriateness of the translations produced by the system. However, assessing
translation quality is dicult because in general a given text will have more than one
correct translation, and because translation mismatches between languages lead to TL
sentences having dierent amounts of information from those in the SL. In general, dierent translations arise from imperfect equivalences between source and target sentences, or
from synonymy relations in the TL. Mismatches often arise because of missing senses in a
language which cannot easily be approximated through paraphrase. For example, the rst
pair of alternative translations below is due to lack of a direct equivalent between English
and Spanish; the second pair is due to synonymity between the two Spanish sentences,
while the third pair gives alternative translations, neither of which can convey the notions
of `pouring onto' and `pouring out' simultaneously.
230
Eng: She swam across the river.
Spa. 1: Cruzo el ro nadando.
Spa. 2: Nado hasta el otro lado del ro.
Eng: They crowd into congested areas.
Spa. 1: Se amontonan en areas congestionadas.
Spa. 2: Llenan areas congestionadas.
Eng: Lava poured out onto the plain.
Spa. 1: La lava sala a raudales a la planicie.
Spa. 2: La lava caa a raudales sobre la planicie.
For these reasons, dierent strategies for determining whether a translation is good
or not have been used. The most common approach for determining translation quality
is to ask a translator or other expert to rank the output texts for quality, using some
predened notion of quality (King 1991). Other mechanisms include measuring editing
time or throughput, counting and weighting dierent types of errors in the output, using
TL patterns to automatically score translations, and computing statistical measures of
the distance between system output and human translations. All these mechanisms have
their own advantages and disadvantages, and consequently, they are more or less suited
to dierent evaluation objectives. It is a consequence of their diversity that evaluations
using one approach cannot easily be compared to evaluations using other approaches, and
hence care must be taken when considering the results available in the literature.
Another issue that complicates MT evaluation is that translations may be appropriate
for some purposes, but not for others. Thus, for information gathering or scanning, very
bad translations are frequently acceptable. On the other hand, machine translated text
intended for publication needs to be of a fairly high standard to justify post-editing.
Apart from the problem of determining translation quality, MT evaluations normally
require a strategy for selecting evaluation material. While there is no generally agreed
method for doing this two widely used approaches may be identied.
7.1.1 Test Suite Evaluation
Test suite evaluations involve a carefully constructed set of examples, each testing a particular linguistic or translation problem. For example, Arnold et al. (1993a) automatically
construct test suites which systematically test various linguistic phenomena. Similarly,
Gamback et al. (1991) test the degree of compositionality of their MT system by constructing sentences in which dierent known transfer problems are included in dierent
linguistic contexts. For instance, translation of `like' into the Swedish phrasal verb tycka
om (lit. `think about') is tested in contexts such as negation, wh-questions and others.
Test suite evaluations are useful for assessing the scalability and generality of MT systems.
However, one problem with test suite evaluations is that they assume that the behaviour
of a system can be projected, from carefully constructed examples, to real texts. In
other words, test suite evaluations only indirectly evaluate the behaviour of the system on
naturally occurring sentences.
231
7.1.2 Corpus Evaluation
In corpus evaluations, a possibly specialized corpus of texts is used as input to the MT
system. Such evaluations have been performed for commercial and experimental systems.
For example, Bennett and Slocum (1988) report on ongoing evaluations of the METAL
system with approximately 1,000 pages of actual texts over a period of 5 years; rates of
between 45 and 85% translations requiring no post-editing are achieved (p. 128). The
spoken language translation system described by Rayner et al. (1993) has been evaluated
using a set of 633 sentences selected from the ATIS corpus. Rates of 41.8% acceptable
translations, based on the scores of bilingual judges, are reported. Unfortunately, for the
purposes of this thesis, the speech recognition error rate (which may be as high as 37.4%)
cannot be factored out of this value due to strong dependencies between the components
of pipelined NLP systems (Rayner et al. 1994).
Corpus evaluation is not completely satisfactory either. One problem is that it does not
systematically test all possible sources of incorrect translations; instead it considers the
most commonly occurring constructions, since these are likely to abound in any randomly
selected text. Ideally, one should apply a judicious mixture of both methods in order to
gain insight from both. For example, King and Falkedal (1990) propose two sets of test
data: one based on the type of text the system is expected to translate, and the other
based on more general examples such as may be constructed without any particular text
type in mind. Such tests would evaluate both suitability to actual texts and extendability
of the system, but of course they are lengthier and more expensive to conduct.
7.2 Experiment
An evaluation of the system presented here was conducted to assess its scalability and
adequacy at handling phenomena which occurs in real texts. It is clear that for this type
of evaluation, a corpus approach is the most suitable, since it will reect such texts more
faithfully. However, given the limited resources and scope of this study, a more restricted
form of corpus evaluation had to be performed, in which elements of test suite evaluation
were included.
To begin with, 20 sentences were selected from the LOB corpus. Each sentence contained at least one preposition from the following selection: above, across, along, behind,
below, in front of, inside, into, near, o, onto, over, through, under, within. The sentences
were chosen semi-randomly: spatial senses were identied manually to obtain a pool of
sentences containing spatial expressions, and then 20 sentences were selected randomly
from this pool.
Some of these sentences were then edited to facilitate initial development of the grammars and lexicons. For example, an original sentence such as:
I grasped Seona's arm, pulled her with me as I went silently along the passage keeping close
to the wall.
was edited to:
I went silently along the passage.
232
After this editing phase, the sentences were translated manually into Spanish. Then the
English and Spanish grammars were updated, together with the monolingual and bilingual
lexicons, morphological rules, and the bilexical rules. Additionally, simple heuristics for
parse tree ranking (rightmost attachment), TL bag selection (smallest cover set), and selection of generated sentences (selecting those with smallest modiers, measured as number
of lexemes, closest to modiee) were implemented.
Following this, a further 24 sentences were added in a similar fashion, but this time,
each sentence contained at least one of the prepositions `at', `in' or `on' in their spatial
senses. For this set of sentences, however, the editing was less stringent, allowing a fuller
range of grammatical phenomena to be included. For example, a sentence such as:
During our Christmas holiday on the Solway we had heard rumours that very large numbers
of geese assembled at the head of the great estuary upon their rst arrival from the Arctic
in late September.
was entered in the development corpus as:
Large numbers of geese assembled at the head of the great estuary
In other words, much more of the original sentence containing the preposition has been
left unedited. The full list of sentences employed is given in Appendix A.
At the end of this system expansion process, a list of words was compiled and arranged into a questionnaire (see Appendix B), which was distributed to 8 native speakers
of English. The questionnaire asked for 5 to 10 short sentences each of which should have
included at least one spatial preposition. 6 questionnaires were returned, containing 51
sentences. Of these, 1 questionnaire with 9 sentences contained quite complex constructions which were only used for further extension of the system. An example of one of these
sentences is:
A crowd of North of England businessmen marched along silently slipping quietly through
the brown lands and on to the Scotch Corner Hotel
There is no theoretical reason for excluding such a sentence from the nal results; given
sucient time and resources, this and many other constructions could have been handled.
However, the purpose of this exercise was not to produce a wide coverage grammar, but
to test the potential for application to real texts of the ILs and the relations hierarchy.
7.3 Results
7.3.1 Parsing
42 of the newly constructed sentences were submitted to the system for translation into
Spanish. 29 sentences were parsed after only some minor changes to the grammars and
lexicons; this form of system updating is carried out in many instances of MT evaluation
as it resembles more closely real translation environments and thus is a better reection
on system performance. All 29 sentences were translated.
In order to give an indication of the eort required to cover the 13 unanalyzed sentences
I will give a summary of the shortcomings of the grammar which caused these omissions.
233
Incomplete verb subcategorization alternations caused 6 omissions; for instance, the phrase
`I felt inside his pocket' is analyzed, but not `I feel dierent on top of the world'. The
following phenomena resulted in one failure each: lack of passivisation of dative verbs; lack
of an adverbial sentence modication rule; absence of imperatives; the noun `drink' was
used as a verb; use of certain semi-idiomatic PPs such as `by accident'; absent phrasal verb
`slip in'; incomplete NP rules (e.g. for `inside Cambridge station' cf. `inside the station').
The 29 translated sentences were distributed to a group of 6 native or near native
speakers of Spanish who were uent speakers of English. They were asked to score the
translations for both intelligibility and accuracy.
7.3.2 Scoring
The scoring scale given to the Spanish speakers was the Nagao scale (Nagao et al. 1988),
which is used to determine the quality of translations produced by MT systems. The scale
was adopted because it has been used by other groups, thus oering the possibility of
comparison between their results and those obtained in the present evaluation.
Under this scheme translated sentences are scored for intelligibility and accuracy.
Roughly, intelligibility, measured on a scale of 1 to 5, indicates how easily the translated text can be understood by a TL speaker. Accuracy, on a scale of 0 to 6, tests how
faithfully the translation conveys the meaning of the original sentence. The questionnaire
for assessing translation quality is given in Appendix C, and includes a more detailed
description of each value in the two scales just mentioned. Four questionnaires were returned, and the score for each sentence was taken as the average of the four scores given
by each respondent.
Score No. % cum. & % up to 4 cum. % up to 4
1
8 19
19
27
27
2
17 40
59
59
86
3
3 7
66
10
96
4
1 2
69
3
100
5
13 31
100
Table 7.1: Intelligibility percentages: out of all sentences, and up to score 4.
Score No. % cum. % % up to 5 cum. % up to 5
0
9 21
21
31
31
1
9 21
42
31
62
2
4 10
52
14
76
3
3 7
59
10
86
4
4 10
69
14
100
5
0 0
69
0
100
6
13 31
100
Table 7.2: Accuracy percentages: out of all sentences, and up to score 5.
The results of the evaluation are given in Tables 7.1 and 7.2. Percentages and cumulative percentages for each category are calculated in two ways: out of all sentences, with
unparsed sentences given the lowest score, and out of all sentences excluding the lowest
234
ranking ones (i.e. excluding unparsed sentences). In this way, the lower and upper bounds
of quality for the current proposal can be estimated, and a more meaningful comparison
with other systems can be performed, as will become clear below.
7.3.3 Analysis of Failures
Before comparing the results just described with those found in the literature, I will
consider the causes for unintelligible and inaccurate translations. Intelligibility, associated
with monolingual sentence quality, suered most when sentences could not be associated
with everyday situations. For example, one of the sentences translated was:
Eng: The lizard head was well-oiled
Trans: La cabeza de lagarto fue lubricado
The problem here was that during intelligibility scoring informants had no access to the
source sentences; therefore, some Spanish translations seemed very odd, and were consequently given a low score. The reason informants produced such sentences might be
attributed to the necessarily reduced vocabulary used.
Incorrect translation of nouns and verbs also resulted in unintelligible sentences. For
example:
Eng: The water went beneath the sand
Trans: el agua camino debajo de la arena
Here, the Spanish reads as the water walked beneath the sand, which is obviously wrong.
Morphological incompleteness in the generation phase also degraded the quality of
several translations. These arose because of certain simplications made in the morphological generator. That is, morphological generation was performed through lexical rules,
but during morphological synthesis, only one iteration of rule application was allowed. For
example, adjectives in Spanish must agree with their noun in number and gender, so 2
morphological rules were implemented: plural and feminine. However during generation
only one was allowed to apply, resulting in errors such as:
las debutantes *lubricada/lubricadas
A somewhat related issue was that of phonological conventions in Spanish, where the
preposition plus article combinations a el and de el must be contracted to al and del
respectively. Incorrect determiners was another cause of problems:
.. amo *oxgeno/el oxgeno
as was the absence of reexive pronouns:
mara f* rompio/se rompiog su brazo
Finally, due to the simplistic model of accusative marking in the Spanish grammar,
omission of the preposition a before certain kinds of objects caused further decreases in
the quality of certain Spanish sentences:
Trans: persiguio f? mara/ a marag
235
Turning now to accuracy, or how meaning-preserving the translations are, one source
of inaccuracy was low intelligibility (i.e. not fully grammatical Spanish sentences). Thus,
incorrectly inected adjectives and missing case markers and reexive pronouns all resulted
in a decrease in perceived accuracy. Certain idiomatic expressions also caused diculties:
Eng: The dancer marched in time to the crowd
Trans: el bailarn marcho en el tiempo a la muchedumbre
Spa: el bailarn marcho al compas de la muchedumbre
Here, the expression `in time to' acquires a particular meaning which has an accurate,
although non-compositional, translation.
Lastly, the incorrect translation of one preposition resulted in the following inaccuracy:
Eng: We are dierent across Europe
Trans: somos diferente a cada lado de Europa
Spa: somos diferente a traves de/por toda Europa
This and other problems are considered in Section 7.7.
7.4 Comparison with Other Systems
Care must be taken when comparing MT systems based on the results of their evaluations since there is a large variety of systems, each with its own particular objectives
and problems. Furthermore, evaluations dier in their scope and level of generality, and
it is dicult to identify a common level of performance at which a system should aim.
One can appreciate this dilemma more clearly by considering the range of percentages for
translation quality reported by dierent groups; these values can range from 40% to 100%
under some given measure of translation quality. For example, Ikehara et al. (1991:105)
report a sucient degree of accuracy being preserved for 40 to 50% of unseen sentences,
drawn from a corpus of newspaper lead sentences. Brown et al. (1992:83) claim 60% correct translations from randomly selected sentences from the Canadian Hansard corpus.
Sato (1993:67) reports 90% accuracy for the translation of technical terms in Computer
Science. Mitamura et al. (1991:59) obtained 100% accuracy and good quality translations
for 200 sentences from the domain of technical electronic manuals. Clearly the specic
testing method, test data, domain and size of the system, all are factors which strongly inuence these results; nevertheless, the range of gures highlights the diculty in deciding
on an appropriate benchmark
While the most informative approach to comparing dierent systems would be to subject them to identical tests under identical conditions, as in the evaluations reported in
Jordan et al. (1993), this is not often practical. Access to complete MT systems is normally restricted by geographical, legal or nancial reasons, and by issues of compatibility
and specicity such as language pair available, subject domain, grammar coverage, and
lexicon size.
I have taken the approach of comparing the present results with those which have used a
similar scale of quality, namely those from the corpus evaluations of Nagao et al. (1988:175),
which involved 1,300 sentences from scientic abstracts, translated from Japanese into
English, using an example-based MT system. A comparison of their results and the
present ones is given in Table 7.3.
236
ILs Nagao's Nagao's
Score cum. &
cum.
No.
1
19
24
318
2
59
54
381
3
66
83
384
4
69
95
162
5
100
98
35
Other
100
20
Table 7.3: Intelligibility percentages compared: for all sentences.
Since the present approach is rule based, and therefore does not have fail-soft behaviour,
a sentence either translates completely or not at all. This is not the case in Nagao's
system, which will produce at least incomplete translations for almost all inputs. This
can be appreciated from the smoother tail o into intelligibility exhibited by Nagao's
system. For this reason, a second list of cumulative percentages, shown in Table 7.4, was
calculated including only those sentences which were translated by the present system,
matched against Nagao et al.'s translations which achieved a score of 4 or better.
ILs Nagao's
Score cum.
cum.
1
27
26
2
86
56
3
96
87
4
100
100
Table 7.4: Intelligibility percentages compared: up to 4.
For these sentences (Table 7.4) the quality of the TL sentences through ILs is markedly
better than Nagao et al.'s. A simple 2 test shows that the dierence between the IL values
and Nagao's is signicant to 0.005, meaning that it is safe to assume that the dierence
between the two sets of values is not due to chance (for these calculations, expected IL
results were assumed to be proportional to Nagao's; also, categories 3 and 4 had to be
combined for both sets to avoid bad approximation to a 2 distribution).
Turning now to accuracy, the cumulative percentages for ILs and Nagao et al.'s system
are given in Tables 7.5 and 7.6; these results show a partially dierent picture.
ILs Nagao's Nagao's
Score cum.
cum.
No.
0
21
19
257
1
42
41
275
2
52
60
248
3
59
73
174
4
69
88
190
5
69
92
61
6
100
97
66
Other
100
29
Table 7.5: Accuracy percentages compared: for all sentences.
237
ILs Nagao's
Score cum.
cum.
0
31
21
1
62
44
2
76
65
3
86
79
4
100
94
5
100
100
Table 7.6: Accuracy percentages compared: up to 5.
Whilst cumulative percentages for all sentences display a similar pattern to those for
intelligibility, those for translations with accuracy of 5 or better (i.e. for sentences actually
translated using ILs) are roughly equivalent. In fact, a 2 test similar to that for intelligibility shows that the dierence between the two sets is not signicant even at 0.05; in
other words, the dierences are likely to be due to chance.
Thus as a preliminary conclusion one can say that while the intelligibility of the Spanish
is high, its accuracy is comparable to that achieved with Nagao et al.'s system, which may
be taken as acceptable.
Before ending this section I will briey compare the above results with those obtained
for the KANT system, as presented by Nyberg et al. (1994:98). Their results are for a
system under development for the translation of heavy equipment documentation from
English into French, and may be considered as the best automatically produced translations that can be achieved at present. A corpus of 608 sentences chosen at random was
used as input to KANT; 546 sentences were translated, of which 491 received correct or
acceptable translations. Nyberg et al.'s denitions of correct and acceptable are (p. 97):
Correct The output sentence is completely correct; it preserves the meaning of the input sentence
completely, is understandable without diculty, and does not violate any rules of grammar.
Acceptable The sentence is complete and easily understandable, but is not completely grammatical or
violates some SGML tagging convention.
Thus one may equate such correct and acceptable sentences with sentences which
achieved scores of 1 or 2 for intelligibility and 0 or 1 for accuracy. There were 18 sentences
with such a combined average score in the evaluation presented here. A comparison of
their results with mine is given in Table 7.7.
ILs Nyberg et al.
% of all sentences
43
81
% of trans. sentences 62
90
Table 7.7: Percentage of correct or acceptable translations for two systems.
These results indicate that much work is needed to increase the coverage of the English
and Spanish grammars and to improve the disambiguation of TL verbs, nouns, adjectives
and adverbs. The next section considers this and related issues.
238
7.5 Scaling
It is generally agreed that the transition from a prototype MT system to a system which can
be useful in real applications is an arduous one, and that it usually involves unpredictable
diculties and forces compromises in the theoretical rigour of a design. It is therefore
relevant to consider the ease with which the present system can be scaled up to handle a
wider range of texts.
While the only way of being certain that a system can scale up is by actually scaling
it up, some indication of the diculty in doing this may be obtained through indirect
means. The use of test suites in this respect has already being noted, as has the proposal
by Gamback et al. (1991) for measuring compositionality. Due to the restricted number
phenomena considered in this thesis, it was decided to probe scalability by counting the
changes to the grammar rules, lexical categories, lexical entries, lexical rules, bilingual
entries and bilexical rules necessary for achieving average scores of 2 or less for intelligibility
and of 1 or less for accuracy for all 42 sentences used in the experiment above; this includes
extending the grammars and lexicons to actually translate the 13 sentences which failed
in the original evaluation (see Appendix D). A smaller test on unseen sentences was then
performed to estimate the generality of the changes thus made.
The number of additions and modications is given in Table 7.5. The table distinModications Additions
Eng. PS rules
1
9
Spa. PS rules
9
6
Eng. lexical categories
1
1
Spa. lexical categories
4
6
Eng. lexical entries
1
9
Spa. lexical entries
3
13
Eng. lexical rules
1 5(2 morph)
Spa. lexical rules
1
Bilexical entries
2
26
Bilexical rules
8
5
guishes between lexical categories (e.g. adjectives) and entries (e.g. little). Also, in order
to improve eciency, a context matching mechanism was implemented, which allows several translation equivalences to be expressed more easily:
Eng: I drink on the plane to sleep
Spa: Bebo en el avion para dormir
The pattern which allows the highlighted translation is:
toe;x;f ( Vf;x ) , para ; ;f ( Vf;x )
The IL within parenthesis is used to provide the appropriate index bindings to `to' and
para but it is not actually translated; only `to' is translated by this pattern. Other bilexical
entries are used to translate the verb. This mechanism is of an experimental nature, and
certainly requires further elaboration for applicability to a wider range of cases. The actual
number of such patterns is included in the count for the bilexical entries.
Further to these additions, other heuristics were incorporated to improve translation
quality. Two of these heuristics disambiguated analyses produced by the parser: one
239
allowed the specication of tree patterns which gave preference to certain SL syntactic
congurations, while the other identied preferred readings for particular lexemes. For
example, a tree pattern used was:
(e-p-lex-entry_1 (time_1 (to_2 *)))
This pattern gave preference to analyses of the form `they marched in time to the crowd',
where the `to' PP must be a modier of `time' rather than of `march'. An example of a
preferred reading heuristic is one which selects dynamic readings of a preposition when
used in the context of movement verbs:
Eng: the goose chased the cat through the park.
Trans: el ganso persiguio el gato por el parque.
(The static reading would be paraphrased as `the goose chased the cat on the other side of
the park'). The actual mechanism to achieve this simply inspected the TFS of prepositions
in all the analyses of a sentence, and gave preference to those with a dynamic spatial
relation modifying a movement event.
A third heuristic was included during TL bag selection (i.e. after transfer and before
generation) which gave preference to the co-occurrence of semantically compatible classes
of lexical entries. For example, when translating:
History is the hinge of time
`hinge' may translate as eje (axis) or gozne (hinge). However, the occurrence of eje in
close proximity to tiempo (time) was preferred to that of gozne (hinge); this is because eje
can have abstract readings, while gozne generally cannot. The algorithm inspected pairs
of lexical TFS, searching for manually specied compatibility classes. For instance, the
occurrence of a concrete noun and an artifact (see Section 6.3.2) increases the likelihood
of a TL bag being an appropriate translation. Proximity is measured using the notion
of connectedness (see Section 2.2.3): directly connected lexemes are deemed closest, thus
making a larger contribution to the total value of the semantic correlation heuristic.
To test the generality of these changes, a further 20 sentences (see Appendix E) were
collected, translated, and scored in the manner already described. 15 sentences were parsed
and 13 translated (no additions to the system were made during scalability evaluation).
The average intelligibility score for translated sentences was 1.5, while their accuracy
was 1.2. In other words, while the quality of the Spanish sentences produced remained
high, the meaning of the original English sentences was not conveyed very accurately (i.e.
while intelligibility remains at an average of 2 or higher { 1.5 in fact { accuracy falls from
an average of 1 or higher to 1.2 for unseen sentences). To maintain accuracy would require
a more powerful disambiguation mechanism, and a larger idioms database. Thus, like
most MT systems, a large amount of knowledge is required for high quality translations,
but unlike many such systems, the grammaticality of the TL sentences remains acceptable
after expansion.
7.6 Relationship to Other Translation Problems
While disambiguation is probably the most pervasive problem in MT, there are other
issues which require consideration if improvements in quality are to be achieved in the
240
long term. Among these problems one may identify text categorization and subject area
identication, anaphora resolution, discourse reasoning, scoping and ellipsis resolution.
A possible starting point for anaphora resolution in the IL framework was presented in
Section 2.3.13; however, as stated there this is a dicult problem which requires elaborate
semantic representations for its solution. In the area of Information Retrieval, techniques
are being developed for automatic subject area identication (Apte et al. 1994) which
may be useful for disambiguation; however, the integration of such techniques into the IL
framework needs to be investigated, particularly as it concerns the integration of symbolic
and numerical text categories, and the application of such categories to disambiguation.
A possible discourse reasoning framework applicable to the translation of spatial prepositions was introduced in Section 6.4.9; incorporation into the IL approach would involve
considerable work in order to increase its coverage and applicability to naturally occurring sentences. Scoping in a framework compatible with the IL approach is described by
Copestake et al. (1995), in which the idea of Minimal Recursion Semantics is proposed; the
basic strategy of that approach is that of introducing extra variables in order to capture
scoping distinctions, in a manner similar to the use of labels by Frank and Reyle (1995).
This technique could be incorporated into the IL lists by creating additional indices, one
for each scoping domain, and use these indices during transfer in order to achieve correct
TL scoping.
I will now describe a possible approach to ellipsis resolution in the IL framework, based
on the proposal for the interpretation of elliptical constructions of Dalrymple et al. (1991).
This discussion, however, should not be taken as indicative of the adequacy of the IL list
framework for dealing with ellipsis; instead, its goal is to show that, given a situation in
which ellipsis resolution is called upon for correct translation, the mechanism described
below could be used or augmented.
Dalrymple et al. (1991) use a restricted form of second order unication to compute a
function based on the complete constituent (the source clause) in an elliptical construction,
which is then applied to the parallel elements in the constituent with the ellided material (the target clause) in order to arrive at a semantic interpretation for the complete
expression. For example, to derive the logical form of:
Dan likes golf, and George does too.
The equation:
P(dan) = like(dan,golf)
is solved for P. Here, the underlined element indicates that it is a primary occurrence. A
primary occurrence of an element in a semantic form is one which is directly associated
with an element in the source clause (op. x2.5). The solutions for P in this equation are:
P 7! x.like(dan,golf)
P 7! x.like(x,golf)
An imposed restriction on possible solutions is that they do not contain primary occurrences; hence, only the second function is selected as a solution. This function is applied
to the parallel element `george' to construct the interpretation of the target clause, which
is nally combined with the semantics of the source clause to give:
241
like(dan,golf) ^ like(george,golf)
It is clear that this approach to ellipsis resolution relies on the identication of parallel
and primary elements; this problem is rather complex and, as Dalrymple et al. argue,
involves more than a consideration of the syntactic structure of the sentence. A proper
evaluation of this issue is beyond the scope of this thesis. I will also be ignoring problems
caused by quantier scoping, sloppy versus strict readings, cascaded ellipsis, etc. Instead,
I will simply describe a mechanism for tackling ellipsis resolution modelled on the one just
presented; I adopt a crude solution to identication of parallel and primary elements in
order to illustrate the mechanism.
Consider the following translations:
Eng: Dan likes golf, and George does too.
Spa. 1: A Dan le gusta el golf, y a George tambien.
Spa. 2: A Dan le gusta el golf, y a George tambien le gusta.
Spa. 3: * A Dan le gusta el golf, y a George tambien le gusta el golf.
Here, 1 is similar to the original English sentence, while 2 and 3 have ellided material
restored. Although the best translation here is 1, translation 2 is acceptable; however, 3
is ungrammatical. For expository purposes, assume that we wish to obtain translations 1
and 2, but not 3.
Furthermore assume that parallel elements are identied syntactically; that is, grammar rules will mark the subjects of the conjuncts as parallel:
S =) S[subj-parallel= +, ellided= none] Conj S[subj-parallel = +, ellided= vp ]
S[subj-parallel = X, ellided = Y] =) NP[parallel= X] VP[ellided= Y]
VP[ellided= vp] =) [does, too]
Here, the value of the feature `subj-parallel' is intended to percolate downwards to the
lexical entries comprising the subject NP. Obviously these rules are oversimplied but
they will serve to demonstrate the main idea. Assume that after analysis, the following
IL representation is constructed:
[Dana, likesb;a;c, golfc, andb;d , Georgee , doesd;e , tood ]
with `Dan' identied as the primary element and `George' as its parallel element. Application of the conjunction rule triggers a mechanism analogous to ellipsis resolution, but
operating over ILs. The equation to solve would be:
P(Dana ) = [Dana, likesb;a;c, golfc]
Now, instead of applying second order matching, primary occurrences of an element are
replaced by variables ranging over ILs. In addition, bindings between indices are preserved:
P 7! Xx [Xx , likesz;x;y , golfy]
(This expression is simply a convenient notation for the process at hand; no model-theoretic
semantic interpretation is being attributed to the expression.)
Application of P to `Georgee' results in the reconstructed IL list:
1) [Georgex , likesz;x;y , golfy ]
242
while the original IL list, less the parallel element `Georgee ', is:
2) [Dana , likesb;a;c, golfc, andb;d , doesd;e , tood ]
During transfer both lists 1) and 2) are used, but ILs not present in the original sentence
{ that is, `likes' and `golf' in 1) { are marked as such; this marking is inherited by their
TL translations, reecting the fact that these ILs arise from material not present in the
original input. In order to generate all and only the grammatical Spanish translations
given above, the success condition in the generation algorithm is slightly modied to allow
as valid sentences those which do not include all, or even any, of the marked lexemes just
mentioned. For example, one sentence generated will be:
A Dan le gusta el golf, y a George tambien with remainder fle, gusta, el, golfg
Since the reminder elements would be marked as not having originated from original
English lexemes but from reconstructed ones, the Spanish sentence would be deemed
valid. Similarly, the sentence
A Dan le gusta el golf, y a George tambien le gusta with reminder fel, golfg
would also be output. Finally, the ungrammatical
Spa. 3: * A Dan le gusta el golf, y a George tambien le gusta el golf.
would need to be disallowed by the Spanish grammar, and therefore not constructed. The
grammar could reject such sentences by using the fact that tambien generally cannot occur
without ellided material.
It is worth stressing that this mechanism has not been implemented, and that it is
therefore dicult to see whether it would be useful in practice. It could be that some
translations require ellipsis resolution which can only be achieved with a fully developed
semantic theory.
7.7 General Problems
The main problems identied through the above evaluation were:
Coverage of the grammars is very narrow.
Disambiguation of nouns, verbs, adjectives and adverbs needs rening.
Certain inadequacies of the relations hierarchy were identied.
In this section I present possible directions for further work in order to tackle these problems. However, I should emphasize that the points made below do not constitute full
solutions, and that addressing each problem adequately would require a whole thesis devoted to each.
243
7.7.1 Grammar Coverage
A survey of wide coverage syntactic analysis systems is given by Black (1993). Black
reports Brill (1993) as achieving 29% accurate syntactic analyses of 500 unseen sentences
of up to 25 words in length, with a grammar automatically learned from 250 sentences.
For a large rule based system he reports 41% accuracy for grammatical analyses of 100
unseen sentences from the Wall Street Journal (WSJ). Statistical approaches reach 78%
parsing accuracy on 1,100 sentences after training on 28,000 sentences drawn from the
WSJ.
From the point of view of the present work it may be possible to adapt these techniques
for use in IL based translation. For instance, following Black (1993), the analysis produced
by Brill's system for the sentence `the boy saw the girl' looks like this:
[Sentence
[Noun Phrase The_ARTICLE boy_NOUN]
[Verb Phrase saw_VERB
[Noun Phrase the_ARTICLE girl_NOUN]]]
Automatic assignment of indices could exploit general properties of nouns, verbs, adjectives, adverbs and prepositions. For example, one can assume that the noun and the
article within the same noun phrase are co-indexed, and that verbs have an index for
each of their subcategorized constituents. Further, from the connectivity constraint, sister
constituents need to be connected, leading to co-indexation between a subject and its VP.
For the above example, indexing might result in:
[Sentence
[Noun Phrase The ARTICLEa boy NOUNa ]
[Verb Phrase saw VERBe;a;b
the ARTICLEb girl NOUNb ]]
From such a tree, an IL list could be extracted and used for transfer.
Obviously, there are many diculties with this proposal, not least that of establishing general rules of co-indexation between lexemes; it is unlikely that the above mechanisms alone are sucient for establishing the necessary bindings, especially for phenomena
that extends beyond local trees such as control, unbounded dependencies and pronominal
anaphora. An alternative approach would be to adapt the probabilistic content derivation
models of Alshawi (1994) and use them to construct IL representations.
7.7.2 Disambiguation of Nouns, Verbs, Adjectives and Adverbs
The disambiguation of verbs, nouns and other content words could take place by expanding
the Qualia structures discussed in Section 6.3.2. Indeed, during evaluation, these structures
proved useful for a number of simple cases. Consider the following translations:
Eng: Mary broke her arm.
Mary broke the chair.
Spa: Mara se rompio el brazo. Mara rompio la silla.
In the context of a body part, the verb `break' translates as the ditransitive romperse
(Fontenelle et al. 1994) whereas for other objects, its translation is romper. There are a
number of ways of tackling this phenomenon, but they all must rely to a greater or lesser
244
extent on the marking of brazo (arm) as a body part. This type of information can be
naturally encoded in the Qualia structure.
However, problems arise when assigning Qualia structures to nouns. In the case of
body parts, the situation may be relatively clear, but for other types of information such
as Telic roles, exact values are dicult to determine. For example, distinguishing between
purpose and function in the telic role requires careful argumentation. Even assuming
that the distinction is one of volition (e.g. one has intended purposes but not intended
functions) it is not clear what counts as a purpose: is the purpose of a spider's web-weaving
to catch ies, or to prevent it from starving, or to preserve the species? They all seem
equally valid purposes; their dierence is one of perspective. Perhaps the obvious purpose
is to catch ies, but how could this be shown to be the correct answer? Would such a
purpose be useful during semantic interpretation or translation? These are all issues that
need to be addressed, but that must remain as further work.
7.7.3 Inadequacies in the Spatial Relations Hierarchy
Finally, some inadequacies of the spatial relations hierarchy were identied. Below, the
starred translations are incorrectly predicted by the hierarchy.
Eng: 1) John connected them across the resistor.
Spa: 1') Juan los conecto *al otro lado de/a cada lado del reostato.
Eng: 2) We are dierent across Europe.
Spa: 2') Somos diferentes *al otro lado de/a traves de/por toda Europa.
Eng: 3) The oce is along the corridor.
Spa: 3') La ocina esta *al otro lado de/a lo largo del pasillo.
The general problem here is that, since `across' and `along' modify static verbs, the prepositions incorrectly receive path-end readings. In the case of 1) the correct interpretation
seems to depend strongly on the semantics of the modiee; for example, in the area of
electronics the word `across' appears to have a particular meaning:
The initial voltage across the resistor.
Since a voltage is dened with reference to two points, the expression describes a situation
paraphrasable as `the initial voltage between one side of the resistor and the other'. Thus,
it seems that the spatial interpretation of such phrases is linked to the semantics of the
modied verb or noun. Further investigation is required to ascertain the frequency of such
readings of `across' and the conditions under which they are appropriate.
Sentence 2) may be paraphrased as `we are dierent fin the whole of/all overg Europe'.
This sense of `across' is found in expressions such as:
There was soup across the table.
John spread paint across the wall.
The rst example involves an existential sentence, while the second example is reminiscent
of locative alternation verbs (Rappaport et al. 1993); these facts may again suggest a verb
induced interpretation for `across'. However, the precise behaviour of such interpretations
has yet to be identied.
245
Finally, in 3) the interpretation is one where the oce is at some point along the corridor. However, there is no such relation in the spatial relations hierarchy, thus highlighting
an important inadequacy. To redress this omission, one could proceed as follows. Assume
the label path-middle for this spatial relation, which indicates a location somewhere in
a path except at one of its ends. In Section 5.3.2 it was argued that prepositions such
as `across' in `the shop across the street' are static prepositions, but neither internal nor
external, and that for this reason they required a separate node in the hierarchy, namely
`path-end'. One property of path-end relations is that they allow explicit specication of
the origin of their reference path:
The post oce is across the road from here.
Such modications are allowed by path-middle prepositions too:
The oce is along the corridor from here.
Thus, in order to incorporate path-middle relations into the hierarchy, one could create a
node which subsumes both path-end and path-middle and whose distinguishing property
was modication by `from here'; call this node path-referent. Then, both path-end and
path-middle could be subtypes of this single type, capturing a useful generalization.
7.8 Conclusion
In this chapter, an evaluation of the IL system was performed in order to test its applicability to a wider range of real texts. The evaluation consisted of two main experiments:
one tested the adequacy of the framework in handling more complex constructions than
those presented in the rest of the thesis, while the other indirectly examined the system's
potential for scalability. A preliminary result is that ILs seem to achieve good intelligibility, which may be attributable to the generation algorithm constructing sentences fully
constrained by the TL grammar. Accuracy, however, appears to suer due to the impoverished semantic structures adopted. These results seem to be conrmed by the scalability
experiment, which shows that after increasing the coverage of the system, intelligibility
remains high, while accuracy is slightly diminished.
Only a limited number of translation problems could be tackled in the evaluation,
and therefore many issues were left unresolved. However, the potential for integrating
the present approach with the solutions to other problems in NLP was investigated by
adapting an ellipsis resolution technique to the IL framework and applying it to simple
translations. While for simple examples the strategy presented may be appropriate, the
complexity of the syntactic and semantic issues involved and the need for complex transfer
relations may require a more sophisticated semantic representation for use during transfer.
Various areas for future research became apparent during the evaluation, in particular, grammar coverage, verb and noun disambiguation and completeness of the spatial
relations hierarchy. Possible lines of inquiry were suggested for these problems, including
adapting wide coverage grammar techniques, enhancing Qualia structures and extending
the relations hierarchy.
On a more general level, it became apparent during evaluation that adequate comparisons between MT systems are dicult (or at the very least expensive) to perform within
246
the type of time span and resources available for the present type of project. This observation agrees with that of Jordan et al. (1993), who found the comparison of MT systems an
expensive and, in the case of research systems, not completely satisfactory process, even
though for their evaluation a team of researchers was employed to work on a dedicated
evaluation project. Part of this diculty stems from the lack of generally agreed evaluation measures which can be meaningfully related to what is expected in MT. Even within
the restricted objective of translation quality adopted in this thesis, preferences vary regarding the appropriate scale to use. Furthermore, while specialized test suite corpora are
being collected for the purpose of evaluating NLP systems (Nerbonne et al. 1993; Neal
et al. 1993), their application to MT is not straightforward because such test suites do not
necessarily probe issues involved in translation. Thus, Nerbonne et al. (1993:91) restrict
the vocabulary of their German test suite in order to concentrate on testing syntactic
phenomena; clearly, such a suite would not be adequate for testing TL disambiguation.
Using corpora of texts, such as those obtainable through the Linguistic Data Consortium
(LDC), for MT evaluations is not easy either, mainly because of the large eort involved in
developing appropriate monolingual and transfer components, and because of the diculty
in assessing the intelligibility and accuracy of the translations produced.
Finally, there is as yet no general forum comparable to the Message Understanding
Conference (MUC) or the Text REtrieval Conference (TREC) in which to objectively
evaluate and compare the performance of MT systems; perhaps this is a reection on
the diculty in accurately evaluating MT output. Yet, the eld would certainly benet
from an agreed infrastructure and set of standards, benchmarks and data similar to those
adopted in tasks such as text retrieval; however, these are not currently available in any
coherent form.
247
Chapter 8
Conclusion
In this chapter I will highlight the key issues covered by the thesis, respond to some possible
criticisms, and indicate a few directions in which the present system may be extended.
8.1 Key Ideas
Two of the main contributions made by this thesis are: a) a proposal for a transfer representation which is largely independent of monolingual syntactic descriptions and which can
tackle a range of problems described in the MT literature; b) a classication of the spatial
relations expressed by the prepositions in a language which is linguistically motivated and
shown to be useful for MT.
A Strongly Lexicalist Representation
Indexed lexeme lists, or IL lists for short, constitute the transfer representation. They are
strongly lexicalist and have a linear structure which makes them largely independent from
monolingual syntactic analyses; that is, because IL lists are minimally recursive, problems
of head switching, discontinuous translation units and intricate structural mismatches are
avoided. IL lists consist of lexical signs augmented with indices corresponding to variables
in an event based approach to semantics.
The representation is complemented with modied tlink-rules called bilexical rules. IL
lists in combination with bilexical rules can handle: argument switching, lexical gaps, lexicalization patterns, head switching, word order dierences and idiomatic expressions; all
within a single and unied framework, even in cases in which the individual monolingual
analyses are widely divergent. In addition, bilexical rules are, by their very nature, capable of expressing regularities in the bilingual lexicon, not only in the domain of spatial
relations, but also in other semantic elds.
A Cross-linguistically Valid Spatial Hierarchy
The spatial relations hierarchy is used to assign each preposition a relation in the form of
an additional index in its indexed lexeme. Based on this relation, many of the properties
of the preposition can be inferred. A variety of tests are used to motivate the spatial
248
hierarchy, thus bringing out the insights of previous theories of spatial meaning and representation. These tests are shown to have analogues in other languages and to lead to
similar hierarchies in the respective spatial domains. When deciding on the translation of
a preposition, the relation hierarchy is used to select appropriate TL equivalents; if this is
not possible, a lexical gap is present in either language and the appropriate bilexical entry
will have a set of lexemes on the side of the language with the gap.
Correspondence between the spatial hierarchies was good, but there were also important dierences in the spatial structure of the languages considered. However, these
dierences were adequately described using the hierarchy; for example, the lack of an
`on/in' distinction in Spanish extends to other prepositions such as `across' and `through',
for which no equivalent expressions exist. This is explained by the structure of these relations whose TFSs contain the values `on' and `in' respectively, thereby implying that a
similar distinction cannot be made in Spanish. These dierences have their root in the
lexicalized relations, i.e. `at', `in' and `on', whose interpretation is highly dependent on
the particular language and on the noun which they modify.
8.2 Principal Characteristics
A number of other characteristics and solutions either follow directly from, or are naturally
associated with these two main ideas.
Translation consists of three steps: 1) parsing, 2) transfer based solely on the bilexicon, and 3) bag generation. Since the transfer algorithm operates over structures very
similar to the surface form of a sentence, few problems arise with respect to the development and extension of linguistic analyses and semantic representations. As long as the
surface forms of dierent languages remain translationally equivalent, the bilingual lexicon will be able to establish a correspondence between source and target sentences. This
also implies that the semantic structures and mechanisms of dierent systems need not
necessarily coincide for translation to take place.
Almost no assumptions about the English grammar are made by the corresponding
Spanish grammar. In particular, the number of indices and their position within the
IL lists is quite distinct for translationally equivalent expressions, including PPs. Such
discrepancies are overcome in the bilexicon, where dissimilarities in the two languages are
reconciled by using index sharing within a bilexical entry. Instantiation of these indices
to constants ensures that during generation only values explicitly shared during analysis
or transfer are unied.
Some of the novelties of the Spanish grammar developed in Chapter 3 are: a treatment
of relative clauses in which subject and object relative clauses are handled uniformly; a
deterministic implementation of clitic climbing and doubling which accounts for all and
only the relevant data; a renement of previous approaches to subject `pro-drop' which
encodes the subject's -role as a separate feature in the lexical entry of the verb. The
rules proposed to implement this grammar are summarized in Figure 3.5, while the data
covered by those rules is shown in Table 3.1.
Having an additional index as the referent of a preposition leads to a compositional
derivation of the IL representation of stacked prepositions, simplies the analysis of prepo249
sitional modiers and acts as an index to which spatial pronouns such as `there' may be
bound. The extra index is also motivated on monolingual grounds, principally because its
structure as a set of space-time pairs simplies the interpretation of sentences containing
spatio-temporal expressions (e.g. the dynamic relations).
Locative types, whose value is a lexicalized relation, imply a classication of nouns
which is directly related to the data and not to notions such as idealizations and conceptualizations, for which empirical consequences are dicult to assess. Thus, the relatively high
co-occurrence of `in' and `armchair' suggests `in' as the locative type for this noun. Certain
nouns such as `bus' were classied as vague in the sense of Copestake and Briscoe (forthcoming); this vagueness was treated as underspecication of its locative type. The fact
that British English prefers `in the bus' to `on the bus' was considered a matter of style
which, it was suggested, is best treated at the application specic level. This vagueness,
in conjunction with the distinction between locative type reading and literal reading of
`on the bus' accounts for the three aspects of the `bus/car' problem.
Exploiting the notions of adjacency derived brom Brew (1992), and of connectivity
and reachability of ILs led to considerable improvements in the heuristics for pruning
the search space of the generator. Areas such as adjectival and adverbial modication,
for which Brew's algorithm was not general enough, were covered. In applying Brew's
technique to phrase structure grammars, an algorithm which operated over CF rules was
extended to handle feature-based grammars. The distinguishing features of this adaptation
are the use of pairs of TFSs to encode the value of the functions FIRST and FOLLOW,
and the use of an addition operation which is sensitive to subsumption relations.
Disambiguation of spatial prepositions took place during generation using TL ltering. This increased the modularity of the system by separating monolingual from bilingual
information. Furthermore, it was shown how information from dierent sources was necessary for selecting appropriate preposition translations. However, it was also noticed that
in the absence of contextual and other types of information, certain ambiguities could not
be resolved, even by human translators.
An evaluation of the system's scalability and applicability to real data was performed.
One preliminary conclusion was that TL quality is good, but that accuracy suered after
scaling because of the impoverished knowledge structures. Expansion of the system by
enlarging the grammars, lexicons, idioms database and semantic classication of content
words should lead to increased performance.
8.3 Objections
There are a number of objections and issues that may be raised regarding this work and
which may have remained unanswered in the rest of the thesis. I will consider some of the
most important ones.
Lack of Interpretation for ILs
The IL framework does not have an interpretation procedure for the order of ILs.
The main answer to this is that the IL representation was deliberately kept as theory
neutral as possible; by preserving the surface order of a sentence and the lexical entries
250
that it contains no information from the input is lost, and hence any semantic theory can
be invoked in order to construct an interpretation. Thus, an interpretation mechanism may
discard ILs in the input such as relativizers and case markers, but since all input lexemes
are available in an IL list, interpretation will be possible. In other words, a semantic
interpretation of a sentence can be extracted from an IL list, whereas the converse is not
always the case. That an interpretation can be specied follows from the assumption
that a semantic theory will have a lexicon from which semantic interpretations are built
through a composition procedure based on the order of the input lexemes. The main
consequence of adopting a fully specied interpretation procedure would be a change in
the index allocation of lexemes, as dierent theories may assume dierent argument and
predicate structures to that implicit in the analyses given in this thesis.
Semantic interpretation should increase the quality of the translations produced with
the IL lists alone, but the need for interpretation may be skewed should this prove adequate
for a particular application or language pair. For example, it may be the case that Human
Assisted Machine Translation applications can forgo semantic inference, or that CatalanSpanish translation may proceed eciently without formal semantic interpretation.
No Independently Motivated Argument Structure
The argument structures on which indexes are assigned are motivated monolingually, causing mismatches at the level of the bilexicon.
While independently motivated argument structures are clearly a desirable objective, in
the IL framework I have not assumed that this can be achieved for all predicates. In Section 6.4.7 I gave an example of how subcategorization was dicult to determine accurately.
Furthermore, languages dier as to their arguments for verbs with equivalent meaning:
Eng: I broke your arm.
Spa: Te romp el brazo.
Lit: I broke you the arm.
Here, the English verb has two arguments, while the Spanish verb has three. Under such
circumstances, modularity is best preserved if argument structure is motivated monolingually, and if dierences in argument structure are overcome in the bilexicon. The
appropriate bilexical entry for the previous example might look like this:
brakee;x;y ( Posz;y ) , rompere;x;z;y ( Proz )
where `Pos' matches a possessive pronoun, while `Pro' matches a pronoun (see Section 7.5
for explanation of this type of pattern). Similar patterns can be used to overcome other
dierences in the argument structure of predicates in SL and TL.
Insucient Cross-linguistic Generalizations
The IL list representation does not express cross-linguistic generalizations, and the spatial
relations hierarchy has not been applied to a sucient number of languages.
The former criticism is only valid because the thesis has only considered the spatial domain, and has ignored other phenomena such as negation, tense and aspect, animacy
and causation all of which may reveal further cross-linguistic generalizations between languages. Such generalizations could be captured through the mechanisms of bilexical rules
251
or translation patterns, and would result in further reductions in the size of the bilexicon
(see Section 2.4 for an example of a rule that captures a transfer generalization involving
English causatives). This approach has the advantage over Interlingua systems of allowing gradual investigation of systematic lexical dierences between languages by identifying
those generalizations which are useful to translation.
As far as the spatial hierarchy is concerned, its applicability to more languages is
certainly an important issue that needs to be addressed. Unfortunately, considering more
languages would have required a disproportionate amount of eort given the present limitations. Nevertheless, the application of the hierarchy to Hungarian showed that the spatial
system of languages outside the Indo-European family could be successfully understood
through the hierarchy.
Multi-predicate Lexemes
How are lexemes with more than one predicate handled in the IL framework?
In response to this criticism I will consider the analysis of `stab' below adapted from
Dorr (1994:614):
cause(X,G) & go(G,K,T) & knife-wound(K) & toward(T,K,Y)
In this expression, X and Y are the subject and object respectively of `stab'. The analysis
is used in an interlingua MT system and it is intended to capture useful generalizations
about the semantics of this verb. This type of analysis can be accommodated in the
IL approach by including in the bilexicon the lexical entry for `stab' with its indices for
subject and object only, and its multi-lexeme Spanish translation:
stabx;y , ley , darx;k;y , pu~naladask , ay
Under this approach, the internal structure of `stab' would be eectively ignored and
translation would depend purely on lexical relationships. The consequence of this is that
some generality is lost because the lexeme pu~naladas (knife-wounds) has to be specied
explicitly in the translation of `stab', rather than originating independently through the
`knife-wound' predicate in the internal representation. One way of avoiding this redundancy would be to allow the internal predicate `knife-wound' to give rise to the Spanish
lexeme pu~naladas and vice versa; however, this would complicate the transfer algorithm
and the specication of bilingual correspondences. Another approach would be to eect all
transfer at the level of internal predicates using a representation similar to that described
by Copestake et al. (1995). These two options would need to be investigated.
8.4 Future Research
Although improvements were achieved by using connectivity checks, further work is required to alleviate ineciencies which occur during generation. An important problem
arises with modiers, such as PPs and relative clauses, because of their unrestricted
adjacency values which give rise to large numbers of adjacency edges in the constraint
propagation graph. It was suggested that this problem may be tackled by exploiting the
connectedness requirement in IL lists in order to detect many impossible subparses early.
252
However, in order to make full use of this property, some form of precompilation would
be necessary in order to determine connected relationships between phrasal and lexical
entries.
Ways of incorporating the results of automatically acquired bilingual terminology and
phraseology need to be investigated. Using IL representations should ease the incorporation of these results into a rule based system, because of the close correspondence between the bags in the bilexicon and the type of bilingual data obtained with corpus based
techniques. Semi-automatic incorporation could be achieved by automatically proposing
source and target bags and then manually assigning them a bilexical entry type in which
the relevant index bindings were made. This approach need not require the semantic
structure of the acquired pair of strings to be made explicit. Instead, the two strings
need only be substitutionally equivalent in the sense that replacing one by the other in
the TL should result in a grammatical sentence that preserves translationally relevant
meaning (e.g. literal meaning, register, subject eld, pragmatic and logical entailments,
connotations, frequency, naturalness, etc.)
The modularity of the IL approach should be tested by developing grammars for languages other than English and Spanish, using the most relevant syntactic or semantic
theories, and then incorporating the language into the system by writing the appropriate
bilexicons. In addition, the description of the language should involve an analysis of its
spatial system in order to verify its resemblance to the spatial hierarchy constructed here.
A longer term goal would be to elaborate the classication into a theory by postulating a
model (i.e. some form of mathematical structure denotation) which accounts for all and
only the data that arises from the various tests suggested for each spatial relation.
Selectional restrictions were used to eect TL disambiguation. However, a number
of contemporary approaches to this problem use statistical techniques based on the use
of corpora. As I have argued in this thesis, certain cases of ambiguity do not appear
amenable to a knowledge based solution; for example, selection between `on/in the bus'
and `on/in the road' are good candidates for statistical disambiguation within a KB system.
On the other hand, a well-motivated model of spatial knowledge will supply the kind of
initial structure which current statistical techniques cannot arrive at, but which they could
further rene in order to adapt a system to a particular application or sublanguage.
A number of further lines of inquiry were identied in Sections 7.6 and 7.7. However,
those proposals need to be tried and tested on actually occurring data and real translation
problems before their feasibility can be determined. For example, one should identify
the variety of ellipsis resolution problems described in the literature (Crouch 1995) and
then consider whether these could only be resolved within a complete semantic theory, or
whether an adequate solution could be implemented based on the IL framework.
253
Appendix A
Sentences for Development
0
the passing broke_down inside their own half
1
I went silently along the passage
2
it lives under the surface
3
we got near someone
4
they lived at Moleigh near Oban
5
they crowd into congested areas
6
the anode and cathode were connected across a resistor
7
the men tear across the sand
8
he took the hat off the chair
9
she slipped off her shoes
10 the bombers roared over Katanga
11 she got into the ricksha
12 he gets_in through an open door
13 the planes fly over parts of Europe
14 I felt inside his jacket
15 someone slipped into Lord_Moynihan 1s box
16 he flung his arm possessively across her
17 time existed only inside the thing carried
18 she pulled them over her ankles
19 the Church marches through our secular world
20 that gay picnic at The_Bell at Edmonton
21 the event was held at the Scotch_Corner_Hotel
22 large numbers of geese assembled at the head of the great estuary
23 she is now studying history at Cambridge
24 the debutantes 1s ball at the Palace_of_Versailles
25 a thin trickle of moisture at the corner of Jesty 1s lips
26 she is at the clinic
27 the contingent in the Panama region
28 it moved too rapidly in the south
29 it is found in different parts of the country
30 fatal accidents in the home rose
31 communions in South_America
32 he invites her for a drink in a Knightsbridge pub
33 it is awfully hard to walk in the water
254
34 a businessman in the North_of_England
35 a frame of certain length on each side
36 solid acetylene on oxygen evaporator tubes
37 it swung quietly on sleek well-oiled hinges
38 she wrote it on a slip of paper
39 his teeth were kicked_in by the dancers on that spot
40 I went_out on a ship
41 expensive site subsidies on ordinary land in certain areas
42 lizards are roasted on the point of a spear
43 an order form is on that page
255
Appendix B
Questionnaire for Sentence
Construction
In order to test my English-Spanish machine translation system, I would
be most grateful if you could construct 5 to 10 sentences, not too long,
using at least ONE PREPOSITION and other words from the lists below. If
possible, the sentence should indicate the position or locations of
someone or something.
Example 1: they lived at Moleigh near Oban
Example 2: he took the hat off the chair
****************************************************************************
****************************************************************************
NOUNS
=====
accident acetylene ankle anode area arm, ball (dance), bomber boy box
businessman Cambridge cat cathode chair Church clinic communion
contingent corner country crowd dancer debutantes dog door drink
Edmonton estuary evaporator event Europe frame goose half hat head
hinge history home jacket Jesty John Katanga Knightsbridge land length
lip lizard Lord_Moynihan man Mary Moleigh North_of_England number
Oban, order (i.e. order form), oxygen page Palace_of_Versailles Panama
paper park part passage passing picnic plane point pub region resistor
ricksha sand Scotch_Corner_Hotel ship side site shoe, slip (or paper),
someone south South_America spear spot station subsidy surface
The_Bell thing time tooth trickle tube water woman world
VERBS
=====
arrive assemble are were was is break break_down carry chase connect
256
exist feel find fling fly get get_in give go go_out hold invite
kick_in live love march move pull rise roar roast sleep slip study
swing take tear think walk write
ADJECTIVES
==========
big brown certain congested different expensive fat fatal fierce, gay
(happy), great happy hard large little moisture noisy open order
ordinary our own oxygen page Palace_of_Versailles open ordinary own
secular sleek solid tame thin well-oiled yellow
ADVERBS
=======
awfully just now obviously only possessively quietly rapidly silently
too
ARTICLES AND OTHER WORDS
========================
a and each our, 's (e.g. John's cat), that the their
PRONOUNS
========
he her his I it she they them we
PREPOSITIONS
============
above across along among at behind below beneath beside by by from for
in in_front_of inside near next_to of on on_top_of over through to
towards under within
257
Appendix C
Questionnaire for Assessing
Translation Quality
In order to evaluate the quality of the output of my English-Spanish
translation system, I would be most greatful if you could score each of the
29 sentences below for intelligibility, as described by each point on the
scale below: 1 = good, 5 = bad.
Intelligibility:
================
1 - The meaning of the sentence is clear, and there are no questions. Grammar,
word usage, and/or style are all appropriate, and no rewriting is needed.
2 - The meaning of the sentence is clear, but there are some problems in
grammar, word usage, and/or style, making the overall quality less than 1.
3 - The basic thrust of the sentence is clear, but you are not sure of some
detailed parts because of grammar and word usage problems. You would
need to look at the original English sentence to clarify the meaning.
4 - The sentence contains many grammatical and word usage problems, and
you can only guess at the meaning after careful study, if at all.
5 - The sentence cannot be understood at all.
-----------------------------------------------------------------------------la mujer lustrosa de surame1rica amo1 oxi1geno:
el agua camino1 debajo_de la arena:
la cabeza de lagarto fue lubricado:
una humedad peque~a es terriblemente ordinaria:
258
el barco lubricado en la superficie de el agua estuvo junto_a el
scotch_corner_hotel:
el bailari1n marcho1 en el tiempo a la muchedumbre:
las debutantes lubricada llegaron de surame1rica:
lord_moynihan persiguio1 mari1a a_trave1s_de el norte_de_inglaterra:
juan volo1 a panama en surame1rica:
somos diferente a cada lado de europa:
la convide1 a mi hogar en knightsbridge:
lord_moynihan volo1 a panama en un bombardero ruidoso:
dormi1 en un compartimiento detra1s_de el scotch_corner_hotel:
estuvo en cambridge:
estuvo debajo_de la chaqueta de juan:
volaron a_trave1s_de el estuario:
mari1a rompio1 su brazo cerca_de la cli1nica:
el ganso feroz persiguio1 el gato gordo a el otro lado de el parque:
una muchedumbre de debutantes camino1 hacia el palacio_de_versalles:
vive en un bar detra1s_de el scotch_corner_hotel:
la cosa se movio1 rapidamente debajo_de la superficie de el agua:
encontro1 un gran lagarto amarillo dentro_de el tubo:
su chaqueta de bombardero estuvo sobre la silla:
juan estuvo detra1s_de la puerta:
estuvieron junto_a el a1rea de merienda:
juan estuvo entre la muchedumbre:
el bailari1n estuvo frente_a la iglesia:
el gato marro1n grande estuvo debajo_de la silla:
259
el perro feroz estuvo a el otro lado de el pasaje:
*******************************************************************************
*******************************************************************************
Now, please consider each of the 29 English sentences below, and score
its machine translated equivalent (the same as above) for accuracy, as
described by each point on the scale below: 0 = good, 6 = bad.
Accuracy:
=========
0 - The content of the English sentence is faithfully conveyed to the Spanish
sentence. The translated sentence is clear to a native speaker of Spanish
and no rewriting is needed.
1 - The content of the English sentence is faithfully conveyed to the Spanish
sentence, and can be clearly understood by a native speaker, but some
rewriting is needed.
2 - The content of the English sentence is faithfully conveyed in the Spanish
sentence, but some changes are needed in word order.
3 - While the content of the English sentence is generally conveyed faithfully
in the Spanish sentence, there are some problems with things like
relationships between phrases and expressions, and with tense, plurals,
and the position of the adverbs. There is some duplication of nouns in the
sentence.
4 - The content of the English sentence is not adequately conveyed in the
Spanish sentence. Some expressions are missing, and there are problems
with the relationships between clauses, between phrases and clauses, or
between sentence elements.
5 - The content of the Enslish sentence is not conveyed in the Spanish
sentence.
6 - The content of the input sentence is not conveyed at all. The output is not
a proper sentence; subjects and predicates are missing.
-----------------------------------------------------------------------------The sleek woman from South_America loved oxygen
la mujer lustrosa de surame1rica amo1 oxi1geno:
The water went beneath the sand
el agua camino1 debajo_de la arena:
260
The lizard head was well-oiled
la cabeza de lagarto fue lubricado:
A little moisture is awfully ordinary
una humedad peque~a es terriblemente ordinaria:
The well-oiled ship on the surface of the water was next_to the
Scotch_Corner_Hotel
el barco lubricado en la superficie de el agua estuvo junto_a el
scotch_corner_hotel:
The dancer marched in time to the crowd
el bailari1n marcho1 en el tiempo a la muchedumbre:
The well-oiled debutantes arrived from South_America
las debutantes lubricada llegaron de surame1rica:
Lord_Moynihan chased Mary across the North_of_England
lord_moynihan persiguio1 mari1a a_trave1s_de el norte_de_inglaterra:
John flew to Panama in South_America
juan volo1 a panama en surame1rica:
We are different across Europe
somos diferente a cada lado de europa:
I invited her to my home in Knightsbridge
la convide1 a mi hogar en knightsbridge:
Lord_Moynihan flew to Panama in a noisy bomber
lord_moynihan volo1 a panama en un bombardero ruidoso:
I slept in a box behind the Scotch_Corner_Hotel
dormi1 en un compartimiento detra1s_de el scotch_corner_hotel:
She was in Cambridge
estuvo en cambridge:
It was under John 1s jacket
estuvo debajo_de la chaqueta de juan:
They flew across the estuary
volaron a_trave1s_de el estuario:
Mary broke her arm near the clinic
mari1a rompio1 su brazo cerca_de la cli1nica:
261
The fierce goose chased the fat cat through the park
el ganso feroz persiguio1 el gato gordo a el otro lado de el parque:
A crowd of debutantes walked towards the Palace_of_Versailles
una muchedumbre de debutantes camino1 hacia el palacio_de_versalles:
She lives in a pub behind the Scotch_Corner_Hotel
vive en un bar detra1s_de el scotch_corner_hotel:
The thing moved rapidly beneath the water 1s surface
la cosa se movio1 rapidamente debajo_de la superficie de el agua:
He found a large yellow lizard inside the tube
encontro1 un gran lagarto amarillo dentro_de el tubo:
His bomber jacket was on_top_of the chair
su chaqueta de bombardero estuvo sobre la silla:
John was behind the door
juan estuvo detra1s_de la puerta:
They were beside the picnic area
estuvieron junto_a el a1rea de merienda:
John was among the crowd
juan estuvo entre la muchedumbre:
The dancer was in_front_of the church
el bailari1n estuvo frente_a la iglesia:
The big brown cat was under the chair
el gato marro1n grande estuvo debajo_de la silla:
The fierce dog was along the passage
el perro feroz estuvo a el otro lado de el pasaje:
262
Appendix D
Unanalysed Sentences (for Testing
Scalability)
I feel different on_top_of the church
It takes oxygen to roast someone in time
History is the hinge of time on this world
Obviously the dancer flew behind the noisy tube of water
I was invited to Panama by Lord_Moynihan
Mary took the water from John to drink it
Go to the pub for a picnic
I drink on the plane only to sleep
Mary flew to Panama by accident
I quietly slipped in among the debutantes
South_America is just south of Panama
We took communion at the church in Cambridge
John and Mary went inside Cambridge station
263
Appendix E
Testing Scaled System:
Input/Output
in: across the region the crowd march in time
out: NIL
in: the Palace_of_Versaille rose above the plane
out: NIL
in: I fly rapidly down towards the land 1s surface
out: NIL
in: each page invites someone to write on_top_of the paper
out: NIL
in: a woman lives in a man 1s world
out:
una mujer vive en el mundo de un hombre
in: too little too late is only the passing of time
out: NIL
in: the North_of_England is only an area of land
out:
so1lo el norte de Inglaterra es un a1rea de tierra
in: the water rapidly rose above the surface of the plane
out:
el agua aumento1 rapidamente encima de la superficie del avio1n
in: man exists only for a certain thing
out: NIL
in: oxygen and acetylene roast little dogs rapidly for the picnic
out:
oxi1geno y el acetileno asan perros peque~os rapidamente por la merienda
in: the businessman arrived in Cambridge from the North_of_England
out:
el hombre de negocios llego1 en Cambridge del norte de Inglaterra
264
in: the dog chased the cat in_front_of the church
out:
el perro persiguio1 el gato por delante de la iglesia
in: John 1s cat loves to sleep on_top_of the box
out: NIL
in: john found his jacket under the chair
out:
Juan encontro1 su chaqueta debajo de la silla
in: Mary walked towards the church
out:
Mari1a camino1 hacia la iglesia
in: the crowd marched towards the Palace_of_Versailles
out:
la muchedumbre marcho1 hacia el palacio de Versalles
in: the plane flew from Knightsbridge to Panama
out:
el avio1n volo1 de Knightsbridge hasta Panama1
in: the woman 1s shoe slipped under the water
out:
el zapato de la mujer se metio1 debajo del agua
in: the clinic is beside the church
out:
la cli1nica esta1 junto a la iglesia
in: the dancer moved among the crowd
out:
el bailari1n se movio1 entre la muchedumbre
265
Bibliography
Agnas, M.-S., Alshawi, H., Bretan, I., Carter, D. M., Ceder, K., Collins, M., Crouch, R., Digalakis, V.,
Ekholm, B., Gamback, B., Kaja, J., Karlgren, J., Lyberg, B., Price, P., Pulman, S., Samuelsson, M.
R. C., and Svensson, T. (1994). Spoken language translator: First year report. Technical report,
SRI/SICS, Menlo Park, CA, Cambridge, UK and Stockholm, Sweden.
Aho, A. V., Sethi, R., and Ullman, J. D. (1986). Compilers - Principles , Techniques, and Tools. Addison
Wesley, Reading, MA.
Allegranza, V., Bennett, P., Durand, J., van Eynde, F., Humphreys, L., Schmidt, P., and Steiner, E.
(1991). Linguistics for machine translation: The Eurotra linguistic specications. In Copeland
et al. (1991), pp. 15{125.
Alshawi, H., ed. (1992). The Core Language Engine. MIT Press, Cambridge, MA.
Alshawi, H. (1994). Qualitative and quantitative models of speech translation. In ACL Workshop `The
Balancing Act - Combining Symbolic and Statistical Approaches to Language, Las Cruces, NM. cmplg/9408014.
Alshawi, H., Carter, D., Gamback, B., and Rayner, M. (1992). Swedish-English QLF translation. In
Alshawi (1992), ch. 14, pp. 277{309.
Aone, C. and McKee, D. (1993). A language-independent anaphora resolution system for understanding
multilingual texts. In Proceedings of the 31st Annual Meeting of the Association for Computational
Linguistics, pp. 156{63, Columbus, Ohio.
Appelo, L., Fellinger, C., and Landsbergen, J. (1987). Subgrammars, rule classes and control in the
Rosetta translation system. In Proceedings of the Third European Conference of the ACL, pp. 118{
33, Copenhagen, Denmark.
Apte, C., Damerau, F., and Weiss, S. M. (1994). Towards language independent automated learning of
text categorization models. In Proceedings of the 17th Annual International ACM-SIGIR Conference
on Research and Development in Information Retrieval, pp. 23{30, Dublin, Ireland. Springer.
Arnold, D., Balkan, L., Humphreys, R. L., Meijer, S., and Sadler, L. (1994). Machine Translation: An
Introductory Guide. NCC and Oxford Blackwell, Oxford.
Arnold, D. and des Tombe, L. (1987). Basic theory and methodology in Eurotra. In Nirenburg (1987),
ch. 7, pp. 114{35.
Arnold, D., Moat, D., Sadler, L., and Way, A. (1993a). Automatic test suite generation. Machine
Translation, 8 pp. 29{38. Special Issue on Evaluation.
Arnold, D., Sadler, L., and Humphreys, L. (1993b). Evaluation: An assessment. Machine Translation, 8
pp. 1{24. Special Issue on Evaluation.
Arnold, D. J., Krauwer, S., Rosner, M., des Tombe, L., and Varile, G. B. (1986). The <C, A>, T
framework in Eurotra: A theoretically committed notation for MT. In Proceedings of COLING '86,
pp. 297{303, Bonn, Germany.
Arthur, C. and Ginever, I. (1909). Hungarian Grammar. Kegan Paul, Trench and Trubner, London.
Asher, N. (1993). Reference to Abstract Objects in Discourse. Kluwer Academic Publishers, Dordrecht,
The Netherlands.
266
Asher, N. and Sablayrolles, P. (forthcoming). A typology and discourse semantics for motion verbs and
spatial PPs in French. Journal of Semantics.
Aske, J. (1989). Path predicates in English and Spanish: A closer look. In Proceedings of the Annual
Meeting of the Berkeley Linguistics Society 15th, pp. 1{14. Bekeley Linguistics Society, Berkeley,
CA.
Badia, T., Durand, J., and Reuther, U. (1990). Modication and semantic relations - legislation. In
The Eurotra Reference Manual, pp. 1{28. Commission of the European Communities, Luxembourg.
Section B.I.3.7.a.
Balari, S. (1991). Information-based linguistics and head-driven phrase structure. In Filgueiras, M.,
Damas, L., Moreira, N., and Tomas, A. P., eds., Natural Language Processing, Lecture Notes in
Computer Science, pp. 55{102. Springer, Berlin. EAIA-90, 2nd Advanced School in Articial Intelligence.
Balari, S. (1992). Sujetos nulos en HPSG. In Vide, C. M., ed., Proceedings of the VII Congreso de
Lenguajes Naturales y Lenguajes Formales, pp. 279{86, Barcelona, Spain.
Bar-Hillel, Y., Gaifman, C., and Shamir, E. (1960). On categorial and phrase structure grammars. Bulletin
of the Research Council of Israel, Section F: Mathematics and Physics, 9F(1) pp. 1{16. Continued
as Israel Journal of Mathematics.
Barwise, J. and Perry, J. (1983). Situations and Attitudes. A Bradford Book, The MIT Press, Cambridge,
MA.
Beaven, J. L. (1990). A unication based treatment of Spanish clitics. In Engdahl, E., Reape, M., Mellor,
M., and Cooper, R., eds., Parametric Variation in Germanic and Romance, vol. 6 of Edinburgh
Working Papers in Cognitive Science, pp. 43{64. University of Edinburgh, Centre for Cognitive
Science.
Beaven, J. L. (1992a). Lexicalist Unication Based Machine Translation. PhD thesis, Department of
Articial Intelligence, University of Edinburgh, Edinburgh, UK.
Beaven, J. L. (1992b). Shake-and-Bake machine translation. In Proceedings of COLING '92, pp. 602{09,
Nantes, France.
Beaven, J. L. and Whitelock, P. (1988). Machine translation using isomorphic UCGs. In Proceedings of
COLING '88, pp. 32{35, Budapest, Hungary.
Bech, A. and Nygaard, A. (1988). The E-Framework: A formalism for natural language processing. In
Proceedings of COLING '88, pp. 36{39, Budapest, Hungary.
Bennett, D. C. (1975). Spatial and Temporal Uses of English Prepositions - An Essay in Straticational
Semantics. Library of Linguistics. Longman, London.
Bennett, M. (1976). A variation and extension of a Montague fragment of English. In Partee (1976a), pp.
119{63.
Bennett, W. S. and Slocum, J. (1988). The LRC machine translation system. In Slocum (1988), pp.
111{40.
Black, E. (1993). Parsing English by computer: The state of the art. In Proceedings of the International
Symposium on Spoken Dialogue, pp. 77{81, Kyoto, Japan.
Bloom, P., Peterson, M., Nadel, L., and Garrett, M., eds. (forthcoming). Language and Space. MIT Press,
Cambridge, MA.
Borsley, R. D. (1987). Subjects and complements in HPSG. Technical Report CSLI-87-107, Center for
the Study of Language and Information, Stanford, CA.
Bowerman, M. (1989). Learning a semantic system - what role do cognitive predispositions play. In Rice
and Schiefelbusch (1989), ch. 4, pp. 133{69.
267
Bowerman, M. (forthcoming). Learning how to structure space for language { a crosslinguistic perspective.
In Bloom et al. (forthcoming).
Bresnan, J., ed. (1982). The Mental Representation of Grammatical Relations. MIT Press Series on
Cognitive Theory and Mental Representations. The MIT Press, Cambridge, MA.
Brew, C. (1992). Letting the cat out of the bag: Generation for Shake-and-Bake MT. In Proceedings of
COLING '92, pp. 610{16, Nantes, France.
Brill, E. (1993). Automatic grammar induction and parsing free text: A transformation based approach.
In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pp.
259{65, Columbus, Ohio.
Briscoe, E., Copestake, A., and Boguraev, B. (1990). Enjoy the paper: Lexical semantics via lexicology.
In Proceedings of COLING '90, pp. 42{47, Helsinki, Finland.
Briscoe, E., Copestake, A., and de Paiva, V., eds. (1993). Inheritance, Defaults and the Lexicon. Cambridge
University Press, Cambridge, UK.
Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Laerty, J. D., Mercer, R. L., and
Roossin, P. S. (1990). A statistical approach to machine translation. Computational Linguistics,
16(2) pp. 79{85.
Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., Laerty, J. D., and Mercer, R. L. (1992). Analysis,
statistical transfer, and synthesis in machine translation. In Proceedings of the Fourth TMI, pp.
83{100, Montreal, Canada.
Brown, P. F., Pietra, S. A. D., Pietra, V. J. D., and Mercer, R. L. (1993). The mathematics of statistical
machine translation. Computational Linguistics, 19(2) pp. 263{312.
Butt, J. and Benjamin, C. (1994). A New Reference Grammar of Modern Spanish. Edward Arnold,
London.
Calder, J., Reape, M., and Zeevat, H. (1989). An algorithm for generation in unication categorial
grammar. In Proceedings of the Fourth European Conference of the ACL, pp. 233{40, Manchester,
England.
Calvo-Perez, J. (1991). El problema no resuelto de a + objeto directo en espa~nol. Espa~nol Actual, 56 pp.
5{21.
Carbonell, J. G., Mitamura, T., and Nyberg, 3rd, E. H. (1992). The KANT perspective: A critique of pure
transfer (and pure interlingua, pure statistics, ...). In Proceedings of the Fourth TMI, pp. 225{35,
Montreal, Canada.
Carpenter, B. (1991). The generative power of categorial grammars and head-driven phrase structure
grammars with lexical rules. Computational Linguistics, 17(3) pp. 301{13.
Carpenter, R. (1992). The Logic of Typed Feature Structures. Tracts in Theoretical Computer Science.
Cambridge University Press, Cambridge, UK.
Carter, D. (1987). Interpreting Anaphors in Natural Language Texts. Series in Articial Intelligence. Ellis
Horwood, Chichester, UK.
Castaneda, H. (1967). Comments on D. Davidson's \The logical form of action sentences". In
Rescher (1967), ch. III.B, pp. 104{12.
Castel, V. M. (1990). La \inversion" sujeto-verbo en espa~nol y el teorema de [slash]. Lingustica Espa~nola
Actual, XII(1) pp. 45{60.
Chierchia, G., Partee, B. H., and Turner, R., eds. (1989). Properties, Types and Meaning, vol. II - Semantic
Issues of Studies in Linguistics and Philosophy. Kluwer, Dordrecht, The Netherlands.
Chomsky, N. (1981). Lectures on Government and Binding. Foris Publications, Dordrecht, The Netherlands.
268
Copeland, C., Durand, J., Krauwer, S., and Maegaard, B., eds. (1991). The Eurotra Linguistic Specication. Studies in Machine Translation and Natural Language Processing. Commission of the European
Community, Luxembourg.
Copestake, A. (1993a). Constraints, tlinks and MT. Technical Report 3.1, ESPRIT BRA-7315 ACQUILEX
II Working Paper, Publishing Division, Cambridge University Press, UK.
Copestake, A. (1993b). Defaults in lexical representation. In Briscoe et al. (1993), ch. 12, pp. 223{45.
Copestake, A. and Briscoe, E. J. (forthcoming). Semi-productive polysemy and sense extension. Journal
of Semantics.
Copestake, A., Flickinger, D., Malouf, R., Riehemann, S., and Sag, I. (1995). Translation using minimal
recursion semantics. CSLI ms.
Copestake, A., Jones, B., Sanlippo, A., Rodriguez, H., Vossen, P., Montemagni, S., and Marinai, E.
(1992). Multilingual lexical representation. Technical Report 043, ESPRIT BRA-3030 ACQUILEX
Working Paper, Commission of the European Communities, Brussels.
Copestake, A. and Sanlippo, A. (1993). Multilingual lexical representation. In Dorr, B., ed., Building
Lexicons for Machine Translation, Proceedings of the AAAI Spring Symposium, Stanford, CA.
Copestake, A., Sanlippo, A., Briscoe, E., and de Paiva, V. (1993). The ACQUILEX LKB: An introduction. In Briscoe et al. (1993), ch. 9, pp. 148{63.
Cresswell, M. J. (1985). Adverbial Modication - Interval Semantics and Its Rivals, vol. 28 of Studies
in Linguistics and Philosophy, ch. IV: Prepositions and Points of View, pp. 97{141. D. Reidel,
Dordrecht, Holland. Reprinted from Linguistics and Philosophy Vol. 2 (1978) pp. 1-41.
Crookston, I. (1990). The E-Framework: Emerging problems. In Proceedings of COLING '90, vol. 2, pp.
66{71, Helsinki, Finland.
Crouch, R. (1995). Ellipsis and quantication: A substitutional approach. In Proceedings of 7th Conference
of the European Chapter of the Association for Computational Linguistics, pp. 229{36, Dublin,
Ireland.
Dahl, V. (1981). Translating Spanish into logic through logic. Computational Linguistics, 7(3) pp. 149{64.
Dalrymple, M., Shieber, S. M., and Pereira, F. C. N. (1991). Ellipsis and higher-order unication. Linguistics and Philosophy, 4(4) pp. 399{452.
Danlos, L. and Samvelian, P. (1992). Translation of the predicative element of a sentence: category
switching, aspect and diathesis. In Proceedings of the Fourth TMI, pp. 21{34, Montreal, Canada.
Davidson, D. (1967). The logical form of action sentences. In Rescher (1967), ch. III, pp. 81{95.
Davidson, D. (1984). Inquiries into Truth and Interpretation, ch. 7 - On Saying That, pp. 93{108.
Clarendon Press, Oxford. Republished from Synthese 19:130-46 (1968-9).
de Carlos, T. and Pountain, C. J. (1993). Locative postnominal phrase modiers in English and Spanish.
Annual Conference of the Association of Hispanists of Great Britain and Ireland, Liverpool, UK.
de Kock, J. (1992). Corpus y norma academica: A con regimen directo. Lingustica Espa~nola Actual,
XIV(1) pp. 69{95.
de Paiva, V. (1993). Types and Constraints in the LKB. In Briscoe et al. (1993), ch. 10, pp. 164{89.
de Saussure, F. (1916). Cours de linguistique general. Payot, Lausanne and Paris. reprinted as Course in
General Linguistics, Glasgow, Fontana/Collins (1974).
Devlin, K. (1991). Logic and Information. Cambridge University Press, Cambridge, UK.
Dorr, B. J. (1992). The use of lexical semantics in interlingual machine translation. Machine Translation,
7(3) pp. 135{93.
Dorr, B. J. (1994). Machine translation divergences: A formal description and proposed solution. Computational Linguistics, 20(4) pp. 597{633.
269
Dowty, D. R. (1979). Word Meaning and Montague Grammar: The Semantics of Verbs and Times in
Generative Semantics and in Montague's PTQ, vol. 7 of Synthese Language Library. D. Reidel,
Dordrecht, The Netherlands.
Dowty, D. R. (1989). On the semantic content of the notion of \thematic role". In Chierchia et al. (1989),
pp. 69{129.
Dowty, D. R., Wall, R., and Peters, P. S. (1981). Introduction to Montague Semantics. Reidel, Dordrecht,
The Netherlands.
Durand, J. (1992). On the translation of prepositions in multilingual MT. Working Papers in Language
and Linguistics 13, Department of Modern Languages, University of Salford, Salford, UK.
Durand, J., Bennett, P., Allegranza, V., van Eynde, F., Humphreys, L., Schmidt, P., and Steiner, E.
(1991). The Eurotra linguistic specications: An overview. Machine Translation, 6(2) pp. 103{47.
Earley, J. (1970). An ecient context-free parsing algorithm. Communications of the ACM, 14 pp. 453{
60. Reprinted in Readings in Natural Language Processing (B. J. Grosz, K. Sparck Jones and B. L.
Webber, eds.), pp. 25-33, Morgan Kaufmann, Los Altos, CA, 1986.
Eisele, A. and Dorre, J. (1988). Unication of disjunctive feature descriptions. In Proceedings of the 26th
Annual Conference of the ACL, pp. 286{94, Bualo, NY.
Emele, M., Heid, U., Momma, S., and Zajac, R. (1992). Interaction between linguistic constraints:
Procedural vs. declarative approaches. Machine Translation, 7(1-2) pp. 61{98.
Erd}os, J., Kozma, E., Prileszky, C., and Uhrman, G. (1990). Hungarian in Words and Pictures.
Tankonyvkiado, Budapest, Hungary.
Estival, D. (1990). ELU: an environment for machine translation. In Proceedings of COLING '90, vol. 3,
pp. 385{87, Helsinki, Finland.
Estival, D., Ballim, A., Russell, G., and Warwick, S. (1990). A syntax and semantics for feature-structure
transfer. In Proceedings of the 3rd TMI, Austin, Texas, US.
Farwell, D. (1992). In defense of rationalist approaches to MT research. In Proceedings of the Fourth
TMI, pp. 185{93, Montreal, Canada.
Fenstad, J. E., Halvorsen, P., Langholm, T., and van Benthem, J. (1987). Situations, Language and Logic.
D. Reidel, Dordrecht, Holland.
Fillmore, C. J. (1971). Some problems for case grammar. In O'Brien (1971), pp. 35{56.
Fontenelle, T., Adriaens, G., and de Braekeleer, G. (1994). The lexical unit in the Metal MT system.
Machine Translation, 9(1) pp. 1{20.
Frank, A. and Reyle, U. (1995). Principle based semantics for HPSG. In Proceedings of 7th Conference of
the European Chapter of the Association for Computational Linguistics, pp. 9{16, Dublin, Ireland.
Gamback, B., Alshawi, H., Carter, D., and Rayner, M. (1991). Measuring compositionality in transferbased machine translation systems. In Natural Language Processing Systems Evaluation Workshop,
Griss Air Force Base, NY 13441-5700. Rome Laboratory, Air Force Systems Command.
Garca-Pelayo, R. (1988). Larousse Gran Diccionario Espa~nol-Ingles English-Spanish. Larousse, Mexico
DF, Mexico.
Garey, M. R. and Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman, New York.
Gazdar, G., Klein, E., Pullum, G., and Sag, I. (1985). Generalised Phrase Structure Grammar. Blackwell,
Oxford, England.
Gazdar, G. and Mellish, C. (1989). Natural Language Processing in LISP: An Introduction to Computational Linguistics. Addison-Wesley, Wokingham, England.
270
Grimaud, M. (1988). Toponyms, prepositions and cognitive maps in English and French. Journal of the
American Society of Geolinguistics, 14 pp. 54{76.
Grishman, R. (1986). Computational Linguistics - An Introduction. Studies in Natural Language Processing. Cambridge University Press, Cambridge, UK.
Groenendijk, J., de Jongh, D., and Stokhof, M., eds. (1986). Studies in Discourse Representation Theory
and the Theory of Generalized Quantiers. Foris, Dordrecht.
Grover, C., Carroll, J., and Briscoe, T. (1993). The Alvey natural language tools grammar. Technical
Report 284, Computer Laboratory, University of Cambridge, UK.
Halvorsen, P.-H. (1983). Semantics for lexical-functional grammar. Linguistic Inquiry, 14(4) pp. 567{615.
Hand, M. (1993). Parataxis and parentheticals. Linguistics and Philosophy, 16(5) pp. 495{507.
Harrison, S. P. and Ellison, T. M. (1992). Restriction and termination in parsing with feature-theoretic
grammars. Computational Linguistics, 18(4) pp. 519{30.
Hawkins, B. W. (1988). The natural category MEDIUM: An alternative to selection restrictions and
similar constructs. In Rudzka-Ostyn (1988), pp. 231{70.
Hays, D. G. (1964). Dependency theory: A formalism and some observations. Language - Journal of the
Linguistic Society of America, 40(4) pp. 511{26.
Herskovits, A. (1986). Language and Spatial Cognition: An Interdisciplinary Study of the Prepositions in
English. Studies in Natural Language Processing. Cambridge University Press, Cambridge, UK.
Hickey, L. (1993). Sustantivo-preposicion-sustantivo: secuencia admisible o inadmisible en castellano.
Presented at the Annual Meeting of the Association of Contemporary Iberian Studies, Manchester,
UK.
Hirst, G. (1981). Anaphora in Natural Language Understanding. Number 119 in Lecture Notes in Computer Science. Springer, Berlin.
Hjelmslev, L. (1935). La Categorie des Cas. E tude de Grammaire Generale. Acta Jutlandica, VII(1).
Hjelmslev, L. (1978). La categora de los casos - Estudios de gramatica general. Number 279 in Biblioteca
romanica hispanica. Editorial Gredos, Madrid, Spain. Translated by F. Pi~nero Torre.
Hobbs, J. (1976). Pronoun resolution. Technical Report 76-1, City College, City University of New York.
Hutchins, W. J. (1986). Machine Translation - Past, Present and Future. Ellis Horwood, Chichester,
England.
Hutchins, W. J. and Somers, H. L. (1992). An Introduction to Machine Translation. Academic Press,
London.
Ikehara, S., Shirai, S., Yokoo, A., and Nakaiwa, H. (1991). Toward an MT system without pre-editing eects of new methods in ALT-J/E. In Proceedings MT Summit III, pp. 101{06, Washington DC.
Jackendo, R. (1973). The base rules for prepositional phrases. In Andersson, S. and Kiparsky, P., eds.,
A Festschrift for Morris Halle, pp. 345{56. Holt, Rinehart and Winston, New York.
Jackendo, R. (1977). X Syntax: A Study of Phrase Structure. Number 2 in Linguistic Inquiry Monographs. MIT Press, Cambridge, MA.
Jackendo, R. (1983). Semantics and Cognition. The MIT Press, Cambridge, MA.
Jackendo, R. S. (1990). Semantic Structures. Number 18 in Current Studies in Linguistics. MIT Press,
Cambridge, MA.
Jaeggli, O. A. (1986). Three issues in the theory of clitics: Case, doubled NPs, and extraction. In Borer,
H., ed., The Syntax of Pronominal Clitics - Syntax and Semantics, vol. 19. Academic Press, New
York.
271
Japkowicz, N. and Wiebe, J. M. (1991). A system for translating locative prepositions from English into
French. In Proceedings of the 29th Annual Conference of the ACL, pp. 153{60, Berkeley, CA.
Johnson, M. (1991). Features and formulae. Computational Linguistics, 17(2) pp. 131{52.
Jordan, P. W., Dorr, B. J., and Benoit, J. W. (1993). A rst-pass approach for evaluating machine
translation systems. Machine Translation, 8(1) pp. 49{58. Special Issue on Evaluation.
Kameyama, M., Ochitani, R., and Peters, S. (1991). Resolving translation mismatches with information
ow. In Proceedings 29th Annual Conference of the ACL, pp. 193{200, Berkeley, CA.
Kamp, H. (1981). A theory of truth and semantic interpretation. In Groenendijk, J. A. G., Janssen, T.
M. V., and Stokhof, M. B. J., eds., Formal Methods in the Study of Language, vol. 135, pp. 277{322.
Mathematical Centre Tracts, Amsterdam.
Kaplan, R. M. (1973). A general syntactic processor. In Rustin (1973), pp. 193{241.
Kaplan, R. M. and Bresnan, J. (1982). Lexical-functional grammar: A formal system for grammatical
representation. In Bresnan (1982), ch. 4, pp. 173{281.
Kaplan, R. M., Netter, K., Wedekind, J., and Zaenen, A. (1989). Translation by structural correspondences. In Proceedings of the Fourth European Conference of the ACL, pp. 272{81, Manchester,
UK.
Kaplan, R. M. and Wedekind, J. (1993). Restriction and correspondence-based translation. In Proceedings of the Sixth European Conference of the ACL, pp. 193{202, The Netherlands. OTS, Utrecht
University.
Karp, R. M. (1972). Reducibility among combinatorial problems. In Miller, R. E. and Thatcher, J. W.,
eds., Complexity of Computer Computations, pp. 85{103. Plenum Press, New York.
Kasper, R. T. (1987). A unication method for disjunctive feature descriptions. In Proceedings of the
25th Annual Conference of the ACL, pp. 235{42, Stanford, CA.
Kasper, R. T. and Rounds, W. C. (1986). A logical semantics for feature structures. In Proceedings of
the 24th Annual Conference of the ACL, pp. 257{66, New York, NY.
Katz, J. J. and Fodor, J. A. (1963). The structure of semantic theory. Language, 39 pp. 170{210.
Kawasaki, Z., Yamano, F., and Yamasaki, N. (1992). Translator knowledge base for machine translation
systems. Machine Translation, 6(4) pp. 265{78.
Kay, M. (1973). The MIND system. In Rustin (1973), pp. 155{88.
Kay, M. (1979). Functional grammar. In Chiarello, C., et al., ed., Proceedings of the Fifth Annual Meeting
of the Berkeley Linguistics Society, pp. 142{58.
Kay, M., Gawron, J. M., and Norvig, P. (1994). Verbmobil: A Translation System for Face-to-Face Dialog.
Number 33 in Lecture Notes. Centre for the Study of Language and Information, Stanford, CA.
Keenan, E. L. (1985). Relative clauses. In Shopen (1985b), ch. 3, pp. 141{70.
King, M. (1991). Evaluation of MT systems { Panel discussion. In Proceedings of MT Summit III, pp.
141{46, Washington, DC.
King, M. and Falkedal, K. (1990). Using test suites in evaluation of machine translation systems. In
Proceedings of the 13th COLING, pp. 211{16, Helsinki, Finland.
Kinoshita, S., Phillips, J., and Tsujii, J. (1992). Interaction between structural changes in machine
translation. In Proceedings of COLING '92, vol. II, pp. 679{85, Nantes, France.
Knuth, D. E. (1965). On the translation of languages from left to right. Information and Control, 8(6)
pp. 607{39.
Knuth, D. E. (1981). The Art of Computer Programming, vol. 2 - Seminumerical Algorithms. AddisonWesley, Reading, MA, 2nd edition.
272
Lako, G. (1970). Pronominalization, negation, and the analysis of adverbs. In Jacobs, R. A. and
Rosenbaum, P. S., eds., Readings in English Transformational Grammar, pp. 145{65. Ginn and
Company, Waltham, MA.
Landsbergen, J. (1987). Montague grammar and machine translation. In Whitelock et al. (1987), pp.
113{47.
Leech, G. N. (1969). Towards a Semantic Description of English. Linguistics Library. Longman, London.
Levinson, S. C. (1991). Relativity in spatial conception and description. Technical Report WP 1, Max
Planck Institute for Psycholinguistics, Nijmegen, The Netherlands. (To appear in J. J. Gumperz
and S. C. Levinson (Eds.), Rethinking Linguistics Relativity, Cambridge University Press, UK).
Magnusdottir, G. (1993). Review of An Introduction to Machine Translation by Hutchins, W. J. and
Somers, H. L. Computational Linguistics, 19(2) pp. 383{384.
Markantonatou, S. and Sadler, L., eds. (1994). Grammatical Formalisms: Issues in Migration, vol. 4 of
Studies in Machine Translation and Natural Language Processing. Oce for Ocial Publications of
the European Community, Luxembourg.
Maxwell, D., Schubert, K., and Witkan, T., eds. (1988). New Directions in Machine Translation. Number 4
in Distributed Language Translation. Foris, Dordrecht, The Netherlands.
Melero, M., Nuebel, R., Ramm, W., and Rubies, A. (1990). Modication and semantic relations for
modiers - pragmatics. In The Eurotra Reference Manual, vol. Version 7.0, ch. Section B.I.3.7.b, pp.
1{10. Commission of the European Communities, Luxembourg.
Mel'cuk, I. and Zholkovsky, A. (1988). The explanatory combinatorial dictionary. In Relational Models of
the Lexicon { Representing Knowledge in Semantic Networks, ch. 2, pp. 41{74. Cambridge University
Press, Cambridge, UK.
Mitamura, T., Nyberg,3rd, E. H., and Carbonell, J. G. (1991). An ecient interlingua translation system
for multi-lingual document production. In Proceedings MT Summit III, Washington DC.
Nagao, M., Tsujii, J., and Nakamura, J. (1988). The Japanese government project for machine translation.
In Slocum (1988), pp. 141{86.
Neal, J. G., Feit, E. L., and Montgomery, C. A. (1993). Benchmark investigation/identication project.
Machine Translation, 8(1) pp. 77{84. Special Issue on Evaluation.
Nerbonne, J., Netter, K., Diagne, A. K., Klein, J., and Dickmann, L. (1993). A diagnostic tool for German
syntax. Machine Translation, 8(1) pp. 85{107. Special Issue on Evaluation.
Nirenburg, S., ed. (1987). Machine Translation - Theoretical and Methodological Issues. Studies in Natural
Language Processing. Cambridge University Press, Cambridge, UK.
Nirenburg, S., Carbonell, J., Tomita, M., and Goodman, K. (1992). Machine Translation: A Knowledge
Based Approach. Morgan Kaufman, San Mateo, CA.
Noonan, M. (1985). Complementation. In Shopen (1985b), ch. 2, pp. 42{140.
Nyberg, E. H., Mitamura, T., and Carbonell, J. G. (1994). Evaluation metrics for knowledge-based
machine translation. In Proceedings of the 15th COLING, pp. 95{99, Kyoto, Japan.
O'Brien, R. J., ed. (1971). Report of the Twenty-second Annual Round Table Meeting in Linguistics and
Language Studies, vol. 24 of Monograph Series on Languages and Linguistics. Georgetown University
Press, Washington DC.
Odijk, J. (1989). The organization of the Rosetta grammars. In Proceedings of the Fourth European
Conference of the ACL, pp. 80{86, Manchester, England.
Olivier, P. and Tsujii, J. (1994). Quantitative perceptual representation of prepositional semantics. Articial Intelligence Review, 8(2-3) pp. 147{158. Special Issue on Integration of Natural Language and
Vision Processing.
273
Parsons, T. (1990). Events in the Semantics of English: A Study in Subatomic Semantics. Number 19 in
Current Studies in Linguistics. MIT Press, Cambridge, MA.
Partee, B. H., ed. (1976a). Montague Grammar. Academic Press, New York.
Partee, B. H. (1976b). Some transformational extensions of Montague grammar. In Partee (1976a), pp.
51{76.
Payne, J. (1987). Colloquial Hungarian. Routledge & Kegan Paul, London.
Pereira, F. C. N. (1981). Extraposition grammars. Computational Linguistics, 7(4) pp. 243{256.
Pereira, F. C. N. and Shieber, S. M. (1987). Prolog and Natural-Language Analysis. Number 10 in CSLI
Lecture Notes. Center for the Study of Language and Information, Stanford, CA.
Phillips, J. D. (1993). Generation of text from logical formulae. Machine Translation, 8(4) pp. 209{35.
Pollard, C. and Sag, I. (1987). Information Based Syntax and Semantics: Vol. 1. Lecture Notes. CSLI,
Stanford, CA.
Pollard, C. and Sag, I. (1994). Head Driven Phrase Structure Grammar. Chicago University Press, IL.
Poznanski, V., Beaven, J. L., and Whitelock, P. (1995). An ecient generation algorithm for lexicalist
MT. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics,
Boston, MA.
Procter, P., ed. (1978). Longman Dictionary of Contemporary English. Longman, Harlow, Essex, England.
Pulman, S. G. (1991). ET6/1 Final Report. Technical report, CEC, Luxembourg.
Pulman, S. G. (1994). Expressivity of lean formalisms. In Markantonatou and Sadler (1994), pp. 35{60.
Pustejovsky, J. (1991a). The generative lexicon. Computational Linguistics, 17(4) pp. 409{441.
Pustejovsky, J. (1991b). The syntax of event structure. Cognition, 41 pp. 47{81.
Pustejovsky, J. (1993). Semantics and the Lexicon, vol. 49 of Studies in Linguistics and Philosophy.
Kluwer, Dordrecht, The Netherlands.
Pustejovsky, J. (forthcoming). Linguistic constraints on type coercion. In St. Dizier and Viegas (forthcoming).
Pustejovsky, J. and Boguraev, B. (1993). Lexical knowledge representation and natural language processing. Articial Intelligence, 63(1-2) pp. 193{224. Special Volume: Natural Language Processing.
Quine, W. V. O. (1960). Word and Object. MIT Press, Cambridge, MA.
Rappaport, M., Laughren, M., and Levin, B. (1993). Levels of lexical representation. In Pustejovsky (1993).
Rayner, M., Alshawi, H., Bretan, I., Carter, D., Digalakis, V., Gamback, B., Kaja, J., Karlgren, J.,
Lyberg, B., Price, P., Pulman, S., and Samuelsson, C. (1993). A speech to speech translation system
built from standard components. In Proceedings of the 1993 ARPA workshop on Human Language
Technology, Princeton, NJ.
Rayner, M., Carter, D., Price, P., and Lyberg, B. (1994). Estimating the performance of pipelined spoken
language translation systems. In Proceedings of ICSLP '94, Yokohama, Japan.
Rescher, N., ed. (1967). The Logic of Decision and Action. University of Pittsburgh, Pittsburgh, PA.
Reyle, U. and Rohrer, C., eds. (1988). Natural Language Processing and Linguistic Theories, vol. 35 of
Studies in Linguistics and Philosophy. D. Reidel, Dordrecht, Holland.
Rice, M. L. and Schiefelbusch, R. L., eds. (1989). The Teachability of Language. Paul H. Brookes Co.,
Inc., Baltimore, Maryland.
Rich, E. (1983). Articial Intelligence. Articial Intelligence Series. McGraw-Hill, Inc., New York and
London.
274
Rosner, M. and Johnson, R., eds. (1992). Computational Linguistics and Formal Semantics. Studies in
Natural Language Processing. Cambridge University Press, Cambridge, UK.
Rounds, W. C. (1988). Set values for unication-based grammar formalisms and logic programming.
Technical Report CSLI-88-129, Center for the Study of Language and Information, Stanford, CA.
Rudzka-Ostyn, B., ed. (1988). Topics in Cognitive Linguistics. Number 50 in Current Issues in Linguistic
Theory. John Benjamin, Amsterdam.
Rupp, C. J., Johnson, R., and Rosner, M. (1992). Situation schemata and linguistic representation. In
Rosner and Johnson (1992), ch. 7, pp. 191{221.
Russell, G., Ballim, A., Estival, D., and Warwick, S. (1991). A language for the statement of binary
relations over feature structures. In Proceedings of the Fifth European Conference of the ACL, Bonn,
Germany.
Rustin, R., ed. (1973). Natural Language Processing. Algorithmics Press, New York.
Sadler, L., Crookston, I., and Way, A. (1990). LFG and translation. In Proceedings of the Third TMI,
University of Texas at Austin. LRC.
Sadler, L. and Thompson, H. (1991). Structural non-correspondence in translation. In Proceedings of the
Fifth European Conference of the ACL, pp. 293{98, Bonn, Germany.
Sanlippo, A. (1990). Grammatical Relations, Thematic Roles and Verb Semantics. PhD thesis, University
of Edinburgh, Edinburgh, UK.
Sanlippo, A., Briscoe, E., Copestake, A., Marti, M., Taule, M., and Alonge, A. (1992). Translation
equivalence and lexicalization in the ACQUILEX LKB. In Proceedings of the Fourth TMI, pp. 1{11,
Montreal, Canada.
Sato, S. (1993). Example-based translation of technical terms. In Proceedings of the Fifth TMI, pp. 58{68,
Kyoto, Japan.
Schmidt, P. (1988). A syntactic description of German in a formalism designed for machine translation.
In Proceedings of COLING '88, pp. 589{94, Budapest, Hungary.
Schneider, T. (1991). The METAL system. Status 1991. In Proceedings MT Summit III, pp. 41{44,
Washington DC.
Schubert, K. (1988). The architecture of DLT - interlingual or double direct? In Maxwell et al. (1988),
pp. 131{44.
Shieber, S., van Noord, G., Pereira, F. C. N., and Moore, R. C. (1990). Semantic-head-driven generation.
Computational Linguistics, 16(1) pp. 30{42.
Shieber, S. M. (1985). Using restriction to extend parsing algorithms for complex-feature-based formalisms.
In Proceedings of the 23rd Annual Conference of the ACL, pp. 145{52, Chicago, IL.
Shieber, S. M. (1986). An Introduction to Unication-based Approaches to Grammar, vol. 4 of CSLI
Lecture Notes. CSLI, Stanford, CA.
Shieber, S. M. (1987). Separating linguistic analyses from linguistic theories. In Whitelock et al. (1987),
pp. 1{36.
Shieber, S. M. (1993). The problem of logical-form equivalence. Computational Linguistics, 19(1) pp.
179{90.
Shopen, T., ed. (1985a). Language Typology and Syntactic Description Vol. III: Grammatical Categories
and the Lexicon. Cambridge University Press, Cambridge, UK.
Shopen, T., ed. (1985b). Language Typology and Syntactic Description. Volume II: Complex Constructions. Cambridge University Press, Cambridge, UK.
Sjostrom, S. (1990). Spatial Relations: Towards a Theory of Spatial Verbs, Prepositions and Pronominal
Adverbs in Swedish. PhD thesis, Department of Linguistics, University of Gotenborg, Sweden.
275
Slocum, J., ed. (1988). Machine Translation Systems. Studies in Natural Language Processing. Cambridge
University Press, Cambridge, UK.
Somers, H. (1987). Valency and Case in Computational Linguistics. Edinburgh University Press, Edinburgh, Scotland.
Sondheimer, N. K. (1978). A semantic analysis of reference to spatial properties. Linguistics and Philosophy, 2(2) pp. 235{80.
Sparck-Jones, K. and Boguraev, B. (1987). A note on a study of cases. Computational Linguistics, 13(1-2)
pp. 65{68.
St. Dizier, P. and Viegas, E., eds. (forthcoming). Computational Lexical Semantics. Cambridge University
Press, Cambridge, UK.
Steiner, E., Eckert, U., Roth, B., and Winter-Thielen, J. (1988a). The development of the EUROTRA-D
system of semantic relations. In Steiner et al. (1988b), ch. 3, pp. 40{104.
Steiner, E. H., Schmidt, P., and Zelinsky-Wibbelt, C., eds. (1988b). From Syntax to Semantics - Insights
from Machine Translation. Pinter, London, UK.
Talmy, L. (1985). Lexicalization patterns: semantic structure in lexical forms. In Shopen (1985a), ch. 2,
pp. 57{149.
Trujillo, A. (1992). Spatial lexicalization in the translation of prepositional phrases. In Proceedings of the
30th Annual Conference of the ACL, Student Session, pp. 306{08, Newark, Delaware.
Trujillo, A. (1994). Computing FIRST and FOLLOW functions for Feature-Theoretic grammars. In
Proceedings of the 15th COLING, pp. 875{80, Kyoto, Japan. cmp-lg/9407030.
Trujillo, A. (forthcoming). Towards a cross-linguistically valid classication of spatial prepositions. Machine Translation.
Trujillo, I. A. and Plowman, D. (1991). Automation of bilingual lexicon compilation. In Proceedings of
MT Summit III, pp. 51{54, Washington DC.
van der Eijk, P. (1993). Automating the acquisition of bilingual terminology. In Proceedings of the Sixth
European Conference of the ACL, pp. 113{19, Utrecht, The Netherlands.
van Noord, G. (1991). An overview of head-driven bottom-up generation. In Dale, R., Mellish, C., and
Zock, M., eds., Natural Language Generation, pp. 141{65. Academic Press, London.
Varile, G. B. and Lav, P. (1988). Eurotra: Practical experience with a multilingual machine translation
system under development. In Proceedings of the 2nd Conference on Applied Natural Language
Processing, pp. 160{67, Austin, TX.
Vauquois, B. (1968). A survey of formal grammars and algorithms for recognition and transformation in
machine translation. In IFIP Congress-68, pp. 254{60, Edinburgh.
Vauquois, B. and Boitet, C. (1988). Automated translation at Grenoble University. In Slocum (1988),
pp. 85{110.
Vendler, Z. (1967). Linguistics in Philosophy. Cornell University Press, Ithaca, NY, USA.
Verkuyl, H. J. (1972). On the Compositional Nature of the Aspects, vol. 15 of Foundations of Language,
Supplementary Series. D. Reidel, Dordrecht, Holland.
Whitelock, P. (1988). A feature-based categorial morpho-syntax for Japanese. In Reyle and Rohrer (1988),
pp. 230{61.
Whitelock, P. (1992). Shake-and-Bake translation. In Proceedings of COLING '92, pp. 784{91, Nantes,
France.
Whitelock, P. J., Wood, M. M., Somers, H. L., Johnson, R., and Bennett, P., eds. (1987). Linguistic
Theory and Computer Applications. Academic Press, London.
276
Wilks, Y. (1975). A preferential, pattern-seeking, semantics for natural language inference. Articial
Intelligence, 6 pp. 53{74.
Winograd, T. (1983). Language as a Cognitive Process - Volume I: Syntax. Addison Wesley, Reading,
MA.
Winston, P. H. (1984). Articial Intelligence. Addison Wesley, Reading, MA, 2 edition.
Zajac, R. (1989). A transfer model using a typed feature structure rewriting system with inheritance. In
Proceedings of the 27th Annual Conference of the ACL, pp. 1{6, Vancouver, Canada.
Zelinsky-Wibbelt, C. (1988). From cognitive grammar to the generation of semantic interpretation in
machine translation. In Steiner et al. (1988b), ch. 4, pp. 105{32.
Zelinsky-Wibbelt, C. (1990). The semantic representation of spatial congurations: a conceptual motivation for generation in machine translation. In Proceedings of COLING '90, pp. 299{303, Helsinki,
Finland.
Zsilka, J. (1967). The System of Hungarian Sentence Patterns, vol. 67 of Uralic and Altaic Series. Indiana
University Publications, Bloomington, IN.
277
© Copyright 2026 Paperzz