REFTEX - A CONTEXT-BASED TRANSLATION AID Poul Soren Kjersgaard University of Odense Campusvej DK-5230 ABSTRACT The source text passage i n c l u d e s words or word c o m p o u n d s which a t r a n s l a t o r wants to r e t r i e v e for the current t r a n s l a t i n g of another text. The target text p a s s a g e is the e q u i v a l e n t version of the source text passage. On the basis of a c o m p a r i s o n of the c o n t e x t s of these words in the c o n c o r ded passage and his own text, the t r a n s l a tor has to decide o n the utility of the t r a n s l a t i o n p r o p o s e d in the target text passage. a component M The s y s t e m was first i m p l e m e n t e d on a CDC m a i n f r a m e i n s t a l l a t i o n , but has now been c o n v e r t e d to an IBM X T - m i c r o c o m p u t e r . The primary scope of the p r o g r a m is to provide a s u p p l e m e n t a l aid for human translators. The system p r e s e n t e d in this paper produces b i l i n g u a l p a s s a g e s of text from an original (source) text and one (or more) of its t r a n s l a t e d versions. The p r o g r a m might become t r a n s l a t o r ' s work bench. 55 Odense The principles of REFTEX The name of the system, REFTEX, is an a c r o n y m for r e f e r e n c e text. Its main char a c t e r i s t i c s can be s u m m s r i s e d as follows: The s y s t e m is meant to be used when the t r a n s l a t o r comes across some word or word c o m p o u n d that cannot be looked up in a d i c t i o n a r y or the t r a n s l a t i o n s of which do not seem r e l e v a n t in the context of the actual t r a n s l a t i o n . The t r a n s l a t o r can then have r e c o u r s e to texts that have already been t r a n s l a t e d , in order to try to r e t r i e v e the wanted word(s) and its/ their t r a n s l a t i o n ( s ) . Such texts exist in an o r i g i n a l (source language) v e r s i o n and one or more t r a n s l a t e d (target language) versions. In REFIEX, such texts are des i g n a t e d r e f e r e n c e texts. During execution of the program, the p r o g r a m will access p a s s a g e s ( c o n c o r d a n c e s ) of the original text that c o n t a i n the word and the e q u i v a l e n t p a s s a g e s of (one of) the translated versions. The t r a n s l a t o r will then decide if the t r a n s l a t i o n c o n t a i n e d in the target l a n g u a g e v e r s i o n is useful in the actual t r a n s l a t i o n . of Introduction C o m p u t e r s can c o n t r i b u t e to t r a n s l a t i o n either a u t o m a t i c a l l y o r as an aid to the human t r a n s l a t o r ( m a c h i n e - a i d e d t r a n s l a tion). The latter r e p r e s e n t s a large spectrum of d i f f e r e n t a p p r o a c h e s as to the degree of human i n t e r v e n t i o n in the t r a ns l a tion process and to the m e t h o d ( s ) . Some systems are s e m i - a u t o m a t i c in the sense that they only ask for human i n t e r v e n t i o n for the r e s o l u t i o n of a m b i g u i t i e s (Melby, 1981). Other systems are d e s i g n e d to relieve the human t r a n s l a t o r of some tedious aspects (such as d i c t i o n a r y look-up) of the t r a n s l a t i o n work, either i n t e r a c t i v e l y via a t e r m i n a l or by batch p r o c e s s i n g overnight. As to method(s), most systems are based on d i c t i o n a r y l o o k - u p s - sometimes c o m b i n e d with a u t o m a t i c i n s e r t i o n of the r e t r i e v e d e q u i v a l e n t s (McNaught, Somers, 1979). It is an i n t e r a c t i v e , s c r e e n - o r i e n t e d s y s t e m that can be used by a t r a n s i s t o r during the t r a n s I a t i o n process. In the present version, the text to be t r a n s l a ted and its t r a n s l a t i o n are s u p p o s e d to exist i n d e p e n d e n t l y on paper, but n o t h i n g p r e v e n t s the i m p l e m e n t a t i o n of an i n t e g r a ted v e r s i o n using w i n d o w s (cf. last section). This paper will d e s c r i b e an a l t e r n a t i v e method, REFTEX. A major d i f f e r e n c e b e t w e e n REFTEX and most other m a c h i n e - a i d e d translation systems that I know of is that REFTEX e m p h a s i s e s the context, w h e r e a s other systems rely on b i l i n g u a l d i c t i o n a r i e s c o n t a i n i n g t r a n s l a t i o n s ( s o m e t i m e s uncommented) and p o s s i b l y d e f i n i t i o n s or exp l a n a t o r y remarks. REFTEX can thus be c o n c e i v e d of as a c o m p u t e r i s e d c o m b i n a t i o n of b i l i n g u a l conc o r d a n c e s used in p h i l o l o g y (usually on ancient texts) and the manual use of translated text as an aid for the translator. 8ut in c o n t r a s t to t r a d i t i o n a l c o n c o r d a n c e making, the project does not aim at producing a finished product of the works of an author, but at s u p p l y i n g the t r a n s l a t o r with an ad hoc tool. 109 The REFTEX ( p r a g m a t i c s ) and k n o w l e d g e of the two languages involved, the t r a n s l a t o r now has to decide whether the source l a n g u a g e p a s s a g e is s u f f i c i e n t l y s i m i l a r to the context of the actual t r a n s l a t i o n to permit reusing the t r a n s l a t i o n c o n t a i n e d in the target l a n g u a g e passage. The d e c i s i o n of course d e p e n d s on the q u a l i t y of the t r a n s l a t e d r e f e r e n c e text and relies on the t r a n s l a tor's a b i l i t y to detect p o s s i b l e errors. system REFTEX has been i m p l e m e n t e d as a program package of two i n d e p e n d e n t programs: ARBORAL and REFTEX. The former uses one or more s l i g h t l y p r e - e d i t e d r e f e r e n c e texts as input and transforms each into an e q u i v a l e n t data s t r u c t u r e that c o n t a i n s both the o r i g i n a l i n f o r m a t i o n (thus p e r m i t t i n g a r e c o n s t r u c tion of the o r i g i n a l text) and some new i n f o r m a t i o n which F a c i l i t a t e s the searching of words in the text and the c o n c o r dance making. If the first b i l i n g u a l c o n c o r d a n c e does not c o n t a i n an a c c e p t a b l e t r a n s l a t i o n , the t r a n s l a t o r can "scroll" to the f o l l o w i n g o c c u r r e n c e ( s ) , until he finds an a d e q u a t e t r a n s l a t i o n or the r e f e r e n c e text is exhausted. If either the word does not exist in the r e f e r e n c e text or it does not have a p p r o p r i a t e t r a n s l a t i o n s , it will be saved in a special array for n o n - r e t r i e v e d w o r d s and can be s e a r c h e d in another r e f e r e n c e text, after t h e t r a n s l a t o r has f i n i s h e d the list of words or e x p r e s s i o n s that he wants to look up. In case that words have been saved in this array, the p r o g r a m will ask for a n o t h e r pair of r e f e r e n c e texts. S u p p o s i n g that they are a v a i l a b l e , the p r o g r a m will try to r e t r i e v e p a s s a g e s cont a i n i n g the words that were saved. The data s t r u c t u r e is o r g a n i s e d as two records. The first one c o n t a i n s a node or an index for each d i F F e r e n t word of the text t o g e t h e r with some s a t e l l i t e i n F o r m a tion: a b s o l u t e word F r e q u e n c i e s and pointers to the First o c c u r r e n c e of the word. The second record is a list s t r u c t u r e cont a i n i n g a r e f e r e n c e for each i n d i v i d u a l word of the r e f e r e n c e text to its p o s i t i o n in the first record, and p o i n t e r s to possibly f o l l o w i n g o c c u r r e n c e s of the word and to the b e g i n n i n g of the p a r a g r a p h ( c o n c o r d a n c e ) that c o n t a i n s the word. Once the f i n i s h e d data s t r u c t u r e has been e s t a b l i s h e d , the p r o g r a m w r i t e s it on a file, from where it can be a c c e s s e d by the main p r o g r a m REFTEX. The p r e - e d i t i n g of the r e f e r e n c e text that was m e n t i o n e d above c o n s i s t s of the i n s e r t i o n in the source text of period m a r k e r s (the number sign: #) t o g e t h e r with a number that u n e q i v o c a l I y i d e n t i f i e s each passage. A p a s s a g e n o r m a l l y c o n s i s t s of one period, p o s s i b l y two. Then, p a r a l l e l period m a r k e r s and n u m b e r s are i n s e r t e d into the target text(s) to ensure the retrieval of p a r a l l e l e x t r a c t s ( c o n c o r d a n c e s ) of the source and target texts. If this p r e - e d i t i n g were not c a r r i e d out, it would not be p o s s i b l e to extract p a r a l l e l passages, if the source and target l a n g u a g e s i n v o l v e d are s t r u c t u r a l l y d i f f e r e n t in respect to modes of e x p r e s s i o n . And even for c l o s e l y related l a n g u a g e s such as the Scand i n a v i a n languages, this w o u l d p r o b a b l y be the case. REFTEX is the part of the p r o g r a m package that will be used by the t r a n s l a t o r d u r i n g the p r o c e s s of t r a n s l a t i o n . P r o g r a m e x e c u t i o n starts by a s k i n g the t r a n s l a t o r to key in names of the pair of r e f e r e n c e texts h e / s h e wants to use for solving the p r o b l e m s of the actual translation. The p r o g r a m then asks for the first key word to be s e a r c h e d in the r e f e r e n c e text, whose e q u i v a l e n t s the t r a n s l a t o r wants to know. If the r e f e r e n c e source text c o n t a i n s that word, the p r o g r a m will print out the p a s s a g e c o n t a i n i n g the first occurfence of t h e ' w o r d t o g e t h e r with the equivalent p a s s a g e of the target l a n g u a g e version. On the basis of his world k n o w l e d g e I10 An a d d i t i o n a l feature of REFTEX is a s e m i - a u t o m a t i c r o u t i n e that e n a b l e s the p r o g r a m to r e t r i e v e i n f l e c t e d forms of a word, for i n s t a n c e f e m i n i n e and/or plural forms as in the S p a n i s h word e s p a S o l espaSola, e s p a ~ o l e s , e s p a ~ o l a s . The routine solely r e l i e s on formal c h a r a c t e r i s tics of words (such as word e n d i n g s ) and not on s e m a n t i c or other m a r k e r s that would imply some sort of " u n d e r s t a n d i n g " of the word (as is the case in many grammars). For the time being, the r o u t i n e has been i m p l e m e n t e d for regular nouns, adjectives, verbs and p a r t i c i p l e s in F r e n c h and Spanish. Computational concordance making Given that the R E F T E X - a p p r o a c h relies on a b i l i n g u a l c o n c o r d a n c e , this s e c t i o n will b r i e f l y i n t r o d u c e two of the p r o b l e m s this causes: w o r d - f o r m d i f f u s i o n and homoform-insensitivity. The former p r o b l e m reflects the wish to group t o g e t h e r d i f f e rent i n f l e c t e d forms of the same word. The s o l u t i o n p r o p o s e d in REFTEX is to depart from the p r i m a r y form and c o n s e q u e n t l y gen e r a t e i n f l e c t e d forms a u t o m a t i c a l l y , when r e g u l a r and m a n u a l l y , when i r r e g u l a r . The latter p r o b l e m r e f l e c t s the homograph or p o l y s e m y problem. To solve this p r o b l e m c o m p l e t e l y , one w o u l d need either a sort of t a g g i n g ( r e q u i r i n g e x t e n s i v e p r e - e d i t i n g ) or some s e m a n t i c a n a l y z e r . N e i t h e r of these s o l u t i o n s has been c h o s e n in the R E F T E X - a p p r o a c h . A " p r a g m a t i c " solution, based on the i m m e d i a t e context, has been d e v e l o p e d , thus r e d u c i n g the amount of s u p e r f l u o u s i n f o r m a t i o n or "noise". An e x a m p l e will i l l u s t r a t e its function: The French word " a p p l i c a t i o n " has m u l t i p l e meanings, and may in some texts be quite frequent. If the key word to be looked up is the "compound p r e p o s i t i o n "en a p p l i c a t i o n de", the word takes on yet another meaning. In order to narrow the search field, REFTEX permits the t r a n s l a t o r to look for the word " a p p l i c a t i o n " together with "en" and "de". In this irrelevant way, a lot of, though not all, i n f o r m a t i o n will be excluded. Methodological considerations The use of b i l i n g u a l c o n c o r d a n c e s implies that REFTEX can be c h a r a c t e r i s e d as a c o n t e x t - o r i e n t e d t r a n s l a t i o n aid in opp o s i t i o n to the d i c t i o n a r y - o r i e n t e d approach that most m a c h i n e - a i d e d systems rely on. These two a p p r o a c h e s both possess weaknesses. The p r o b l e m of a c o n t e x t - o r i e n t e d a pp r o a c h can b~ r e s t a t e d as the q u e s t i o n o f how r e l i a b l e the t r a n s l a t i o n o f t h e r e Ference source t e x t i s , whereas t h e p r o blem o f a d i c t i o n a r y - o r i e n t e d approach may be t h e d i f f i c u l t i e s of defining precisely t h e words o f a language ( c f . W i t t g e n s t e i n ) . In f a c t , t h e d i f f e r e n c e between t h e two approaches comes down to t h e q u e s t i o n o f whether words possess an i n d e p e n d e n t meani n g , d e f i n e d a t t h e " l a n g u e " - l e v e l or t h e i r meaning i s i n f l u e n c e d by t h e a c t u a l c o n t e x t u a l use o f t h e words, t h e " p a r o l e " - l e v e l . The d i f f e r e n c e between t h e two a p p r o a c h es may be i l l u s t r a t e d by a w e l l - k n o w n e x ample from t h e M T - l i t e r a t u r e : t h e E n g l i s h verb " t o know", which i s r e n d e r e d i n many European languages by two d i f f e r e n t v e r b s . Does t h i s verb have two d i s t i n c t meanings which t h e l e x i c o g r a p h e r can account f o r or would i t be p r e f e r a b l e t o l e t t h e t r a n s l a t o r d e c i d e t h e r e l e v a n t e q u i v a l e n t on t h e b a s i s o f a s e r i e s o f b i l i n g u a l l y concorded examples? A s i m i l a r example would be the German word "Schlagsahne" which i s r e n d e r e d i n t o Danish by two d i f f e r e n t words: p i s k e f l e d e (cream) and fledeskum (whipped cr e a m) . The s t r e n g t h o f a b i l i n g u a l d i c t i o n a r y approach i s o f course i t s a b i l i t y i n many cases t o convey t o t h e user a f a i r l y good i d e a o f t h e meaning o f a word i n a n o t h e r language. this holds true, another aspect of the approach is to d e t e r m i n e w h e t h e r the impact of the context is equally strong for any sub-vocabulary. In the negative, this would mean that a c o n t e x t - r e l a t e d a p p r o a c h would be less relevant in some cases. No c o n c l u s i v e answer has been given to that question, but it seems fairly reasonable to suppose that the more s p e c i a l i s e d the v o c a b u l a r y is the less the m e a n i n g of the word is i n f l u e n c e d by the context. In such cases, the utility of the REFTEX approach may be the p o s s i b i l i t y to r e t r i e v e newly coined c o m p o u n d s that have not yet been l e x i c a l i s e d , or "loose" c o l l o c a t i o n s that never appear in d i c t i o n a r i e s . Alternative applications The primary scope of the p r o g r a m - as was stated in the i n t r o d u c t i o n - is to provide a s u p p l e m e n t a l aid for human translators. In that respect, it could p r o b a b l y become an i n t e g r a t e d part of a t r a n s l a t o r ' s work bench Or a m a n u e n s i s (Kay, 1980), enabling the t r a n s l a t o r to carry out all parts ( t r a n s l a t i o n , d i c t i o n a r y and reference text look-ups, text p r o c e s s i n g ) of the t r a n s l a t i o n process. This part of the project has not been completed. A c o n t e x t - o r i e n t e d a p p r o a c h may also be an a p p r o p r i a t e tool for l e x i c o g r a p h e r s and other r e s e a r c h e r s b e c a u s e it can p r o v i d e the "raw m a t e r i a l " for s y n t a c t i c investigations as well. The system might thus prove useful for making " t r a n s l a t i o n ruIes", i.e. rules s t a t i n g how to t r a n s I a t e syntactic p h e n o m e n a from one l a n g u a g e into another. Relevant literature Arthernt Peter: M a c h i n e T r a n s l a t i o n and c o m p u t e r i z e d T e r m i n o l o g y Systems; a Transl a t o r ' s . v i e w p o i n t pp. 77-109 in S n e l l ( e d . ) : T r a n s l a t i n g and the Computer. North Holland. Den Haag 1979. Carestia-Greenfieldt C a r e s t i a et Serain, Daniel: La t r a d u c t i o n a s s i s t 4 e par ordinateur: Des b a n q u e s de t e r m i n o l o g i e aux systbmes i n t e r a c t i f s de t r a d u c t i o n . Paris 1976. The s t r e n g t h o f an c o n t e x t - o r i e n t e d approach i s i t s a b i l i t y t o h e l p d e c i d i n g ( j u s t ) which among a number o f d i f f e r e n t p r o p o s a l s should be r e t a i n e d f o r t h e c u r r e n t t r a n s l a t i o n . And, n e e d l e s s t o say, i n some s i t u a t i o n s , i t w i l l c e r t a i n l y be poss i b l e to combine t h e two approaches i n o r der t o make t h e best o u t o f each. Kay~ Martin: The Proper Place of Men and M a c h i n e s in L a n g u a g e T r a n s l a t i o n . Xerox. Palo Alto/Cal. 1980. The b e l i e f t h a t the l i n g u i s t i c c o n t e x t c o n t r i b u t e s to d e t e r m i n i n g the meaning o f words i s o f course i m p l i e d i n t h e use o f a c o n t e x t - o r i e n t e d approach. Supposing t h a t 111 McNaught, John and Somers~ lator as a Computer User. t e r 1979. H.L.: The Trans- UMIST. Manches- Melby~ Alan K.: T r a n s l a t o r s and M a c h i n e s Can They C o o p e r a t e ? in L ' i n f o r m a t i q u e au service de la t r a d u c t i o n . Num~ro special de META 26.1. M o n t r e a l 1981. 112
© Copyright 2026 Paperzz