as a PDF

Machine Translation, Language Divergence and Lexical
Resources
Pushpak Bhattacharyya
Department of Computer Science and Engineering,
Indian Institute of Technology,
Bombay.
Abstract
The key concern in machine translation, whose purpose it is to convert documents from one language
to another, is the language divergence problem. This problem arises from the fact that languages make
different lexical and syntactic choices for expressing an idea. Language divergence needs to be tackled
not only for translating between language pairs from distant families (e.g, English and Japanese), but
also for pairs which are close siblings (e.g., Hindi and Marathi). In this paper, we study the language
divergence problem in the context of interlingua based Machine Translation (MT), which is a
translation approach making heavy demand on semantics. Cases of language divergence involving
English and Hindi and those involving English, Hindi and Marathi are discussed. The solution to the
language divergence problem lies in building and using knowledge networks like wordnets. We present
our work on the creation of Hindi wordnets and discuss the use of this resource in solving the word
sense disambiguation (WSD) problem.
1 Introduction
Machine Translation (MT) [Hutchins, 2000; Somers, 2000; Hutchins and Somers, 1992] is
defined as the process of converting a document in one language (L1) to another (L2). L1 is
called the source language and L2, the target language. While the creation of general purpose,
fully automatic machine translations programs is still a distant goal, the effort on the
construction of Human Aided MT (HAMT) and Machine Aided Human Translation (MATH)
systems is on with fair amount of success.
In HAMT systems, the translator is a computer program. Pre and Post editing are done
on the input and output texts respectively, to ease the analysis and generation processes. We
show below an example of pre editing:
The inspection team appointed by the United
Nations visited Iraq early July, 2003. (1)
Pre edited
The <cnp> inspection team </cnp> {which was}
appointed by the <org> United Nations </org>
visited Iraq {in} early <date>July, 2003</date>. (2)
Here
a. inspection team has been marked as a compound noun phrase (<cnp>,< /cnp>).
b. A relative clause beginner (which was) has been inserted.
c. United Nations has been named-entity (NE) tagged as an organization (<org>, <
/org>).
1
d. in has been inserted before early.
e. July, 2003 has been named entity tagged as a date (<date>, </date>).
Named Entity Tagging [McCullum et. al., 2000] is an important pre editing step used in most
NLP applications. Since the accuracy of current NE taggers is around 90%- which is low for
most serious applications-, such tags are quite often inserted or corrected manually. The
insertion of relative clause beginners, too, is a non-trivial problem. Consider the sentence
The president and the prime minister of Mauritius (a small island country on the Indian
Ocean) due to arrive yesterday, were delayed because of the airlines strike.
The appropriate relative clause before a small country is which is, while that before due to
arrive is who were. This requires determining the correct
a. wh-word (who/which)
b. number (is/are) and
c. tense (is/were)
Post editing modifies the translation program’s output to improve its readability and
idiomaticity. As an example, consider the situation below:
He wants good food everyday (3)
Translator output
vah raoja
AcCa Kanaa caahta
hO. (4)
He everyday good food wants <copula> (5)
Post edited
]sao
raoja
AcCa Kanaa caaihe. (6)
He (dat.) everyday good food is needed. (7)
This example shows that dative case marking on the subject makes the output more natural
for Hindi. However, the machine needs the knowledge of the category of the verb (want, like
etc.) to be able to apply this transformation. In the absence of this knowledge, post editing is a
necessity.
In Machine Aided Human Translation (MAHT), the translation is ultimately done by
human beings. However, translators use various supports from the computer, like
a. Online dictionaries: it is known that the most time consuming and pain-staking part
of a translator’s job is turning the pages of a dictionary. Online access to electronic
dictionaries reduces this burden considerably.
b. Terminology Data Banks: Domains have special purpose terminology. Electronic
terminology data banks with translations enhance the efficiency of the translation job.
For example, while translating a document on energy policy, the availability of AaNaivak
}jaa- as the translation of the term nuclear energy will remove the search for or hard
thinking on an appropriate translation of the term.
c. Translation Memories [Sumita and Lida, 1991; Brown, 1996]: These are the
fragments of translations of previously translated texts, available to the human
translator, electronically.
2
Point c. merits some discussion. Let us suppose the following situation occurs:
Available Translations:
He bought a pen (8)
]snanao ek klama KrIda (9)
(He (ergative marker) a pen bought)
and
All kings have huge palaces (10)
saBaI rajaaAaoMko pasa bahut baDo p/asaad hOM
(11)
(all kings-of in-possesion very large houses are)
Required Translation:
He bought a huge house (12)
The underlined portions of (8) and (9) can be used to obtain the translation of (12) as
]snanao ek bahut baDa Gar KrIda (13)
(He (erg.) a very large house bought)
There are pitfalls, however. Consider the following translation situation
•
German:
Ein messer ist im schrank; er miβt eletrizitat (14)
•
TM1: Ein messer ist im schrank ->
A meter is in the cabinet.
•
TM2:
er miβt eletrizitat.
It measures electricity
•
New situation
Ein messer ist im schrank; er ist sehr scharf (15)
A meter is in the cabinet; *it is very sharp.
Messer in German: Meter/Knife in English
TM1 and TM2 are the translation memories. Their use, however, produces incorrect
translation for (15). This happens because messer in German can mean both meter and knife
in English, and in the current situation the second translation should be chosen (which should
have been apparent from the presence of scharf, but that falls under the topic of word sense
disambiguation).
The following tricky issues in using the translation memory are illustrated by the above
examples:
a. Text alignment [Kay and Roscheisen, 1993]: (8) and (9) show that contiguous
pieces of text in the input document may not remain contiguous in the target language
text.
b. Affix consideration: (11) and (13) show that words may not be used as such in the
new translation situation (baDo to baDa).
c. Sense Disambiguation: (14) and (15) show that correct sense determination is
essential.
3
We now give a roadmap of the paper. In section 2, we mention two more taxonomic divisions
of MT systems and describe the three basic approaches to MT. Section 3 discusses the
language divergence between English and Hindi and that between English, Hindi and
Marathi. Section 4 studies the use of lexical resources like the wordnet to solve the word
sense disambiguation problem in the context of language divergence. Section 5 concludes the
paper.
2 Taxonomies of MT Systems
MT systems are divided into categories depending on many criteria, two of which are
mentioned here:
a. Domain coverage: SYTRAN [Chapter 10 of Hutchins and Somers, 1992] is an
example of a general purpose MT system, while Tom Meteo [Chapter 12 of Hutchins
and Somers, 1992] is an example of a special purpose MT system in the weather
report domain.
b. Point of Entry from the Source Text to the Target Text: This is illustrated by
Vauquois Triangle [Vauquois, 1975] (figure 1). The movements to and from the top
of the triangle need deeper levels of understanding of the input and output texts. The
three famous translation methodologies, viz., Direct, Transfer and Interlingua are
placed respectively at the bottom, in the middle and towards the top of the triangle.
Longer movements on either side of the triangle indicate more involved processing in
terms of analysis and generation.
Deep understanding level
Ontological interlingua
Interlingual level
Semantic transfer
Logico-semantic level
ran
ng t
Mixing levels
Syntactico-functional level
Syntagmatic level
Morpho-syntactic level
Semantico-linguistic interlingua
Conceptual transfer
sfer
i
end
Asc
Multilevel transfer
Multilevel description
Syntactic transfer (deep)
F-structures (functional)
Syntactic transfer (surface)
C-structures (constituent)
Semi-direct translation
Des
cen
Direct translation
Graphemic level
SPA-structures (semantic
& predicate-argument)
din
Tagged text
g tr
ans
fers
Text
F
Figure 1: Vauquois Triangle
4
We illustrate the three approaches to MT with a simple example. Consider the sentence I like
mangoes. The stages in translating from English to Hindi in the Direct Approach are shown
below:
1.
2.
3.
4.
Word replacements
•
I like mangoes
•
maOM AcCa laga
•
I like (root) mangoes
Aama
Morphology
•
maOM AcCa lagata Aama
•
I like
mangoes
Syntactic re-arrangement
•
maOM Aama
•
I mangoes like
AcCa lagata hO
Idiomatization
•
mauJao
•
I (dative) mangoes like
Aama
AcCa lagata hO
Note how the entry from the source text into the target text takes place right from the start of
the processing.
The transfer based approach does some initial processing on the input text (for example,
parsing) and then applies transfer rules from the source to the target language. This is
illustrated in figure 2. SVO-SOV transformation is applied to the parse tree to meet the
demand of English to Hindi translation. Thus some amount of processing is done on the
source text before ‘entering’ the target text.
The final approach, viz., Interlingua based, does a lot of processing on the source text
before ‘entering’ the target text. For example in the Universal Networking Language (UNL)
framework [Uchida et. al., 2000], the given sentence under consideration is expressed as
aoj(like(icl>be).@entry, I) (16)
obj(like(icl>be).@entry, mango(icl>fruit).@pl) (17)
where
a. aoj is the attribute-of-object relation (denoting the possessor of a state or attribute),
b. (icl>be) is the restriction constraining the possibilities for like- showing that it is a
stative verb and not any other part of speech entity (preposition, e.g.),
c. .@entry is an attribute of the node of like, showing that the generation of the target
language text starts from this node,
d. obj denotes the patient of the activity (like),
e. (icl>fruit) makes the sense of mango unique, and finally
f.
.@pl denotes plurality.
The Hindi sentence is generated by traversing the graph (figure 3) defined by the UNL
expressions (16) and (17).
5
Hindi
lexicalization
SVO->SOV
transfer
S
NP
VP
NP
V
I
like
and
dativization
S
NP
VP
NP
NP
mangoes
I
S
V
mangoes
like
mauJo
VP
V
NP
Aama
psaMd hO
Figure 2: Transfer based approach to machine translation
like(icl>be).
@entry
aoj
I
obj
Mango(icl>fruit).
@pl
Figure 3: UNL Graph of I like mangoes
3 Language Divergence
language divergence [Dorr, 1993; Dave et. al., 2002] refers to the differences in lexical and
syntactic choices that languages make in expressing ideas. We discuss below many interesting
examples of language divergences in the context of tri-lingual MT between English, Hindi
and Marathi.
3.1
English-Hindi
Consider the sentence
The demands on sportsmen today can lead to burnout at an early age. (18)
The word burnout is of interest, whose POS and Gloss respectively from the Oxford English
Dictionary [www.oed.com] are
6
Noun; the state of being extremely tired or ill, either physically or mentally, because
you have worked too hard (19)
The Hindi translation of (18) is
iKlaaiDyaaoM sao jaao Aaja ApoxaaeM hOM, vao ]nho kma ]ma/ maoM hI Aik/yaaSaIla kr saktI hOM. (20)
whose equivalent literal English transcription is
Players-from which today demands are, those them early age-in certainly inactive
make can <copula>. (21)
We see that to produce a natural Hindi translation of the source sentence, we have to turn a
noun (burnout) into an adjective (अ बयाशील).
There are examples from other parts of speech too:
a. Noun to Verb:
Every concert they gave us was a sell-out. (22)
(Gloss: an event for which on the tickets have been sold) (23)
]nako hr saMgaIt kaya-k/ma ko saBaI iTkT ibak gae qao.
(24)
Their every concert-of all tickets sold completely were (25)
b. Adjective to Adverb (very common):
The children watched in wide-eyed amazement. (26)
(Gloss: with eyes fully open because of fear, great surprise, etc) (27)
baccao AaScaya- sao AaMK faDo doK rho qao. (28)
Children amazement-with eyes opened-fully see-doing were. (29)
c. Adjective to Verb:
He was in a bad mood at breakfast and wasn't very communicative. (30)
(Gloss: able and willing to talk and give information to other people) (31)
naaSto ko samaya vah Kraba mauD maoM qaa AaOr jyaada baat caIt nahIM kr rha qaa. (32)
Breakfast-of time he bad mood-in was and much chit-chat not doing was.
(33)
d. Preposition to Adverb:
It gets cooler toward evening. (34)
(Gloss: near a point in time) (35)
Saama haoto haoto zMDk baZ jaatI hO. (36)
Evening being-being (reduplication:
<copula>. (37)
while
coming)
cold
increases
e. Other idiomatic usages:
Given her interest in children, teaching seems the right job for her. (38)
(Gloss: when you consider something) (39)
baccaaoM ko p/it ]sakI idlacaspI doKto hue AQyaapna ]sako ile ]icat kama lagata hO. (40)
Children-towards her interest seen-having, teaching her-for right job seems
to be. (41)
7
3.2
Hindi-Marathi
Marathi is a closer sibling1 of Hindi than English is (both Marathi and Hindi are descendents
of Sanskrit, and they share geographical proximity). Still we find wide spread cases of
language divergence between Marathi and Hindi, even at fairly low levels of processing, viz.,
morphology and postposition and case marking. We consider aakhyaatas, which roughly
correspond to verb morphology paradigms in Marathi [Damle, 1970]. Table 1 enumerates (i)
the various senses of prathama taakhyaata (the first morphology paradigm for verb) and (ii)
the corresponding sentences in three languages. Our goal is to study if mere replacement of
suffixes can produce correct verb forms in the translated text. The Marathi suffix tao (to: 3rd
person, singular number, present tense marker) is retained as the pivot, and it is examined if
the corresponding suffixes from Hindi and English are preserved in the translations.
ूथम ता यात (prathama taakhyaata)
Sense
Sentences in Marathi, Hindi and English
1. vat-maana (present tense)
tao jaatao. vah jaata hO. He goes.
2. isqarsa%ya (universal truth)
pRqvaI saUya-aBaaovatI ifrto. pRqvaI saUya- ko caarao Aaor GaumatI hO.
The earth
revolves round the sun.
3. eoithaisak sa%ya (Historical kRYNa Aja-unaasa saaMgatao. kRYNa Aja-una sao khto hO. Krushna says to Arjuna.
truth)
4. AvatrNa (Quotation)
Damalao ma*Natat, damalao khto hO,
Damle says,...
maI tulaa maaOja daKvatao. maOM tumho majaodar caIja idKata hUM. I will show you
5. ]_oSa (objective)
something interesting.
6.sainnaiht
Past)
BaUt (Immediate kQaI Aalaasa? ha yaotao [tkaca.kba Aae? basa AaBaI Aayaa.
you come? Just now (I came).
When did
7. ina:saMSaya BaivaYya (certainty Aata tao maar Katao Kasa! Aba vah maar Kaegaa hI! He is in for a beating.
in future)
8. AaSvaasana (assurance)
maI tumhalaa ]Va BaoTtao. maOM Aap sao kla imalata hUM.
I will see you
tomorrow.
9.
rIitvat-maana
present)
(habitual tao raoja AByaasa krtao. vah raoja pZta hOM. He studies regularly.
Table 1: Marathi first paradigm for verb morphology with corresponding
sentences in Marathi, Hindi and English (in that order)
It is observed that
a. For senses 1-4, straightforward suffix replacement produces the correct translation.
b. For sense 5 and 8, suffix replacement works for Marathi and Hindi, but not for
English which takes on the 3rd person, singular number, future tense marking.
1
See http://www.ethnologue.com/show_family.asp?subid=629 for a “family tree” of the languages.
8
c. For sense 6, replacement works neither for Hindi nor for English, as both take on past
tense marking.
d. For sense 7, Hindi takes on the future tense marking, while English has an idiomatic
construction.
These examples for English-Hindi and Marathi-Hindi-English translations show that purely
mechanical approaches do not suffice for machine translation2, even at low levels of
processing. Many of the problems can be traced to incorrect choice of the senses of input and
output words. This motivates our discussion of the next part of the paper.
4
Word Sense Disambiguation (WSD)
Word Sense Disambiguation (WSD) [Yarowski, 1992] is defined as the task of finding the
correct sense of the word in a context. The task needs large amounts of word and world
knowledge. Let us consider the word स ब ध (sambandha: rough meaning: relation) in the
following Hindi sentence and their senses3:
ऋ वेद क एक ऋचा म दःयु के वशेषण से उनक संःकृ ित एवं वे दक समाज के साथ उनके स ब ध पर पूण
ूकाश पड़ता हे । उ ह अबतु, मृीवाच, अौ , प ण, अय
आ द कहा गया हे ।
The senses of the word, picked up from the Hindi wordnet (described in the next section) are:
1.
संबंध, स ब ध, मतलब, नाता, ता लुक़, वाःता, रँता - कसी ूकार का लगाव या संपक:"इस काम से राम का कोई
संबंध नह ं है "
2.
संबंध कारक, ष ी, संबंध, स ब ध, स ब ध कारक - याकरण म वह कारक जससे एक श द का दसरे
श द के साथ
ू
संबंध सूिचत होता है :"संबंध कारक क
3.
वभ
का, के , क , रा, रे र आ द ह जैसे यह कस क पुःतक है ?"
लगाव , संबंध, स ब ध, संसग - दो वःतुओं म कसी ूकार का लगाव या संपक बतलाने वाला त व:"साथ रहते-रहते
तो जानवर से भी लगाव हो जाता है "
4.
5.
6.
संबंध, स ब ध, रँता - ववाह अथवा उसका िन य:"मंगला के िलए बलासपुर म संबंध प का हो गया है "
संबंध, स ब ध - एक साथ बँधने, जुड़ने या िमलने क
बया:"ूेम-भाव से आपसी संबध
ं म ूगाढ़ता आती है "
नाता, रँता, संबंध, स ब ध - मनुंय का वह पारःप रक संबंध जो एक ह कुल म ज म लेने अथवा ववाह आ द करने
से होता है :"मधु रमा से आपका
या नाता है ?"
Here sense 1 is the most appropriate one, though 5 and 6 are quite close.
The problem of sense disambiguation can now be formulated as the task of picking up the
correct sense from a repository of sense enumerations. Our approach to this problem is to
make use of the Hindi Wordnet4 [Chakrabarti et. al., 2002] being developed in line with the
English wordnet [Fellbaum, 1998]. The basics of wordnet are described next.
4.1
The wordnet Principle
Wordnet is an electronic lexical reference system in which each word meaning is represented
as a set of word-forms known as synonym sets or synsets. Synsets are created for content
words, i.e., Nouns, Verbs, Adjectives and Adverbs. Table 2- called the Lexical Matrix- is an
abstract representation of the organization of lexical information in the wordnet. Word forms
are imagined to be listed as headings of columns and word meanings as headings of rows.
Rows express synonymy, while columns express polysemy.
To take an example, the synset {कलम, पेन, क़लम, लेखनी} ({kalama, pena, lekhanii}) gives
the meaning उपकरण जसक सहायता से कागज़ आ द पर िलखते ह (instrument of writing).
2
A more formal treatment of English Hindi Language Divergence can be found in [Dave et. al., 2004].
3
English translations are avoided to retain clarity. This does not hamper understanding the main issues.
4
http://www.cfilt.iitb.ac.in/wordnet/webhwn/
9
कलम (kalama) belongs to a synset whose members form a row in the lexical matrix, and the
row number gives a unique id to the synset.
Word Meanings
Word-Forms
F2
F3 …..
M1
E1,1 E1,2
M2
E2,2
M3
F1
Fn
E3,3
……
…..
Mm
Em,n
Table 2: Lexical Matrix: the principle of the wordnet
कलम (kalama) has another meaning- पेड़ क वह टहनी जो दसर
जगह बैठाने या दसरे
ू
ू पेड़ म पैबंद लगाने
के िलए काट जाए (a cutting for planting or grafting)- which comes in the column headed by this
word.
4.1.1
Semantic Relations in Hindi Wordnet
The design of the Hindi Wordnet design is inspired by the famous English Wordnet
[Fellbaum, 1998]. The basic semantic relations are:
Relation
Meaning
Hypernymy/Hyponymy
Is-A (Kind-Of)
Entailment/Troponymy
Manner-Of (for verbs)
Meronymy/Holonymy
Has-A (Part-Whole)
Table 3: Illustrating the nature of the relations in Wordnet
A part of the Hindi wordnet is shown in figure 4. Referring to the figure, the synset {घर, गृह}
({ghara, griha} meaning: house) has the hypernymy relation to {आवास, िनवास} ({aavaasa,
nivaasa} meaning: place of residence). Its meronymy relation (Has-A) links to {आँगन}
({aangana} meaning: courtyard) {बरामदा} ({baraamadaa} meaning: veranda) and {अ ययन
क } ({adhyana kaksha} meaning: the study) and the hyponymy relation links to {बाड़ }
({baadii} meaning: small house), {सराय} ({saraaya} meaning: motel) and {झोपड़ } ({jhopdii}
meaning: hutment).
The Hindi wordnet currently contains approximately 30,000 unique words and 13,000
synsets.
4.1.2
The Disambiguation Approach
The basic idea behind the WSD approach in illustrated in figure 5 (motivated by
[Ramakrishnan et. al., 2004; lin, 1998; lesk, 1986]). The context of the word to be
disambiguated is collected from a window around it. In the present case the window is the
10
a. sentence in which the word occurs,
b. the previous sentence, and
c. the following sentence
This context provides what we call the context bag. The wordnet is mined with a view to find
the semantic associations of the given word. A set of words is collected by traversing the
wordnet graph. This set is called the semantic bag. For every sense of the word, the overlap of
the semantic bag with the context bag is found out. The sense with the maximum overlap
emerges as the winner sense. This algorithm is further detailed in figure 6.
Figure 4: A small part of the Hindi Wordnet
4.1.3
Evaluation
We use the Hindi corpora from the Central Institute of Indian Languages (CIIL), Mysore5 as
the test bed for sense disambiguation. Currently only nouns are considered. The documents
are chosen from many domains and the accuracy results are presented in figure 7 [Sinha et.
al., 2004]. As can be seen from this graphics, the performance of the system varies from about
40% (children stories) to about 70% (agriculture). The reason for the low performance on
children stories is that the text for this domain tends to use short and easy words of daily
usage which are, however, highly polysemous.
5
www.ciil.org
11
Hindi Text
Document
Hindi Wordnet
Semantic Bag
Context Bag
Sense
with
maximum overlap
Figure 5: The basic idea of the WSD approach
1. For a polysemous word w needing diambiguation, a set of context
2. words in its surrounding window is collected. Let this collection be C, the
context bag. The window is the current sentence and the preceding and the
following sentences.
3. For each sense s of w, do the following
Let B be the bag of words obtained from the
i.
Synonyms in the synsets
ii.
Glosses of the synsets
iii.
Example Sentences of the synsets
iv.
Hypernyms (recursively upto the roots)
v.
Glosses of Hypernyms
vi.
Example Sentences of Hypernyms
vii.
Hyponyms
viii.
Glosses of Hypernyms (recursively upto the
leaves)
ix.
Example Sentences of Hyponyms
x.
Meronyms (recursively upto the beginner synset)
xi.
Glosses of Meronyms
xii.
Example sentences of meronyms
4. Mesure the overlap between C and B using intersection similarity
5. Output that sense as the winner sense which has the maximum overlap
simialrity value
Figure 6: WSD Algorithm
12
Accuracy
Scienc e
History
Domain
Children Literature
Mass Media
Short S tory
Sociology
Science and Sociology
Agriculture
0
20
40
60
80
Percentage of Accuracy
Figure 7: Histogram showing the WSD accuracy across domains for Hindi Words
5 Conclusions and Future Work
We discussed in this paper (a) Types of MT systems, (b) Language Divergence Problems and
(c) Word Sense Disambiguation using the lexical resource of wordnet. (b) motivates (c), and
to the best of our knowledge, ours is the first attempt to automate WSD for an Indian
language. We have not fully exploited morphology which we believe will provide additional
knowledge base for correct sense determination.
6 References
•
•
•
•
•
•
•
•
•
Dave S., Parikh J. and Bhattacharyya P.: 2002, Interlingua Based English Hindi
Machine Translation and Language Divergence, Journal of Machine Translation
(JMT), Volume 17.
Brown, R.D.: 1996, Example Based Machine Translation in the pangloss system,
Proc. 16th Int’l Conf on Computational Linguistics, Copenhagen, Denmark.
Chakrabarti D., Narayan D., Pandey P. and Bhattacharyya P.: 2002, An Experience in
Building the Indo-WordNet-A WordNet for Hindi, Proc. Global Wordnet Conference
(GWC2002), Mysore, India.
Damle M.K.: 1970 edition, Shastriya Marathi Vyakarana, K.S. Arjunvadkar (ed.),
Deshmukh and Co. (pub.), Pune.
Dorr, Bonnie J.: 1993, Machine Translation: a view from the lexicon, The MIT Press.
Fellbaum, C.(ed.): 1998, WordNet: An electronic Lexical database, MIT Press.
Hutchins, J: 2000, Machine Translation: General Overview, chapter 27 in The
Oxford Handbook of Computational Linguistics, Mitkov, R. (ed.), Oxford University
Press, Oxford.
Hutchins, W.J. and Somers H.L.: 1992, An Introduction to Machine Translation,
Academic Press.
Kay, Martin and Roscheisen, Martin: 1993, Text Translation Alignment,
Computational Linguistics, Vol 19, Issue 1.
13
•
•
•
•
•
•
•
•
•
•
Lesk, M.E.: 1986, Automatic Sense Disambiguation Using Machine Readable
Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone, Proc. of SIGDOC
Conference, Toronto, Ontario.
Lin, D.: 1998, An Information-Theoretic Definition of Similarity, Proc of Int’l Conf
on Machine Learning (ICML), Madison, Wisconsin.
McCallum, A., Freigtag, D. and Pereira, F.: 2000, Maximum entropy Markov models
for information extraction and segmentation, Proc. of Int’l Conf on Machine
Learning (ICML), Stanford, CA.
Ramakrishnan G., Prithviraj B. and Bhattacharyya P.: 2004, A Gloss Centered
Algorithm for Word Sense Disambiguation, Proc of the ACL SENSEVAL
Conference, Barcelona, Spain.
Sinha M., Kumar M., Pande P., Kashyap L. and Bhattacharyya P.: 2004, Hindi Word
Sense Disambiguation, Proc of Int’l Symposium on Machine Translation, Natural
Language Processing and Translation Support Systems, Delhi, India.
Somers, H: 2000, Machine Translation: Latest Developments, chapter 28 in The
Oxford Handbook of Computational Linguistics, Mitkov, R. (ed.), Oxford University
Press, Oxford.
Sumita, E. and Lida, H.: 1991, Experiments and Prospects of Example Based
Machine Translation, Proc. of the Conf of the Association of Computational
Linguistics (ACL91), Berkeley, CA.
Vauquois, Bernard: 1975, La Traduction Automatique à Grenoble, Dunod, Paris.
Way, A. and Gough, N.: 2003, wEBMT: developing and validating an example-based
machine translation system using the world wide web, Computational Linguistics,
Vol. 29, Issue 3.
Yarowsky, D.: 1992, Word Sense Disambiguation using statistical model of Roget’s
categories trained on large corpora, Proc of 14th Int’l Conf on Computational
Linguistics (COLING-92), Nantes, France.
14