Machine Translation-Indian Regional Languages

Recent Advances in Computer Engineering, Communications and Information Technology
Machine Translation-Indian Regional Languages
NAKUL SHARMA
Information Technology
University of Pune (Affiliated)
Fl-02B, Radiant Hillview Housing Society, Opp. H. P. Petrol Station, Kondhwa, Pune-48
INDIA
[email protected]
Abstract: - Natural Language Processing is an emerging field of Machine Learning. NLP systems deal with
making use of machines to translate text or speech. MT system can be classified according to approaches being
followed for translation. In this paper, existing MT systems according to the regional languages of India are
being analyzed.
Key-Words: - Machine Translation (MT), Natural Language Processing (NLP), Indo-Aryan Languages,
Dravidian Languages.
1 Introduction
MT is the branch of NLP. It strives to convert
natural languages (such as Hindi, English etc.) to
another natural language by making use of
machines. Training of MT systems can be done on
multilingual languages.
Interlingua
Developing a Universal Natural
Language
Example
Corpus and previous translations
Transfer
Translation rules
Knowledge
Uses Artificial Intelligence
Table 1 gives the list of major MT types and the
techniques on which they are based upon.
Fig. 1. Working of MT systems
2 Problem Formulation
The Fig 1 shows how general processing of MT
systems takes place. The source language is fed
into the MT system and the target language is
generated by the system. The MT system varies
from text-to-text or text-to-speech. The text-totext systems convert source text into target text.
The text-to-speech systems convert source text
into speech form of target language. The
reverse conversion of speech-to-text is also
possible depending upon the various factors.
TABLE I.
With respect to Machine Translation systems
in India, there is a lot of work currently being
undertaken. This work is a endeavor to
address following research questions:Question-1: What are the various spoken and
written languages in India?
Question-2: What are regions in which the
regional languages of India are used?
MACHINE TRANSLATION SYSTEMS [1]
Type of MT System
Based On
Direct
Dictionary lookup
Statistical
Corpus and statistical models
ISBN: 978-960-474-361-2
407
Recent Advances in Computer Engineering, Communications and Information Technology
3 Problem Solution
Urdu
Jammu and
Kashmir,
Uttar
Pradesh,
Delhi
Yes
Yes
Dogri
Himachal
Pradesh,
Jammu and
Kashmir.
No
Yes
-Both written and spoken
Haryanvi
Haryana
No
Yes
The languages such as Hindi, Punjabi, and
Marathi are spoken as well as written. The
languages such as Dogri are only spoken but
not written. India hosts many regional
languages. Based upon their historical
significance, they are spoken and/or written in
many scripts. Some of the scripts in which the
languages are written are Devnagri, Gurmukhi
etc.
Rajasthani
Rajasthan
No
Yes
Bihari
Bihar
No
Yes
Rajasthani
Rajasthan
No
Yes
Bihari
Bihar
No
Yes
3.1 Indian Regional Languages
The natural languages can be categorized as:-Written
-Spoken
Table III gives major languages of central India
along with the official languages of states.
TABLE III.
Fig. 2. Division of Regional Languages of India
Language
Official
Language of
State
Written
Spoken
Hindi
Madhya
Pradesh,
Jharkhand,
Chattisgarh,
Delhi,
Uttranchal,
Uttar Pradesh
Yes
Yes
Table II gives the major languages of North
India along with the official languages of
states.
TABLE II.
LANGUAGES OF CENTRAL INDIA
LANGUAGES OF NORTH INDIA
Language
Official
Language of
State
Written
Spoken
Hindi
Uttar
Pradesh,
Uttaranchal,
Bihar,
Rajasthan,
Haryana,
Delhi
Yes
Yes
ISBN: 978-960-474-361-2
Table IV gives the major languages of southern
states along with their official languages.
TABLE IV.
408
LANGUAGES OF SOUTH INDIA
Language
Official
Language
of State
Written
Spoken
Malayalam
Kerela
Yes
Yes
Tamil
Tamil Nadu,
Yes
Yes
Recent Advances in Computer Engineering, Communications and Information Technology
Adaman and
Nicobar
Islands,
Puducherry
Assamese
Assam,
Nagaland,
Arunachal
Pradesh
Yes
Yes
Telugu
Andhra
Pradesh
Yes
Yes
Mizo
Mizorum
Yes
Yes
Yes
Yes
Karnataka
Yes
Yes
Borak(Kokbor
ak)
Tripura
Kannada
Tulu
-
Yes
Yes
Hindi and
English
Arunanchal
Pradesh
Yes
Yes
Oriya
Odisa
Yes
Yes
*-These languages are part of the Dravidian group
of languages. They are spoken and written in south
Indian states.
I.
A. Web-Based Hindi to Punjabi MT system
Table V gives the major languages of western and
south western states along with their official
languages.
TABLE V.
This system makes use of Direct Machine
Translation technique. It can convert web
pages, web documents from Hindi to Punjabi
language [2]. Punjabi university, Patiala, has
developed a web based system available at
LANGUAGES OF WESTERN AND SOUTH
WESTERN STATES
Language
Official
Languages of
States/Union
Territories
Writte
n
Gujarati
Gujarat
Yes
Yes
Marathi
Maharashtra
Yes
Yes
Yes
Yes
Portuguese Daman and Diu,
Goa
On-line Machine Translation Tools
Spoken
http://h2p.learningpunjabi.org
B. Bing Translator
A service offered by Microsoft, bing can
translate languages and also provide various
ways of viewing the translated content [2]. This
tool can be accessed at:
http://bing.com/translator
C. Babylon
Translation by Babylon is a free online version
of translation Babylon software [7]. This tool
can be accessed online at the at:
Table 6 gives the major languages eastern states
along with their official languages.
http://translation.babylon.com/
TABLE VI.
LANGUAGES OF EASTERN STATES
Language
Official
Language of
States
Bengali
West Bengal,
Tripura
ISBN: 978-960-474-361-2
D. PROMT Translation
This online tool undertakes translation by
giving text to be translated to Google, Bing,
Bayblon translation systems[8]. This tool can be
accessed online at:
Written Spoken
Yes
Yes
http://imtranslator.net/translator.asp
409
Recent Advances in Computer Engineering, Communications and Information Technology
languages include Hindi, Marathi, Urdu, Tamil
and Oriya.
E. Google Translator
It is a service offered by Google Inc. Google
Translator provides a side by side view while
translating content. Google also provide the
feature of translating the web links into English
[2]. This tool can be accessed online at:-
F. UNL Based Encovertor-decovertor
This technique is based on Universal Natural
Language (UNL). A encovertor converts
English sentences to Punjabi sentences. A
decovertor converts Punjabi sentences back to
English language [3].
http://translate.google.com
II. Off-line Translation Tools
A. Systran
G. Anusaaraka
This system was developed by a company of
the same name. The system offers translation on
35 languages. It provides technology for
Yahoo! Babel Fish and was used by Google Inc.
till 2007[6].
This is an expert based Machine Translation
system. This system deals with creating a joint
architectural model for doing Machine
Translation from English to various regional
Languages [9].
B. METAL
H. Sampark
METAL is a MT system developed at
University of Texas. Using the concept of
controlled language, this system achieves high
quality translations. METAL is now known as
LANT-MARK and its marketing is taken over
by a Belgian company [6].
Sampark make use of transfer-based approach
for undertaking translation. This system
consists of source analysis, transfer, and target
generation as the main processes [10].
C. English to Bangla Phrase-Based Machine
Translation
The system uses as base the log-linear
translation models for undertaking translation.
The source language is English and target
language is Bengali [10].
D. Anglabharati
This tool makes use of rule-based technique
for completing the translation. It makes use of
context free grammar for generating a pseudotarget applicable to a group of Indian
Languages [11].
E. Anuvadaksh
The tool is being developed by CDAC-Pune.
The system aims to translate English text into
regional languages of India. The target
ISBN: 978-960-474-361-2
410
Recent Advances in Computer Engineering, Communications and Information Technology
4. English Language Machine Translation Systems
Table shows some of the machine translation system in which the source or target language is English.
Name of System
Source/Target Language
MT System
Developed By
(Organization/authors)
Portage [10]
English
Phrase-Based
National Research Council,
Canada
PANGLOSS [11]
English-Spanish,English-korean
Knowledge-Based
Kevin Knight, Steve K. Luk
Pan-Lite [12]
English
Example-Based, TransferBased, Knowledge-Based
Centre of Machine Translation,
Carnegie Mellon University
ALT-J/E [13]
English-Japanese
Experimental-Based
NTT Communications Science
Labouratory
Pan EBMT [14]
Spanish-English
Example-Based Machine
Translation
Centre of Machine Translation,
Carnegie Mellon University
English to American
Sign Language [15]
English-American Sign
Language
Visual & spatial
University of Pennsylvania
Candide [16]
French-English
Statistical (Probability based)
IBM Thomas J Watson Centre
ManTra [17]
English-Hindi
Automatic, Parsing
Anathankrishnan, C-DAC
Mumbai
Shiraz [18]
Persian-English
Ontology Based, Direct MT
Computing Research
Labouratory, New Mexico State
University
ETAP [19]
Russian-English
Direct MT
Computational Linguistics
Labouratory, Russian Academy
of Sciences
English to Japanese MT
system [20]
English-Japanese
Example-Based
ATR Interpreting Telephony
Research Laboratories, Kyoto,
Japan
MaTrEx [21]
English-Italian
Example Based, Statistical
Based
Dublin City University, Dublin
Transonics [22]
English-Persian
Speech to speech translator
University of South California,
HRL Laboratories, LLC, CA
LOGON [23]
English-Norwegian
Semantic Based, Transfer
Based
Various universities in Norway
ISBN: 978-960-474-361-2
411
Recent Advances in Computer Engineering, Communications and Information Technology
[7] “Free
Online
Translation”,
[Online]
http://translation.babylon.com accessed on 19
March 2013.
[8] “Online
Translation
Tools”,
[Online]
http://imtranslator.net/translator.asp accessed on
19 March 2013.
[9]
Sriram Choudhury, Ankitha Rao, Dipti M
Sharma, “Anusaaraka: An Expert System Based
Machine Translation System”, IEEE
[10]
Gary Anthes, “Automated Translation of
Indian Languages” communications of the
ACM, Technology News, January 2010, Vol 53.
No. 1.
5. Acknowledgement
I would like to thank my M.E. thesis guide, Dr.
Parteek Bhatia (CSED, Thapar University) for
being a constant source of inspiration. This
work is also an inspiration of my mother and
father who have helped in all efforts. It is
difficult to pen-down the efforts they all had
undertaken.
[11] Sadat, F., Johnson, H., Agbago, A., Foster, G.,
Kuhn, R., Martin, J. and Tikuisis, A. “Portage: A
Phrase-based Machine Translation System “ In
Proc. Association of Computing Linguistics,
Workshop on Buiding and Using Parallel Texts:
Data-Driven Machine Translation,Ann Arbor,
Michigan, USA. June 29-30 2005. pp. 133-136.
NRC 48525.
6. Conclusion
Machine Translation has gone a long way from
is objective for translating text and speeches.
But much work needs to be done to be done in
respect with having a satisfactory output in
terms of quality and the flexibility of the
software. There has been a spur in Machine
Translation
professionals
as
well
as
organization focused on Machine Translation.
Indian government is also making constant
efforts to bridge the gap of language barrier.
III.
[12] Ralf D. Brown: Example-Based Machine
Translation in the Pangloss System. COLING 1996:
PP. 169-174.
[13] R E Frederking, R D Brown, “THE
PANGLOSS-LITE MACHINE TRANSLATION
SYSTEM”, MT Horizons, Proceedings of the
Second Conference of the Association for Machine
Translation in the Americas (AMTA-96)
REFERENCES
[14] Ralf D Brown, “Example-based machine
translation in the pangloss system”, Proceedings of
the 16th conference on Computational linguisticsVolume 1, Association for Computational
Linguistics, PP-169-174.
[1] Nakul Sharma, Parteek Bhatia, “English to
Hindi Statistical Machine Translation System”,
Master of Engineering Thesis submitted to
Thapar University, July 2011 accessed at
http://dspace.thapar.edu:8080/dspace/handle/10
266/1449.
[2] Nakul Sharma, Parteek Bhatia, “Statistical
Machine Translation for Indian Languages”,
IEEE’s International Conference in Computer
Engineering and Technology (ICCET-2010),
ISBN: 978-81-920748-1-8.
[3] Parteek Kumar, R.K. Sharma, “UNL Based
Machine Translation System for Punjabi
Language”, Phd thesis submitted to Thapar
University,
Feb
2012
accessible
at
http://dspace.thapar.edu:8080/dspace/handle/10
266/1729
[4] “Natural Language Processing activities at
CDAC Kolkata”, Annual Report-2011, CDAC
Kolkata.
[5] Sitendar, Seema Bawa, “Survey of Indian
Machine Translation Systems”, In. Proc.
International Journal of Computer Science and
Technology (IJCST), Vol-3 Issue-1, Jan-March
2012.
[6] “Machine Translation System in Indian
Perspectives”, Sanjay Kumar Dwivedi and
Pramod Premdas Sukhadeve, In Proc. Of
Journal of Computer Science. Vol 6 Issue 10,
ISSN-1549-3636, 2010.
ISBN: 978-960-474-361-2
[15] Liwei Zhao, Karin Kipper, William Schuler,
Christian Vogler, Martha Palmer, and Norman I.
Badler , “A Machine Translation System from
English to American Sign Language ”, Leture
Notes in Computer Science, Volume 1934,
Envisioning Machine Translation in the Information
Future 4th Conference of the Association for
Machine Translation in the Americas, 2000, pages
54-67.
[16] Adam L. Berger, Peter F. Brown,* Stephen A.
Della Pietra, Vincent J. Della Pietra, John R.
GiUett, John D. Lafferty, Robert L. Mercer,* Harry
Printz, Luboi Urei, “The Candide System for
Machine Translation”, IBM Thomas J. Watson
Research Center , P.O. Box 704 Yorktown Heights,
NY 10598.
412
Recent Advances in Computer Engineering, Communications and Information Technology
[17] Ananthakrishnan R, Kavitha M, Jayprasad J
Hegde, Chandra Shekhar, Ritesh Shah, Sawani
Bade, Sasikumar M , “MaTra: A Practical Approach
to Fully-Automatic Indicative English- Hindi
Machine Translation”.
[18] Jan W. Amtrup, Hamid Mansouri Rad, Karine
Megerdoomian and Rémi Zajac, “Persian-English
Machine Translation: An overview of Shiraz
Project. NMSU, CRL, Memoranda in Computer and
Cognitive Science (MCCS-00-319).
[19] Makoto Nagao, “A FRAMEWORK OF A
MECHANICAL TRANSLATION
BETWEEN
JAPANESE AND ENGLISH BY ANALOGY
PRINCIPLE ”, ARTIFICIAL AND HUMAN
INTELLIGENCE , Elsevier Science Publishers.
B.V. NATO, 1984.
[20] Nicolas Stroppa, Andy Way, “MaTerX: DCU
Machine Translation System for IWSLT 2006”,
[21] Emil Ettelaie, Sudeep Gandhe, Panayiotis
Georgiou, Kevin Knight, Daniel Marcu, Shrikanth
Narayanan,
David Traum , Robert Belvin ,
“Transonics:
A
Practical
Speech-to-Speech
Translator for English-Farsi Medical Dialogues ”,
Proceedings of the ACL Interactive Poster and
Demonstration Sessions, pages 89–92, Ann Arbor,
June 2005.
ISBN: 978-960-474-361-2
413