Recent Advances in Computer Engineering, Communications and Information Technology Machine Translation-Indian Regional Languages NAKUL SHARMA Information Technology University of Pune (Affiliated) Fl-02B, Radiant Hillview Housing Society, Opp. H. P. Petrol Station, Kondhwa, Pune-48 INDIA [email protected] Abstract: - Natural Language Processing is an emerging field of Machine Learning. NLP systems deal with making use of machines to translate text or speech. MT system can be classified according to approaches being followed for translation. In this paper, existing MT systems according to the regional languages of India are being analyzed. Key-Words: - Machine Translation (MT), Natural Language Processing (NLP), Indo-Aryan Languages, Dravidian Languages. 1 Introduction MT is the branch of NLP. It strives to convert natural languages (such as Hindi, English etc.) to another natural language by making use of machines. Training of MT systems can be done on multilingual languages. Interlingua Developing a Universal Natural Language Example Corpus and previous translations Transfer Translation rules Knowledge Uses Artificial Intelligence Table 1 gives the list of major MT types and the techniques on which they are based upon. Fig. 1. Working of MT systems 2 Problem Formulation The Fig 1 shows how general processing of MT systems takes place. The source language is fed into the MT system and the target language is generated by the system. The MT system varies from text-to-text or text-to-speech. The text-totext systems convert source text into target text. The text-to-speech systems convert source text into speech form of target language. The reverse conversion of speech-to-text is also possible depending upon the various factors. TABLE I. With respect to Machine Translation systems in India, there is a lot of work currently being undertaken. This work is a endeavor to address following research questions:Question-1: What are the various spoken and written languages in India? Question-2: What are regions in which the regional languages of India are used? MACHINE TRANSLATION SYSTEMS [1] Type of MT System Based On Direct Dictionary lookup Statistical Corpus and statistical models ISBN: 978-960-474-361-2 407 Recent Advances in Computer Engineering, Communications and Information Technology 3 Problem Solution Urdu Jammu and Kashmir, Uttar Pradesh, Delhi Yes Yes Dogri Himachal Pradesh, Jammu and Kashmir. No Yes -Both written and spoken Haryanvi Haryana No Yes The languages such as Hindi, Punjabi, and Marathi are spoken as well as written. The languages such as Dogri are only spoken but not written. India hosts many regional languages. Based upon their historical significance, they are spoken and/or written in many scripts. Some of the scripts in which the languages are written are Devnagri, Gurmukhi etc. Rajasthani Rajasthan No Yes Bihari Bihar No Yes Rajasthani Rajasthan No Yes Bihari Bihar No Yes 3.1 Indian Regional Languages The natural languages can be categorized as:-Written -Spoken Table III gives major languages of central India along with the official languages of states. TABLE III. Fig. 2. Division of Regional Languages of India Language Official Language of State Written Spoken Hindi Madhya Pradesh, Jharkhand, Chattisgarh, Delhi, Uttranchal, Uttar Pradesh Yes Yes Table II gives the major languages of North India along with the official languages of states. TABLE II. LANGUAGES OF CENTRAL INDIA LANGUAGES OF NORTH INDIA Language Official Language of State Written Spoken Hindi Uttar Pradesh, Uttaranchal, Bihar, Rajasthan, Haryana, Delhi Yes Yes ISBN: 978-960-474-361-2 Table IV gives the major languages of southern states along with their official languages. TABLE IV. 408 LANGUAGES OF SOUTH INDIA Language Official Language of State Written Spoken Malayalam Kerela Yes Yes Tamil Tamil Nadu, Yes Yes Recent Advances in Computer Engineering, Communications and Information Technology Adaman and Nicobar Islands, Puducherry Assamese Assam, Nagaland, Arunachal Pradesh Yes Yes Telugu Andhra Pradesh Yes Yes Mizo Mizorum Yes Yes Yes Yes Karnataka Yes Yes Borak(Kokbor ak) Tripura Kannada Tulu - Yes Yes Hindi and English Arunanchal Pradesh Yes Yes Oriya Odisa Yes Yes *-These languages are part of the Dravidian group of languages. They are spoken and written in south Indian states. I. A. Web-Based Hindi to Punjabi MT system Table V gives the major languages of western and south western states along with their official languages. TABLE V. This system makes use of Direct Machine Translation technique. It can convert web pages, web documents from Hindi to Punjabi language [2]. Punjabi university, Patiala, has developed a web based system available at LANGUAGES OF WESTERN AND SOUTH WESTERN STATES Language Official Languages of States/Union Territories Writte n Gujarati Gujarat Yes Yes Marathi Maharashtra Yes Yes Yes Yes Portuguese Daman and Diu, Goa On-line Machine Translation Tools Spoken http://h2p.learningpunjabi.org B. Bing Translator A service offered by Microsoft, bing can translate languages and also provide various ways of viewing the translated content [2]. This tool can be accessed at: http://bing.com/translator C. Babylon Translation by Babylon is a free online version of translation Babylon software [7]. This tool can be accessed online at the at: Table 6 gives the major languages eastern states along with their official languages. http://translation.babylon.com/ TABLE VI. LANGUAGES OF EASTERN STATES Language Official Language of States Bengali West Bengal, Tripura ISBN: 978-960-474-361-2 D. PROMT Translation This online tool undertakes translation by giving text to be translated to Google, Bing, Bayblon translation systems[8]. This tool can be accessed online at: Written Spoken Yes Yes http://imtranslator.net/translator.asp 409 Recent Advances in Computer Engineering, Communications and Information Technology languages include Hindi, Marathi, Urdu, Tamil and Oriya. E. Google Translator It is a service offered by Google Inc. Google Translator provides a side by side view while translating content. Google also provide the feature of translating the web links into English [2]. This tool can be accessed online at:- F. UNL Based Encovertor-decovertor This technique is based on Universal Natural Language (UNL). A encovertor converts English sentences to Punjabi sentences. A decovertor converts Punjabi sentences back to English language [3]. http://translate.google.com II. Off-line Translation Tools A. Systran G. Anusaaraka This system was developed by a company of the same name. The system offers translation on 35 languages. It provides technology for Yahoo! Babel Fish and was used by Google Inc. till 2007[6]. This is an expert based Machine Translation system. This system deals with creating a joint architectural model for doing Machine Translation from English to various regional Languages [9]. B. METAL H. Sampark METAL is a MT system developed at University of Texas. Using the concept of controlled language, this system achieves high quality translations. METAL is now known as LANT-MARK and its marketing is taken over by a Belgian company [6]. Sampark make use of transfer-based approach for undertaking translation. This system consists of source analysis, transfer, and target generation as the main processes [10]. C. English to Bangla Phrase-Based Machine Translation The system uses as base the log-linear translation models for undertaking translation. The source language is English and target language is Bengali [10]. D. Anglabharati This tool makes use of rule-based technique for completing the translation. It makes use of context free grammar for generating a pseudotarget applicable to a group of Indian Languages [11]. E. Anuvadaksh The tool is being developed by CDAC-Pune. The system aims to translate English text into regional languages of India. The target ISBN: 978-960-474-361-2 410 Recent Advances in Computer Engineering, Communications and Information Technology 4. English Language Machine Translation Systems Table shows some of the machine translation system in which the source or target language is English. Name of System Source/Target Language MT System Developed By (Organization/authors) Portage [10] English Phrase-Based National Research Council, Canada PANGLOSS [11] English-Spanish,English-korean Knowledge-Based Kevin Knight, Steve K. Luk Pan-Lite [12] English Example-Based, TransferBased, Knowledge-Based Centre of Machine Translation, Carnegie Mellon University ALT-J/E [13] English-Japanese Experimental-Based NTT Communications Science Labouratory Pan EBMT [14] Spanish-English Example-Based Machine Translation Centre of Machine Translation, Carnegie Mellon University English to American Sign Language [15] English-American Sign Language Visual & spatial University of Pennsylvania Candide [16] French-English Statistical (Probability based) IBM Thomas J Watson Centre ManTra [17] English-Hindi Automatic, Parsing Anathankrishnan, C-DAC Mumbai Shiraz [18] Persian-English Ontology Based, Direct MT Computing Research Labouratory, New Mexico State University ETAP [19] Russian-English Direct MT Computational Linguistics Labouratory, Russian Academy of Sciences English to Japanese MT system [20] English-Japanese Example-Based ATR Interpreting Telephony Research Laboratories, Kyoto, Japan MaTrEx [21] English-Italian Example Based, Statistical Based Dublin City University, Dublin Transonics [22] English-Persian Speech to speech translator University of South California, HRL Laboratories, LLC, CA LOGON [23] English-Norwegian Semantic Based, Transfer Based Various universities in Norway ISBN: 978-960-474-361-2 411 Recent Advances in Computer Engineering, Communications and Information Technology [7] “Free Online Translation”, [Online] http://translation.babylon.com accessed on 19 March 2013. [8] “Online Translation Tools”, [Online] http://imtranslator.net/translator.asp accessed on 19 March 2013. [9] Sriram Choudhury, Ankitha Rao, Dipti M Sharma, “Anusaaraka: An Expert System Based Machine Translation System”, IEEE [10] Gary Anthes, “Automated Translation of Indian Languages” communications of the ACM, Technology News, January 2010, Vol 53. No. 1. 5. Acknowledgement I would like to thank my M.E. thesis guide, Dr. Parteek Bhatia (CSED, Thapar University) for being a constant source of inspiration. This work is also an inspiration of my mother and father who have helped in all efforts. It is difficult to pen-down the efforts they all had undertaken. [11] Sadat, F., Johnson, H., Agbago, A., Foster, G., Kuhn, R., Martin, J. and Tikuisis, A. “Portage: A Phrase-based Machine Translation System “ In Proc. Association of Computing Linguistics, Workshop on Buiding and Using Parallel Texts: Data-Driven Machine Translation,Ann Arbor, Michigan, USA. June 29-30 2005. pp. 133-136. NRC 48525. 6. Conclusion Machine Translation has gone a long way from is objective for translating text and speeches. But much work needs to be done to be done in respect with having a satisfactory output in terms of quality and the flexibility of the software. There has been a spur in Machine Translation professionals as well as organization focused on Machine Translation. Indian government is also making constant efforts to bridge the gap of language barrier. III. [12] Ralf D. Brown: Example-Based Machine Translation in the Pangloss System. COLING 1996: PP. 169-174. [13] R E Frederking, R D Brown, “THE PANGLOSS-LITE MACHINE TRANSLATION SYSTEM”, MT Horizons, Proceedings of the Second Conference of the Association for Machine Translation in the Americas (AMTA-96) REFERENCES [14] Ralf D Brown, “Example-based machine translation in the pangloss system”, Proceedings of the 16th conference on Computational linguisticsVolume 1, Association for Computational Linguistics, PP-169-174. [1] Nakul Sharma, Parteek Bhatia, “English to Hindi Statistical Machine Translation System”, Master of Engineering Thesis submitted to Thapar University, July 2011 accessed at http://dspace.thapar.edu:8080/dspace/handle/10 266/1449. [2] Nakul Sharma, Parteek Bhatia, “Statistical Machine Translation for Indian Languages”, IEEE’s International Conference in Computer Engineering and Technology (ICCET-2010), ISBN: 978-81-920748-1-8. [3] Parteek Kumar, R.K. Sharma, “UNL Based Machine Translation System for Punjabi Language”, Phd thesis submitted to Thapar University, Feb 2012 accessible at http://dspace.thapar.edu:8080/dspace/handle/10 266/1729 [4] “Natural Language Processing activities at CDAC Kolkata”, Annual Report-2011, CDAC Kolkata. [5] Sitendar, Seema Bawa, “Survey of Indian Machine Translation Systems”, In. Proc. International Journal of Computer Science and Technology (IJCST), Vol-3 Issue-1, Jan-March 2012. [6] “Machine Translation System in Indian Perspectives”, Sanjay Kumar Dwivedi and Pramod Premdas Sukhadeve, In Proc. Of Journal of Computer Science. Vol 6 Issue 10, ISSN-1549-3636, 2010. ISBN: 978-960-474-361-2 [15] Liwei Zhao, Karin Kipper, William Schuler, Christian Vogler, Martha Palmer, and Norman I. Badler , “A Machine Translation System from English to American Sign Language ”, Leture Notes in Computer Science, Volume 1934, Envisioning Machine Translation in the Information Future 4th Conference of the Association for Machine Translation in the Americas, 2000, pages 54-67. [16] Adam L. Berger, Peter F. Brown,* Stephen A. Della Pietra, Vincent J. Della Pietra, John R. GiUett, John D. Lafferty, Robert L. Mercer,* Harry Printz, Luboi Urei, “The Candide System for Machine Translation”, IBM Thomas J. Watson Research Center , P.O. Box 704 Yorktown Heights, NY 10598. 412 Recent Advances in Computer Engineering, Communications and Information Technology [17] Ananthakrishnan R, Kavitha M, Jayprasad J Hegde, Chandra Shekhar, Ritesh Shah, Sawani Bade, Sasikumar M , “MaTra: A Practical Approach to Fully-Automatic Indicative English- Hindi Machine Translation”. [18] Jan W. Amtrup, Hamid Mansouri Rad, Karine Megerdoomian and Rémi Zajac, “Persian-English Machine Translation: An overview of Shiraz Project. NMSU, CRL, Memoranda in Computer and Cognitive Science (MCCS-00-319). [19] Makoto Nagao, “A FRAMEWORK OF A MECHANICAL TRANSLATION BETWEEN JAPANESE AND ENGLISH BY ANALOGY PRINCIPLE ”, ARTIFICIAL AND HUMAN INTELLIGENCE , Elsevier Science Publishers. B.V. NATO, 1984. [20] Nicolas Stroppa, Andy Way, “MaTerX: DCU Machine Translation System for IWSLT 2006”, [21] Emil Ettelaie, Sudeep Gandhe, Panayiotis Georgiou, Kevin Knight, Daniel Marcu, Shrikanth Narayanan, David Traum , Robert Belvin , “Transonics: A Practical Speech-to-Speech Translator for English-Farsi Medical Dialogues ”, Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 89–92, Ann Arbor, June 2005. ISBN: 978-960-474-361-2 413
© Copyright 2026 Paperzz