semantic based orthographic with prepositional phrase for english

CONTENTS
•
Abstract
•
Motivation
•
Literature Survey
•
Existing System
•
Proposed System Framework
•
Modules Description
•
Comparative Analysis
•
Experimental Results
•
Conclusion
•
References
1
ABSTRACT
•
Machine Translation is one of the major area under NLP. While translating English - Tamil,
preposition in English sentences should be translated into postpositions in Tamil to make
meaningful sentences.
•
This project mainly focused to eliminate the prepositional phrase attachment and orthographical
errors.
2
MOTIVATION
• Machine translation quality has improved substantially in recent years.
• Prepositions are plays sound role in meaningful translation for any languages.
• The prepositional phase errors are the major issue.
• The motivation of this project is to improve the English-Tamil translation quality.
• Use some semantic rule to correct the prepositional errors.
3
LITERATURE SURVEY
Word Alignment Problem
S.
No
1
Authour & Year
Approaches
R.Harshawardhan et.al, IJCSE, 2011 Linear Programming
7
6
2
S.Vetrivel and Diana Baby,ICN,2010 HMM-Viterbi Algorithm
Linear
Programming
5
HMM-Viterbi
4
Sentence Simplification Problem
S.
No
Authour & Year
Approaches
3
Concept Labeling
2
Idioms & Phrasal
Verbs
1
Rule Based
0
1
R.Harshawardhan et.al., IJCA, 2011
Concept Labeling
2
Thiruumeni P G et.al., IJCA,2011
Idioms and Phrasal Verbs
3
C.Poornima et.al., IJCA,2011
Rule Based
Word Alignment
Sentence
Simplification
4
Contd…
Morphological Analyzer and Generator
S.No
Authour & Year
Approaches
8
7
1
2
M.Selvam and A M. Natarajan,IJCSE,2009
V.Dhanalakshmi and S.Rajendran, IJCA,2010
Rule Based
SVM Based
3
Anand Kumar M et.al.,IJCSE,2010
Sequence
Labeling
4
Antony P.J and K P Soman,IJCSET,2012
Suffix Stripping
6
5
Rule Based
4
SVM Based
3
Sequence Labeling
2
Suffix Stripping
1
0
Morphological Analyzer and Generator
POS Tagging
6
S.No
1
2
Authour & Year
D.Chandrakanth,IJCE,2012
Selvam M et.al., IJCPL,2008
Approaches
SVM Based
Phrase Structure Tree
Bank
5
4
SVM Based
3
Phrase Structure Tree Bank
HMM Based
2
1
3
Adam R. Teichert et.al,EMNL,2010
HMM Based
0
POS Tagging
5
EXISTING SYSTEM
6
PROPOSED SYSTEM FRAMEWORK
7
POS TAGGING
8
WORD BY WORD TRANSLATION
9
WORD BY WORD TRANSLATION
10
MORPHOLOGICAL ANALYSIS
11
RULES OF PREPOSITIONAL PHRASE ATTACHMENT
Rules of the prepositional phrase “of”
1. <NN><IN><DT> or <NN><IN><NN> = Prepositional phrase is “udaiya/in”.
2.<NN><IN><JJ> = Prepositional phrase is “kkaana”
3.<RB><IN><NNP> = Prepositional phrase is “il”.
4.<VBN><IN><NN> = Prepositional phrase is “aal”.
Rules of the prepositional phrase “by”
<POSP1><IN><POSP2> = Prepositional phrase is “aal”.
Rules of the prepositional phrase “on”
<POSP1><IN><POSP2> = Prepositional phrase is “mele/il”.
Rules of the prepositional phrase “in”
<POSP1><IN><POSP2>=Prepositional phrase is “il”.
Rules of the prepositional phrase “to”
<POSP1><IN><POSP2> = Prepositional phrase is “kku”.
Rules of the prepositional phrase “from ”
<POSP1><IN><POSP2> =Prepositional phrase is “irunthu”.
12
PREPOSITIONAL PHRASE ATTACHMENT
NN
IN
NN
உடைய/இன்
A Page of the Book – புக்கினுடைய
பக்கம்
13
PREPOSITIONAL PHRASE ATTACHMENT
NN
IN
JJ
க்கான
Cotton is a crop of subtropical climate – பருத்தி
பயிராகும்
ஒரு மித வெப்ப மண்ைல காலநிடலக்கான
14
PREPOSITIONAL PHRASE ATTACHMENT
RB
IN
NNP
இல்
He lives south of London–
அெர் வதற்கு லண்ைனில் ெசிக்கிறார்
15
PREPOSITIONAL PHRASE ATTACHMENT
VBN
IN
NN
ஆல்
Most tables are made of the wood – வபரும்பாலான
மமடைகள் மரத்தால் வசய்யப்பட்ைு
16
ORTHOGRAPHICAL RULES
Rule 1:
Rule 2:
Rule 3:
17
WORDS REORDERING
He
went to
Shop
Reorder
அவன்
சென்றான்
கடைக்கு
கடைக்கு
சென்றான்
18
ENGLISH-TAMIL TRANSLATION
19
COMPARATIVE STUDY
20
EXPERIMENTAL RESULTS
Total. No.
System/
Total No. of
No. of
No. of
Translated
Correct
Correct
Words
sentences
words
Total No.
of
Metrics
Sentences
of Words
*P
*R
*F
Proposed
200
1020
970
185
940
92%
97%
94%
200
1020
970
120
610
60%
63%
61%
200
1020
970
160
820
80%
85%
82%
System
TDIL
Translate
Google
Translate
P*- Precision , R*-Recall,F*-F-Measure
120%
100%
Accuracy
80%
Propose d Syste m
60%
TDIL
Google Translate
40%
20%
0%
Pre cision
Re call
Me trics
F-Me asure
21
CONCLUSION
• There has been a significant advancement in the area of machine
translation than the existing system.
• This work is mainly focused to identify the exact meaning of the
preposition with respect to the content and place for English-Tamil
translation.
• Thus the accuracy of the proposed translation system is 92%, 97% and
94%.
22
REFERENCES
1.
R.Harshawardhan, Mridula sara Augustine and Dr.K.P.Soman(2011), “A simplified approach to word alignment
algorithm for English-Tamil translation”,IJCSE,Vol.2,No.1 Pages:94-100.
2. S.Vetrivel and Diana Baby (2010), “English to Tamil statistical machine translation and alignment using HMM”,
Proceedings of the 12th international conference on Networking, VLSI and signal processing, Pages: 182-186.
3. R. Harshawardhan, Mridula Sara Augustine and Dr K. P. Soman(2011), “Phrase based English-Tamil translation
system by concept labeling using translation memory”,IJCA,Vol.20,No.3,Pages:1-6.
4. Thiruumeni P G,Anand Kumar M,Dhanalakshmi and V,Soman K P(2011), “An approach to handle idioms and
phrasal verbs in English-Tamil machine translation system”, IJCA,Vol.26,No.10,Pages:36-41.
5. Poornima C,Dhanalakshmi V,Anand Kumar and M,Soman K P(2011), “Rule based sentence simplification for
English to Tamil Machine Translation system”,IJCA,Vol.25,No.8,Pages:38-42.
6. M.Selvam and A M.Natarajan (2009), “Improvement of Rule Based Morphological Analysis and POS Tagging in
Tamil Language via Projection and Induction Techniques”,IJCSE,Vol.3,Pages:357-367.
7. Dhanalakshmi and Rajendran(2010), “Natural Language processing tools for Tamil grammar learning and
teaching”,IJCA,Vol.8,No.14,Pages:26-30.
8. Lakshmana Pandian S and Kumanan Kadhirvelu(2012), “Machine translation from English to Tamil using Hybrid
Technique”,IJCA,Vol.46,No.16,Pages:36-42.
9. Anand Kumar M, Dhanalakshmi V,Soman K P and Rajendran(2010), “A Sequence labeling approach to
morphological analyzer for Tamil language”,IJCSE,Vol.2,No.6,Pages:1944-1951.
10. Anand Kumar M,Dhanalakshmi V, Rekha R U,Soman K P and Rajendran(2010), “A Novel data driven algorithm
for Tamil morphological generator”,IJCA,Vol.6,No.12,Pages:52-56.
23
CONTD…
11. Antony P J and K P Soman(2012), “Computational Morphology and Natural language parsing for Indian
languages: A literature Survey”, IJCSET,Vol.3,No.4,Pages:136-146.
12. D.Chandrakanth, M.Anand Kumar and S.Gunasekaran(2012), “Parts-of-Speech tagging for Tamil
language”,IJCE,Vol.6,No.6,Pages:88-93.
13. Dinesh Kumar and Gurpreet Singh Josan(2010), “Part of speech Taggers for morphologically rich Indian
languages: A survey”, IJCA,Vol.6,No.5,Pages:1-9.
14. Selvam M,Natarajan.A M, and Thangarajan R(2008), “Structural parsing of Natural Language text in Tamil
using phrase structure Hybrid Language Model”,IJCPL,World Academy of Science,Engineering and
Technology,Vol.22,No.3,Pages:463-469.
15. Adam R.Teichert and Hal Daume III (2010), “Unsupervised Part of Speech Tagging without a Lexicon”, In
Proceedings of the 2010 Conference on Empirical Methods in Natural Language, Processing,Pages:1-6.
16. Antony P J and Soman K P(2011), “Parts of Speech tagging for Indian languages: A Literature
Survey”,IJCA,Vol.34,No.8,Pages:22-29.
17. S.Saraswathi,P.Kanivadhana,M.Anusiya and S.Sathiya(2011), “Bilingual Translation System” ,IJCSE,Vol.3,No.3 ,
Pages: 1168-1174.
18. Matt Post,Chris Callison-Burch and Miles Osborne(2012), “Constructing parallel corpora for six Indian
languages via crowd sourcing”, Proceeding of the 7th workshop on Statistical machine translation, Pages:401409.
19. Meera Subhash,Wilscy M and S A Shanavas(2012), “A Rule based approach for Root word identification in
Malayalam language”,IJCSIT,Vol.4,No.3,Pages:159-166.
20. Pushpak Bhattacharyya (2012), “Natural Language processing: A perspective from computation in presence of
Ambiguity ,Resource constraint and Multilinguality”, CSI Journal of Computing,Vol.1,No.2,Pages:1-13.
24
CONTD…
21. Kuang-hua and Hsin-His Chen(1996), “A Rule based and Corpus-Oriented approach to prepositional phrase
attachment”, Proceedings of the 16th conference on Computational linguistics,Vol.1,Pages:216-221.
22. Vincent Van Asch and Walter Daelemans(2009), “Prepositional phrase attachment in shallow parsing”,
Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing, Pages:
12-17.
23. Rajat Kumar Mohanty,Ashish Francis,Almeida and Pushpak Bhattacharyya(2005), “ Prepositional Phrase
attachment and Interlingua”, Research on Computing Science,Pages:241-253.
24. Sudip Kumar Naskar and Sivaji Bandyopadhyay(2006), “Handling of prepositions in English to Bengali Machine
translation”, Proceedings of the Third ACL-SIGSEM Workshop on prepositions, Association for Computational
Linguistics,Pages:89-94.
25. I.Dan Melamed,Ryan Green and Joseph P.Turian (2006), “Precision and Recall of Machine Translation”, 03
Proceedings of the Conference of the North American Chapter of the Association for Computational
Linguistics on Human Language Technology,Vol.2,Pages 61-63.
25
Thank You !