METIS II: Statistical Machine Translation using Monolingual Corpora (FP6-IST-003768) Profile Evaluation METIS II, the continuation of the successful assessment project METIS I, is an IST Programme, with a 3-year duration (01/10/2004 – 30/09/2007). The METIS II consortium comprises the following partners: Institute for Language & Speech Processing [ILSP] (co-ordinator) Katholieke Universiteit Leuven [KUL] Gesellschaft zur Förderung der Angewandten Informationsforschung [GFAI] Universitat Pompeu Fabra [UPF] German Results Evaluation Setup BLEU 0,325 For the system evaluation an experimental corpus extracted from real texts, mainly from newspapers, was used. It consisted of 200 sentences, 50 per language pair. The test sentences were of relative complexity, containing one to two clauses each and covered various syntactic phenomena such as word-order variation, NP structure, negation, modification etc. 0,300 0,275 Fig. 5: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the BLEU metric 0,250 0,225 0,200 0,175 0,150 0,125 0,100 0,075 0,050 0,025 0,000 METIS-II The reference translations have been restricted to 3 and were produced by humans, while BLEU & NIST metrics have been used for the evaluation. Systran NIST 5,000 4,500 4,000 Fig. 6: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the NIST metric 3,500 3,000 2,500 2,000 1,500 Evaluation Results 1,000 0,500 0,000 The METIS Approach METIS-II Greek Results Spanish Results BLEU 0,49 32 31 35 Number of Sentences 30 31 METIS II 29 29 30 S ystran 22 23 25 20 16 15 15 10 12 9 10 10 5 8 4 5 8 2 2 .0 ] [1 .0 ,1 .0 ] [0 .9 ,1 .0 ] [0 .8 ,1 .0 ] [0 .7 ,1 .0 ] [0 .6 ,1 .0 ] [0 .5 ,1 .0 ] [0 .4 ,1 .0 ] [0 .3 ,1 .0 ] [0 .2 ,1 .0 ] 0 [0 .1 ,1 METIS II is a hybrid system, combining various approaches to machine translation (rule-based, statistical, pattern-matching techniques). It makes use of readily available resources, such as bilingual dictionaries or basic NLP tools, and it can be easily customised to handle different source (SL) and target language (TL) tags. Systran Fig. 1: Comparative analysis of the score ranges obtained for METIS II and SYSTRAN using the BLEU metric Fig. 7: Comparative analysis 0,48 Metis-No op. 0,47 Metis-Insertion Metis-Deletion 0,46 Metis-Permutation Metis-All ops. 0,45 Systran 0,44 of the scores obtained for different settings of METIS II and SYSTRAN using the BLEU metric Score range 0,43 NIST 60 Number of Sentences Most importantly, however, METIS II is innovative because it does not need bilingual corpora for the translation process, but exclusively relies on monolingual TL corpora. 50 50 49 49 48 45 45 METIS II 41 43 35 35 40 Systran 30 31 30 28 22 20 21 11 10 10 5 4 2 0 Fig. 2: Comparative analysis of the score ranges obtained for METIS II and SYSTRAN using the NIST metric METIS II employs a series of weights, i.e. system parameters, in various phases of the translation process. Weights are associated with system resources and employed by the pattern-matching algorithm; they can be automatically adjusted to customise system performance. 6,55 Metis-No op. Metis-Insertion 6,5 Metis-Deletion 6,45 Metis-Permutation 6,4 Metis-All ops. Systran 6,35 6,25 Dutch Results Future Work BLEU RESULTS 0,5000 0,4500 0,4000 0,3500 VERBATIM 0,3000 BASE 0,2500 TRANSFER 0,2000 NO S LEVEL 0,1500 SYSTRAN 0,1000 0,0500 0,0000 best BLEU avg BLEU 10,0000 9,0000 8,0000 7,0000 VERBATIM 6,0000 BASE 5,0000 TRANSFER 4,0000 NO S LEVEL 3,0000 SYSTRAN 2,0000 1,0000 0,0000 best NIST avg NIST Fig. 3: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the BLEU metric Fig. 4: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the NIST metric Future work involves further investigation of METIS II system architecture. More specifically, work towards the system optimisation includes the following: Further system testing with a big number of test suites that will have more elaborate structures and deal with a wider range of phenomena Algorithm optimisation in terms of accuracy Automatic fine tuning of weights Implementation of a post-editor module METIS II Architecture Web The end user selects the preferred SL and enters Interface the text to be translated. NLP NLP tools handle the SL input yielding an SL sequence annotated with grammatical & syntactic information. Database Server Lexicon Lexicon The SL sequence is enhanced by translation Lookup equivalents & PoS info, thus resembling a TL pattern. Weights BNC Clauses BNC Chunks Token Generation Rules Fig. 8: Comparative analysis of the scores obtained for different settings of METIS II and SYSTRAN using the NIST metric NIST NIST RESULTS Four (4) language pairs have been developed as yet, namely Greek, Dutch, German & Spanish English. 6,6 6,3 Score range METIS II handles sequences both at sentence and sub-sentential level, achieving thus to exploit the recursive property of natural language. 6,65 Core Engine The core engine of METIS II system is fed with a sequence of TL-like patterns, handled by the pattern-matching algorithm. It proceeds in 2 stages involving wider and narrower contexts, thus generating a TL sequence. Token The token generation module receives as input Generation a sequence of translated lemmas & their respective tags; it is responsible for the production of tokens out of lemmas. Final Translation
© Copyright 2025 Paperzz