> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 Statistical Methods for Improving Spanish into Spanish Sign Language Translation Verónica López Ludeña, Rubén San-Segundo Grupo de Tecnología del Habla Universidad Politécnica de Madrid Abstract—This paper describes new methods for improving a spoken Spanish into Spanish Sign Language (Lengua de Signos Española: LSE) Translation system. The first one consists of incorporating a categorization module as a preprocessing step before the translation. This categorization module replaces Spanish words with associated tags. Secondly, this paper investigates the use of Factored Translation Models (FTMs) for improving the translation performance. Both methods aim to incorporate syntactic-semantic information during the translation process. In both cases, this new information permits to reduce significantly the translation error rate. When implementing these two modules, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a Phrase-based system and into a Statistical Finite State Transducer (SFST), but the use of FTMs has been only considered in the Phrase-based system. The evaluation results reveal that the BLEU (BiLingual Evaluation Understudy) has improved from 69.1% to 73.9% with FTMs and to 78.8% with the categorization method. Index Terms—Spanish Sign Language (LSE), Statistical Language Translation, Syntactic-Semantic Information, Factored Translation Models, Source language categorization, Phrase-based translation model, Statistical Finite State Transducer. I I. INTRODUCTION the world, there are around 70 million people with hearing deficiencies (information from World Federation of the Deaf). Deafness brings about significant communication problems: deaf people cannot hear and most of them are unable to use written languages, having serious problems when expressing themselves in these languages or understanding written texts. They have problems with verb tenses, concordances of gender and number, etc., and they have difficulties when creating a mental image of abstract concepts. This fact can cause deaf people to have problems when accessing information, education, job, social relationship, culture, etc. Deaf people use a sign language (their mother tongue) for communicating and there are not enough sign-language interpreters and communication systems. In the USA, there are 650,000 Deaf people (who use a sign language), although there are more people with hearing deficiencies, but only 7,000 sign-language interpreters, i.e. a ratio of 93 deaf people to 1 interpreter. In Finland we can find the best ratio, 6 to 1, and in Slovakia the worst with 3,000 users to 1 interpreter [1]. In Spain this ratio is 221 to 1. This information shows the need to develop automatic translation systems with new technologies for helping hearing and deaf people to communicate each other. It is necessary to make a difference between “deaf” and “Deaf”: the first one refers to non-hearing people, and the second one refers to people who use a sign language as the first way to communicate being part of the “Deaf community”. Each country has a different sign language, but there may even be different sign languages in different regions in the same country. There is also an international sign language, but most of deaf people do not know it. However, national sign languages are fully-fledged languages that have a grammar and lexicon just like any spoken language, contrary to what most people think. Traditionally, deafness has been associated to people with learning problems but this is not true. The use of sign languages defines the Deaf as a linguistic minority, with learning skills, cultural and group rights similar to other minority language communities. According to information from INE (Statistic Spanish Institute), in Spain, there are 1,064,000 deaf people and 50% are more than 65 years old. They are a geographically dispersed population, producing more social isolation. 47% of deaf population do not have basic studies or are illiterate, and only between 1% and 3% have finished their studies (as opposed to 21% of Spanish hearing people). Also, 20% of the deaf population is unemployed (30% for women). According to the information presented above, deaf people are more vulnerable and they do not have the same opportunities as hearing people. They cannot access information and communication in the same way as hearing people do: TV programs, multimedia content on the internet and personal public services. All these aspects support the need to generate new technologies in order to develop automatic translation systems for converting this information into sign language. This paper describes a new N > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2 automatic categorization for improving a statistical translation system that helps deaf people to communicate with government employees in two specific domains: the renewal of Identity Documents and the renewal of the Driver’s License. II. STATE OF THE ART There are different sign languages depending on the country or even in every region within the country. Professor William Stokoe [2] presented the first conclusions from several studies on ASL (American Sign Language). After these studies, sign languages studies began to increase in the USA ([3], [4]), Europe [5], Africa [6] and Japan [7]. In Spain, during the last twenty years, there have been several proposals for normalising Spanish Sign Language (LSE). Mª Ángeles Rodríguez [8] carried out a detailed analysis of LSE illustrating the main characteristics. She detailed the differences between the sign language used by Deaf people and the standardization proposals. In 2007, the Spanish Government accepted the Spanish Sign Language (Lengua de Signos Española: LSE) as one of the official languages in Spain, defining a plan to invest in resources in this language, in an attempt to normalize and to extend this language over the entire Deaf community. In 2009, the first grammar description for Spanish Sign Language (LSE) was presented [9]. In recent years, there have been several research projects related to automatic language translation. In Europe: C-Star, ATR. Vermobil, Eutrans, LC-Star, PF-Star and, finally, TC-STAR. The TC-STAR project (http://www.tc-star.org/), financed by European Commission within the Sixth Frame Program, has envisaged an effort to advance research into all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT) and Text to Speech conversion (TTS). In the USA, DARPA (Defense Advanced Research Projects Agency) is supporting the GALE program (http://www.darpa.mil/ipto/programs/gale/gale.asp). The goal of the DARPA GALE program is to develop and apply computer software technologies to absorb, analyse and interpret huge volumes of speech and text in multiple languages. The best performing translation systems are based on various types of statistical approaches ([10], [11]), including examplebased methods [12], finite-state transducers [13] and other data driven approaches. The progress achieved over the last 10 years is due to several factors such as efficient algorithms for training [14], context dependent models [15], efficient algorithms for generation [16], more powerful computers and bigger parallel corpora, and automatic error measurements ([17], [18]). Another significant effort in machine translation has been the organization of several Workshops on Statistical Machine Translation (SMT). On the webpage http://www.statmt.org/, it is possible to obtain all the information on these events. As a result of these workshops, there are two free machine translation systems called Moses (http://www.statmt.org/moses/) and Joshua (http://cs.jhu.edu/~ccb/joshua/). Moses is a phrase-based statistical machine translation system that allows machine translation system models to be built for any language pair, using a collection of translated texts (parallel corpus). On the other hand, Joshua uses synchronous context free grammars (SCFG) for statistical machine translation. In recent years, several groups have developed prototypes for translating Spoken language into Sign Language: examplebased [19][23], rule-based [20], full sentence [21] or statistical approaches ([22], [23], SiSi system, [24]) approaches. The research into sign language has been possible thanks to corpora generated by several groups. Some examples are: a corpus made up of more than 300 hours from 100 speakers in Australian Sign Language [25]. The RWTH-BOSTON-400 Database that contains 843 sentences with about 400 different signs from 5 speakers in American Sign Language with English annotations [26]. The British Sign Language Corpus Project aims to create a machine-readable digital corpus of spontaneous and elicited British Sign Language (BSL) collected from deaf native signers and early learners across the United Kingdom [27]. And a corpus developed at The Institute for Language and Speech Processing (ILSP) which contains parts of free signing narration, as well as a considerable amount of grouped signed phrases and sentence level utterances [28]. There are others examples in BSL [29], NGS (German Sign Language) [30], and Italian Sign Language [31]. Not only the data but also new practise [32] and new uses of traditional annotation tools [33] have been developed. In Europe, the two main research projects involving sign languages are DICTA-SIGN ([29], [34]) and SIGN-SPEAK ([35], [36]), both financed by The European Commission within the Seventh Frame Program. DICTA-SIGN (http://www.dictasign.eu/) aims to develop the technologies necessary to make Web 2.0 interactions in sign language possible: users sign to a webcam using a dictation style. The computer recognizes the signed phrases, converts them into an internal representation of sign language, and then it has an animated avatar sign them back to the users. In SIGN-SPEAK (http://www.signspeak.eu/), the overall goal is to develop a new vision-based technology for recognizing and translating continuous sign language into text. This paper proposes new methods for improving statistical translation by means of incorporating syntactic-semantic information in the translation process: including a pre-categorization module and using Factored Translation Models (FTMs). The proposed approaches shows a very good performance for translating spoken Spanish into LSE (Lengua de Signos Española) thus reducing considerably the translation error. These new techniques allow the system to adapt better to the differences between spoken and sign languages. In next section, these differences will be presented. The translation module is the main part of a Spoken Spanish into Spanish Sign Language system [20]. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3 belonging to the sign language), and a 3D avatar animation module (for playing back the signs) (Fig. 1). This system has been designed to translate the government employee’s explanations in personal services into LSE (for deaf users). The services considered are the renewal of Identity Documents and the renewal of the Driver’s License. For the natural language translation module, two different statistical strategies have been analysed: a Phrase-based system (Moses) and a Statistical Finite State Transducer (SFST). The categorization module has been incorporated into both translation strategies, but the use of FTMs has been only considered in the Phrase-based system. Natural Speech Speech Recognition Word Sequence Acoustic Language Models Model Language translation Sign Sequence Translation Information Sign Animation Sign Descriptions Fig. 1. Spanish into LSE translation system III. SPANISH SIGN LANGUAGE (LENGUA DE SIGNOS ESPAÑOLA: LSE) Spanish Sign Language (LSE), just like other sign languages, has a visual-gestural channel, but it also has grammatical characteristics similar to written languages. Sign languages have complex grammars and professional linguistics have found all of the necessarily linguistic characteristics for classifying sign languages as “true languages”. In linguistic terms, sign languages are as complex as written languages, despite the common misconception that they are a “simplification” of written languages. For example, The United Kingdom and USA share the same language. However, British Sign Language is completely different from American Sign Language. W. Stokoe supports the idea that sign languages have four dimensions (three space dimensions plus time), and spoken languages have only one dimension, time, so it cannot say that sign languages are a simplification of any other language. Sign languages are not mime: signs do not necessarily have a relationship to a word. Anyway, they have more iconicity than spoken languages. Like spoken languages, sign language transforms meaningless units initially called cheremes (like the phonemes of spoken languages) into units with semantic information. These phonemes of a sign are the handshape, the palm orientation, the place of articulation, the movement and the face expressions (non-manual marks). One important difference between spoken/written languages and sign languages is sequentially. Phonemes in spoken languages are produced in a sequence. On the other hand, sign languages have a large non-sequential component, because fingers, hands and face movements can be involved in a sign simultaneously, even two hands moving in different directions. These features give a complexity to sign languages that traditional written languages do not have. This fact makes it very difficult to write sign languages. Traditionally, signs have been written using words (in capital letters) in Spanish (or English in the case of BSL, British Sign Language) with a similar meaning to the sign meaning. They are called glosses (i.e. “CAR” for the sign “car”). In the last 20 years, several alternatives, based on specific characteristic of the signs, have appeared in the international community: HamNoSys [37], SEA (Sistema de Escritura Alfabética) [38] and SignWriting (http://www.signwriting.org/). These notations allow every component of each sign to be represented (handshape, orientation, location, movement and non-manual markers). In order to write down LSE, each sign of an LSE sentence is represented by a gloss, so a gloss sequence represents a sequence of signs. An example of glosses representing the sentence “¿a qué hora se abre? (what time do you open?)” would be “ABRIR HORA? (OPEN HOUR?)”. There can be several signs represented by a gloss with ‘+’, for example: “SABADO+DOMINGO (SATURDAY+SUNDAY)” to represent “fin de semana (weekend)”. Also, there can be several words in Spanish that form only one gloss in LSE, this fact is marked with ‘-‘. For example, “CAFE-CON-LECHE” for representing “café con leche (coffee with milk)”. The main characteristics of LSE are as follows: • Gender in LSE is not usually specified, but if necessary, it can be indicated by adding an additional gloss like “MUJER (woman)” or “HOMBRE (man)”. • For indicating plural, there are two ways: with a gloss in plural (adding an ‘S’) or adding “-pl” to the gloss. For example, “PERSONA-pl” (person in plural –people-) or “AMIGOS” (friends). • For specifying verb tenses, the verb tense can be added in parentheses next to the gloss, for example “USAR (FUT.) (to use in future)”. • For representing a negative sentence, it is added to the verb in infinitive, the gloss “NO”, for example “PODER NO” > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4 (cannot). • For representing an interrogative sentence, it is marked with an interrogative mark at the end of the sentence (POR-QUÉ? (WHY?)), but in Spanish two interrogative marks are used (¿por qué?) at the beginning and at the end. • In LSE, also there are spelling for representing names or unknown words and this is indicated with “dl” previous to the spelled word, for example, “dlJUAN” for spelling the name “Juan”. • An important characteristic of LSE is the use of classifiers. Classifiers are signs that indicate actions, places, etc., and they are denoted with the prefix “CL” and a letter that indicates the classifier’s type (for example, place). Some classifiers are “CLL-ACERCARSE”, “CLD-GRANDE”, etc. • Finally, one important characteristic of LSE is iconicity: signs resemble to concept that represent. If written LSE is analysed, glosses have semantic information principally. LSE has some characteristics that differ from Spanish. One important difference is the order of arguments in sentences: LSE has a SOV (subject-object-verb) order in contrast to SVO (subject-verb-object) Spanish order. An example that illustrates this behaviour is shown below: Spanish: Juan ha comprado las entradas (Juan has brought the tickets) LSE: JUAN ENTRADAS COMPRAR (JUAN TICKETS TO-BUY) Comparing these two different orders in predication, the next typological differences (Table I) can be extracted [9]: TABLE I TYPOLOGICAL DIFFERENCES THAT ARE RELATED TO PREDICATION ORDER BETWEEN LSE AND SPANISH SPANISH Prepositions cerca de casa (close to home) Demonstrative + Name ese hombre (this man) Name + Genitive madre de Juan (Juan’s mother) Initial interrogative particle ¿dónde está el libro? (Where is the book?) Auxiliary verb + Principal verb debes comer (you must eat) Negative particle + Verb no trabajo (I do not work) LSE Postpositions CASA CERCA (HOME CLOSE) Name + Demonstrative HOMBRE ESE (MAN THIS) Genitive + Name JUAN MADRE (JUAN MOTHER) Final interrogative particle LIBRO DÓNDE? (BOOK WHERE?) Principal verb + Auxiliary verb COMER DEBER (EAT MUST) Verb + Negative particle TRABAJAR NO (TO-WORK NO) There are other typological differences that are not related to predication order: • Spanish has an informative style (without topics) and LSE has a communicative style (with topics). • How it was explained above, gender is not usually specified in LSE, in contrast to Spanish. • In LSE, there can be concordances between verbs and subject, receiver or object and even subject and receiver, but in Spanish there can be only concordance between verb and subject. For example: • The use of classifiers is common in LSE, but they are not in Spanish. For example: Spanish: debe acercarse a la cámara (you must approach the camera) LSE: FOTO CLD_GRANDE_NO CLL_ACERCARSE DEBER (PHOTO CLD_BIG_NO CLL_APPROACH MUST) • Articles are used in Spanish, but not in LSE. For example: Spanish: La televisión se apaga (The TV is switched off) LSE: TELEVISION APAGAR (TV TO-SWITCH-OFF) • Plural can be descriptive in LSE, but not in Spanish. The way the flowers sign is represented provides information about how the flowers are situated. For example: Spanish: flor (flower) LSE: FLOR (FLOWER) Spanish: flores (flowers) LSE: CL-“flores” (CL-“flowers”) • There is a difference between an absent and present third person in LSE, but there is no absent third person in Spanish. • In LSE, there is the possibility of using double reference, not in Spanish. > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5 • LSE is a language with ample flexibility, and homonymy between substantive and adjective is usual, so most nouns can be adjectives and vice versa. But there are few cases in Spanish. • In Spanish, there is a copula in non-verbal predications (the verb ‘to be’, ser and estar in Spanish), but there is not in LSE (except some locative predications). For example: Spanish: Antonio está en la Universidad (Antonio is in the University) LSE: ANTONIO/UNIVERSIDAD ALLÍ (ANTONIO/UNIVERSITY THERE) • There is a difference between inclusive and exclusive quantifiers in LSE, but not in Spanish. • There are Spanish impersonal sentences with “se” pronoun, but not in LSE. For example: Spanish: Se come bien (you eat well) LSE: COMER BIEN (TO-EAT WELL) • It is important to comment that LSE is more lexically flexible than Spanish, and it is perfect for generating periphrasis through its descriptive nature and because of this, LSE has fewer nouns than Spanish. • To finish, LSE has less gloss per sentence (4.4 in our database) than Spanish (5.9 in our database). Some examples of LSE sentences and their Spanish translation are shown in Table II: TABLE II EXAMPLES OF SPANISH SENTENCES AND THEIR LSE TRANSLATION Spanish dame una foto (give me a photo) debe recoger el deneí dentro de un mes (you must collect your ID card in a month) LSE TU FOTO UNA DAR-A_MI (YOU PHOTO ONE GIVE-TO-ME) DNI RECOGER PRÓXIMO UNMES (ID-CARD TO-COLLECT NEXT MONTH) IV. PARALLEL CORPUS In order to develop a translation system focused on the domain of the renewal of Identity Document (ID) and Driver’s license (DL), a database, including a parallel corpus, has been generated. This database has been obtained with the collaboration of Local Government Offices where the aforementioned services (ID and DL) are provided. For a period of three weeks, the most frequent explanations (from government employees) and the most frequent questions (from the user) were taken down and more than 5,000 sentences were noted. These 5,000 sentences were analysed because not all of them refer to ID or DL, so sentences were selected manually in order to develop a system in a specific domain. Finally, 1360 sentences were collected: 1,023 pronounced by government employees and 337 by users. These sentences were translated into LSE, both in text (sequence of glosses) and in video, and compiled in an excel file. This corpus was increased to 4,080 by incorporating different variants for Spanish sentences, maintaining the LSE translation. The excel file (Fig. 2) contains eight different information fields: “ÍNDICE” (sentence index), “DOMINIO” (domain: ID or DL renewal), “VENTANILLA” (window where the sentence was collected), “SERVICIO” (service provided when the sentence was collected), if the sentence was pronounced by the government employee or user (funcionario or usuario respectively), sentence in Spanish (CASTELLANO), sentence in LSE (sequence of glosses), and link to the video file with LSE representation. Fig. 2. Fragment of system database The main features of the corpus are summarised in Table III. These features are divided depending on the domain (ID or DL renewal) and whether the sentence was spoken by the government employee or the user. For the system development, two types of files were generated from the database: text files and sign files. Text files are made > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6 up of Spanish sentences of the parallel corpus and sign files contain their LSE translations (LSE sentences are gloss sequences). The corpus was divided randomly into three sets: training (75%), development (12.5%) and test (12.5%). TABLE III MAIN STATISTICS OF THE CORPUS Government employee Sentence pairs ID Spanish LSE 1,425 DL Spanish LSE 1,641 Different sentences 1,236 389 1,413 199 Running words 8,490 6,282 17,113 12,741 Vocabulary User 652 364 527 237 Spanish LSE Spanish LSE Sentence pairs Different sentences Running words Vocabulary 531 483 458 139 389 93 2,768 1,950 3,130 2,283 422 165 294 133 V. AUTOMATIC SPEECH RECOGNIZER The Automatic Speech Recognizer (ASR) used is a state-of-the-art speech recognition system developed at GTH-UPM (http://lorien.die.upm.es). It is a speaker independent continuous speech recognition system based on HMMs (Hidden Markov Models). The feature extraction includes CMN and CVN (Cepstrum Medium and Variance Normalization) techniques. The ASR provides one confidence value for each word recognized in the word sequence. Regarding the performance of the ASR module, with vocabularies smaller than 1,000 words, the Word Error Rate (WER) is lower than 5%. VI. STATISTICAL TRANSLATION STRATEGIES In this paper, two different statistical strategies have been considered: a Phrase-based system and a Statistical Finite State Transducer. The proposed automatic categorization has been evaluated with both translation strategies. This section describes the architectures used for the experiments. A. Phrase-based translation system The Phrase-based translation system is based on the software released to support the shared task at the 2009 NAACL Workshop on Statistical Machine Translation (http://www.statmt.org/wmt09/). Fig. 3 shows the system architecture. Parallel Corpus Alignment (GIZA++) Target Corpus Phrase Extract (Phrase-Extract and Phrase-Score) N-gram Training SRI-LM (ngram-count) Translation Model Source Corpus Ref Target Corpus Decoder (MOSES) Evaluation Language Model Fig. 3. Phrase-based translation architecture The phrase model has been trained following these steps: • Word alignment computation. At this step, the GIZA++ software [10] has been used to calculate the alignments between words and signs. The parameter “alignment” was fixed to “target-source” as the best option: only this target-source alignment is considered (LSE-Spanish). In this configuration, alignment is guided by signs: each sign in LSE is aligned with a Spanish word and it is possible that some words were not aligned to any sign. • Phrase extraction [16]. All phrase pairs that are consistent with the word alignment are collected. The maximum size of a phrase has been fixed to 7, based on tuning experiments with the development set. • Phrase scoring. In this step, the translation probabilities are computed for all phrase pairs. Both translation probabilities are calculated: forward and backward. The Moses decoder is used for the translation process [39]. This program is a beam search decoder for phrase-based statistical machine translation models. In order to obtain a 3-gram language model, the SRI language modeling toolkit has been used [40]. > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7 B. Statistical Finite State Transducer The translation based on SFST is carried out as set out in Fig. 4. The translation model consists of an SFST made up of aggregations: subsequences of aligned source and target words. The SFST is inferred from the word alignment (obtained with GIZA++) using the GIATI (Grammatical Inference and Alignments for Transducer Inference) algorithm [13]. The SFST probabilities are also trained from aligned corpora. The software used in this paper has been downloaded from http://prhlt.iti.es/content.php?page=software.php. Parallel Corpus Target Corpus Alignment (GIZA++) Finite State Transducer (GIATI) N-gram Training SRI-LM (ngram-count) Translation Model Source Corpus Target Corpus Translation (REFX) Evaluation Language Model Fig. 4. Diagram of the FST-based translation module VII. CATEGORIZATION For incorporating a categorization module, the system considers the categories used in the rule-based translation system previously developed for these two application domains [20]. The natural language translation module was implemented using a rule-based technique considering a bottom-up strategy. In this case, the relationships between signs and words are defined by an expert. In a bottom-up strategy, the translation analysis is carried out starting from each word individually and extending the analysis to neighboring context words or already-formed signs (generally named blocks). The rules implemented by the expert define these relationships. The translation process is carried out in two steps. In the first one, every word is mapped to one or several syntactic-pragmatic tags. After that, the translation module applies different rules that convert the tagged words into signs by means of grouping concepts or signs and defining new signs. These rules can define short and large scope relationships between the concepts or signs. The categories used in the first step were considered for these experiments. For including categories in the translation process, the main idea is to replace the source language words with their categories and to train the translation model based on these categories instead of words. When implementing the categorization module, several strategies for dealing with the “non-relevant” words have been proposed: In the first alternative, all the words are replaced by their tags with the exception of those words that they do not appear in the list (OOV words). As, it was commented before, they are kept as they are. In the word-tag list, there is a “non-relevant” tag mapped to words that are not relevant for the translation process (named “basura” (garbage)). This alternative will be referred in the experiments like “Base categorization”. For example: • • • Source sentence: debes pagar las tasas en la caja Categorizated source sentence: DEBER PAGAR basura DINERO basura basura DINERO=CAJA Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR The second proposed alternative was not to tag any word in the source language but removing non-relevant words from the source lexicon (associated to the “non-relevant” tag). This alternative will be referred in the experiments like “Non-relevant word deletion”. For example: • • • Source sentence: debes pagar las tasas en la caja Categorizated source sentence: debes pagar tasas caja Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR Finally, the third alternative proposes to replace words with tags (with the exception of OOVs) and to remove “non-relevant” tags. This alternative will be referred in the experiments like “Categorization and non-relevant word deletion”. For example: • • • Source sentence: debes pagar las tasas en la caja Categorizated source sentence: DEBER PAGAR DINERO DINERO=CAJA Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR In section IX all the alternatives will be evaluated and discussed. > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8 VIII. FACTORED TRANSLATION MODELS For the phrase-based translation strategy, there is the possibility of training factored models in order to include this information in the translation process [41]. This possibility is an extension of phrase-based statistical machine translation models that enables the straightforward integration of additional annotations at the word-level (linguistic markup or automatically generated word classes). The main idea is to add additional annotation at the word level. A word in this framework is not only a token, but a vector of factors that represents different levels of annotation. The translation of factored representations of input words into the factored representations of output words is broken up into a sequence of mapping steps that either translates input factors into output factors, or generates additional output factors from existing output factors. In this case, only the source language has been factored (with an additional factor, its category) and in order to use this factorization module, three different strategies was considered for dealing with “non-relevant” words, words that are not relevant for the translation process. They are tagged with the non-relevant tag named “basura” (garbage). In the first alternative, all the words in the source language are factored and several translations models are trained (word-sign and tag-sign). Only two factors have been considered: word and tag. This alternative will be referred in the experiments like “Using tags”. For example: • • • Source sentence: debes pagar las tasas en la caja (you must pay the taxes in the cash desk) Factorized source sentence: debes|DEBER pagar|PAGAR las|basura tasas|DINERO en|basura la|basura caja|DINERO=CAJA Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR (WINDOW SPECIFIC CASH YOU PAY) The second proposed alternative was to keep the original words (without additional factors), but removing non-relevant words from the source lexicon. This alternative will be referred in the experiments like “Removing non-relevant words from the source lexicon”. • • • Source sentence: debes pagar las tasas en la caja (you must pay the taxes in the cash desk) Factorized source sentence: debes pagar tasas caja Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR (WINDOW SPECIFIC CASH YOU PAY) Finally, in third alternative all the words are factored and “non-relevant” words are removed. This alternative will be referred in the experiments like “Using tags and removing non-relevant tags”. • • • Source sentence: debes pagar las tasas en la caja (you must pay the taxes in the cash desk) Factorized source sentence: debes|DEBER pagar|PAGAR tasas|DINERO caja|DINERO=CAJA Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR (WINDOW SPECIFIC CASH YOU PAY) IX. EXPERIMENTS AND DISCUSSION For the experiments, the corpus (described in section IV) was divided randomly into three sets: training (75%), development (12.5%) and test (12.5%). The training set was used to train the translation and language models, and the development sets were used for tuning the weights and the analysis of the probability threshold (in the case of automatic categorization). For evaluating the performance of the translation systems, the BLEU (BiLingual Evaluation Understudy) metric [17] has been computed using the NIST tool (mteval.pl). In order to analyze the significance of the differences between several systems, for every BLEU result, the confidence interval (at 95%) is also presented. This interval is calculated using the following formula: ± ∆ = 1,96 BLEU (100 − BLEU ) n n is the number of signs used in evaluation, in this case n=2,906. Table IV compares the baseline system and the system with the categorization module for translating the references (Reference) and the speech recognizer outputs (ASR output) using the phrase-based translation system. > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9 TABLE IV EVALUATION RESULTS FOR THE PHRASE-BASED TRANSLATION SYSTEM REPLACING SOURCE LANGUAGE WORDS BY THEIR CATEGORIES Phrase-based translation System Baseline Base categorization Non-relevant words deletion Categorization and nonrelevant word deletion BLEU ±∆ Reference 73.6 1.6 ASR output 69.1 1.7 Reference 81.9 1.4 ASR output 74.5 1.6 Reference 80.0 1.4 ASR output 73.9 1.6 Reference 84.3 1.3 ASR output 78.8 1.5 Table V compares the baseline system and the system with the categorization module for translating the references (Reference) and the speech recognizer outputs (ASR output) using the SFST-based translation system. TABLE V EVALUATION RESULTS FOR THE SFST-BASED TRANSLATION SYSTEM REPLACING SOURCE LANGUAGE WORDS BY THEIR CATEGORIES Phrase-based translation System Baseline Base categorization Non-relevant words deletion Categorization and nonrelevant word deletion BLEU ±∆ Reference 71.2 1.6 ASR output 69.8 1.7 Reference 71.9 1.6 ASR output 68.7 1.7 Reference 76.7 1.5 ASR output 72.8 1.6 Reference 81.5 1.4 ASR output 75.6 1.6 Table VI compares the baseline system and the system with the FTMs for translating the references (Reference) and the speech recognizer outputs (ASR output).When using the FTMs the three different alternatives for dealing with non-relevant words are analyzed. Comparing these three alternatives, it is shown that adding tags to the words and removing “non-relevant” words are complementary actions that allow reaching better results. TABLE VI EVALUATION RESULTS OF THE PHRASE-BASED SYSTEM COMPARING THE BASELINE SYSTEM AND THE SYSTEM WITH THE FTMS Phrase-based translation System Baseline Using tags Removing non-relevant words from the source lexicon Using tags and removing “nonrelevant” tags BLEU ±∆ Reference 73.7 1.6 ASR output 69.1 1.7 Reference 75.5 1.6 ASR output 68.0 1.7 Reference 80.0 1.4 ASR output 73.9 1.6 Reference 81.8 1.4 ASR output 73.9 1.6 > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10 The first analysis is that the improvement reached with the factored models proposed in the phrase-based architecture is lower than the improvement obtained by replacing the original words (in the source language) by their categories. In factored models, two translation models are generated: word-sign and category-sign, and it seems that the process for tuning the weights of these models does not obtain optimum ones. However, in both cases (factorization and categorization), comparing the three alternatives for dealing with the non-relevant words, it is shown that adding tags to the words and removing “non-relevant” words are complementary actions that allow reaching better results. One important difference between Spanish and LSE is that there are more words in Spanish than signs in LSE (7.7 for Spanish and 5.7 for LSE in this corpus). This main difference can produces the generation of several phrases in the same output. This type of error produces an important number of insertions. Additionally, when dealing with long sentences sometimes the system cannot deal properly with big distortions and the output presents important order changes and also some sentences truncations. Other important source of errors corresponds to ordering errors provoked by the different order in predication: LSE has a SOV (Subject-Object-Verb) while Spanish SVO (Subject-Verb-Object). Finally, when translating Spanish into LSE, there is a relevant number of words in the testing set that they do not appear in the training set due to the higher variability presented in Spanish. These words are named Out Of Vocabulary (OOV) words. For example, in Spanish there are many verb conjugations that are translated into the same sign sequence. So, when a new conjugation appears in the evaluation set, it is an OOV that provokes a translation error. In conclusion, the main causes of the translation errors are related to the different variability in the vocabulary for Spanish and LSE (much higher in Spanish), the different number for words or signs in the sentences (higher in Spanish) and the different predication order. The categorization module and the FTMs, including synthetic-semantic information, allow reducing the variability in the source language (for example, several verb conjugations are tagged with the same tag) and also the number of tokens composing the input sentence (when removing non-relevant words). Also, reducing the source language variability and the number of tokens provoke an important reduction on the number of source-target alignments the system has to train. When having a small corpus, as it is the case of many sign languages, this reduction of alignment points permits to obtain better training models with less data, improving the results. These aspects allow increasing the system performance. Finally, the evaluation results reveal that using the categorization module the BLEU has increased from 69.1% to 78.8% for the phrase-based system, and from 69.8% to 75.6% for the SFST when translating ASR outputs. Including FTMs, the BLEU score has increased from 69.1% to 73.9% when translating ASR outputs. X. CONCLUSION This paper describes the use of a categorization module and Factored Translation Models (FTMs) for improving a Spanish into Spanish Sign Language Translation System. These methods allow incorporating syntactic-semantic information during the translation process reducing the source language variability and the number of words composing the input sentence. These two aspects reduce the translation error rate considering two statistical translation systems: phrase-based and SFST-based translation systems. This system is used to translate government employee’s explanations into LSE when providing a personal service for renewing the Identity Document and Driver’s License. The evaluation results reveal that the two methods proposed in this paper allow increasing the translation performance significantly in both statistical translation strategies. REFERENCES [1] Mark Wheatley, Annika Pabsch, 2010. “Sign Language in Europe”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies W. Stokoe, Sign Language structure: an outline of the visual communication systems of the American deaf, Studies in Linguistics, Buffalo University Paper 8, 1960. LREC. Malta 2010. [2] W. Stokoe, Sign Language structure: an outline of the visual communication systems of the American deaf, Studies in Linguistics, Buffalo University Paper 8, 1960. [3] Anderson LB., 1979 “Aspect in Sign Language Morphology: The Role of Universal Semantics and Pragmatics in Determining Grammatical categories”, Linguistics Research Laboratory, Gallaudet College (for the Symposium on Tense/Aspect: between semantics and pragmatics, UCLA, 4–6 May), 1979. [4] Christopoulos, C. Bonvillian, J. 1985. “Sign Language, Journal of Communication Disorders” 18 (1985) 1–20. [5] Hansen B., 1975 “Varieties in Danish Sign Language, Sign Language Studies” 8 (1975) 249–256. J. Kyle, British Sign Language, Special Education 8 (1981) 19–23. [6] Penn, C. Lewis, R. Greenstein A., 1984. ”Sign Language in South Africa”, South African Disorder of Communication 31 (1984) 6–11. [7] Notoya M., Suzuki S., Furukawa M., Umeda R., 1986 “Method and acquisition of sign language in profoundly deaf infants”, Japan Journal of Logopedics and Phoniatrics 27 (1986) 235–243. [8] Rodríguez MA., 1991. “Lenguaje de signos, PhD. Dissertation”, Confederación Nacional de Sordos Españoles (CNSE) and Fundación ONCE, Madrid. Spain, 1991. [9] Herrero, A. 2009. “Gramática didáctica de la Lengua de Signos Española (LSE)”. 2009. [10] Och J., Ney. H., 2002. “Discriminative Training and Maximum Entropy Models for Statistical Machine Translation”. Annual Meeting of the Ass. For Computational Linguistics (ACL), Philadelphia, PA, pp. 295-302. 2002. [11] Mariño J.B., Banchs R., Crego J.M., Gispert A., Lambert P., Fonollosa J.A., Costa-Jussà M., 2006. "N-gram-based Machine Translation", Computational Linguistics, Association for Computacional Linguistics. Vol. 32, nº 4, pp. 527-549. > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11 [12] Sumita E., Y. Akiba, T. Doi et al. 2003. “A Corpus-Centered Approach to Spoken Language Translation”. Conf. of the Europ. Chapter of the Ass. For Computational Linguistics (EACL), Budapest, Hungary. pp 171-174. 2003. [13] Casacuberta F., E. Vidal. 2004. “Machine Translation with Inferred Stochastic Finite-State Transducers”. Computational Linguistics, Vol. 30, No. 2, pp. 205-225, June 2004. [14] Och J., Ney. H., 2003. “A systematic comparison of various alignment models”. Computational Linguistics, Vol. 29, No. 1 pp. 19-51, 2003. [15] Zens R., F.J. Och, H. Ney. 2002. “Phrase-Based Statistical Machine Translation”. German Conference on Artificial Intelligence (KI 2002). Aachen, Germany, Springer, LNAI, pp. 18-32, Sep. 2002. [16] Koehn P., F.J. Och D. Marcu. 2003. “Statistical Phrase-based translation”. Human Language Technology Conference 2003 (HLT-NAACL 2003), Edmonton, Canada, pp. 127-133, May 2003. [17] Papineni K., S. Roukos, T. Ward, W.J. Zhu. 2002 “BLEU: a method for automatic evaluation of machine translation”. 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, pp. 311-318. 2002. [18] Agarwal, Abhaya and Lavie, Alon, 2008. "Meteor, m-bleu and m-ter: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output", Proceedings of Workshop on Statistical Machine Translation at the 46th Annual Meeting of the Association of Computational Linguistics (ACL-2008), Columbus, June 2008. [19] Morrissey, S. 2008. “Data-Driven Machine Translation for Sign Languages”. Thesis. Dublin City University, Dublin, Ireland. [20] San-Segundo R., Barra R., Córdoba R., D’Haro L.F., Fernández F., Ferreiros J., Lucas J.M., Macías-Guarasa J., Montero J.M., Pardo J.M, 2008. “Speech to Sign Language translation system for Spanish”. Speech Communication, Vol 50. 1009-1020. 2008. [21] Cox, S.J., Lincoln M., Tryggvason J., Nakisa M., Wells M., Mand Tutt, and Abbott, S., 2002 “TESSA, a system to aid communication with deaf people”. In ASSETS 2002, pages 205-212, Edinburgh, Scotland, 2002. [22] Stein, D., Bungeroth, J. and Ney, H.: 2006 “Morpho-Syntax Based Statistical Methods for Sign Language Translation”. 11th Annual conference of the European Association for Machine Translation, Oslo, Norway, June 2006. [23] Morrissey S., Way A., Stein D., Bungeroth J., and Ney H., 2007 “Towards a Hybrid Data-Driven MT System for Sign Languages. Machine Translation Summit (MT Summit)”, pages 329-335, Copenhagen, Denmark, September 2007. [24] Vendrame M., Tiotto G., 2010. ATLAS Project: Forecast in Italian Sign Language and Annotation of Corpora. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010b [25] Johnston T., 2008. “Corpus linguistics and signed languages: no lemmata, no corpus”. 3rd Workshop on the Representation and Processing of Sign Languages, June 1. 2008. [26] Dreuw P., Neidle C., Athitsos V., Sclaroff S., and Ney H. 2008. “Benchmark Databases for Video-Based Automatic Sign Language Recognition”. In International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco, May 2008. [27] Schembri. A.,2008 ”British Sign Language Corpus Project: Open Access Archives and the Observer’s Paradox”. Deafness Cognition and Language Research Centre, University College London. LREC 2008. [28] Efthimiou E., and Fotinea, E., 2008 “GSLC: Creation and Αnnotation of a Greek Sign Language Corpus for HCI” LREC 2008. [29] Morrissey S., Somers H., Smith R., Gilchrist S., Dandapat S., 2010 “Building Sign Language Corpora for Use in Machine Translation”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010 [30] Hanke T., König L., Wagner S., Matthes S.. 2010. “DGS Corpus & Dicta-Sign: The Hamburg Studio Setup”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010. [31] Geraci C., Bayley R., Branchini C., Cardinaletti A., Cecchetto C., Donati C., Giudice S., Mereghetti E., Poletti F., Santoro M., Zucchi S. 2010. “Building a corpus for Italian Sign Language. Methodological issues and some preliminary results”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010. [32] Forster J., Stein D., Ormel E., Crasborn O., Ney H.. 2010. “Best Practice for Sign Language Data Collections Regarding the Needs of Data-Driven Recognition and Translation”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010. [33] Crasborn O., Sloetjes H.. 2010. “Using ELAN for annotating sign language corpora in a team setting”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010. [34] Efthimiou E., Fotinea S., Hanke T., Glauert J., Bowden R., Braffort A., Collet C., Maragos P., Goudenove F. 2010. “DICTA-SIGN: Sign Language Recognition, Generation and Modelling with application in Deaf Communication”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010. [35] Dreuw P., Ney H., Martinez G., Crasborn O., Piater J., Miguel Moya J., and Wheatley M., 2010 “The SignSpeak Project - Bridging the Gap Between Signers and Speakers”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010a. [36] Dreuw P., Forster J., Gweth Y., Stein D., Ney H., Martinez G., Verges Llahi J., Crasborn O., Ormel E., Du W., Hoyoux T., Piater J., Moya Lazaro JM, and Wheatley M. 2010 “SignSpeak - Understanding, Recognition, and Translation of Sign Languages”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010b. [37] Prillwitz, S., R. Leven, H. Zienert, T. Hanke, J. Henning, et-al. 1989. “Hamburg Notation System for Sign Languages – An introductory Guide”. International Studies on Sign Language and the Communication of the Deaf, Volume 5. Institute of German Sign Language and Communication of the Deaf, University of Hamburg, 1989. [38] Herrero, A., 2004 “Escritura alfabética de la Lengua de Signos Española” Universidad de Alicante. Servicio de Publicaciones. [39] Koehn, Philipp. 2010. “Statistical Machine Translation”. Cambridge University Press. [40] Stolcke A., 2002. “SRILM – An Extensible Language Modelling Toolkit”. Proc. Intl. Conf. on Spoken Language Processing, vol. 2, pp. 901-904, Denver. [41] Koehn, P., Hoang, H., “Factored Translation Models”. 2007. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 868–876, Prague, June 2007.
© Copyright 2026 Paperzz