Statistical Methods for Improving Spanish into Spanish Sign

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
1
Statistical Methods for Improving Spanish into
Spanish Sign Language Translation
Verónica López Ludeña, Rubén San-Segundo
Grupo de Tecnología del Habla
Universidad Politécnica de Madrid
Abstract—This paper describes new methods for improving a spoken Spanish into Spanish Sign Language (Lengua de Signos
Española: LSE) Translation system. The first one consists of incorporating a categorization module as a preprocessing step before the
translation. This categorization module replaces Spanish words with associated tags. Secondly, this paper investigates the use of
Factored Translation Models (FTMs) for improving the translation performance. Both methods aim to incorporate syntactic-semantic
information during the translation process. In both cases, this new information permits to reduce significantly the translation error
rate. When implementing these two modules, several alternatives for dealing with non-relevant words have been studied. Non-relevant
words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a Phrase-based
system and into a Statistical Finite State Transducer (SFST), but the use of FTMs has been only considered in the Phrase-based
system. The evaluation results reveal that the BLEU (BiLingual Evaluation Understudy) has improved from 69.1% to 73.9% with
FTMs and to 78.8% with the categorization method.
Index Terms—Spanish Sign Language (LSE), Statistical Language Translation, Syntactic-Semantic Information, Factored
Translation Models, Source language categorization, Phrase-based translation model, Statistical Finite State Transducer.
I
I. INTRODUCTION
the world, there are around 70 million people with hearing deficiencies (information from World Federation of the Deaf).
Deafness brings about significant communication problems: deaf people cannot hear and most of them are unable to use
written languages, having serious problems when expressing themselves in these languages or understanding written texts. They
have problems with verb tenses, concordances of gender and number, etc., and they have difficulties when creating a mental
image of abstract concepts. This fact can cause deaf people to have problems when accessing information, education, job, social
relationship, culture, etc. Deaf people use a sign language (their mother tongue) for communicating and there are not enough
sign-language interpreters and communication systems. In the USA, there are 650,000 Deaf people (who use a sign language),
although there are more people with hearing deficiencies, but only 7,000 sign-language interpreters, i.e. a ratio of 93 deaf people
to 1 interpreter. In Finland we can find the best ratio, 6 to 1, and in Slovakia the worst with 3,000 users to 1 interpreter [1]. In
Spain this ratio is 221 to 1. This information shows the need to develop automatic translation systems with new technologies for
helping hearing and deaf people to communicate each other.
It is necessary to make a difference between “deaf” and “Deaf”: the first one refers to non-hearing people, and the second one
refers to people who use a sign language as the first way to communicate being part of the “Deaf community”. Each country has
a different sign language, but there may even be different sign languages in different regions in the same country. There is also
an international sign language, but most of deaf people do not know it. However, national sign languages are fully-fledged
languages that have a grammar and lexicon just like any spoken language, contrary to what most people think. Traditionally,
deafness has been associated to people with learning problems but this is not true. The use of sign languages defines the Deaf as
a linguistic minority, with learning skills, cultural and group rights similar to other minority language communities.
According to information from INE (Statistic Spanish Institute), in Spain, there are 1,064,000 deaf people and 50% are more
than 65 years old. They are a geographically dispersed population, producing more social isolation. 47% of deaf population do
not have basic studies or are illiterate, and only between 1% and 3% have finished their studies (as opposed to 21% of Spanish
hearing people). Also, 20% of the deaf population is unemployed (30% for women).
According to the information presented above, deaf people are more vulnerable and they do not have the same opportunities
as hearing people. They cannot access information and communication in the same way as hearing people do: TV programs,
multimedia content on the internet and personal public services. All these aspects support the need to generate new technologies
in order to develop automatic translation systems for converting this information into sign language. This paper describes a new
N
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
2
automatic categorization for improving a statistical translation system that helps deaf people to communicate with government
employees in two specific domains: the renewal of Identity Documents and the renewal of the Driver’s License.
II. STATE OF THE ART
There are different sign languages depending on the country or even in every region within the country. Professor William
Stokoe [2] presented the first conclusions from several studies on ASL (American Sign Language). After these studies, sign
languages studies began to increase in the USA ([3], [4]), Europe [5], Africa [6] and Japan [7]. In Spain, during the last twenty
years, there have been several proposals for normalising Spanish Sign Language (LSE). Mª Ángeles Rodríguez [8] carried out a
detailed analysis of LSE illustrating the main characteristics. She detailed the differences between the sign language used by
Deaf people and the standardization proposals. In 2007, the Spanish Government accepted the Spanish Sign Language (Lengua
de Signos Española: LSE) as one of the official languages in Spain, defining a plan to invest in resources in this language, in an
attempt to normalize and to extend this language over the entire Deaf community. In 2009, the first grammar description for
Spanish Sign Language (LSE) was presented [9].
In recent years, there have been several research projects related to automatic language translation. In Europe: C-Star, ATR.
Vermobil, Eutrans, LC-Star, PF-Star and, finally, TC-STAR. The TC-STAR project (http://www.tc-star.org/), financed by
European Commission within the Sixth Frame Program, has envisaged an effort to advance research into all core technologies
for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT) and Text
to Speech conversion (TTS). In the USA, DARPA (Defense Advanced Research Projects Agency) is supporting the GALE
program (http://www.darpa.mil/ipto/programs/gale/gale.asp). The goal of the DARPA GALE program is to develop and apply
computer software technologies to absorb, analyse and interpret huge volumes of speech and text in multiple languages.
The best performing translation systems are based on various types of statistical approaches ([10], [11]), including examplebased methods [12], finite-state transducers [13] and other data driven approaches. The progress achieved over the last 10 years
is due to several factors such as efficient algorithms for training [14], context dependent models [15], efficient algorithms for
generation [16], more powerful computers and bigger parallel corpora, and automatic error measurements ([17], [18]). Another
significant effort in machine translation has been the organization of several Workshops on Statistical Machine Translation
(SMT). On the webpage http://www.statmt.org/, it is possible to obtain all the information on these events. As a result of these
workshops, there are two free machine translation systems called Moses (http://www.statmt.org/moses/) and Joshua
(http://cs.jhu.edu/~ccb/joshua/). Moses is a phrase-based statistical machine translation system that allows machine translation
system models to be built for any language pair, using a collection of translated texts (parallel corpus). On the other hand, Joshua
uses synchronous context free grammars (SCFG) for statistical machine translation.
In recent years, several groups have developed prototypes for translating Spoken language into Sign Language: examplebased [19][23], rule-based [20], full sentence [21] or statistical approaches ([22], [23], SiSi system, [24]) approaches.
The research into sign language has been possible thanks to corpora generated by several groups. Some examples are: a
corpus made up of more than 300 hours from 100 speakers in Australian Sign Language [25]. The RWTH-BOSTON-400
Database that contains 843 sentences with about 400 different signs from 5 speakers in American Sign Language with English
annotations [26]. The British Sign Language Corpus Project aims to create a machine-readable digital corpus of spontaneous and
elicited British Sign Language (BSL) collected from deaf native signers and early learners across the United Kingdom [27]. And
a corpus developed at The Institute for Language and Speech Processing (ILSP) which contains parts of free signing narration,
as well as a considerable amount of grouped signed phrases and sentence level utterances [28]. There are others examples in
BSL [29], NGS (German Sign Language) [30], and Italian Sign Language [31]. Not only the data but also new practise [32] and
new uses of traditional annotation tools [33] have been developed.
In Europe, the two main research projects involving sign languages are DICTA-SIGN ([29], [34]) and SIGN-SPEAK ([35],
[36]), both financed by The European Commission within the Seventh Frame Program. DICTA-SIGN
(http://www.dictasign.eu/) aims to develop the technologies necessary to make Web 2.0 interactions in sign language possible:
users sign to a webcam using a dictation style. The computer recognizes the signed phrases, converts them into an internal
representation of sign language, and then it has an animated avatar sign them back to the users. In SIGN-SPEAK
(http://www.signspeak.eu/), the overall goal is to develop a new vision-based technology for recognizing and translating
continuous sign language into text.
This paper proposes new methods for improving statistical translation by means of incorporating syntactic-semantic
information in the translation process: including a pre-categorization module and using Factored Translation Models (FTMs).
The proposed approaches shows a very good performance for translating spoken Spanish into LSE (Lengua de Signos Española)
thus reducing considerably the translation error. These new techniques allow the system to adapt better to the differences
between spoken and sign languages. In next section, these differences will be presented. The translation module is the main part
of a Spoken Spanish into Spanish Sign Language system [20]. The system is made up of a speech recognizer (for decoding the
spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
3
belonging to the sign language), and a 3D avatar animation module (for playing back the signs) (Fig. 1). This system has been
designed to translate the government employee’s explanations in personal services into LSE (for deaf users). The services
considered are the renewal of Identity Documents and the renewal of the Driver’s License.
For the natural language translation module, two different statistical strategies have been analysed: a Phrase-based system
(Moses) and a Statistical Finite State Transducer (SFST). The categorization module has been incorporated into both translation
strategies, but the use of FTMs has been only considered in the Phrase-based system.
Natural
Speech
Speech
Recognition
Word
Sequence
Acoustic Language
Models
Model
Language
translation
Sign
Sequence
Translation
Information
Sign Animation
Sign
Descriptions
Fig. 1. Spanish into LSE translation system
III. SPANISH SIGN LANGUAGE (LENGUA DE SIGNOS ESPAÑOLA: LSE)
Spanish Sign Language (LSE), just like other sign languages, has a visual-gestural channel, but it also has grammatical
characteristics similar to written languages. Sign languages have complex grammars and professional linguistics have found all
of the necessarily linguistic characteristics for classifying sign languages as “true languages”. In linguistic terms, sign languages
are as complex as written languages, despite the common misconception that they are a “simplification” of written languages.
For example, The United Kingdom and USA share the same language. However, British Sign Language is completely different
from American Sign Language. W. Stokoe supports the idea that sign languages have four dimensions (three space dimensions
plus time), and spoken languages have only one dimension, time, so it cannot say that sign languages are a simplification of any
other language.
Sign languages are not mime: signs do not necessarily have a relationship to a word. Anyway, they have more iconicity than
spoken languages. Like spoken languages, sign language transforms meaningless units initially called cheremes (like the
phonemes of spoken languages) into units with semantic information. These phonemes of a sign are the handshape, the palm
orientation, the place of articulation, the movement and the face expressions (non-manual marks).
One important difference between spoken/written languages and sign languages is sequentially. Phonemes in spoken
languages are produced in a sequence. On the other hand, sign languages have a large non-sequential component, because
fingers, hands and face movements can be involved in a sign simultaneously, even two hands moving in different directions.
These features give a complexity to sign languages that traditional written languages do not have. This fact makes it very
difficult to write sign languages. Traditionally, signs have been written using words (in capital letters) in Spanish (or English in
the case of BSL, British Sign Language) with a similar meaning to the sign meaning. They are called glosses (i.e. “CAR” for the
sign “car”). In the last 20 years, several alternatives, based on specific characteristic of the signs, have appeared in the
international community: HamNoSys [37], SEA (Sistema de Escritura Alfabética) [38] and SignWriting
(http://www.signwriting.org/). These notations allow every component of each sign to be represented (handshape, orientation,
location, movement and non-manual markers).
In order to write down LSE, each sign of an LSE sentence is represented by a gloss, so a gloss sequence represents a sequence
of signs. An example of glosses representing the sentence “¿a qué hora se abre? (what time do you open?)” would be “ABRIR
HORA? (OPEN HOUR?)”. There can be several signs represented by a gloss with ‘+’, for example: “SABADO+DOMINGO
(SATURDAY+SUNDAY)” to represent “fin de semana (weekend)”. Also, there can be several words in Spanish that form only
one gloss in LSE, this fact is marked with ‘-‘. For example, “CAFE-CON-LECHE” for representing “café con leche (coffee with
milk)”.
The main characteristics of LSE are as follows:
• Gender in LSE is not usually specified, but if necessary, it can be indicated by adding an additional gloss like “MUJER
(woman)” or “HOMBRE (man)”.
• For indicating plural, there are two ways: with a gloss in plural (adding an ‘S’) or adding “-pl” to the gloss. For example,
“PERSONA-pl” (person in plural –people-) or “AMIGOS” (friends).
• For specifying verb tenses, the verb tense can be added in parentheses next to the gloss, for example “USAR (FUT.) (to use in
future)”.
• For representing a negative sentence, it is added to the verb in infinitive, the gloss “NO”, for example “PODER NO”
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
4
(cannot).
• For representing an interrogative sentence, it is marked with an interrogative mark at the end of the sentence (POR-QUÉ?
(WHY?)), but in Spanish two interrogative marks are used (¿por qué?) at the beginning and at the end.
• In LSE, also there are spelling for representing names or unknown words and this is indicated with “dl” previous to the
spelled word, for example, “dlJUAN” for spelling the name “Juan”.
• An important characteristic of LSE is the use of classifiers. Classifiers are signs that indicate actions, places, etc., and they
are denoted with the prefix “CL” and a letter that indicates the classifier’s type (for example, place). Some classifiers are
“CLL-ACERCARSE”, “CLD-GRANDE”, etc.
• Finally, one important characteristic of LSE is iconicity: signs resemble to concept that represent. If written LSE is analysed,
glosses have semantic information principally.
LSE has some characteristics that differ from Spanish. One important difference is the order of arguments in sentences: LSE
has a SOV (subject-object-verb) order in contrast to SVO (subject-verb-object) Spanish order. An example that illustrates this
behaviour is shown below:
Spanish: Juan ha comprado las entradas (Juan has brought the tickets)
LSE: JUAN ENTRADAS COMPRAR (JUAN TICKETS TO-BUY)
Comparing these two different orders in predication, the next typological differences (Table I) can be extracted [9]:
TABLE I
TYPOLOGICAL DIFFERENCES THAT ARE RELATED TO PREDICATION ORDER
BETWEEN LSE AND SPANISH
SPANISH
Prepositions
cerca de casa (close to home)
Demonstrative + Name
ese hombre (this man)
Name + Genitive
madre de Juan (Juan’s mother)
Initial interrogative particle
¿dónde está el libro? (Where is the
book?)
Auxiliary verb + Principal verb
debes comer (you must eat)
Negative particle + Verb
no trabajo (I do not work)
LSE
Postpositions
CASA CERCA (HOME CLOSE)
Name + Demonstrative
HOMBRE ESE (MAN THIS)
Genitive + Name
JUAN MADRE (JUAN MOTHER)
Final interrogative particle
LIBRO DÓNDE? (BOOK
WHERE?)
Principal verb + Auxiliary verb
COMER DEBER (EAT MUST)
Verb + Negative particle
TRABAJAR NO (TO-WORK NO)
There are other typological differences that are not related to predication order:
• Spanish has an informative style (without topics) and LSE has a communicative style (with topics).
• How it was explained above, gender is not usually specified in LSE, in contrast to Spanish.
• In LSE, there can be concordances between verbs and subject, receiver or object and even subject and receiver, but in
Spanish there can be only concordance between verb and subject. For example:
• The use of classifiers is common in LSE, but they are not in Spanish. For example:
Spanish: debe acercarse a la cámara (you must approach the camera)
LSE: FOTO CLD_GRANDE_NO CLL_ACERCARSE DEBER
(PHOTO CLD_BIG_NO CLL_APPROACH MUST)
• Articles are used in Spanish, but not in LSE. For example:
Spanish: La televisión se apaga (The TV is switched off)
LSE: TELEVISION APAGAR (TV TO-SWITCH-OFF)
• Plural can be descriptive in LSE, but not in Spanish. The way the flowers sign is represented provides information about how
the flowers are situated. For example:
Spanish: flor (flower)
LSE: FLOR (FLOWER)
Spanish: flores (flowers)
LSE: CL-“flores” (CL-“flowers”)
• There is a difference between an absent and present third person in LSE, but there is no absent third person in Spanish.
• In LSE, there is the possibility of using double reference, not in Spanish.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
5
• LSE is a language with ample flexibility, and homonymy between substantive and adjective is usual, so most nouns can be
adjectives and vice versa. But there are few cases in Spanish.
• In Spanish, there is a copula in non-verbal predications (the verb ‘to be’, ser and estar in Spanish), but there is not in LSE
(except some locative predications). For example:
Spanish: Antonio está en la Universidad (Antonio is in the University)
LSE: ANTONIO/UNIVERSIDAD ALLÍ (ANTONIO/UNIVERSITY THERE)
• There is a difference between inclusive and exclusive quantifiers in LSE, but not in Spanish.
• There are Spanish impersonal sentences with “se” pronoun, but not in LSE. For example:
Spanish: Se come bien (you eat well)
LSE: COMER BIEN (TO-EAT WELL)
• It is important to comment that LSE is more lexically flexible than Spanish, and it is perfect for generating periphrasis
through its descriptive nature and because of this, LSE has fewer nouns than Spanish.
• To finish, LSE has less gloss per sentence (4.4 in our database) than Spanish (5.9 in our database). Some examples of LSE
sentences and their Spanish translation are shown in Table II:
TABLE II
EXAMPLES OF SPANISH SENTENCES AND THEIR LSE TRANSLATION
Spanish
dame una foto (give me a photo)
debe recoger el deneí dentro de un
mes (you must collect your ID
card in a month)
LSE
TU FOTO UNA DAR-A_MI (YOU
PHOTO ONE GIVE-TO-ME)
DNI RECOGER PRÓXIMO UNMES (ID-CARD TO-COLLECT
NEXT MONTH)
IV. PARALLEL CORPUS
In order to develop a translation system focused on the domain of the renewal of Identity Document (ID) and Driver’s license
(DL), a database, including a parallel corpus, has been generated. This database has been obtained with the collaboration of
Local Government Offices where the aforementioned services (ID and DL) are provided. For a period of three weeks, the most
frequent explanations (from government employees) and the most frequent questions (from the user) were taken down and more
than 5,000 sentences were noted. These 5,000 sentences were analysed because not all of them refer to ID or DL, so sentences
were selected manually in order to develop a system in a specific domain. Finally, 1360 sentences were collected: 1,023
pronounced by government employees and 337 by users. These sentences were translated into LSE, both in text (sequence of
glosses) and in video, and compiled in an excel file. This corpus was increased to 4,080 by incorporating different variants for
Spanish sentences, maintaining the LSE translation.
The excel file (Fig. 2) contains eight different information fields: “ÍNDICE” (sentence index), “DOMINIO” (domain: ID or
DL renewal), “VENTANILLA” (window where the sentence was collected), “SERVICIO” (service provided when the sentence
was collected), if the sentence was pronounced by the government employee or user (funcionario or usuario respectively),
sentence in Spanish (CASTELLANO), sentence in LSE (sequence of glosses), and link to the video file with LSE
representation.
Fig. 2. Fragment of system database
The main features of the corpus are summarised in Table III. These features are divided depending on the domain (ID or DL
renewal) and whether the sentence was spoken by the government employee or the user.
For the system development, two types of files were generated from the database: text files and sign files. Text files are made
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
6
up of Spanish sentences of the parallel corpus and sign files contain their LSE translations (LSE sentences are gloss sequences).
The corpus was divided randomly into three sets: training (75%), development (12.5%) and test (12.5%).
TABLE III
MAIN STATISTICS OF THE CORPUS
Government employee
Sentence pairs
ID
Spanish LSE
1,425
DL
Spanish
LSE
1,641
Different sentences
1,236
389
1,413
199
Running words
8,490
6,282
17,113
12,741
Vocabulary
User
652
364
527
237
Spanish
LSE
Spanish
LSE
Sentence pairs
Different sentences
Running words
Vocabulary
531
483
458
139
389
93
2,768
1,950
3,130
2,283
422
165
294
133
V. AUTOMATIC SPEECH RECOGNIZER
The Automatic Speech Recognizer (ASR) used is a state-of-the-art speech recognition system developed at GTH-UPM
(http://lorien.die.upm.es). It is a speaker independent continuous speech recognition system based on HMMs (Hidden Markov
Models). The feature extraction includes CMN and CVN (Cepstrum Medium and Variance Normalization) techniques. The ASR
provides one confidence value for each word recognized in the word sequence. Regarding the performance of the ASR module,
with vocabularies smaller than 1,000 words, the Word Error Rate (WER) is lower than 5%.
VI. STATISTICAL TRANSLATION STRATEGIES
In this paper, two different statistical strategies have been considered: a Phrase-based system and a Statistical Finite State
Transducer. The proposed automatic categorization has been evaluated with both translation strategies. This section describes
the architectures used for the experiments.
A. Phrase-based translation system
The Phrase-based translation system is based on the software released to support the shared task at the 2009 NAACL
Workshop on Statistical Machine Translation (http://www.statmt.org/wmt09/). Fig. 3 shows the system architecture.
Parallel
Corpus
Alignment
(GIZA++)
Target
Corpus
Phrase Extract
(Phrase-Extract and
Phrase-Score)
N-gram Training
SRI-LM (ngram-count)
Translation
Model
Source
Corpus
Ref Target
Corpus
Decoder
(MOSES)
Evaluation
Language
Model
Fig. 3. Phrase-based translation architecture
The phrase model has been trained following these steps:
• Word alignment computation. At this step, the GIZA++ software [10] has been used to calculate the alignments between
words and signs. The parameter “alignment” was fixed to “target-source” as the best option: only this target-source
alignment is considered (LSE-Spanish). In this configuration, alignment is guided by signs: each sign in LSE is aligned with
a Spanish word and it is possible that some words were not aligned to any sign.
• Phrase extraction [16]. All phrase pairs that are consistent with the word alignment are collected. The maximum size of a
phrase has been fixed to 7, based on tuning experiments with the development set.
• Phrase scoring. In this step, the translation probabilities are computed for all phrase pairs. Both translation probabilities are
calculated: forward and backward.
The Moses decoder is used for the translation process [39]. This program is a beam search decoder for phrase-based statistical
machine translation models. In order to obtain a 3-gram language model, the SRI language modeling toolkit has been used [40].
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
7
B. Statistical Finite State Transducer
The translation based on SFST is carried out as set out in Fig. 4.
The translation model consists of an SFST made up of aggregations: subsequences of aligned source and target words. The
SFST is inferred from the word alignment (obtained with GIZA++) using the GIATI (Grammatical Inference and Alignments
for Transducer Inference) algorithm [13]. The SFST probabilities are also trained from aligned corpora. The software used in
this paper has been downloaded from http://prhlt.iti.es/content.php?page=software.php.
Parallel
Corpus
Target
Corpus
Alignment
(GIZA++)
Finite State
Transducer
(GIATI)
N-gram Training
SRI-LM (ngram-count)
Translation
Model
Source
Corpus
Target
Corpus
Translation
(REFX)
Evaluation
Language
Model
Fig. 4. Diagram of the FST-based translation module
VII. CATEGORIZATION
For incorporating a categorization module, the system considers the categories used in the rule-based translation system
previously developed for these two application domains [20]. The natural language translation module was implemented using a
rule-based technique considering a bottom-up strategy. In this case, the relationships between signs and words are defined by an
expert. In a bottom-up strategy, the translation analysis is carried out starting from each word individually and extending the
analysis to neighboring context words or already-formed signs (generally named blocks). The rules implemented by the expert
define these relationships.
The translation process is carried out in two steps. In the first one, every word is mapped to one or several syntactic-pragmatic
tags. After that, the translation module applies different rules that convert the tagged words into signs by means of grouping
concepts or signs and defining new signs. These rules can define short and large scope relationships between the concepts or
signs. The categories used in the first step were considered for these experiments.
For including categories in the translation process, the main idea is to replace the source language words with their categories
and to train the translation model based on these categories instead of words. When implementing the categorization module,
several strategies for dealing with the “non-relevant” words have been proposed:
In the first alternative, all the words are replaced by their tags with the exception of those words that they do not appear in the
list (OOV words). As, it was commented before, they are kept as they are. In the word-tag list, there is a “non-relevant” tag
mapped to words that are not relevant for the translation process (named “basura” (garbage)). This alternative will be referred in
the experiments like “Base categorization”. For example:
•
•
•
Source sentence: debes pagar las tasas en la caja
Categorizated source sentence: DEBER PAGAR basura DINERO basura basura DINERO=CAJA
Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR
The second proposed alternative was not to tag any word in the source language but removing non-relevant words from the
source lexicon (associated to the “non-relevant” tag). This alternative will be referred in the experiments like “Non-relevant
word deletion”. For example:
•
•
•
Source sentence: debes pagar las tasas en la caja
Categorizated source sentence: debes pagar tasas caja
Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR
Finally, the third alternative proposes to replace words with tags (with the exception of OOVs) and to remove “non-relevant”
tags. This alternative will be referred in the experiments like “Categorization and non-relevant word deletion”. For example:
•
•
•
Source sentence: debes pagar las tasas en la caja
Categorizated source sentence: DEBER PAGAR DINERO DINERO=CAJA
Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR
In section IX all the alternatives will be evaluated and discussed.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
8
VIII. FACTORED TRANSLATION MODELS
For the phrase-based translation strategy, there is the possibility of training factored models in order to include this
information in the translation process [41]. This possibility is an extension of phrase-based statistical machine translation models
that enables the straightforward integration of additional annotations at the word-level (linguistic markup or automatically
generated word classes). The main idea is to add additional annotation at the word level. A word in this framework is not only a
token, but a vector of factors that represents different levels of annotation. The translation of factored representations of input
words into the factored representations of output words is broken up into a sequence of mapping steps that either translates input
factors into output factors, or generates additional output factors from existing output factors. In this case, only the source
language has been factored (with an additional factor, its category) and in order to use this factorization module, three different
strategies was considered for dealing with “non-relevant” words, words that are not relevant for the translation process. They are
tagged with the non-relevant tag named “basura” (garbage).
In the first alternative, all the words in the source language are factored and several translations models are trained (word-sign
and tag-sign). Only two factors have been considered: word and tag. This alternative will be referred in the experiments like
“Using tags”. For example:
•
•
•
Source sentence: debes pagar las tasas en la caja (you must pay the taxes in the cash desk)
Factorized source sentence: debes|DEBER pagar|PAGAR las|basura tasas|DINERO en|basura la|basura
caja|DINERO=CAJA
Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR (WINDOW SPECIFIC CASH YOU PAY)
The second proposed alternative was to keep the original words (without additional factors), but removing non-relevant words
from the source lexicon. This alternative will be referred in the experiments like “Removing non-relevant words from the
source lexicon”.
•
•
•
Source sentence: debes pagar las tasas en la caja (you must pay the taxes in the cash desk)
Factorized source sentence: debes pagar tasas caja
Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR (WINDOW SPECIFIC CASH YOU PAY)
Finally, in third alternative all the words are factored and “non-relevant” words are removed. This alternative will be referred
in the experiments like “Using tags and removing non-relevant tags”.
•
•
•
Source sentence: debes pagar las tasas en la caja (you must pay the taxes in the cash desk)
Factorized source sentence: debes|DEBER pagar|PAGAR tasas|DINERO caja|DINERO=CAJA
Target sentence: VENTANILLA ESPECÍFICO CAJA TU PAGAR (WINDOW SPECIFIC CASH YOU PAY)
IX. EXPERIMENTS AND DISCUSSION
For the experiments, the corpus (described in section IV) was divided randomly into three sets: training (75%), development
(12.5%) and test (12.5%). The training set was used to train the translation and language models, and the development sets were
used for tuning the weights and the analysis of the probability threshold (in the case of automatic categorization).
For evaluating the performance of the translation systems, the BLEU (BiLingual Evaluation Understudy) metric [17] has been
computed using the NIST tool (mteval.pl). In order to analyze the significance of the differences between several systems, for
every BLEU result, the confidence interval (at 95%) is also presented. This interval is calculated using the following formula:
± ∆ = 1,96
BLEU (100 − BLEU )
n
n is the number of signs used in evaluation, in this case n=2,906.
Table IV compares the baseline system and the system with the categorization module for translating the references
(Reference) and the speech recognizer outputs (ASR output) using the phrase-based translation system.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
9
TABLE IV
EVALUATION RESULTS FOR THE PHRASE-BASED TRANSLATION
SYSTEM REPLACING SOURCE LANGUAGE WORDS BY THEIR
CATEGORIES
Phrase-based translation System
Baseline
Base categorization
Non-relevant words
deletion
Categorization and nonrelevant word deletion
BLEU
±∆
Reference
73.6
1.6
ASR output
69.1
1.7
Reference
81.9
1.4
ASR output
74.5
1.6
Reference
80.0
1.4
ASR output
73.9
1.6
Reference
84.3
1.3
ASR output
78.8
1.5
Table V compares the baseline system and the system with the categorization module for translating the references
(Reference) and the speech recognizer outputs (ASR output) using the SFST-based translation system.
TABLE V
EVALUATION RESULTS FOR THE SFST-BASED TRANSLATION
SYSTEM REPLACING SOURCE LANGUAGE WORDS BY THEIR
CATEGORIES
Phrase-based translation System
Baseline
Base categorization
Non-relevant words
deletion
Categorization and nonrelevant word deletion
BLEU
±∆
Reference
71.2
1.6
ASR output
69.8
1.7
Reference
71.9
1.6
ASR output
68.7
1.7
Reference
76.7
1.5
ASR output
72.8
1.6
Reference
81.5
1.4
ASR output
75.6
1.6
Table VI compares the baseline system and the system with the FTMs for translating the references (Reference) and the
speech recognizer outputs (ASR output).When using the FTMs the three different alternatives for dealing with non-relevant
words are analyzed. Comparing these three alternatives, it is shown that adding tags to the words and removing “non-relevant”
words are complementary actions that allow reaching better results.
TABLE VI
EVALUATION RESULTS OF THE PHRASE-BASED SYSTEM
COMPARING THE BASELINE SYSTEM AND THE SYSTEM WITH THE
FTMS
Phrase-based translation System
Baseline
Using tags
Removing non-relevant
words from the source
lexicon
Using tags and
removing “nonrelevant” tags
BLEU
±∆
Reference
73.7
1.6
ASR output
69.1
1.7
Reference
75.5
1.6
ASR output
68.0
1.7
Reference
80.0
1.4
ASR output
73.9
1.6
Reference
81.8
1.4
ASR output
73.9
1.6
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
10
The first analysis is that the improvement reached with the factored models proposed in the phrase-based architecture is lower
than the improvement obtained by replacing the original words (in the source language) by their categories. In factored models,
two translation models are generated: word-sign and category-sign, and it seems that the process for tuning the weights of these
models does not obtain optimum ones. However, in both cases (factorization and categorization), comparing the three
alternatives for dealing with the non-relevant words, it is shown that adding tags to the words and removing “non-relevant”
words are complementary actions that allow reaching better results.
One important difference between Spanish and LSE is that there are more words in Spanish than signs in LSE (7.7 for
Spanish and 5.7 for LSE in this corpus). This main difference can produces the generation of several phrases in the same output.
This type of error produces an important number of insertions. Additionally, when dealing with long sentences sometimes the
system cannot deal properly with big distortions and the output presents important order changes and also some sentences
truncations.
Other important source of errors corresponds to ordering errors provoked by the different order in predication: LSE has a
SOV (Subject-Object-Verb) while Spanish SVO (Subject-Verb-Object).
Finally, when translating Spanish into LSE, there is a relevant number of words in the testing set that they do not appear in the
training set due to the higher variability presented in Spanish. These words are named Out Of Vocabulary (OOV) words. For
example, in Spanish there are many verb conjugations that are translated into the same sign sequence. So, when a new
conjugation appears in the evaluation set, it is an OOV that provokes a translation error.
In conclusion, the main causes of the translation errors are related to the different variability in the vocabulary for Spanish and
LSE (much higher in Spanish), the different number for words or signs in the sentences (higher in Spanish) and the different
predication order.
The categorization module and the FTMs, including synthetic-semantic information, allow reducing the variability in the
source language (for example, several verb conjugations are tagged with the same tag) and also the number of tokens composing
the input sentence (when removing non-relevant words). Also, reducing the source language variability and the number of
tokens provoke an important reduction on the number of source-target alignments the system has to train. When having a small
corpus, as it is the case of many sign languages, this reduction of alignment points permits to obtain better training models with
less data, improving the results. These aspects allow increasing the system performance.
Finally, the evaluation results reveal that using the categorization module the BLEU has increased from 69.1% to 78.8% for
the phrase-based system, and from 69.8% to 75.6% for the SFST when translating ASR outputs. Including FTMs, the BLEU
score has increased from 69.1% to 73.9% when translating ASR outputs.
X.
CONCLUSION
This paper describes the use of a categorization module and Factored Translation Models (FTMs) for improving a Spanish
into Spanish Sign Language Translation System. These methods allow incorporating syntactic-semantic information during the
translation process reducing the source language variability and the number of words composing the input sentence. These two
aspects reduce the translation error rate considering two statistical translation systems: phrase-based and SFST-based translation
systems. This system is used to translate government employee’s explanations into LSE when providing a personal service for
renewing the Identity Document and Driver’s License. The evaluation results reveal that the two methods proposed in this paper
allow increasing the translation performance significantly in both statistical translation strategies.
REFERENCES
[1]
Mark Wheatley, Annika Pabsch, 2010. “Sign Language in Europe”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora
and Sign Language Technologies W. Stokoe, Sign Language structure: an outline of the visual communication systems of the American deaf, Studies in
Linguistics, Buffalo University Paper 8, 1960. LREC. Malta 2010.
[2] W. Stokoe, Sign Language structure: an outline of the visual communication systems of the American deaf, Studies in Linguistics, Buffalo University Paper
8, 1960.
[3] Anderson LB., 1979 “Aspect in Sign Language Morphology: The Role of Universal Semantics and Pragmatics in Determining Grammatical categories”,
Linguistics Research Laboratory, Gallaudet College (for the Symposium on Tense/Aspect: between semantics and pragmatics, UCLA, 4–6 May), 1979.
[4] Christopoulos, C. Bonvillian, J. 1985. “Sign Language, Journal of Communication Disorders” 18 (1985) 1–20.
[5] Hansen B., 1975 “Varieties in Danish Sign Language, Sign Language Studies” 8 (1975) 249–256. J. Kyle, British Sign Language, Special Education 8
(1981) 19–23.
[6] Penn, C. Lewis, R. Greenstein A., 1984. ”Sign Language in South Africa”, South African Disorder of Communication 31 (1984) 6–11.
[7] Notoya M., Suzuki S., Furukawa M., Umeda R., 1986 “Method and acquisition of sign language in profoundly deaf infants”, Japan Journal of Logopedics
and Phoniatrics 27 (1986) 235–243.
[8] Rodríguez MA., 1991. “Lenguaje de signos, PhD. Dissertation”, Confederación Nacional de Sordos Españoles (CNSE) and Fundación ONCE, Madrid.
Spain, 1991.
[9] Herrero, A. 2009. “Gramática didáctica de la Lengua de Signos Española (LSE)”. 2009.
[10] Och J., Ney. H., 2002. “Discriminative Training and Maximum Entropy Models for Statistical Machine Translation”. Annual Meeting of the Ass. For
Computational Linguistics (ACL), Philadelphia, PA, pp. 295-302. 2002.
[11] Mariño J.B., Banchs R., Crego J.M., Gispert A., Lambert P., Fonollosa J.A., Costa-Jussà M., 2006. "N-gram-based Machine Translation", Computational
Linguistics, Association for Computacional Linguistics. Vol. 32, nº 4, pp. 527-549.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
11
[12] Sumita E., Y. Akiba, T. Doi et al. 2003. “A Corpus-Centered Approach to Spoken Language Translation”. Conf. of the Europ. Chapter of the Ass. For
Computational Linguistics (EACL), Budapest, Hungary. pp 171-174. 2003.
[13] Casacuberta F., E. Vidal. 2004. “Machine Translation with Inferred Stochastic Finite-State Transducers”. Computational Linguistics, Vol. 30, No. 2, pp.
205-225, June 2004.
[14] Och J., Ney. H., 2003. “A systematic comparison of various alignment models”. Computational Linguistics, Vol. 29, No. 1 pp. 19-51, 2003.
[15] Zens R., F.J. Och, H. Ney. 2002. “Phrase-Based Statistical Machine Translation”. German Conference on Artificial Intelligence (KI 2002). Aachen,
Germany, Springer, LNAI, pp. 18-32, Sep. 2002.
[16] Koehn P., F.J. Och D. Marcu. 2003. “Statistical Phrase-based translation”. Human Language Technology Conference 2003 (HLT-NAACL 2003),
Edmonton, Canada, pp. 127-133, May 2003.
[17] Papineni K., S. Roukos, T. Ward, W.J. Zhu. 2002 “BLEU: a method for automatic evaluation of machine translation”. 40th Annual Meeting of the
Association for Computational Linguistics (ACL), Philadelphia, PA, pp. 311-318. 2002.
[18] Agarwal, Abhaya and Lavie, Alon, 2008. "Meteor, m-bleu and m-ter: Evaluation Metrics for High-Correlation with Human Rankings of Machine
Translation Output", Proceedings of Workshop on Statistical Machine Translation at the 46th Annual Meeting of the Association of Computational
Linguistics (ACL-2008), Columbus, June 2008.
[19] Morrissey, S. 2008. “Data-Driven Machine Translation for Sign Languages”. Thesis. Dublin City University, Dublin, Ireland.
[20] San-Segundo R., Barra R., Córdoba R., D’Haro L.F., Fernández F., Ferreiros J., Lucas J.M., Macías-Guarasa J., Montero J.M., Pardo J.M, 2008. “Speech to
Sign Language translation system for Spanish”. Speech Communication, Vol 50. 1009-1020. 2008.
[21] Cox, S.J., Lincoln M., Tryggvason J., Nakisa M., Wells M., Mand Tutt, and Abbott, S., 2002 “TESSA, a system to aid communication with deaf people”. In
ASSETS 2002, pages 205-212, Edinburgh, Scotland, 2002.
[22] Stein, D., Bungeroth, J. and Ney, H.: 2006 “Morpho-Syntax Based Statistical Methods for Sign Language Translation”. 11th Annual conference of the
European Association for Machine Translation, Oslo, Norway, June 2006.
[23] Morrissey S., Way A., Stein D., Bungeroth J., and Ney H., 2007 “Towards a Hybrid Data-Driven MT System for Sign Languages. Machine Translation
Summit (MT Summit)”, pages 329-335, Copenhagen, Denmark, September 2007.
[24] Vendrame M., Tiotto G., 2010. ATLAS Project: Forecast in Italian Sign Language and Annotation of Corpora. In 4th Workshop on the Representation and
Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010b
[25] Johnston T., 2008. “Corpus linguistics and signed languages: no lemmata, no corpus”. 3rd Workshop on the Representation and Processing of Sign
Languages, June 1. 2008.
[26] Dreuw P., Neidle C., Athitsos V., Sclaroff S., and Ney H. 2008. “Benchmark Databases for Video-Based Automatic Sign Language Recognition”. In
International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco, May 2008.
[27] Schembri. A.,2008 ”British Sign Language Corpus Project: Open Access Archives and the Observer’s Paradox”. Deafness Cognition and Language
Research Centre, University College London. LREC 2008.
[28] Efthimiou E., and Fotinea, E., 2008 “GSLC: Creation and Αnnotation of a Greek Sign Language Corpus for HCI” LREC 2008.
[29] Morrissey S., Somers H., Smith R., Gilchrist S., Dandapat S., 2010 “Building Sign Language Corpora for Use in Machine Translation”. In 4th Workshop on
the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010
[30] Hanke T., König L., Wagner S., Matthes S.. 2010. “DGS Corpus & Dicta-Sign: The Hamburg Studio Setup”. In 4th Workshop on the Representation and
Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010.
[31] Geraci C., Bayley R., Branchini C., Cardinaletti A., Cecchetto C., Donati C., Giudice S., Mereghetti E., Poletti F., Santoro M., Zucchi S. 2010. “Building a
corpus for Italian Sign Language. Methodological issues and some preliminary results”. In 4th Workshop on the Representation and Processing of Sign
Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010.
[32] Forster J., Stein D., Ormel E., Crasborn O., Ney H.. 2010. “Best Practice for Sign Language Data Collections Regarding the Needs of Data-Driven
Recognition and Translation”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT
2010), Valletta, Malta, May 2010.
[33] Crasborn O., Sloetjes H.. 2010. “Using ELAN for annotating sign language corpora in a team setting”. In 4th Workshop on the Representation and
Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010.
[34] Efthimiou E., Fotinea S., Hanke T., Glauert J., Bowden R., Braffort A., Collet C., Maragos P., Goudenove F. 2010. “DICTA-SIGN: Sign Language
Recognition, Generation and Modelling with application in Deaf Communication”. In 4th Workshop on the Representation and Processing of Sign
Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010.
[35] Dreuw P., Ney H., Martinez G., Crasborn O., Piater J., Miguel Moya J., and Wheatley M., 2010 “The SignSpeak Project - Bridging the Gap Between
Signers and Speakers”. In 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010),
Valletta, Malta, May 2010a.
[36] Dreuw P., Forster J., Gweth Y., Stein D., Ney H., Martinez G., Verges Llahi J., Crasborn O., Ormel E., Du W., Hoyoux T., Piater J., Moya Lazaro JM, and
Wheatley M. 2010 “SignSpeak - Understanding, Recognition, and Translation of Sign Languages”. In 4th Workshop on the Representation and Processing
of Sign Languages: Corpora and Sign Language Technologies (CSLT 2010), Valletta, Malta, May 2010b.
[37] Prillwitz, S., R. Leven, H. Zienert, T. Hanke, J. Henning, et-al. 1989. “Hamburg Notation System for Sign Languages – An introductory Guide”.
International Studies on Sign Language and the Communication of the Deaf, Volume 5. Institute of German Sign Language and Communication of the
Deaf, University of Hamburg, 1989.
[38] Herrero, A., 2004 “Escritura alfabética de la Lengua de Signos Española” Universidad de Alicante. Servicio de Publicaciones.
[39] Koehn, Philipp. 2010. “Statistical Machine Translation”. Cambridge University Press.
[40] Stolcke A., 2002. “SRILM – An Extensible Language Modelling Toolkit”. Proc. Intl. Conf. on Spoken Language Processing, vol. 2, pp. 901-904, Denver.
[41] Koehn, P., Hoang, H., “Factored Translation Models”. 2007. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning, pp. 868–876, Prague, June 2007.