Using Thetos, Text-into-Sign-Language Translator for Polish Przemysław Szmal Silesian University of Technology, Institute of Informatics e-mail: [email protected] 1. Introduction Deaf community in Poland counts ca. 50.000 persons that suffer from 100% lack of hearing, among them deaf-born and those who lost the hearing later in consequence of illness or an accident. For all of them the sign language is the basic and sometimes the only means of communication. In the society they are linguistic minority – they experience troubles in contacts on the basis of the spoken language. To equalize their chance with other people, interpreters are required, who would assist them when necessary. The number of human interpreters is still insufficient, there are also situations when mediation of a human can be embarassing (e.g. doctor’s visit...). An alternative could be an automatic interpreter. An attempt to solve the problem has been taken in the Institute of Informatics at the Silesian University of Technology. The effect is the Thetos system. Its main function is translating Polish texts into the Polish Sign Language. Thetos is also equipped with a multimedia PSL dictionary. The regular presentation (described in this paper) is concerned with the translator, while the dictionary was intended to feature only in the demonstration, not covered here. Thetos is the first and (its authors think...) the only this type system for Polish. It is in many aspects close to the eSIGN system developed at the Hamburg University and used to allow the deaf people access to information distributed by the German Government. Basic progress in Thetos development was made in the framework of two projects financed by the Polish Committee for Scientific Research: Project no. 8 T11C 007 17 entitled „Polish texts translation into the Sign Language” („Translacja tekstów polskich na język migowy”) in years 1999-2001. Its effect was a simple prototype (primarily called TGT-1) Project no. 4 T11C 024 24 entitled „Assisting hearing-impaired persons with an electronic generation of the Sign Language” („Wspomaganie osób niepełnosprawnych słuchowo za pomocą komputerowej generacji języka migowego”) in years 2003-2005. The effect was an advanced prototype. Now we have started preparations to a next official project without stopping system development. A general view of the Thetos translator is shown in fig. 1. THETOS input text file Linguistic part output sequence of words Multimedia part animated gesture sequence Fig. 1. A general view of the translator of the Thetos system In the translator we can distinguish two main parts: the linguistic and the multimedia one. They are responsible for respective processing phases. The linguistic part transforms input sequence of words into an equivalent word sequence arranged acording to the sign language grammar rules. This sequence is taken by the multimedia part and interpreted. The effect is an animated sequence of gestures performed by a cartoon character (avatar) designed specially for this purpose. For example, in case if we put to the linguistic part the following text: Słyszałem, że masz dostać nową pracę. (I’ve heard that you are to get a new job) (English equivalent is intentionally a bit coarse) On output of the linguistic part we get: ja słyszeć już. ty dostać mieć nowy praca. (I hear already. you get have new job) Snapshots of the output gesture sequence are shown in fig. 2. ja (I) słyszeć (hear) już (already) ty (you) dostać (get) mieć (have) Fig. 2. Translation example nowy (new) praca (job) 2. Interpreting text to the Deaf – what for? People often ask: „What is text interpretation for, as the Deaf can read the text by himself?”. In fact, in many cases he cannot, and even if he can, he prefers sign language communication. Simply, it is his mother language, used originally in contacts with the world, without alternative. Then any phonic language is a foreign one. Learning to read is difficult, since the Deaf child cannot build associations between words and sounds. Written words are simply images. The Deaf has to succesively learn words (with their inflective variants) that correspond to objects he knows from seing (serious problems are caused by abstract notions!), and learn syntax and semantic rules. Special difficulties occur in case of reach-inflexion languages, as e.g. Polish. In communication with speaking partner, the hearing impaired person uses complementary communication channels – if loss of hearing is not complete, he receives deformed voice. Additional information is conveyed by the movements of lips. Al this helps him to reconstruct the utterance. 3. How to teach Sign Language to hearing people? While teaching sign language we show to the learner the rules according to which the language is built. To build utterances we need signs. Each sign has a number of so called distinctive traits. A sign differs from the other signs with a unique combination of “values” of those traits. Major elements used to build a sign include: arrangement of fingers of one or both hands, position of the hand or hands, articulation place meant as hand position as opposed to the body and to themselves, articulation (movement) – its direction articulation (movement) – the way movement is performed. First three elements express so called static parameters, while the remaining two – the dynamic ones. A number of basic gestures is used to denote letters and some numerals – they form so called finger alphabet. Simple configurations used in alphabetic signs appear in other, more complex ones. The sign language has a relatively rich vocabulary, which is however less developed than that of phonic languages. To describe gestures the SiGN, Szczepankowski Gestographic Notation is used. While building sign language sentences or longer utterances we use a specific syntax which significantly differs from that of Polish: the strict word order is in signing an important thing, while Polish is flexible in this respect. An interesting element is spacial organization of utterances. An utterance is a form of live show – the objects that feature it are by convention situated at different angles relative to the signer. While making reference to a specific object, the signer simply turns to or points to the place assigned to that object. Building the Thetos translator can be viewed as “teaching” it all essential elements of the sign language. Now the translator can better or worse comply to the rules of the language at different levels. 4. How to translate utterances? 4.1 Text-into-text processing As it was said above, input to and output from the linguistic part of the system uses text files. A simplified translation scheme is shown in fig. 3. Text analysis Translation Text generation Fig. 3. Translation scheme in Thetos Individual processing steps are done by chained processors, which perform the morphologic, syntactic, and semantic analysis of the input utterance, etc. The processing is based on a formalised description of input and output language grammar as well as on a set of dictionaries, which contain varied information about words and possible linguistic constructions. Processing steps involve: recognition, classification, and detection of specific traits of words appearing in the input utterance (morphologic analysis), finding syntactical structures (syntax analysis), finding the meaning of the structures (semantic analysis), proper translation at the internal representation level (transfer), finding syntactic structure for the output (sign language) utterance, selecting words to be used in the output utterance. Important difficulties are caused by the fact that in many cases the results of analysis are ambiguous. Individual processors may indicate a number of possibile classifications, structures or meanings. The questions are: how to classify a word? – e.g. the word “para” is an inflective form of the noun “para” meant either as “pair” or “steam”, the noun “par” (“peer” – noble), and the verb “parać się” (“dabble”) (such coinciding forms are called homonyms), what sentence structure should be proposed? what roles should be assigned to words/subphrases? In order to perform the translation (and to produce one output utterance), we have to resolve all ambiguities. To do this, a look into the context is helpful; unfortunately, it is a highly time and space consumming work. The translator can deal with different linguistic construction. Let’s consider a continuous text, meant as a sequence of sentences that refer each to other. Due to stylistic reasons, in such text anaphora and ellipses are often used. (In case of anaphora, in different places in the text we refer to the same element, as person, thing, activity etc. In order not to use each time the same wording, we introduce synonyms or other substitute elements. In case of ellipsis, some utterance element is dropped). The translator has to detect such construction and recover from them, i.e. to decide, which element would occur in the text if it were not stylistically improved. While polishing the output utterance, the stylistic rules specific for the sign language should be applied to this extended text. The idea is shown in fig. 4. In the example a fragment of “The Little Red Riding Hood” fairy tale has been used. It is worth mention that the linguistic processing is done by means of the linguistic analysis server LAS. It can work locally but is also accessible via Internet at address http://thetos.zo.iinf.polsl.gliwice.pl/. S1 S2 S3 S4 Dziewczynka chodziła w czerwonej pelerynce z kapturkiem i dlatego wszyscy nazywali ją Czerwonym Kapturkiem. Jej mamusia także lubiła używać tego imienia, bo pasowało ( ) do dziewczynki. The girlie wore (litt.: walked in) a red pelerine with a hood and therefore everyone called her the little Red Riding Hood. Also her mother liked to use that name since it fitted to the girlie. Fig. 4. Anaphora and ellipsis in sample text. S1-S4 – simple statements. Single and double underline – first and second anaphora family members, respectively; parentheses in bold and italics mark the place of ellipsis (lacking word: ono = it). 4.2 Gesture presentation In this section we discuss selected problems connected with working of the animation part of the Thetos system. Transformation scheme A general scheme of transformations done in the multimedia part of the system is shown in fig. 5. Fig. 5. General scheme of transformations in the multimedia part. The multimedia part starts from some gesture specification. This specification is transformed with reference to avatar’s geometry specification, what brings in effect the specification of avatar’s movements. It is immediately used to drive the animation in the ”real time”. Specifying gestures Gestures are specified using the Szczepankowski Gestografic Notation (SiGN). SiGN is a simple, concise, and easy to use textual notation. Originally it was thought as a support for teaching the Polish Sign Language. It is widely known and used in the Polish Deaf community. SiGN has been evolving – last changes have been introduced for the needs of automatic processing, with the Thetos system in mind. There are over 6000 entries in older dictionary versions. Over 1400 entries have been converted into the newest one. SiGN is convenient for people, but troublesome in implementation, since it is: incomplete – there are no means to express some sophisticated gestures, imprecise, in some cases internally contradictory, highly intuitive – this is intuition which helps the human (but not the computer program) to properly interpret imprecise records and avoid contradictions. An example of SiGN description for the sign “write” is shown in fig. 6. The specification: PE:23k }/ LBk:13k # P:III\V<-“ says that initially the right hand (P – Pol. Prawa) is shaped in “E” sign, palm oriented horizontally (2), its end directed obliquely to the front and to the inside (3). The right hand is situated over the left one and has a direct contact with it either point-wise or on a small area (}/). In the same time the left hand (L) is shaped in “B” with the thumb pulled back (k – Pol. kciuk = thumb). The palm is oriented horizontally with its inside up (1), its end directed obliquely to the front and to the inside (3). The sign is a dynamic one (#); the right hand without changing its configuration moves to the front and to the right (III\V), the motion is less then average (<) and is done along another body part (-), in this case – the left hand. This motion is repeated twice (“). write PE:23k }/ LBk:13k # P:III\V<-” Fig. 6. Description of the sign “write” in SiGN. Geometry and transformations The image of an avatar is built on the basis of skeleton, which is covered with a “skin”. The skeleton resembles the humans one. Dimensions of the skeleton are constant. As in the real world, bones can move. While moving, the bones change angles (with point in joints) relatively to their neighbors. There are 23 angles for each palm (see fig. 7), for the upper part of the body the total is 67 angles. Due to anatomical constraints the number of parameters needed to specify current configuration of the body can be reduced to 45 angles and positions. Animation method applied in Thetos is based on so-called key-frames, which represent body position in specific moments or movement stages. To get smooth animation, the system produces additional images for intermediate states by means of interpolation. Transformation, i.e. the change of body configuration or state, is described in terms of translations (meant as bone position in space) and rotations (turn relative to superior bone; this compound value is represented by a quaternion, matrix 44). P3 P1 P4 P1 P5 S1 S2 S3 S4 Fig. 7. Human hand skeleton At an early stage of animation mechanism development, there was a number of problems to be solved. Some of them are illustrated in fig. 8. First, the avatar had a primitive look – its body was limited to a sketchy front part, without rear and sides. Also the face without details was criticised. Second, the animation mechanism was imprecise and couldn’t cope with body part collisions. Fig. 8. Problems with gesture presentation Characters, „skins” In order to prepare and drive animations in Thetos, a number of methods, mechanisms, and various specific elements is used. Support for this, on different levels, the constructor can find in existing professional software packages. In case of Thetos, animation is done by an animation engine that is aresponsible for animation planning, rendering (i.e. drawing complete images on the screen), and animation control. The engine imports data that describe a three-dimensional avatar’s model (see fig. 9). This model with dynamic changes of its state can be prepared self-dependently by Thetos mechanisms or can be borrowed from a 3D package. The basic support package for the system is 3D Studio. Thetos currently uses its own renderer module and animation engine as well. The engine incorporates collision removal mechanism; the movements are generated using simple and inverse kinematics. One of things which can be adopted from 3D packages are avatar’s “skins”. Figure 10 demonstrates a family of candidates for avatars. First three come from the 3D Studio; the arlequin, a bit changed, has been featuring in the last official system version. The young lady is an original design and will replace the arlequin. rendering data data import 3D modeling to the and export engine animation planning animation control Fig. 9. Technological details of gesture presentation. Fig. 10. Gallery of avatars Face animation Face animation has two main aspects. The first is reproduction of face mimics, which is an essential component element of some signs. The second is speech animation, which opens an additional communication channel for the Deaf. Our discussion focuses on the latter. The essence of speech animation is presentation of lips configuration they have by emitting phonemes. There are 41 different fonemes in Polish – they can be produced using 8 basic lips configurations, which are shown in fig. 11. From the technical point of view, lips movements are controlled by means of special „bones”. As to realisation, in order to make only impression of avatar’s speaking, it would suffice to display a series of basics configurations according to phonemes previously detected in words being signed. To allow lip reading, intermediate lip positions should be introduced. Possibly the technique of “morphing” would be useful. Morphing consists in quasi-continuous deformations during transition from one image to another; basis for this process is change in position of selected points of the image). Config. S (s,z,ź,ż,c) Config. P (p,b,m) Config. E (e,ę) Config. U (u,ł) Config. A (a,o,y,ą) Config. T (t,d,l,n,r) Config. I (i,k,g,h,j) Config. W (f,w) Fig. 11. Lips configurations used in speech animation: 4.3 General effect At first glance, the translations produced by Thetos look inviting. However, a serious application of the system requires some shortcomings to be removed. The necessary outlay of work in each case is different. From the user’s point of view, the worst thing is still poor quality of presentation that causes the gestures to be in many cases incomprehensible. To improve the quality, fine-tuning of the animation mechanism is necessary. By the way, it may entail some modifications in the SiGN notation. The next cause of problems is insufficient vocabulary. The dictionaries of the linguistic part contain over 60,000 words description. Our experiments with free Internet texts indicate that this number should be doubled (at present, 10-15% of words cannot be recognized). As it was mentioned above, the dictionary of gesture specifications has about 1400 gesture specifications. It covers the set of signs obligatory for the basic-level sign language human interpreter. Of course, this set should be extended. Anyway, as compared to phonic vocabulary, the sign language one is less reach. It causes the need for a mechanism that would be finding substitutes for gestures that are not known to the system. It could simply be synonyms or characteristics given in a few words. Some problems – fortunately easiest to solve – issue from the oversimplified communication between the main system parts. Specifically, information about different meanings of words is lost. In consequence, the animation part can use a sign, which in the dictionary is connected with improper homonym. A challenge is introducing inverse translations, i.e. from gestures into text. 5. How can Thetos help in education? Possible application fields of the system involve – among others: health service (the first prototype of Thetos was oriented towards first medical aid), different kind of offices, cinema and TV, where it could be used as an electronic narrator, interpreter training, education. One of children education elements – apart from “standard” assistance in sign language learning – should be interpretation of fairy tales. Students can profit from the system directly, in an interactive contact. The system can also be useful in case where didactic stuff is prepared and registered off-line to be replayed anytime and anywhere it is necessary. This solution is advantegeous because of limited quality of translation, which could be easily verified and corrected by the human. 6. Final remarks The practical objective of the Thetos project has been to elaborate of a text-into-sign language translator for Polish. Translation process involves linguistic and multimedia processing. The objective has been reached, although system’s capabilities are limited: it is still a prototype. However, system development process hasn’t stopped. The plans of the project team involve: improvement of translation mechanisms, both in linguistic and multimedia part; among other things, attention will be paid to the new avatar and its skills, essential extension of dictionaries in the scope of both the general and application specific vocabulary. The authors think sometimes about extending the scope of languages on the phonic and the sign side. Works over the inverse (that is sign-into-text) translation should start soon. Acknowledgement In this presentation the Author used elements of a number of his and Dr. Jaroslaw Francik’s presentations shown at other conferences. The demo version of the Thetos system and further information about the project can be found at address: http://thetos.polsl.pl
© Copyright 2026 Paperzz