2002 JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 Conversion of Sign Language to Spoken Sentences by Means of a Sensory Glove Pietro Cavallo Electronic Engineering Department, University of Tor Vergata, Roma, Italy Email:[email protected] Giovanni Saggio Electronic Engineering Department, University of Tor Vergata, Roma, Italy Email: [email protected] Abstract—Normal communication of deaf people in ordinary life still remains an unrealized task, despite the fact that Sign Language Recognition (SLR) made a big improvement in recent years. We want here to address this problem proposing a portable and low cost system, which demonstrated to be effective for translating gestures into written or spoken sentences. This system relies on a home-made sensory glove, used to measure the hand gestures, and on Wavelet Analysis (WA) and a Support Vector Machine (SVM) to classify the hand’s movements. In particular we devoted our efforts to translating the Italian Sign Language (LIS, Linguaggio Italiano dei Segni), applying WA for feature extractions and SVM for the classification of one hundred different dynamic gestures. The proposed system is light, not intrusive or obtrusive, to be easily utilized by deaf people in everyday life, and it has demonstrated valid results in terms of signs/words conversion. Index Terms—Machine intelligence, Pattern analysis, Human Computer Interaction, Support Vector Machines, Data glove, Sign Language Recognition, Italian Sign Language, LIS S I. INTRODUCTION imilarly to spoken languages, Sign Languages (SLs) are complete and powerful forms of communication, and are adopted by millions of people, who suffer from deafness, all over the world. SLs are different among different regions and states: American Sign Language (ASL), Japanese Sign Language (JSL), German Sign Language (GSL), Lingua Italiana dei Segni (LIS, the Italian Sign Language), etc. However, each single SL relies commonly on gesture and posture mimic interpretation, which plays a fundamental role in non-verbal communication too. SL comprehension is generally limited only to a restricted part of population, thus deaf people remain restrained apart from social interactions with hearing persons, and the body-language/non-verbal communication is mainly limited to “feelings” rather than consciousness Manuscript received August 27, 2013; revised January 21, 2014; accepted January 23, 2014. © 2014 ACADEMY PUBLISHER doi:10.4304/jsw.9.8.2002-2009 understanding. These are the main reasons why a system for “Automatic” Sign Language Recognition (A-SLR) is welcome and a greater human effort is being devoted to realize it [1]. A-SLR could allow deaf people to communicate without limitations, could assign suitable interpretations to non-verbal communication without ambiguities, and could be the basis of a new form of human-computer interaction, since the most natural way of human-computer interaction would be through speech and gestures, rather than the current adopted interfaces like keyboard and mouse. Particularly, the integration of A-SLR with automatic text writing or speech synthesis modules can furnish a “speaking aid” to deaf people. Our purpose is to realize a system capable to measure human static and dynamic postures, classify them as “words” organized into “sentences”, in order to “translate” SLs into written or spoken languages. To this aim, a great challenge comes from the measure of human postures with acquisition devices that are both comfortable and easy to use. In particular, we have to focus our attention on hand postures and movements since the SL is mostly, even if not exclusively, based on them. In fact SL is made of hand gestures, body and facial expressions, but the latter is not strictly “fundamental”. Currently, hand movements are commonly measured by motion tracking techniques based on digital cameras. These systems offer interesting results in terms of accuracy, but suffer from a small active range and disadvantages related to portability. In order to overcome these problems, new measuring systems have been developed, in particular the ones based on sensory gloves (i.e. gloves equipped with sensors capable to convert hand movements into electrical signals). The first sensory gloves on the market [2-3] were obtrusive for movements, uncomfortable and capable to measure only a very low number of Degrees Of Freedom (DOF) of the human hand. Nowadays, commercial sensory gloves are quite light, comfortable and capable to measure up to 22 DOFs [4], covering flex-extension and abdu-adduction movements of the fingers and spatial arrangement of the wrist. However, the cost remains generally too high (tens of thousands of dollars) to be widely applied in everyday scenarios. Thus, new sensory gloves have been created by research groups all over the JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 world [5], proposing different raw fabric materials (stretchable, washable, light, comfortable to don), different kind of sensors to convert postures into electrical signals (based on optics, magnetic fields, inertia principles, etc.), different front-end electronic solutions, and so on. Here, we designed and realized a sensory glove based on both bend and inertial sensors sewn onto a Lycra support. The overall system is explained in details in section IV.A. The electric values coming from the sensors are processed by an ad-hoc designed electronic circuitry that is connected to a personal computer where digital data are on-line processed to obtain a classification of the measured gestures. By means of the glove and the hardware, we measured the static and dynamic postures of the hands of two persons who performed the signs belonging to the LIS (Linguaggio Italiano dei Segni, i.e. the Italian sign language). Thanks to feature extraction and gesture classification, we “translated” in real-time a stream of LIS signs into Italian words and sentences, which can be converted into speech via commercially available synthesizer software. Apart from sign language translation, our data glove could have a great impact on Human-Computer Interaction applications, being the user able to translate its actions and gestures into computer commands. This is an interesting application, especially with new devices coming up in the next future (e.g. smart glasses) which need a natural way of interaction. It also has great potential for video games and for all those scenarios where nonverbal communication can improve the way people interact with machines. We are strongly convinced that a glove system is much more user friendly and natural to use, especially when interfaced to a smartphone via wireless/Bluetooth connection, while a camera-based system suffers indeed of a restricted visual field, and it is not portable in everyday life. A-SLR can be considered close to speech recognition algorithms, even though speech recognition only deals with one signal while A-SLR has to handle multiple signals, i.e. hand shape, orientation, position and movement. In addition, the commercial inertial sensors (as the ones we used for the glove) have in general “noisy” problems that must be taken into account. For the “recognition” of the hand’s kinematic, we adopted wavelet analysis: the dynamic components of the gestures were described by wavelet coefficients and then furnished to a Support Vector Machine (SVM) module for the classification process. Three different chunking methods were first considered and then implemented in order to truncate the signs contained in a continuous sentence. The remainder of this paper is organized as follows. Section II reviews the state of the art. In Section III there is a brief introduction to the Italian Sign Language. In Section IV, details of our system are given: the sensory glove (subsection A); the acquisition data circuitry (subsection B); proposition of three different chunking © 2014 ACADEMY PUBLISHER 2003 methods (subsection C); feature extraction and gesture classification (subsections D and E). In section V the experimental results are given. The conclusions are drawn in the last section. II. RELATED WORKS The most commonly adopted method to measure human movements relies on “motion tracking” procedures, which usually involves optical techniques (e.g. Optotrak Certus @ www.ndigital.com, OptiTrack @ www.naturalpoint.com/optitrack). E. J. Muybridge was a pioneer of optical analysis of movements, by investigating about human and animal movements through photograph sequences [6]. Even though the method at the beginning was inevitably inaccurate, nowadays the optical system has reached complete maturity, but with the drawbacks of high costs, the need to arrange a scenario with cameras, and the need for a high computational workload. As time passed, other methods were invented such as magnetic based tracking systems (see the 3D Space Fastrak as an example, www.polhemus.com), hybrid systems combining inertial sensors with an optical marking system (MoCap), or mechanical based system consisting of an exoskeleton made of lightweight aluminum rods that follow the motion of the user's bones (see the Gypsy 7 Torso Motion Capture System, www.metamotion.com). Finally, we must mention wearable sensor systems consisting of sensors located on the human body as the sensory gloves. These gloves are quite precise to measure finger joints angles with a resolution of the order of one degree, but with the drawback of the imperfect space tracking of the hand, i.e. the location of the hand in the 3D surroundings. In previous works [7-10] the most adopted way to carry out the tracking was to use magnetic field devices such as the well known Polhemus Fastrak [11]. Even though it provides accurate information about trajectory and orientation of the object to be tracked, the cost is fairly high. In addition, it has a limited action range and it is not portable at all. In [12] and [13] space tracking was carried out by using accelerometers. These kinds of inertial devices require little energy to work, are comparatively cheap and small enough to be integrated in a glove. Unfortunately, for the Fastrak previously discussed, the signals obtained from the accelerometers are more difficult to process and with lower spatial resolution. According to our knowledge, only few works are related to inertial devices in A-SLR applications; in [12], they are used to track hand movements to recognize a set of 3 gestures; in [13] the authors carried out the classification of 17 different trajectories (not related to any sign languages), using fusion features based on Wavelet analysis to represent the trajectories. We here extend the work of [13], using Wavelet coefficients as well to represent the dynamic components of the gestures, but analyzing a larger set of 100 gestures belonging to a real sign language (i.e. LIS), and using an innovative low-cost acquisition device. Regarding the classification part, in literature different 2004 JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 classificationn techniques have been used u to recoggnize static or dynaamic gesturess [14]. The firsst A-SLR atteempts were focusedd on static gesttures (e.g. fing ger alphabets)), and the recognitioon algorithmss were not so similar to the ones used for isoolated and/or continuous signs belonginng to dynamic gesstures. Since we w are going to study the llatter ones we deccided to focuss on previouss related workks on dynamic gestture recognitioon and to leav ve out the workks on static gesturees. [7] is one oof the first atteempts made in n A-SLR; it ussed 5 neural netwoorks to recognnize 200 diffe ferent signs w with a data glove eequipped withh Fastrak. In n [15] the auuthors adopted an algorithm based on a feeed forward N Neural Network trainned with a Baackpropagation n algorithm. Inn [16] a simple recuurrent networkk was used to separate the signs in a continuuous stream and a Hiddeen Markov M Model (HMM) wass trained to classify the gestures. Inn [8] temporal cluustering and a variant of HMM, know wn as transition moovement moddel, were com mbined togethher to classify a laarge set of Chinese C signs,, but still usiing a Fastrak devicce to trace thee space path. Very few w works are focuused on the Itaalian sign langguage (LIS) recognnition. In [10] a Self-Organ nizing Map N Neural Network wass adopted andd combined with w lips moveement recognition vvia optical techniques to chu unk the wordss in a continuous sttream. On thee other hand, in i [17] the auuthors realized a virrtual avatar caapable to mov ve according tto the signs of the IItalian LIS. III. ITALIAN N SIGN LANGU UAGE Even thouugh it is nott legally reco ognized, LIS is a complex laanguage takeen as a sttandard wayy of communicatiion among thee Italian deaf community. A As in the example of figure 1, each sign is identified by four parameters: Thee location wheere the sign is made Thee ‘configuratioon’, which is the t shape assuumed by tthe hand whilee performing the t sign c. Thee orientation of o the hand’s palm p d. Thee movement off the hand in the t space. For the puurpose of this work, we lim mited our studyy to a a. b. Fig. 1. Sign Language param meters © 2014 ACADEMY PUBLISHER LIS subset of 100 0 signs taken randomly fro om the Italiann Sign n Language diictionary [18].. The numberr of signs heree chossen is compaarable to the average num mber found inn literature. IV V. SYSTEM DEESCRIPTION o the system m Fiigure 2 depiccts the generral diagram of arch hitecture. Fig. F 2. General syystem architecture Each block of the t diagram, w which represen nts the logicall step of the overrall method, hhas been reaalized by ourr ved Technicall research group naamed HITEG ((Health Involv up).Wearing tthe data glove, the user iss Engineering Grou m a single ggesture or a sequence off askeed to perform conttinuous signs. These are meeasured and siignals are sentt to a PC in a digital format. Dat ata are then prreprocessed too ove potential noise n by meanns of a smooth hing algorithm m. remo If a sentence s consttituted by seveeral gestures is recorded thee Chu unker will split it in single ssigns. We asso ociate to eachh gestu ure a featuree vector, whiich is then passed p to thee classsifier, which utilizes it forr learning or classificationn purp pose. A detailed exp planation of aall the logicall steps can bee fou und in the follo owing paragraaphs. A. The Hiteg Da ata Glove In I order to measure m handd movements we used ourr hom memade senso ory glove, shoown in figure 3. 3 It I is made out of a Lycra suppport and equ uipped with 155 ben nd sensors, to measure the movements of o each distall inteerphalangeal joint, j proximaal interphalang geal joint andd metacarpophalan ngeal joint. Thhe resistance value of eachh ben nd sensor variees according too the amount of o bending, soo we can measuree flexion and extension mo ovements. Noo nsors were devoted d to aabduction an nd adductionn sen mo ovements sincee their influennce on the meeasurement off the postures is negligible tto the aim of o our work.. oreover, to provide p inforrmation abou ut the spacee Mo possition of the hand, two innertial measu urement unitss (IM MU) Analog Combo Boarrd Razor from SparkFun,, JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 equipped witth a 3 axis acccelerometer an nd a gyroscopee, are integrated in the glove, onne in correspon ndence of the back r below the elbow. The IMU of the hand aand the other right sensor on thee elbow is attacched with a removable armbband, and it will sooon become wireless w in ordeer to make thee user experience m more comfortaable. 2005 smoothing (eq. 1), so that uunwanted hig gh frequencyy com mponents (e.g. noise spikes) are discarded d. Given G a time series is the movin ng average off , ordeer 1: ∑ where w valu ue. (1)) is the weight assocciated to the i-th observedd C. Chunking To obtain a corrrect A-SLR, tthe recognition n of each signn ntence) plays a in a continuous strream (that connstitutes a sen damental role. The solutioon is far from m trivial, alsoo fund becaause each sign n differs in durration form eacch other and itt is no ot given to kno ow each “startiting point”. Th his problem is,, in faact, increased by Movemennt Epenthesis (ME), i.e. thee mov vement that lin nks two conseecutive signs,, which is nott easilly predictable being differen ent each time. Three different ways off chunking have beenn inveestigated in thiis work: Fig. 3. The HITEG H sensory gllove B. Data Acqquisition In figure 44, the hardwarre needed for data acquisitiion is shown. Resisstance signals from bend seensors are acqu quired at a samplingg rate of 720 Hz H per sensor, and convertedd into voltage unitss in a 10-bit digital formaat, by means of a voltage dividder, a microccontroller and an ADC uniit, so resulting 10023 logical levels l (0÷210-1), matchingg an interval of 0÷ ÷4.99V analoggical values. Fig. 4. Hardware archhitecture for data acquisition. Us provide threee values (in the range of 3 ) Both IMU associated too the accelerattion on the thrree axis, and three values for thhe angular speed (in the ran nge of 300°// ) a acquired w with a along each axxis in the 3D space. Data are PIC 18F45500 microcontrooller, digitalizeed and providded to a computer vvia ZigBee prootocol. Becausse of possible nnoise, the acquired data are then filtered using a moving aveerage © 2014 ACADEMY PUBLISHER a. b. c. Introducing artificiaal pauses; Detectting static connfigurations; Training a neuraal network to t recognizee transittions. The first metho od is quite tri rivial. The useer is asked too ore and after thhey perform th he sign, so thee insert a pause befo nker detects the sign accoording to thee unchangingg chun meaasured values. The second method m is som mewhat trick kier since noo a requested. It relies on th he fact that inn artifficial pauses are LIS the configuration of the hannd (see section III) remainss statiic or replays twice or morre times the same gesturee durin ng the whole sign. s So the neew gesture can n be identifiedd acco ording to thee static or pperiodically moving m handd conffiguration. This is a simple and efficient way to figuree out where to cut the continuuous stream of signs, butt prob blems come up u when two cconsecutive signs s with thee same configuratio on are perform med. In that casse the user hass ntroduce an artificial a transsition movemeent (in whichh to in fingers are moved d disorderly) to split the signs correctly. nique is similaar to the one adopted a in [7].. The third techn h transition is manuallyy labeled, marking thee Each begiinning and the end. A Muultilayer Perceeptron Neurall Netw work (MLPNN N, [7] and [155]) is trained associating a thee outp put of the net to a Gaussiann centered in the middle off the transition. t In this way, the nnetwork should d learn how too deteect a transition n, associating aan output valu ue in betweenn 0 an nd 1; outside th his range the ssignal corresponds to a signn (fig. 5). This method is the most com mfortable and natural n for thee userr because doees not requiree them to inssert unnaturall actio ons during thee “conversatioon”. But unfo ortunately, thee accu uracy is lower than the otheer two method ds (about 80% % of correct chunkiing while the former two split s correctlyy almo ost the totalitty of the signns) and the leearning phasee requ uires a lot of tiime due to thee labeling partt. 2006 JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 velet. wav Equation (2) defines thhe Continuo ous nsform (CWT) of a time siggnal x(t). Tran , Fig. 5. Labeled transitions andd the correspondiing neural networrk’s Gausssian output. The accuraacy of the threee methods deescribed is repoorted in table I. A deeper investtigation of chu unking methoods is beyond the ppurpose of thiis work; the reader r is invitted to read papers [[8], [9], and [116] for furtherr information. √ ∗ dt Wavelett (2)) In n this equation n, ψ is the wavvelet function while a and b represent the translation and scale coefficientss pectively. The result W(a, b)) is the waveleet coefficientss resp that relate the sign nal to the wavvelet at differeent scales andd transslations [19]. To reduce thee computatioonal workload d, a discretee WT is often useed, called Discrete Wavelett verssion of the CW Tran nsform (DWT T) [20-22]. T The DWT is obtained byy samp pling the tim me-scale planee using digitaal filters withh diffeerent cutoff frrequencies at different scalles. High-passs filterrs will producce detailed innformation, d[n], while low w passs filters will give coarse aapproximation ns, a[n]. Thee scalee is determineed by upsamplling and down nsampling thee sign nal before filtering it. The depth of this t decompossition tree deepends on thee TABLE I CHUNK KERS ACCURACY Chunkking method Accuracy on 2000 gesturres a. Artificiial pauses 98.8% b1. Static configurations 92.3% b2. Static configurations 97.7% with arttificial transitionss 86.4% c. Neural N Network Accuracy off chunkers (perrcentage of corrrectly chunked sings) evaluated on 2000 streams of signss, with an averagee of 10 signs per sstream. Only method b11 and c were testeed on the same daata set due to the nature of the methodss. Method b wass tested with an nd without introdducing artificial transition movements. D. Feature Extraction Being inteerested in statiic postures off the hand we need only to consiider one value per each meaasured joint’s aangle (besides the sspace path) too represent the configurationn of a sign. Hence, we take into account a the meedian values oof the ugh representtative 15 joint anggles in order to have enou values. m different subjjects are calib brated by recorrding Data from m (see section V V) to a continuouss stream of movements collect the m min-max value range of each h bend sensor.. The outpuut of the IMU Us cannot bee directly useed as features sincce it varies acccording to th he duration oof the gesture, or thhe sign relatedd part of the ou utput can be shhifted along the tim me axis. In order to extract usseful featuress we investiggated Wavelet Trannsform (WT). WT, in conttrast to the cllassic Fourier Traansform, worrks on a multi-scale m bbasis, decomposingg a signal intoo different lev vels of detail uusing fixed blocks called waveleets. The waveelet functions used d by scaaling (dilation)) and to represent a signal are derived mother shifting (trannslation) a gennerating functtion called mo © 2014 ACADEMY PUBLISHER Fig. 6: Wavelet decoomposition tree. gth of the sign nal. The resullt of the DWT T of the timee leng sign nal is given by y the concatennation of all th he coefficientss a[n] and d[n], from m the last leveel of decompo osition. We W performed tests with diifferent typess of wavelets,, Fig. 7: 7 Daubechies waavelet of order 2. and discovered thaat the Daubechhies wavelet of o order 2 (fig.. mum efficienccy for our app plication. Thiss 7) gave the maxim is prrobably due to its smoothinng feature thaat made moree suitaable to detect changes of thee IMU’s output values. After A sampling the IMU’s siggnals down to o 256 samples,, the detail d wavelet coefficients aat the first, second, third andd JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 2007 fourth levels (129 + 66 + 34 + 18 coefficients) and the approximation wavelet coefficients at the fourth level (18 coefficients) were computed. In order to reduce the dimensionality of the feature vectors, the following statistics over the set of the wavelet coefficients were used to represent each of the six IMU signals, as done in [23]: ones. In this context we utilized a One vs All strategy [25] where a set of binary classifiers are constructed comparing each time one class to the rest. (1) Energy of each sub-band. (2) Maximum coefficient for each sub-band. (3) Minimum coefficient for each sub-band. (4) Mean of the coefficients for each sub-band. (5) Standard deviation of the coefficients for each sub-band. From our experiments, these 25 features turned out to be sufficient to obtain a good representation of the signal without requiring too much computational power. Summing everything up, a sign was represented by a feature vector of 315 components (25 features by 6 components for each of the 2 IMUs plus 15 angle values for bending sensors). E. Classification The automatic classification of a sign was made by a Support Vector Machine (SVM), in order to find the hyper-plane that maximizes the separation between classes [24]. Let us assume we have two linearly separable classes k yk ∈ {−1,+1} . Each training example x ∈ ℜ N belongs to a class. A separation hyper-plane between the classes can be written as: T k y k ( w x + b) > 0 k =1,…, m (3) The aim of an SVM is to maximize the minimum distance between the training examples and the separating hyper-plane, also called margin of separation. We can rewrite (3) rescaling the weights w and the bias b as T k y k ( w x + b) ≥ 1 k =1,…,m (4) Therefore, the margin of separation is 1/|| w || and maximizing it is equivalent to minimize the Euclidean norm of the weight vector w . The optimum hyper-plane will be then found in terms of weights and bias (Fig. 8). All k the points x that satisfy the constraints (4) with the equality sign are called support vectors. By means of Lagrange Multipliers we are able to consider only these vectors to find the optimal w and b. We used a Soft Margin SVM that introduces a tolerance to classification errors. A constant C controls the trade-off between the maximization of the margin and the minimization of the error. As SVM was originally designed for binary classification, it cannot deal directly with multi-class classification problems, which is usually solved by decomposition of the problem into several two-classes © 2014 ACADEMY PUBLISHER Fig. 8: Optimal separating hyper-plane corresponding to the SVM solution. The support vectors lie on the dashed lines in the caption. We adopted a linear kernel, and we set the C parameter to a value of 25 through a validation test. V. EXPERIMENT RESULTS AND DISCUSSION The experiments were performed with two subjects, the first expert and the second inexpert of the LIS language. The adopted experimental procedure can be summarized as: • • • To wear the glove To calibrate the system To record 100 signs 15 times per sign. A “session” was the fulfillment of all previous steps. Two sessions for each subject were recorded in two different days in order to test the adaptability of the system to glove repositioning. Since the piezoresistive sensors are subject to a linear shift of the response function due to temperature change or different repositioning of the glove, a simple calibration was performed each time the user started using the software. The calibration consisted in continuously recording the gesture of opening and closing the hand for ten seconds, in order to find the minimum and maximum values to correctly identify the range of response for each sensor. To avoid noisy outliers the median of several maximum and minimum values was considered instead of one single value. We trained the SVM with 10 examples of a sign and tested it with the remaining five. The evaluation of the performances of the system was referred to as percentage accuracy according to the following formula: ∗ 100% (5) 2008 JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 Table II shhows the meaan percentage accuracy for each subject, wherre Subject 1 iss the expert LIIS user. Tests were done on a groowing-size sett of gestures consisting of 3 0, 50 TABLE II MEA AN PERCENTAGE ACCURACY A FOR EACH E SUBJECT Subject Set 30 Set 50 0 Set 1100 I (expert) 96% 95.6% % 94.88% II (non expert)) 97.6% 96.4% % 93.22% SVM mean accuracy obtainned on a growing size set of signs. Accuracy valuees are expressed in percentage for f each subject. Both training and testt set are made from the same session. and 100 diffeerent randomlly taken signs, to understannd the accuracy trennd according to t the growth of the datasett. An averagge accuracy off 96.8% is obtaained on a redduced set of 30 gesttures while forr the completee set (100 gesttures) the accuracyy decreases only o of 2.8% %, obtaining 94% correctly classsified signs. In this firsst case, we traained and testted the SVM with examples reccorded in the same session. It means thaat the user should train the sysstem every tim me they weaar the n convenien nt and excesssively glove. This iis obviously not time demandding. In Tablee III, accuracy y achieved trai aining and testing thhe classifier onn different sesssions is show wn. subject and then tested t on the other one. In this case, thee systeem could be trrained by an eexpert user and d then used byy everryone else, skiipping the learrning part. No ow the system m resu ults “weaker” because the ttwo users did d not perform m sign ns in the sam me way, especcially if we consider thatt subject II is a no ovel LIS userr. The averagee accuracy off 7% is reached, meaning thaat the system is still usablee 75.7 but with a higheer error rate, so that it maakes sense too grate an autom matic correctioon system [26 6]. integ Although A it is not n possible tto make a com mparison withh the other o A-SLR works reporteed in literaturre (because off diffeerent acquisitiion systems aand different set of signs),, the percentages p reeached in thiss study are en ncouraging. Inn addiition, it has to o be considereed that our sen nsory glove iss cheaaper than statee of the art glloves, being equipped e withh IMU Us and not magnetic trackeers. The grap ph in figure 9 represents a rough h comparison of production n costs for sixx diffeerent types of gesture acquiisition systemss. As it can bee noticced, our system is rather low w cost, comparable only too low performance optical systeems, which we w believe nott p haviing the same portability. T TABLE III MEAN PERCENTAGE AC CCURACY ON DIFF FERENT SESSIONS Subject Session I over Session II I n II Session over Sessio on I Avera rage I (expert) 88.1% 89.4% % 88.755% II (non expert)) 81.2% 80.4% % 80.88% Accuracies oobtained training the SVM on a seession and testingg it on the remaining. Accuracy valuess are expressed in i percentage foor each subject. This is a moore practical way w since thee training phaase is performed onnly once at thee very first tim me, so the userr does not have to w worry about reepositioning th he glove. As we couuld expect, mean m accuracy y is lower thann the previous testt because repoositioning thee glove affectts the overall perfformance: 844.8% of average accuraccy is obtained in tthis case. How wever, it has to be noticedd that Subject 2 is nnot an expert LIS user, hen nce the tendenncy to forget how tto perform a gesture in th he second se ssion causes a loweer accuracy (880.8%) respect to the expertt user (88.75%), buut the accuracyy remains accceptable. So far, thee system is “add-personam”, in the sense th that it has to be trainned by the sam me person who o is going to uuse it. Table IV rreports accuraacies when SV VM is trainedd on a T TABLE IV MEAN PERCENTAGE AC CCURACY ON DIFF FERENT SUBJECTSS Subject ffor trainingg Subjeect for testing Mean accuraacy I (experrt) II (nnon expert) 78% II (non exppert) I (expert) 73.4% Accuracies oobtained training the SVM on a su ubject and testingg it on the other one. A Accuracy values are a expressed in percentage. © 2014 ACADEMY PUBLISHER Fig. F 9: Approxim mate costs comparrison among diffeerent gesture acquisition sy systems. VI. CONCLLUSION A completely portable sensorry glove was developed d andd used d to recognizee 100 gesturess belonging to o Italian Signn Lang guag. Based on o bend and IMU sensors, the glove iss com mpletely portab ble, unobtrusivve and comfo ortable to don.. In th he future, ourr system will become wireeless, and thee posssibility of being interfacedd to smartph hones will bee prov vided. In this way, it willl be easy to carry in anyy situaation, giving the t user freeddom to move and naturallyy interract with other people. An A A-SLR systtem based on wavelet featu ures and SVM M is allso proposed.. Experimentss, performed on the samee subject, demonstrrate results wiith an average accuracy off %. This can be b consideredd a very good d result, evenn 94% thou ugh it cannot be b directly coompared to liteerature due too the lack l of related d works about LIS. In n further stu udies, new ways of ch hunking andd classsification willl be investigatted; the overaall system willl beco ome wireless and a integratedd into mobile devices d with a speeech synthesizer to realize a new aid for everydayy com mmunication between deaf aand hearing peeople. JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 REFFERENCES [1] S.C.W. O Ong, S. Rangganath, "Autom matic sign langguage analysis: a survey and thhe future beyon nd lexical meanning", IEEE Trransactions onn Pattern Ana alysis and Maachine Intelligennce, vol. 27, pp.. 873-891, Jun 2005. 2 [2] T. G. Zim mmerman et al.,, "A hand gestu ure interface devvice", SIGCHI B Bull. 17, 1987. [3] T. Takahhashi, F. Kishinno, "Hand gestu ure coding baseed on experimeents using a handd gesture interfa face device", SIG GCHI Bull., 19991. [4] D. J. Sturrman, D. Zeltzeer, "A survey off glove-based innput", IEEE Coomputer Graphhics and Appliccations, vol. 144, pp. 30-39, Jaan 1994. [5] L. Dipiettro et al., "A Suurvey of Glove-Based System ms and Their Appplications", IEE EE Transaction ns on Systems, Man, and Cybeernetics—Part C: Application ns and Reviewss, vol. 38, pp. 4661-482, Jul. 20008. [6] T. P. Anndriacchi, E. J. J Alexander, "Studies of hhuman locomotioon: past, preesent and fu uture", Journaal of Biomechaanics, vol. 33, pp. p 1217-1224, Mar. 2000. [7] S. S. Felss, G. E. Hintonn, “Glove-Talk: A Neural Nettwork Interface Between a Data-Glove and a Speech S Synthessizer”, N Networkss, vol. 3, 1992. IEEE Traansactions on Neural [8] G. Fangg et al., "Larrge-Vocabulary y Continuous Sign Languagee Recognition Based on Transition-Move T ement Models", IEEE Transsactions on Systems, S Man, and Cybernettics—Part A: Syystems and Hum mans, vol. 37, ppp. 1-9, Jan. 20077. [9] M. Maebatake et al., “Siign Language Recognition R Bassed on ulti-Stream HM MM”, Position and Movemeent Using Mu Second Internationall Symposium m on Univversal Communiication, ISUC '08, Osaka. [10] I. Infantinno et al., “A Syystem for Sign Language Senntence Recognitiion Based onn Common Seense Context”,, The Internatioonal Conferennce on Comp puter as a Tool, EUROCO ON 2005. [11] S. McMiillan, "Upper Body B Tracking Using U the Polhhemus M Techhnical Fastrak", Naval Postgraaduate School, Monterey, Report NP NPSCS-96-002, Jan. J 1996. [12] Ji-Hwan Kim et al., “3-D “ Hand Motion Trackingg and Gesture Recognition using a daata glove”, IEEE Internatioonal Symposium m on Industria al Electronics, ISIE 2009. Accelerometer Based Gesturee Recognition U Using [13] Z. He, “A Fusion Feeatures and SVM”, Journal off software, vol. 6, pp. 1042-10449, Jun. 2011. [14] S. Mitra,, T, Acharya, "Gesture Recognition: A Surrvey", IEEE Trransactions onn Pattern Ana alysis and Maachine Intelligennce, vol. 37, pp.. 311-324, May y 2007. [15] D. Xu, “A neural nettwork approach h for hand geesture recognitioon in virtual reaality driving traiining system off spg.”, 18th Inteernational Connference on Pattern P Recognnition, ICPR 20006. [16] G. Fangg, W. Gao, “A SRN/H HMM System m for Signer-inndependent Continuous Sign Langguage Recognitiion”, Fifth IE EEE International Conferencce on Automatiic Face and Gessture Recognition, Washingtonn DC, USA, 20002. [17] N. Bertolldi et al., “On the t creation and d the annotationn of a large-scalle Italian-LIS parallel p corpus””, Proceedings oof the 4th Workkshop on the Reppresentation an nd Processing off Sign Languagees: Corpora and a Sign Lang guage Technoloogies, Internatioonal Conferennce on Langua age Resourcess and Evaluatioon, 2010. [18] O. Romeeo, "Dizionario dei segni - Laa lingua dei seggni in 1400 imm magini", Zanichhelli, 1991. © 2014 ACADEMY PUBLISHER 2009 [19] A. Godfrey et al., “A Continnuous Wavelet Transform andd Delirium Motorric Subtyping”,, Classification Method for D d Rehabilitationn IEEE Transactions on Neuraal Systems and Engineering, vol. v 17, no. 3, JU UNE 2009. [20] E. D. Übeyli, “ECG beats cclassification using multiclasss support vector machines with error correcting g output codes””, 684, 2007. Digital Signal Processing, vool. 17, pp. 675–6 [21] S. Soltani, "On the use of thhe wavelet deccomposition forr time series prediction", p Neeurocomputing, vol. 48, pp.. 267–277, 2002 2. [22] M. Unser, A. Aldroubi, A "A revview of waveletts in biomedicall applications", Proc. P IEEE 84, vol. 4, pp. 626 6–638, 1996. [23] P. S. Addison, "Wavelet transsforms and the ECG: E a review"", Physiological Measurement, vol. 26, pp. 155-199, Aug.. 2005. [24] Vladimir N. Vapnik, V “The N Nature of Statistical Learningg Theory”, Sprin nger-Verlag, Neew York, 1995. [25] C. W. Hsu, C. C J. Lin, “A comparison of Methods forr multiclass Support Vector Maachine”, IEEE Transactions T onn Neural Networrks, vol. 13, 20002. [26] A. S. Dolgo opolov, "Autoomatic spelling correction",, Cybernetics an nd systems anaalysis, vol. 22 2, pp. 332-339,, May 1986. Pietro Cavaallo received thee B.S. and M.S.. degrees in Computer Eng gineering from m University ““Tor Vergata”, Rome, Italy inn 2008 and 2011, resp pectively. Hee collaboratess with HITEG group (Healthh Involved Teechnical Engin neering Group,, www.hiteg.uuniroma2.it), of o Tor Vergataa University ssince 2007. In n 2010 he hass been an exchhange student at a University off Bergen, Noorway. His ressearch interestss ude Machine Leearning, Humann Computer Intteraction, Brainn inclu Com mputer Interface, Image Processsing and Computer Vision. Giovanni SSaggio receiveed the Dr. Eng.. degree in Electronic Eng gineering from m the Univerrsity “Tor Veergata”, Rome,, Italy, in 19991, and the Ph.D. P degree inn Microelectrronic and Teleccommunicationn Engineering ng, in 1996, from f the samee University.. In 1991, he did d his thesis inn the Nanoeelectronics Ressearch Centre,, Department nt of Electronicss and Electricall Engineering ng, University of Glasgow,, Scotland, and in the t Cavendishh Laboratory, Department D off Physsics, University of Cambridge,, England. His initial researchh activ vities covered th he area of nanoddevices, surfacee acoustic wavee devicces, noise in eleectronic devicess. He is currently a Researcherr at thee University “T Tor Vergata”, Ro Rome, Italy. He teaches t coursess abou ut Electronics at the Faculty off Engineering and a the Facultyy of Medicine. M His current c researchh interests are related to thee field ds of biosensors,, sensor’s charaacterization, hum man kinematicss’ meassurements and brain computeer interface. Hee has publishedd tens of papers on international joournals and tw wo books aboutt urrently: membeer of Italian Spaace BioMedicall electtronics. He is cu Society; Principal in nvestigator W.PP. DCMC Projeect (from Spacee oject regardingg Italiaan Agency); Prrincipal investiigator of a Pro Aero onautics; Promoter and coorddinator of the HITEG groupp (Heaalth Involveed Technicaal Engineerring Group,, www w.hiteg.uniromaa2.it), Universitty “Tor Vergataa”, Rome, Italy..
© Copyright 2026 Paperzz