Conversion of Sign Language to Spoken Sentences by Means of a

2002
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014
Conversion of Sign Language to Spoken
Sentences by Means of a Sensory Glove
Pietro Cavallo
Electronic Engineering Department, University of Tor Vergata, Roma, Italy
Email:[email protected]
Giovanni Saggio
Electronic Engineering Department, University of Tor Vergata, Roma, Italy
Email: [email protected]
Abstract—Normal communication of deaf people in ordinary
life still remains an unrealized task, despite the fact that Sign
Language Recognition (SLR) made a big improvement in
recent years. We want here to address this problem
proposing a portable and low cost system, which
demonstrated to be effective for translating gestures into
written or spoken sentences. This system relies on a
home-made sensory glove, used to measure the hand gestures,
and on Wavelet Analysis (WA) and a Support Vector
Machine (SVM) to classify the hand’s movements. In
particular we devoted our efforts to translating the Italian
Sign Language (LIS, Linguaggio Italiano dei Segni),
applying WA for feature extractions and SVM for the
classification of one hundred different dynamic gestures. The
proposed system is light, not intrusive or obtrusive, to be
easily utilized by deaf people in everyday life, and it has
demonstrated valid results in terms of signs/words
conversion.
Index Terms—Machine intelligence, Pattern analysis,
Human Computer Interaction, Support Vector Machines,
Data glove, Sign Language Recognition, Italian Sign
Language, LIS
S
I. INTRODUCTION
imilarly to spoken languages, Sign Languages (SLs)
are complete and powerful forms of communication,
and are adopted by millions of people, who suffer from
deafness, all over the world. SLs are different among
different regions and states: American Sign Language
(ASL), Japanese Sign Language (JSL), German Sign
Language (GSL), Lingua Italiana dei Segni (LIS, the
Italian Sign Language), etc. However, each single SL
relies commonly on gesture and posture mimic
interpretation, which plays a fundamental role in
non-verbal communication too. SL comprehension is
generally limited only to a restricted part of population,
thus deaf people remain restrained apart from social
interactions
with
hearing
persons,
and
the
body-language/non-verbal communication is mainly
limited to “feelings” rather than consciousness
Manuscript received August 27, 2013; revised January 21, 2014;
accepted January 23, 2014.
© 2014 ACADEMY PUBLISHER
doi:10.4304/jsw.9.8.2002-2009
understanding. These are the main reasons why a system
for “Automatic” Sign Language Recognition (A-SLR) is
welcome and a greater human effort is being devoted to
realize it [1]. A-SLR could allow deaf people to
communicate without limitations, could assign suitable
interpretations to non-verbal communication without
ambiguities, and could be the basis of a new form of
human-computer interaction, since the most natural way
of human-computer interaction would be through speech
and gestures, rather than the current adopted interfaces like
keyboard and mouse. Particularly, the integration of
A-SLR with automatic text writing or speech synthesis
modules can furnish a “speaking aid” to deaf people.
Our purpose is to realize a system capable to measure
human static and dynamic postures, classify them as
“words” organized into “sentences”, in order to “translate”
SLs into written or spoken languages. To this aim, a great
challenge comes from the measure of human postures with
acquisition devices that are both comfortable and easy to
use. In particular, we have to focus our attention on hand
postures and movements since the SL is mostly, even if
not exclusively, based on them. In fact SL is made of hand
gestures, body and facial expressions, but the latter is not
strictly “fundamental”.
Currently, hand movements are commonly measured by
motion tracking techniques based on digital cameras.
These systems offer interesting results in terms of
accuracy, but suffer from a small active range and
disadvantages related to portability. In order to overcome
these problems, new measuring systems have been
developed, in particular the ones based on sensory gloves
(i.e. gloves equipped with sensors capable to convert hand
movements into electrical signals).
The first sensory gloves on the market [2-3] were
obtrusive for movements, uncomfortable and capable to
measure only a very low number of Degrees Of Freedom
(DOF) of the human hand. Nowadays, commercial
sensory gloves are quite light, comfortable and capable to
measure up to 22 DOFs [4], covering flex-extension and
abdu-adduction movements of the fingers and spatial
arrangement of the wrist. However, the cost remains
generally too high (tens of thousands of dollars) to be
widely applied in everyday scenarios. Thus, new sensory
gloves have been created by research groups all over the
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014
world [5], proposing different raw fabric materials
(stretchable, washable, light, comfortable to don),
different kind of sensors to convert postures into electrical
signals (based on optics, magnetic fields, inertia principles,
etc.), different front-end electronic solutions, and so on.
Here, we designed and realized a sensory glove based
on both bend and inertial sensors sewn onto a Lycra
support. The overall system is explained in details in
section IV.A.
The electric values coming from the sensors are
processed by an ad-hoc designed electronic circuitry that
is connected to a personal computer where digital data are
on-line processed to obtain a classification of the
measured gestures.
By means of the glove and the hardware, we measured
the static and dynamic postures of the hands of two
persons who performed the signs belonging to the LIS
(Linguaggio Italiano dei Segni, i.e. the Italian sign
language). Thanks to feature extraction and gesture
classification, we “translated” in real-time a stream of LIS
signs into Italian words and sentences, which can be
converted into speech via commercially available
synthesizer software.
Apart from sign language translation, our data glove
could have a great impact on Human-Computer
Interaction applications, being the user able to translate its
actions and gestures into computer commands.
This is an interesting application, especially with new
devices coming up in the next future (e.g. smart glasses)
which need a natural way of interaction.
It also has great potential for video games and for all
those scenarios where nonverbal communication can
improve the way people interact with machines.
We are strongly convinced that a glove system is much
more user friendly and natural to use, especially when
interfaced to a smartphone via wireless/Bluetooth
connection, while a camera-based system suffers indeed of
a restricted visual field, and it is not portable in everyday
life.
A-SLR can be considered close to speech recognition
algorithms, even though speech recognition only deals
with one signal while A-SLR has to handle multiple
signals, i.e. hand shape, orientation, position and
movement. In addition, the commercial inertial sensors (as
the ones we used for the glove) have in general “noisy”
problems that must be taken into account.
For the “recognition” of the hand’s kinematic, we
adopted wavelet analysis: the dynamic components of the
gestures were described by wavelet coefficients and then
furnished to a Support Vector Machine (SVM) module for
the classification process. Three different chunking
methods were first considered and then implemented in
order to truncate the signs contained in a continuous
sentence.
The remainder of this paper is organized as follows.
Section II reviews the state of the art. In Section III there is
a brief introduction to the Italian Sign Language. In
Section IV, details of our system are given: the sensory
glove (subsection A); the acquisition data circuitry
(subsection B); proposition of three different chunking
© 2014 ACADEMY PUBLISHER
2003
methods (subsection C); feature extraction and gesture
classification (subsections D and E). In section V the
experimental results are given. The conclusions are drawn
in the last section.
II. RELATED WORKS
The most commonly adopted method to measure human
movements relies on “motion tracking” procedures, which
usually involves optical techniques (e.g. Optotrak Certus
@
www.ndigital.com,
OptiTrack
@
www.naturalpoint.com/optitrack). E. J. Muybridge was a
pioneer of optical analysis of movements, by investigating
about human and animal movements through photograph
sequences [6]. Even though the method at the beginning
was inevitably inaccurate, nowadays the optical system
has reached complete maturity, but with the drawbacks of
high costs, the need to arrange a scenario with cameras,
and the need for a high computational workload. As time
passed, other methods were invented such as magnetic
based tracking systems (see the 3D Space Fastrak as an
example,
www.polhemus.com),
hybrid
systems
combining inertial sensors with an optical marking system
(MoCap), or mechanical based system consisting of an
exoskeleton made of lightweight aluminum rods that
follow the motion of the user's bones (see the Gypsy 7
Torso Motion Capture System, www.metamotion.com).
Finally, we must mention wearable sensor systems
consisting of sensors located on the human body as the
sensory gloves.
These gloves are quite precise to measure finger joints
angles with a resolution of the order of one degree, but
with the drawback of the imperfect space tracking of the
hand, i.e. the location of the hand in the 3D surroundings.
In previous works [7-10] the most adopted way to carry
out the tracking was to use magnetic field devices such as
the well known Polhemus Fastrak [11]. Even though it
provides accurate information about trajectory and
orientation of the object to be tracked, the cost is fairly
high. In addition, it has a limited action range and it is not
portable at all.
In [12] and [13] space tracking was carried out by using
accelerometers. These kinds of inertial devices require
little energy to work, are comparatively cheap and small
enough to be integrated in a glove. Unfortunately, for the
Fastrak previously discussed, the signals obtained from
the accelerometers are more difficult to process and with
lower spatial resolution.
According to our knowledge, only few works are related
to inertial devices in A-SLR applications; in [12], they are
used to track hand movements to recognize a set of 3
gestures; in [13] the authors carried out the classification
of 17 different trajectories (not related to any sign
languages), using fusion features based on Wavelet
analysis to represent the trajectories. We here extend the
work of [13], using Wavelet coefficients as well to
represent the dynamic components of the gestures, but
analyzing a larger set of 100 gestures belonging to a real
sign language (i.e. LIS), and using an innovative low-cost
acquisition device.
Regarding the classification part, in literature different
2004
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014
classificationn techniques have been used
u
to recoggnize
static or dynaamic gesturess [14]. The firsst A-SLR atteempts
were focusedd on static gesttures (e.g. fing
ger alphabets)), and
the recognitioon algorithmss were not so similar to the ones
used for isoolated and/or continuous signs belonginng to
dynamic gesstures. Since we
w are going to study the llatter
ones we deccided to focuss on previouss related workks on
dynamic gestture recognitioon and to leav
ve out the workks on
static gesturees.
[7] is one oof the first atteempts made in
n A-SLR; it ussed 5
neural netwoorks to recognnize 200 diffe
ferent signs w
with a
data glove eequipped withh Fastrak. In
n [15] the auuthors
adopted an algorithm based on a feeed forward N
Neural
Network trainned with a Baackpropagation
n algorithm. Inn [16]
a simple recuurrent networkk was used to separate the signs
in a continuuous stream and a Hiddeen Markov M
Model
(HMM) wass trained to classify the gestures. Inn [8]
temporal cluustering and a variant of HMM, know
wn as
transition moovement moddel, were com
mbined togethher to
classify a laarge set of Chinese
C
signs,, but still usiing a
Fastrak devicce to trace thee space path.
Very few w
works are focuused on the Itaalian sign langguage
(LIS) recognnition. In [10] a Self-Organ
nizing Map N
Neural
Network wass adopted andd combined with
w lips moveement
recognition vvia optical techniques to chu
unk the wordss in a
continuous sttream. On thee other hand, in
i [17] the auuthors
realized a virrtual avatar caapable to mov
ve according tto the
signs of the IItalian LIS.
III. ITALIAN
N SIGN LANGU
UAGE
Even thouugh it is nott legally reco
ognized, LIS is a
complex laanguage takeen as a sttandard wayy of
communicatiion among thee Italian deaf community. A
As in
the example of figure 1, each sign is identified by four
parameters:
Thee location wheere the sign is made
Thee ‘configuratioon’, which is the
t shape assuumed
by tthe hand whilee performing the
t sign
c. Thee orientation of
o the hand’s palm
p
d. Thee movement off the hand in the
t space.
For the puurpose of this work, we lim
mited our studyy to a
a.
b.
Fig. 1. Sign Language param
meters
© 2014 ACADEMY PUBLISHER
LIS subset of 100
0 signs taken randomly fro
om the Italiann
Sign
n Language diictionary [18].. The numberr of signs heree
chossen is compaarable to the average num
mber found inn
literature.
IV
V. SYSTEM DEESCRIPTION
o the system
m
Fiigure 2 depiccts the generral diagram of
arch
hitecture.
Fig.
F 2. General syystem architecture
Each block of the
t diagram, w
which represen
nts the logicall
step of the overrall method, hhas been reaalized by ourr
ved Technicall
research group naamed HITEG ((Health Involv
up).Wearing tthe data glove, the user iss
Engineering Grou
m a single ggesture or a sequence off
askeed to perform
conttinuous signs. These are meeasured and siignals are sentt
to a PC in a digital format. Dat
ata are then prreprocessed too
ove potential noise
n
by meanns of a smooth
hing algorithm
m.
remo
If a sentence
s
consttituted by seveeral gestures is recorded thee
Chu
unker will split it in single ssigns. We asso
ociate to eachh
gestu
ure a featuree vector, whiich is then passed
p
to thee
classsifier, which utilizes it forr learning or classificationn
purp
pose.
A detailed exp
planation of aall the logicall steps can bee
fou
und in the follo
owing paragraaphs.
A. The Hiteg Da
ata Glove
In
I order to measure
m
handd movements we used ourr
hom
memade senso
ory glove, shoown in figure 3.
3
It
I is made out of a Lycra suppport and equ
uipped with 155
ben
nd sensors, to measure the movements of
o each distall
inteerphalangeal joint,
j
proximaal interphalang
geal joint andd
metacarpophalan
ngeal joint. Thhe resistance value of eachh
ben
nd sensor variees according too the amount of
o bending, soo
we can measuree flexion and extension mo
ovements. Noo
nsors were devoted
d
to aabduction an
nd adductionn
sen
mo
ovements sincee their influennce on the meeasurement off
the postures is negligible tto the aim of
o our work..
oreover, to provide
p
inforrmation abou
ut the spacee
Mo
possition of the hand, two innertial measu
urement unitss
(IM
MU) Analog Combo Boarrd Razor from SparkFun,,
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014
equipped witth a 3 axis acccelerometer an
nd a gyroscopee, are
integrated in the glove, onne in correspon
ndence of the back
r
below the elbow. The IMU
of the hand aand the other right
sensor on thee elbow is attacched with a removable armbband,
and it will sooon become wireless
w
in ordeer to make thee user
experience m
more comfortaable.
2005
smoothing (eq. 1), so that uunwanted hig
gh frequencyy
com
mponents (e.g. noise spikes) are discarded
d.
Given
G
a time series
is the movin
ng average off
,
ordeer
1:
∑
where
w
valu
ue.
(1))
is the weight assocciated to the i-th observedd
C. Chunking
To obtain a corrrect A-SLR, tthe recognition
n of each signn
ntence) plays a
in a continuous strream (that connstitutes a sen
damental role. The solutioon is far from
m trivial, alsoo
fund
becaause each sign
n differs in durration form eacch other and itt
is no
ot given to kno
ow each “startiting point”. Th
his problem is,,
in faact, increased by Movemennt Epenthesis (ME), i.e. thee
mov
vement that lin
nks two conseecutive signs,, which is nott
easilly predictable being differen
ent each time.
Three different ways off chunking have beenn
inveestigated in thiis work:
Fig. 3. The HITEG
H
sensory gllove
B. Data Acqquisition
In figure 44, the hardwarre needed for data acquisitiion is
shown. Resisstance signals from bend seensors are acqu
quired
at a samplingg rate of 720 Hz
H per sensor, and convertedd into
voltage unitss in a 10-bit digital formaat, by means of a
voltage dividder, a microccontroller and an ADC uniit, so
resulting 10023 logical levels
l
(0÷210-1), matchingg an
interval of 0÷
÷4.99V analoggical values.
Fig. 4. Hardware archhitecture for data acquisition.
Us provide threee values (in the range of 3 )
Both IMU
associated too the accelerattion on the thrree axis, and three
values for thhe angular speed (in the ran
nge of 300°// )
a acquired w
with a
along each axxis in the 3D space. Data are
PIC 18F45500 microcontrooller, digitalizeed and providded to
a computer vvia ZigBee prootocol. Becausse of possible nnoise,
the acquired data are then filtered using a moving aveerage
© 2014 ACADEMY PUBLISHER
a.
b.
c.
Introducing artificiaal pauses;
Detectting static connfigurations;
Training a neuraal network to
t recognizee
transittions.
The first metho
od is quite tri
rivial. The useer is asked too
ore and after thhey perform th
he sign, so thee
insert a pause befo
nker detects the sign accoording to thee unchangingg
chun
meaasured values.
The second method
m
is som
mewhat trick
kier since noo
a requested. It relies on th
he fact that inn
artifficial pauses are
LIS the configuration of the hannd (see section III) remainss
statiic or replays twice or morre times the same gesturee
durin
ng the whole sign.
s
So the neew gesture can
n be identifiedd
acco
ording to thee static or pperiodically moving
m
handd
conffiguration. This is a simple and efficient way to figuree
out where to cut the continuuous stream of signs, butt
prob
blems come up
u when two cconsecutive signs
s
with thee
same configuratio
on are perform
med. In that casse the user hass
ntroduce an artificial
a
transsition movemeent (in whichh
to in
fingers are moved
d disorderly) to split the signs correctly.
nique is similaar to the one adopted
a
in [7]..
The third techn
h transition is manuallyy labeled, marking thee
Each
begiinning and the end. A Muultilayer Perceeptron Neurall
Netw
work (MLPNN
N, [7] and [155]) is trained associating
a
thee
outp
put of the net to a Gaussiann centered in the middle off
the transition.
t
In this way, the nnetwork should
d learn how too
deteect a transition
n, associating aan output valu
ue in betweenn
0 an
nd 1; outside th
his range the ssignal corresponds to a signn
(fig. 5).
This method is the most com
mfortable and natural
n
for thee
userr because doees not requiree them to inssert unnaturall
actio
ons during thee “conversatioon”. But unfo
ortunately, thee
accu
uracy is lower than the otheer two method
ds (about 80%
%
of correct chunkiing while the former two split
s
correctlyy
almo
ost the totalitty of the signns) and the leearning phasee
requ
uires a lot of tiime due to thee labeling partt.
2006
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014
velet.
wav
Equation (2) defines thhe Continuo
ous
nsform (CWT) of a time siggnal x(t).
Tran
,
Fig. 5. Labeled transitions andd the correspondiing neural networrk’s
Gausssian output.
The accuraacy of the threee methods deescribed is repoorted
in table I. A deeper investtigation of chu
unking methoods is
beyond the ppurpose of thiis work; the reader
r
is invitted to
read papers [[8], [9], and [116] for furtherr information.
√
∗
dt
Wavelett
(2))
In
n this equation
n, ψ is the wavvelet function while a and b
represent the translation and scale coefficientss
pectively. The result W(a, b)) is the waveleet coefficientss
resp
that relate the sign
nal to the wavvelet at differeent scales andd
transslations [19].
To reduce thee computatioonal workload
d, a discretee
WT is often useed, called Discrete Wavelett
verssion of the CW
Tran
nsform (DWT
T) [20-22]. T
The DWT is obtained byy
samp
pling the tim
me-scale planee using digitaal filters withh
diffeerent cutoff frrequencies at different scalles. High-passs
filterrs will producce detailed innformation, d[n], while low
w
passs filters will give coarse aapproximation
ns, a[n]. Thee
scalee is determineed by upsamplling and down
nsampling thee
sign
nal before filtering it.
The depth of this
t
decompossition tree deepends on thee
TABLE I
CHUNK
KERS ACCURACY
Chunkking method
Accuracy on 2000 gesturres
a. Artificiial pauses
98.8%
b1. Static configurations
92.3%
b2. Static configurations
97.7%
with arttificial transitionss
86.4%
c. Neural N
Network
Accuracy off chunkers (perrcentage of corrrectly chunked sings)
evaluated on 2000 streams of signss, with an averagee of 10 signs per sstream.
Only method b11 and c were testeed on the same daata set due to the nature
of the methodss. Method b wass tested with an
nd without introdducing
artificial transition movements.
D. Feature Extraction
Being inteerested in statiic postures off the hand we need
only to consiider one value per each meaasured joint’s aangle
(besides the sspace path) too represent the configurationn of a
sign. Hence, we take into account
a
the meedian values oof the
ugh representtative
15 joint anggles in order to have enou
values.
m different subjjects are calib
brated by recorrding
Data from
m
(see section V
V) to
a continuouss stream of movements
collect the m
min-max value range of each
h bend sensor..
The outpuut of the IMU
Us cannot bee directly useed as
features sincce it varies acccording to th
he duration oof the
gesture, or thhe sign relatedd part of the ou
utput can be shhifted
along the tim
me axis.
In order to extract usseful featuress we investiggated
Wavelet Trannsform (WT). WT, in conttrast to the cllassic
Fourier Traansform, worrks on a multi-scale
m
bbasis,
decomposingg a signal intoo different lev
vels of detail uusing
fixed blocks called waveleets. The waveelet functions used
d
by scaaling (dilation)) and
to represent a signal are derived
mother
shifting (trannslation) a gennerating functtion called mo
© 2014 ACADEMY PUBLISHER
Fig. 6: Wavelet decoomposition tree.
gth of the sign
nal. The resullt of the DWT
T of the timee
leng
sign
nal is given by
y the concatennation of all th
he coefficientss
a[n] and d[n], from
m the last leveel of decompo
osition.
We
W performed tests with diifferent typess of wavelets,,
Fig. 7:
7 Daubechies waavelet of order 2.
and discovered thaat the Daubechhies wavelet of
o order 2 (fig..
mum efficienccy for our app
plication. Thiss
7) gave the maxim
is prrobably due to its smoothinng feature thaat made moree
suitaable to detect changes of thee IMU’s output values.
After
A
sampling the IMU’s siggnals down to
o 256 samples,,
the detail
d
wavelet coefficients aat the first, second, third andd
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014
2007
fourth levels (129 + 66 + 34 + 18 coefficients) and the
approximation wavelet coefficients at the fourth level (18
coefficients) were computed. In order to reduce the
dimensionality of the feature vectors, the following
statistics over the set of the wavelet coefficients were used
to represent each of the six IMU signals, as done in [23]:
ones. In this context we utilized a One vs All strategy [25]
where a set of binary classifiers are constructed comparing
each time one class to the rest.
(1) Energy of each sub-band.
(2) Maximum coefficient for each sub-band.
(3) Minimum coefficient for each sub-band.
(4) Mean of the coefficients for each sub-band.
(5) Standard deviation of the coefficients for each
sub-band.
From our experiments, these 25 features turned out to be
sufficient to obtain a good representation of the signal
without requiring too much computational power.
Summing everything up, a sign was represented by a
feature vector of 315 components (25 features by 6
components for each of the 2 IMUs plus 15 angle values
for bending sensors).
E. Classification
The automatic classification of a sign was made by a
Support Vector Machine (SVM), in order to find the
hyper-plane that maximizes the separation between
classes [24].
Let us assume we have two linearly separable classes
k
yk ∈ {−1,+1} . Each training example x ∈ ℜ N
belongs to a class. A separation hyper-plane between the
classes can be written as:
T
k
y k ( w x + b) > 0
k =1,…, m
(3)
The aim of an SVM is to maximize the minimum
distance between the training examples and the separating
hyper-plane, also called margin of separation. We can
rewrite (3) rescaling the weights w and the bias b as
T
k
y k ( w x + b) ≥ 1
k =1,…,m
(4)
Therefore, the margin of separation is 1/|| w || and
maximizing it is equivalent to minimize the Euclidean
norm of the weight vector w . The optimum hyper-plane
will be then found in terms of weights and bias (Fig. 8). All
k
the points x that satisfy the constraints (4) with the
equality sign are called support vectors.
By means of Lagrange Multipliers we are able to
consider only these vectors to find the optimal w and b.
We used a Soft Margin SVM that introduces a tolerance to
classification errors. A constant C controls the trade-off
between the maximization of the margin and the
minimization of the error.
As SVM was originally designed for binary
classification, it cannot deal directly with multi-class
classification problems, which is usually solved by
decomposition of the problem into several two-classes
© 2014 ACADEMY PUBLISHER
Fig. 8: Optimal separating hyper-plane corresponding to the SVM
solution. The support vectors lie on the dashed lines in the caption.
We adopted a linear kernel, and we set the C parameter
to a value of 25 through a validation test.
V. EXPERIMENT RESULTS AND DISCUSSION
The experiments were performed with two subjects, the
first expert and the second inexpert of the LIS language.
The adopted experimental procedure can be summarized
as:
•
•
•
To wear the glove
To calibrate the system
To record 100 signs 15 times per sign.
A “session” was the fulfillment of all previous steps.
Two sessions for each subject were recorded in two
different days in order to test the adaptability of the system
to glove repositioning.
Since the piezoresistive sensors are subject to a linear
shift of the response function due to temperature change or
different repositioning of the glove, a simple calibration
was performed each time the user started using the
software. The calibration consisted in continuously
recording the gesture of opening and closing the hand for
ten seconds, in order to find the minimum and maximum
values to correctly identify the range of response for each
sensor. To avoid noisy outliers the median of several
maximum and minimum values was considered instead of
one single value.
We trained the SVM with 10 examples of a sign and
tested it with the remaining five. The evaluation of the
performances of the system was referred to as percentage
accuracy according to the following formula:
∗ 100%
(5)
2008
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014
Table II shhows the meaan percentage accuracy for each
subject, wherre Subject 1 iss the expert LIIS user. Tests were
done on a groowing-size sett of gestures consisting of 3 0, 50
TABLE II
MEA
AN PERCENTAGE ACCURACY
A
FOR EACH
E
SUBJECT
Subject
Set 30
Set 50
0
Set 1100
I (expert)
96%
95.6%
%
94.88%
II (non expert))
97.6%
96.4%
%
93.22%
SVM mean accuracy obtainned on a growing size set of signs.
Accuracy valuees are expressed in percentage for
f each subject. Both
training and testt set are made from the same session.
and 100 diffeerent randomlly taken signs, to understannd the
accuracy trennd according to
t the growth of the datasett.
An averagge accuracy off 96.8% is obtaained on a redduced
set of 30 gesttures while forr the completee set (100 gesttures)
the accuracyy decreases only
o
of 2.8%
%, obtaining 94%
correctly classsified signs.
In this firsst case, we traained and testted the SVM with
examples reccorded in the same session. It means thaat the
user should train the sysstem every tim
me they weaar the
n convenien
nt and excesssively
glove. This iis obviously not
time demandding. In Tablee III, accuracy
y achieved trai
aining
and testing thhe classifier onn different sesssions is show
wn.
subject and then tested
t
on the other one. In this case, thee
systeem could be trrained by an eexpert user and
d then used byy
everryone else, skiipping the learrning part. No
ow the system
m
resu
ults “weaker” because the ttwo users did
d not perform
m
sign
ns in the sam
me way, especcially if we consider thatt
subject II is a no
ovel LIS userr. The averagee accuracy off
7% is reached, meaning thaat the system is still usablee
75.7
but with a higheer error rate, so that it maakes sense too
grate an autom
matic correctioon system [26
6].
integ
Although
A
it is not
n possible tto make a com
mparison withh
the other
o
A-SLR works reporteed in literaturre (because off
diffeerent acquisitiion systems aand different set of signs),,
the percentages
p
reeached in thiss study are en
ncouraging. Inn
addiition, it has to
o be considereed that our sen
nsory glove iss
cheaaper than statee of the art glloves, being equipped
e
withh
IMU
Us and not magnetic trackeers. The grap
ph in figure 9
represents a rough
h comparison of production
n costs for sixx
diffeerent types of gesture acquiisition systemss. As it can bee
noticced, our system is rather low
w cost, comparable only too
low performance optical systeems, which we
w believe nott
p
haviing the same portability.
T
TABLE
III
MEAN PERCENTAGE AC
CCURACY ON DIFF
FERENT SESSIONS
Subject
Session I
over Session II
I
n II
Session
over Sessio
on I
Avera
rage
I (expert)
88.1%
89.4%
%
88.755%
II (non expert))
81.2%
80.4%
%
80.88%
Accuracies oobtained training the SVM on a seession and testingg it on
the remaining. Accuracy valuess are expressed in
i percentage foor each
subject.
This is a moore practical way
w since thee training phaase is
performed onnly once at thee very first tim
me, so the userr does
not have to w
worry about reepositioning th
he glove.
As we couuld expect, mean
m
accuracy
y is lower thann the
previous testt because repoositioning thee glove affectts the
overall perfformance: 844.8% of average accuraccy is
obtained in tthis case. How
wever, it has to be noticedd that
Subject 2 is nnot an expert LIS user, hen
nce the tendenncy to
forget how tto perform a gesture in th
he second se ssion
causes a loweer accuracy (880.8%) respect to the expertt user
(88.75%), buut the accuracyy remains accceptable.
So far, thee system is “add-personam”, in the sense th
that it
has to be trainned by the sam
me person who
o is going to uuse it.
Table IV rreports accuraacies when SV
VM is trainedd on a
T
TABLE
IV
MEAN PERCENTAGE AC
CCURACY ON DIFF
FERENT SUBJECTSS
Subject ffor
trainingg
Subjeect for testing
Mean accuraacy
I (experrt)
II (nnon expert)
78%
II (non exppert)
I (expert)
73.4%
Accuracies oobtained training the SVM on a su
ubject and testingg it on
the other one. A
Accuracy values are
a expressed in percentage.
© 2014 ACADEMY PUBLISHER
Fig.
F 9: Approxim
mate costs comparrison among diffeerent gesture
acquisition sy
systems.
VI. CONCLLUSION
A completely portable sensorry glove was developed
d
andd
used
d to recognizee 100 gesturess belonging to
o Italian Signn
Lang
guag. Based on
o bend and IMU sensors, the glove iss
com
mpletely portab
ble, unobtrusivve and comfo
ortable to don..
In th
he future, ourr system will become wireeless, and thee
posssibility of being interfacedd to smartph
hones will bee
prov
vided. In this way, it willl be easy to carry in anyy
situaation, giving the
t user freeddom to move and naturallyy
interract with other people.
An
A A-SLR systtem based on wavelet featu
ures and SVM
M
is allso proposed.. Experimentss, performed on the samee
subject, demonstrrate results wiith an average accuracy off
%. This can be
b consideredd a very good
d result, evenn
94%
thou
ugh it cannot be
b directly coompared to liteerature due too
the lack
l
of related
d works about LIS.
In
n further stu
udies, new ways of ch
hunking andd
classsification willl be investigatted; the overaall system willl
beco
ome wireless and
a integratedd into mobile devices
d
with a
speeech synthesizer to realize a new aid for everydayy
com
mmunication between deaf aand hearing peeople.
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014
REFFERENCES
[1] S.C.W. O
Ong, S. Rangganath, "Autom
matic sign langguage
analysis: a survey and thhe future beyon
nd lexical meanning",
IEEE Trransactions onn Pattern Ana
alysis and Maachine
Intelligennce, vol. 27, pp.. 873-891, Jun 2005.
2
[2] T. G. Zim
mmerman et al.,, "A hand gestu
ure interface devvice",
SIGCHI B
Bull. 17, 1987.
[3] T. Takahhashi, F. Kishinno, "Hand gestu
ure coding baseed on
experimeents using a handd gesture interfa
face device", SIG
GCHI
Bull., 19991.
[4] D. J. Sturrman, D. Zeltzeer, "A survey off glove-based innput",
IEEE Coomputer Graphhics and Appliccations, vol. 144, pp.
30-39, Jaan 1994.
[5] L. Dipiettro et al., "A Suurvey of Glove-Based System
ms and
Their Appplications", IEE
EE Transaction
ns on Systems, Man,
and Cybeernetics—Part C: Application
ns and Reviewss, vol.
38, pp. 4661-482, Jul. 20008.
[6] T. P. Anndriacchi, E. J.
J Alexander, "Studies of hhuman
locomotioon: past, preesent and fu
uture", Journaal of
Biomechaanics, vol. 33, pp.
p 1217-1224, Mar. 2000.
[7] S. S. Felss, G. E. Hintonn, “Glove-Talk: A Neural Nettwork
Interface Between a Data-Glove and a Speech
S
Synthessizer”,
N
Networkss, vol. 3, 1992.
IEEE Traansactions on Neural
[8] G. Fangg et al., "Larrge-Vocabulary
y Continuous Sign
Languagee Recognition Based on Transition-Move
T
ement
Models", IEEE Transsactions on Systems,
S
Man, and
Cybernettics—Part A: Syystems and Hum
mans, vol. 37, ppp. 1-9,
Jan. 20077.
[9] M. Maebatake et al., “Siign Language Recognition
R
Bassed on
ulti-Stream HM
MM”,
Position and Movemeent Using Mu
Second
Internationall
Symposium
m
on
Univversal
Communiication, ISUC '08, Osaka.
[10] I. Infantinno et al., “A Syystem for Sign Language Senntence
Recognitiion Based onn Common Seense Context”,, The
Internatioonal Conferennce on Comp
puter as a Tool,
EUROCO
ON 2005.
[11] S. McMiillan, "Upper Body
B
Tracking Using
U
the Polhhemus
M
Techhnical
Fastrak", Naval Postgraaduate School, Monterey,
Report NP
NPSCS-96-002, Jan.
J 1996.
[12] Ji-Hwan Kim et al., “3-D
“
Hand Motion Trackingg and
Gesture Recognition using a daata glove”, IEEE
Internatioonal Symposium
m on Industria
al Electronics, ISIE
2009.
Accelerometer Based Gesturee Recognition U
Using
[13] Z. He, “A
Fusion Feeatures and SVM”, Journal off software, vol. 6, pp.
1042-10449, Jun. 2011.
[14] S. Mitra,, T, Acharya, "Gesture Recognition: A Surrvey",
IEEE Trransactions onn Pattern Ana
alysis and Maachine
Intelligennce, vol. 37, pp.. 311-324, May
y 2007.
[15] D. Xu, “A neural nettwork approach
h for hand geesture
recognitioon in virtual reaality driving traiining system off spg.”,
18th Inteernational Connference on Pattern
P
Recognnition,
ICPR 20006.
[16] G. Fangg, W. Gao, “A SRN/H
HMM System
m for
Signer-inndependent
Continuous
Sign
Langguage
Recognitiion”, Fifth IE
EEE International Conferencce on
Automatiic Face and Gessture Recognition, Washingtonn DC,
USA, 20002.
[17] N. Bertolldi et al., “On the
t creation and
d the annotationn of a
large-scalle Italian-LIS parallel
p
corpus””, Proceedings oof the
4th Workkshop on the Reppresentation an
nd Processing off Sign
Languagees: Corpora and
a
Sign Lang
guage Technoloogies,
Internatioonal Conferennce on Langua
age Resourcess and
Evaluatioon, 2010.
[18] O. Romeeo, "Dizionario dei segni - Laa lingua dei seggni in
1400 imm
magini", Zanichhelli, 1991.
© 2014 ACADEMY PUBLISHER
2009
[19] A. Godfrey et al., “A Continnuous Wavelet Transform andd
Delirium Motorric Subtyping”,,
Classification Method for D
d Rehabilitationn
IEEE Transactions on Neuraal Systems and
Engineering, vol.
v 17, no. 3, JU
UNE 2009.
[20] E. D. Übeyli, “ECG beats cclassification using multiclasss
support vector machines with error correcting
g output codes””,
684, 2007.
Digital Signal Processing, vool. 17, pp. 675–6
[21] S. Soltani, "On the use of thhe wavelet deccomposition forr
time series prediction",
p
Neeurocomputing, vol. 48, pp..
267–277, 2002
2.
[22] M. Unser, A. Aldroubi,
A
"A revview of waveletts in biomedicall
applications", Proc.
P
IEEE 84, vol. 4, pp. 626
6–638, 1996.
[23] P. S. Addison, "Wavelet transsforms and the ECG:
E
a review"",
Physiological Measurement, vol. 26, pp. 155-199, Aug..
2005.
[24] Vladimir N. Vapnik,
V
“The N
Nature of Statistical Learningg
Theory”, Sprin
nger-Verlag, Neew York, 1995.
[25] C. W. Hsu, C.
C J. Lin, “A comparison of Methods forr
multiclass Support Vector Maachine”, IEEE Transactions
T
onn
Neural Networrks, vol. 13, 20002.
[26] A. S. Dolgo
opolov, "Autoomatic spelling correction",,
Cybernetics an
nd systems anaalysis, vol. 22
2, pp. 332-339,,
May 1986.
Pietro Cavaallo received thee B.S. and M.S..
degrees in Computer Eng
gineering from
m
University ““Tor Vergata”, Rome, Italy inn
2008 and 2011, resp
pectively. Hee
collaboratess with HITEG group (Healthh
Involved Teechnical Engin
neering Group,,
www.hiteg.uuniroma2.it), of
o Tor Vergataa
University ssince 2007. In
n 2010 he hass
been an exchhange student at
a University off
Bergen, Noorway. His ressearch interestss
ude Machine Leearning, Humann Computer Intteraction, Brainn
inclu
Com
mputer Interface, Image Processsing and Computer Vision.
Giovanni SSaggio receiveed the Dr. Eng..
degree in Electronic Eng
gineering from
m
the Univerrsity “Tor Veergata”, Rome,,
Italy, in 19991, and the Ph.D.
P
degree inn
Microelectrronic and Teleccommunicationn
Engineering
ng, in 1996, from
f
the samee
University.. In 1991, he did
d his thesis inn
the Nanoeelectronics Ressearch Centre,,
Department
nt of Electronicss and Electricall
Engineering
ng, University of Glasgow,,
Scotland, and in the
t Cavendishh Laboratory, Department
D
off
Physsics, University of Cambridge,, England. His initial researchh
activ
vities covered th
he area of nanoddevices, surfacee acoustic wavee
devicces, noise in eleectronic devicess. He is currently a Researcherr
at thee University “T
Tor Vergata”, Ro
Rome, Italy. He teaches
t
coursess
abou
ut Electronics at the Faculty off Engineering and
a the Facultyy
of Medicine.
M
His current
c
researchh interests are related to thee
field
ds of biosensors,, sensor’s charaacterization, hum
man kinematicss’
meassurements and brain computeer interface. Hee has publishedd
tens of papers on international joournals and tw
wo books aboutt
urrently: membeer of Italian Spaace BioMedicall
electtronics. He is cu
Society; Principal in
nvestigator W.PP. DCMC Projeect (from Spacee
oject regardingg
Italiaan Agency); Prrincipal investiigator of a Pro
Aero
onautics; Promoter and coorddinator of the HITEG groupp
(Heaalth
Involveed
Technicaal
Engineerring
Group,,
www
w.hiteg.uniromaa2.it), Universitty “Tor Vergataa”, Rome, Italy..