Important Regions in the Articulator Trajectory

Important Regions in the
Articulator Trajectory
G. Ananthakrishnan
and Olov Engwall
ISSP08, Strasbourg 2008
The phenomenon of Co-articulation
• Postulation for an Articulation Model
–Articulation is a series of target positions
•
(assumed by certain articulators at certain times.)
–The position of some articulators is more critical than others
•
(depending on the phoneme)
–The non-critical articulators are in the process of moving
towards their next critical position in the utterance
•
(which may be several phonemes later.)
–The timing of when a particular articulator reaches the target
position is important.
ISSP08, Strasbourg 2008
The phenomenon of Co-articulation
Co-articulation while pronouncing \tu\
ISSP08, Strasbourg 2008
Why find what is Important/Critical?
– Intuitive understanding of articulation of speech
– Estimating data for co-articulation models
– Improving acoustic-to-articulatory inversion
– Evaluation of acoustic-to-articulatory inversion
– Studying different articulatory strategies by
different people and different languages.
ISSP08, Strasbourg 2008
What is Important or Critical?
•
Different approaches to define criticality.
– The place of articulation used to define various phonemes in IPA
– Statistical Approach – Distance (Kullback-Leibler) between the
articulator distributions for different phonemes (Jackson et al.)
– Phonetic invariance in the articulator configuration. (Recasens et
al.)
– Articulatory resistance for the phonemes. (Blandon et al.)
– Critical points based on physical parameters of the trajectory (This
study)
ISSP08, Strasbourg 2008
Data requirements for this study
– Continuous articulatory trajectories for a sentence
– Aligned acoustic and articulatory information
– Phonetic transcription required to study articulation of
different phonemes.
– Phonetically rich data with sufficient number of contexts
– Information about important articulators like lips, jaws
and tongue.
ISSP08, Strasbourg 2008
Data requirements for this study
ISSP08, Strasbourg 2008
Data
Upper Lip (UL)
Tongue Tip
(TT)
Velum (V)
Lower
Lip
(LL)
Tongue Body
(TB)
Lower Jaw (LJ)
•
•
•
MOCHA-TIMIT database, with one female speaker
(fsew0) uttering 460 British English sentences
7 EMA coils trajectories along the midsagittal plane
The original sampling frequency of the articulatory
channels is 500 Hz
ISSP08, Strasbourg 2008
Tongue Dorsum
(TD)
Data Processing
•
•
•
•
•
•
Articulatory data low-pass filtered and down-sampled to 100 Hz
Mean – subtracted over several sentences to avoid drift in the
articulatory data
16 Mel Frequency Cepstral Coefficients for acoustic
representation
30 ms windows with 10 ms shift, 40 frequency channels
The database split into 5 equal parts and 4 of them used for
training and the remaining used for testing.
The test and training data-sets rotated using jack-knife principle
ISSP08, Strasbourg 2008
Proposition
•
The important regions in the trajectory are those regions
where there is a
–
–
Minimum in the velocity of motion
OR
Drastic change in the direction or angle
of motion
ISSP08, Strasbourg 2008
Formulation
•
Let γa(t)T be the pair of x and y sampled positions of
the EMA coil at time frame ‘t’,
•
θa(t) is the angle and νa(t), the velocity.
•
The importance function is given by Ia(t).
•
The critical points are local maxima in Ia(t)
ISSP08, Strasbourg 2008
Importance Function
The X-axis trajectory of the tongue tip along with the importance value
‘ITT (t)’ for the sentence - ‘He will allow a rare lie’
ISSP08, Strasbourg 2008
Critical Points in the Trajectory
The trajectory of the tongue tip during the utterance
of the sentence - ‘He will allow a rare lie’.
ISSP08, Strasbourg 2008
Criticality w.r.t Acoustics
•
The drop in velocity of the MFCC, corresponds with the
sustained acoustic regions.
Phonemes affected by the
position of the lower jaw
ISSP08, Strasbourg 2008
Criticality w.r.t Acoustics
•
For some transient phonemes, the position of the articulators are
important before the start, for example in the phoneme ‘w’.
Phonemes affected by the
position of the lower lip
ISSP08, Strasbourg 2008
Criticality w.r.t Acoustics
•
The sustained acoustic region, is likely to be the result of the
target position that articulators are trying to achieve.
Phonemes affected by the
position of the tongue tip
ISSP08, Strasbourg 2008
Criticality w.r.t Acoustics
•
Some articulators reach their target positions, while others are in
motion to reach their target at another instance.
Phonemes affected by the
position of the tongue tip
ISSP08, Strasbourg 2008
Features of the Critical Points
– The importance of the positions of the tongue is
usually higher for the critical points connected to the
stop consonants and fricatives than in vowels.
– There number of critical points are usually higher for
diphthongs and long vowels.
– The critical point connected with any transient
phonemes may not fall within the boundary of the
phoneme, especially for glides.s
ISSP08, Strasbourg 2008
Features of the Critical Points
Ratio of number of Critical Points per phoneme
8
Velum
7
Tongue Dorsum
6
Tongue Body
5
Tongue Tip
4
3
Lower Lip
2
Upper Lip
1
Lower Jaw
0
ɔɩ p ʊ ɩəʳ æ aʊ ʝ
e ɑ: b
m n
ɭ
t
v θ
ɹ
f z w s
Phonemes (Selected Phonemes)
ISSP08, Strasbourg 2008
Features of the Critical Points (/k/)
Critical Articulators
ISSP08, Strasbourg 2008
Features of the Critical Points (/t/)
Critical Articulator
t
ISSP08, Strasbourg 2008
Features of the Critical Points (/a/)
Critical Articulators
ISSP08, Strasbourg 2008
Features of the Critical Points (/o/)
Critical Articulators
ISSP08, Strasbourg 2008
Features of the Critical Points (/i/)
Critical Articulators
ISSP08, Strasbourg 2008
Features of the Critical Points (/n/)
Critical Articulators
ISSP08, Strasbourg 2008
Features of the Critical Points (/n/)
ISSP08, Strasbourg 2008
Important Regions in the Trajectory
A comparison between different regression estimations using the whole
data set or just important samples as identified by the critical point algorithm.
mean % number of samples
for training
Mean
RMSE(mm)
Linear Regression (Important samples)
80.00%
7.40%
5.27%
2.28
2.3
2.29
Mean Pearson’s
Correlation
Coefficients (%)
43.7
43.23
43.48
ANN Regression (Important samples)
80.00%
7.40%
5.27%
2.1
2.12
2.13
54.42
53.32
54.38
Method
ISSP08, Strasbourg 2008
Conclusions
The proposed method
– Relies on the physical parameters of the trajectory
– Quite simple and intuitive, explains most of the
observed phenomenon associated with articulation
– Finds the critical articulators and their positions
– Finds the time instant that the trajectory reaches the
critical point
ISSP08, Strasbourg 2008
Application and Future Work
– Speed up the training stage of the inversion substantially,
while maintaining estimation performance
– Suitable method for evaluation of the inversion process
– Suggest a data driven co-articulation model based on
physical constraints of the articulator
– The method suggested could also be generalized for
obtaining visual parameters for audio-visual data
processing
ISSP08, Strasbourg 2008
THANK YOU
Any questions?
ISSP08, Strasbourg 2008