Important Regions in the Articulator Trajectory G. Ananthakrishnan and Olov Engwall ISSP08, Strasbourg 2008 The phenomenon of Co-articulation • Postulation for an Articulation Model –Articulation is a series of target positions • (assumed by certain articulators at certain times.) –The position of some articulators is more critical than others • (depending on the phoneme) –The non-critical articulators are in the process of moving towards their next critical position in the utterance • (which may be several phonemes later.) –The timing of when a particular articulator reaches the target position is important. ISSP08, Strasbourg 2008 The phenomenon of Co-articulation Co-articulation while pronouncing \tu\ ISSP08, Strasbourg 2008 Why find what is Important/Critical? – Intuitive understanding of articulation of speech – Estimating data for co-articulation models – Improving acoustic-to-articulatory inversion – Evaluation of acoustic-to-articulatory inversion – Studying different articulatory strategies by different people and different languages. ISSP08, Strasbourg 2008 What is Important or Critical? • Different approaches to define criticality. – The place of articulation used to define various phonemes in IPA – Statistical Approach – Distance (Kullback-Leibler) between the articulator distributions for different phonemes (Jackson et al.) – Phonetic invariance in the articulator configuration. (Recasens et al.) – Articulatory resistance for the phonemes. (Blandon et al.) – Critical points based on physical parameters of the trajectory (This study) ISSP08, Strasbourg 2008 Data requirements for this study – Continuous articulatory trajectories for a sentence – Aligned acoustic and articulatory information – Phonetic transcription required to study articulation of different phonemes. – Phonetically rich data with sufficient number of contexts – Information about important articulators like lips, jaws and tongue. ISSP08, Strasbourg 2008 Data requirements for this study ISSP08, Strasbourg 2008 Data Upper Lip (UL) Tongue Tip (TT) Velum (V) Lower Lip (LL) Tongue Body (TB) Lower Jaw (LJ) • • • MOCHA-TIMIT database, with one female speaker (fsew0) uttering 460 British English sentences 7 EMA coils trajectories along the midsagittal plane The original sampling frequency of the articulatory channels is 500 Hz ISSP08, Strasbourg 2008 Tongue Dorsum (TD) Data Processing • • • • • • Articulatory data low-pass filtered and down-sampled to 100 Hz Mean – subtracted over several sentences to avoid drift in the articulatory data 16 Mel Frequency Cepstral Coefficients for acoustic representation 30 ms windows with 10 ms shift, 40 frequency channels The database split into 5 equal parts and 4 of them used for training and the remaining used for testing. The test and training data-sets rotated using jack-knife principle ISSP08, Strasbourg 2008 Proposition • The important regions in the trajectory are those regions where there is a – – Minimum in the velocity of motion OR Drastic change in the direction or angle of motion ISSP08, Strasbourg 2008 Formulation • Let γa(t)T be the pair of x and y sampled positions of the EMA coil at time frame ‘t’, • θa(t) is the angle and νa(t), the velocity. • The importance function is given by Ia(t). • The critical points are local maxima in Ia(t) ISSP08, Strasbourg 2008 Importance Function The X-axis trajectory of the tongue tip along with the importance value ‘ITT (t)’ for the sentence - ‘He will allow a rare lie’ ISSP08, Strasbourg 2008 Critical Points in the Trajectory The trajectory of the tongue tip during the utterance of the sentence - ‘He will allow a rare lie’. ISSP08, Strasbourg 2008 Criticality w.r.t Acoustics • The drop in velocity of the MFCC, corresponds with the sustained acoustic regions. Phonemes affected by the position of the lower jaw ISSP08, Strasbourg 2008 Criticality w.r.t Acoustics • For some transient phonemes, the position of the articulators are important before the start, for example in the phoneme ‘w’. Phonemes affected by the position of the lower lip ISSP08, Strasbourg 2008 Criticality w.r.t Acoustics • The sustained acoustic region, is likely to be the result of the target position that articulators are trying to achieve. Phonemes affected by the position of the tongue tip ISSP08, Strasbourg 2008 Criticality w.r.t Acoustics • Some articulators reach their target positions, while others are in motion to reach their target at another instance. Phonemes affected by the position of the tongue tip ISSP08, Strasbourg 2008 Features of the Critical Points – The importance of the positions of the tongue is usually higher for the critical points connected to the stop consonants and fricatives than in vowels. – There number of critical points are usually higher for diphthongs and long vowels. – The critical point connected with any transient phonemes may not fall within the boundary of the phoneme, especially for glides.s ISSP08, Strasbourg 2008 Features of the Critical Points Ratio of number of Critical Points per phoneme 8 Velum 7 Tongue Dorsum 6 Tongue Body 5 Tongue Tip 4 3 Lower Lip 2 Upper Lip 1 Lower Jaw 0 ɔɩ p ʊ ɩəʳ æ aʊ ʝ e ɑ: b m n ɭ t v θ ɹ f z w s Phonemes (Selected Phonemes) ISSP08, Strasbourg 2008 Features of the Critical Points (/k/) Critical Articulators ISSP08, Strasbourg 2008 Features of the Critical Points (/t/) Critical Articulator t ISSP08, Strasbourg 2008 Features of the Critical Points (/a/) Critical Articulators ISSP08, Strasbourg 2008 Features of the Critical Points (/o/) Critical Articulators ISSP08, Strasbourg 2008 Features of the Critical Points (/i/) Critical Articulators ISSP08, Strasbourg 2008 Features of the Critical Points (/n/) Critical Articulators ISSP08, Strasbourg 2008 Features of the Critical Points (/n/) ISSP08, Strasbourg 2008 Important Regions in the Trajectory A comparison between different regression estimations using the whole data set or just important samples as identified by the critical point algorithm. mean % number of samples for training Mean RMSE(mm) Linear Regression (Important samples) 80.00% 7.40% 5.27% 2.28 2.3 2.29 Mean Pearson’s Correlation Coefficients (%) 43.7 43.23 43.48 ANN Regression (Important samples) 80.00% 7.40% 5.27% 2.1 2.12 2.13 54.42 53.32 54.38 Method ISSP08, Strasbourg 2008 Conclusions The proposed method – Relies on the physical parameters of the trajectory – Quite simple and intuitive, explains most of the observed phenomenon associated with articulation – Finds the critical articulators and their positions – Finds the time instant that the trajectory reaches the critical point ISSP08, Strasbourg 2008 Application and Future Work – Speed up the training stage of the inversion substantially, while maintaining estimation performance – Suitable method for evaluation of the inversion process – Suggest a data driven co-articulation model based on physical constraints of the articulator – The method suggested could also be generalized for obtaining visual parameters for audio-visual data processing ISSP08, Strasbourg 2008 THANK YOU Any questions? ISSP08, Strasbourg 2008
© Copyright 2026 Paperzz