The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall, Gea de Jong, Linda Shockey Introduction • Formant measurement implicitly required legally in the UK in speaker comparison cases • Measurements on analogue spectrograms had to be by hand and eye • Measurements on digital spectrograms can be assisted by formant trackers, LPC is common Introduction • How replicable are measurements by eye on digital spectrograms? Introduction • How replicable are measurement by eye on digital spectrograms? • If LPC tracking is used what can lead to variability? Introduction • How replicable are measurement by eye on digital spectrograms? • If LPC tracking is used what can lead to variability? − Software settings Introduction • How replicable are measurement by eye on digital spectrograms? • If LPC tracking is used what can lead to variability? − Software settings − Point at which data is extracted Study Aims • What is required in order to make measurements more replicable? Study Aims • What is required in order to make measurements more replicable? • If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements? Study Aims • What is required in order to make measurements more replicable? • If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements? • If method of analysis is the same does this lead to statistically improved reliability between laboratories? Aims continued • We are aiming to find a reliable means of obtaining formant values • We are examining reliability, not validity Data • read speech from Cambridge DyViS database • male • Standard Southern British English • aged 18-25 • 40 speakers: Set 1 (20 speakers) Set 2 (20 speakers) Data • 6 monophthongs: / iː, æ, ɑː, ɔː, ʊ, uː / • 6 repetitions per vowel per speaker • elicited in hVd contexts in sentences: It’s a warning we’d better HEED today. It’s only one loaf, but it’s all Peter HAD today. We worked rather HARD today. We built up quite a HOARD today. He insisted on wearing a HOOD today. He hates contracting words, but he said a WHO’D today. Measurements • Analysts from 3 labs – Cambridge, Plymouth, Reading • Task: to measure F1, F2, F3 for each vowel token using Praat • Set 1 – using individual – but constrainedmethods • Set 2 – after a meeting at which a single method is agreed Set 1 Methods • Measure the formants at a relatively early point in the vowel Set 1 Methods • Measure the formants at a relatively early point in the vowel • Measure formants over no more than 5 glottal pulses Set 1 Methods • Measure the formants at a relatively early point in the vowel • Measure formants over no more than 5 glottal pulses • Use either: − LPC tracking checked against the spectrogram or Set 1 Methods • Measure the formants at a relatively early point in the vowel • Measure formants over no more than 5 glottal pulses • Use either: − LPC tracking checked against the spectrogram or − hand/eye measures Set 2 Method • Measure towards the start of the vowel Set 2 Method • Measure towards the start of the vowel • Measure in a relatively steady early part of the vowel Set 2 Method • Measure towards the start of the vowel • Measure in a relatively steady early part of the vowel • Measure around the vowel's maximum intensity Set 2 Method • Measure towards the start of the vowel • Measure in a relatively steady early part of the vowel • Measure around the vowel's maximum intensity • Use a single time slice Set 2 Method (continued) • Use the LPC formant tracker adjusted for best visual fit Set 2 Method (continued) • Use the LPC formant tracker adjusted for best visual fit • When values generated by Praat are judged by visual inspection to be incorrect, replace them by correct values from a time-slice immediately preceding or following the slice being measured. Results: HAD, F1 Set 1 Lab1 Lab2 Lab3 Results: HAD, F1 Set 1 Lab1 Lab2 Lab3 Results: HAD, F1 Set 1 Lab1 Set 2 Lab2 Lab3 Lab1 Lab2 Lab3 Results: HAD, F1 Set 1 Lab1 Set 2 Lab2 Lab3 Lab1 Lab2 Lab3 Statistical Analysis • 3 formants 6 vowels 2 datasets = 36 tests • Two-way ANOVA - repeated measures on the factor Lab (3) - between-groups factor Speaker (20) • If Lab signficant at p < 0.05: Pairwise comparisons with Sidak correction Results: HAD, F1 Set 1 Lab1 Set 2 Lab2 Lab3 Lab1 Lab2 Lab3 Results: HAD, F1 Set 1 Lab1 Set 2 Lab2 Lab3 Lab: significant Lab1 Lab2 Lab3 Results: HAD, F1 Set 1 Set 2 0.001 0.000 0.000 Lab1 Lab2 Lab3 Lab: significant Lab1 Lab2 Lab3 Results: HAD, F1 Set 1 Set 2 0.001 0.000 0.000 Lab1 Lab2 Lab3 Lab: significant Lab1 Lab2 Lab3 Lab: significant but pairwise comparisons NS Results: HAD, F1 Set 1 Set 2 0.001 0.000 0.000 Lab1 Lab2 NS NS Lab3 Lab: significant Lab1 NS Lab2 Lab3 Lab: significant but pairwise comparisons NS Results: HAD, F2 Results: HAD, F2 Set 1 Set 2 NS NS NS Lab1 NS Lab2 Lab3 Lab: not significant NS NS Lab1 Lab2 Lab3 Lab: not significant Results: HAD, F3 Results: HAD, F3 Set 1 Set 2 NS 0.000 NS 0.000 Lab1 Lab2 NS NS Lab3 Lab: significant Lab1 Lab2 Lab3 Lab: not significant Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS Summary - HAD main effect Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS pairwise comparisons Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS improvement Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS improvement Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS Summary - HAD Set 1 Set 2 F1 F2 F3 F1 F2 F3 Lab sig NS sig sig NS NS 1 vs 2 sig NS NS NS NS NS 1 vs 3 sig NS sig NS NS NS 2 vs 3 sig NS sig NS NS NS Set 2: good news Effect of Lab - 6 vowels Set 1 F1 F2 F3 heed sig NS sig had sig NS sig hard sig sig sig hoard sig sig sig who’d sig sig NS hood sig sig sig Effect of Lab - 6 vowels Set 1 Set 2 F1 F2 F3 F1 F2 F3 heed sig NS sig sig NS sig had sig NS sig sig NS NS hard sig sig sig NS sig sig hoard sig sig sig sig sig NS who’d sig sig NS sig sig sig hood sig sig sig NS sig NS Influence of Speaker • Interaction Lab x Speaker significant (p < 0.05) for F1-F3 of all 6 vowels for both Set 1 and Set 2 certain speakers lead to measurement differences among labs for example… F3 of HARD (Set 2) means by speaker F3 of HARD (Set 2) means by speaker Agreement across labs in most cases, but certain individuals lead to measurement differences among labs F3 of HARD (Set 2) means by speaker Agreement across labs in most cases, but certain individuals lead to measurement differences among labs Difficult cases: subject 42 F3 Subject 42 HARD4 F3 = 2219Hz Subject 42 HARD2 F3 = 2579Hz Subject 42 HARD6 F3 = 3325 Hz Difficult cases: subject 43 F3 Visual inspection vs formant tracker Visual inspection Subject 43 HARD1 F3? Visual inspection Subject 43 HARD2 F3? Visual inspection Visual inspection Subject 43 HARD2 F3? Subject 43 HARD1 F3? Tracker Tracker The effect of intraspeaker variability, possibly voice quality • This can affect: − The visibility of formants − The functioning of the LPC tracker for example… The effect of intraspeaker variability ..had today. Subject 37: HAD1 F1=?? ..had today. Subject 37: HAD6 F1 Discussion: Laboratory Effects • Do different laboratories produce different formant values? Discussion: Laboratory Effects • Do different laboratories produce different formant values? YES Discussion: Laboratory Effects • Do different laboratories produce different values formant values? YES • Does replicating the measurement method reduce these differences? Discussion: Laboratory Effects • Do different laboratories produce different formant values? YES • Does replicating the measurement method reduce these differences? YES Discussion: Laboratory Effects • Do different laboratories produce different formant values? YES • Does replicating the measurement method reduce these differences? YES • Could these be reduced further? Discussion: Laboratory Effects • Do different laboratories produce different formant values? YES • Does replicating the measurement method reduce these differences? YES • Could these be reduced further? YES Other sources of variability • Settings (e.g. No. of poles; No of Formants in Praat) Other sources of variability • Settings • The exact point in the vowel at which the measure is taken Other sources of variability • Settings • The exact point in the vowel at which the measure is taken • The ‘readability’ of the spectrogram which can be affected by speaker characteristics Conclusion • Developing standard ways of collecting formant values could assist comparisons between experts in case work • If records are kept relating to time points, software and settings then the measurement process can be replicated Acknowledgements • IAFPA Research Grant for travel expenses • Economic and Social Research Council UK for funding the DyViS Project ‘Dynamic Variability in Speech: A Forensic Phonetic Study of British English’ [RES-000-23-1248] • Other members of the DyViS project – Francis Nolan and Toby Hudson
© Copyright 2025 Paperzz