Segmental foreign accent

Universität Stuttgart
Institut für Maschinelle Sprachverarbeitung
Azenbergstraße 12
70174 Stuttgart
Germany
Segmental foreign accent
Autor: Daniel Duran
Matrikelnummer:
E-Mail: [email protected]
Prüfer: Apl. Prof. Dr. phil. Bernd Möbius
Betreuer: Matthias Jilka, Bernd Möbius
Diplomarbeit Nr. 63
Beginn der Arbeit: 01. 09. 2007
Ende der Arbeit: 01. 03. 2008
Hiermit erkläre ich, dass ich die vorliegende Arbeit selbständig verfasst habe und dabei keine andere
als die angegebene Literatur verwendet habe.
Alle Zitate und sinngemäßen Entlehnungen sind als solche unter genauer Angabe der Quelle gekennzeichnet.
Daniel Duran
Esslingen, den 1. März 2008
Abstract
This thesis examines segmental foreign accent phenomena, i. e. individual sounds as spoken by
non-native speakers of a language and their characteristic deviances from the language norm.
The first chapter introduces basic terminology and identifies some difficulties in defining several
concepts essential to foreign accent research and research on second language acquisition in general.
Terms like accent, native and foreign language or bilingualism are introduced and the according
definitions are discussed as proposed in the literature.
Chapter 2 gives an overview on various variables in experimental studies on L2 speech production,
and foreign accent and the factors which have been found to influence the degree of foreign accent.
This chapter is primarily concerned with extralinguistic variables which correspond to characteristics
of the examined speakers like gender, age or their previous language experiences. However, the
influence of a speaker’s L1 on his or her L2 is also discussed in that chapter.
An overview of the research literature on various segmental acoustic, i. e. phonetic and phonological
manifestations of foreign accent is given in chapter 3. The most often analysed acoustic phenomena are identified and fields which received only marginal attention in foreign accent research are
addressed.
In chapter 4, theories and models which are used to explain the foreign accent phenomenon are
reviewed. Some of the theories and models presented there are concerned with (second) language
acquisition in general and not particularly with segmental phenomena of foreign accent. These
are nevertheless relevant to research on the topic on which this thesis is focused. Besides general
models of the human language capacity, models concerned with specific phenomena in the domain
of segmentals are presented.
Chapter 5 addresses methodological issues important to experimental studies on foreign accent. It
is discussed according to what criteria the subjects for experiments on foreign accent should be
selected. Problems with various task designs are discussed as well as the problems an experimenter
faces when evaluating and interpreting the data. Various methods are pointed out how degree of
foreign accent can be measured.
Finally, chapter 6 presents an experimental study on the realisation of a phonological vowel opposition in German by non-native speakers in comparison to bilingual and native speakers of German.
For this study, speech samples of twenty speakers were recorded and acoustically analysed, ten of
which are non-native speakers who learned German not until school age. The study focuses on the
acoustic features of vowel quality and vowel duration and compares the realisations of these between
the respective speakers.
Contents
Abbreviations
7
1 Introduction
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Accent . . . . . . . . . . . . . . . . . . . . .
1.1.2 Language acquisition vs. language learning
1.1.3 Native language – First language . . . . . .
1.1.4 Foreign language – Second language . . . .
1.1.5 Bilingualism . . . . . . . . . . . . . . . . . .
1.1.6 Interlanguage phonology . . . . . . . . . . .
1.1.7 Foreign accent . . . . . . . . . . . . . . . .
1.2 Summary . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
10
13
13
14
14
14
2 Factors affecting degree of foreign accent
2.1 Affective and psychological factors . . . . . . . . . . . . .
2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Language learning aptitude . . . . . . . . . . . . .
2.2 Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Formal L2 instruction . . . . . . . . . . . . . . . . . . . .
2.4 L1 background and L1-L2 combination . . . . . . . . . .
2.4.1 Language distance . . . . . . . . . . . . . . . . . .
2.4.2 L1 proficiency and influence of L2 on L1 . . . . . .
2.5 Language use patterns . . . . . . . . . . . . . . . . . . . .
2.6 Exposure to L2 surrounding and amount of L2 experience
2.6.1 Length of residence . . . . . . . . . . . . . . . . . .
2.6.2 Age of arrival . . . . . . . . . . . . . . . . . . . . .
2.7 Age of L2 learning (AOL) . . . . . . . . . . . . . . . . . .
2.8 Speaker-independent factors . . . . . . . . . . . . . . . . .
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16
16
17
17
18
18
19
20
21
22
23
23
23
23
24
26
3 Manifestations of foreign accent
3.1 Segmentals I: Consonants . . .
3.1.1 VOT . . . . . . . . . . .
3.2 Segmentals II: Vowels . . . . .
3.3 Phonotactics . . . . . . . . . .
3.4 Suprasegmentals . . . . . . . .
3.5 Voice quality . . . . . . . . . .
3.6 Summary . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
28
28
29
29
30
31
31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
3
4 Theories on foreign accent
4.1 Universal grammar and foreign accent . . . . . . . . . .
4.2 The critical period hypothesis . . . . . . . . . . . . . . .
4.2.1 Nature not nurture . . . . . . . . . . . . . . . . .
4.2.2 Problems with the critical period hypothesis . .
4.2.3 A sensitive period . . . . . . . . . . . . . . . . .
4.2.4 Summary . . . . . . . . . . . . . . . . . . . . . .
4.3 Contrastive analysis, phonetic transfer and interference .
4.4 Direct realism . . . . . . . . . . . . . . . . . . . . . . . .
4.5 The Speech Learning Model . . . . . . . . . . . . . . . .
4.5.1 Summary . . . . . . . . . . . . . . . . . . . . . .
4.6 The perceptual magnet effect . . . . . . . . . . . . . . .
4.6.1 Summary . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
34
35
36
38
38
39
40
41
42
42
44
5 Methodological issues
5.1 Subject selection and control group . . . . . . . . . . . .
5.2 Obtaining data: The task . . . . . . . . . . . . . . . . .
5.3 FA rating by native speaker judges . . . . . . . . . . . .
5.3.1 Scaling foreign accent . . . . . . . . . . . . . . .
5.3.2 The judges . . . . . . . . . . . . . . . . . . . . .
5.3.3 Native speakers’ judgments and acoustic features
5.4 Foreign accent detection by acoustic measurements . . .
5.5 Criteria for native-likeness of speech . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
of foreign accent
. . . . . . . . . .
. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
45
46
47
48
48
49
50
50
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
52
52
55
56
57
58
59
59
59
60
62
62
64
72
74
78
79
6 Experimental study
6.1 German vowels . . . . . . . . . . . . . . . . . .
6.1.1 Acoustic correlates of the German vowel
6.1.2 German vowels: Summary . . . . . . . .
6.2 The participants . . . . . . . . . . . . . . . . .
6.3 The speech material . . . . . . . . . . . . . . .
6.4 Procedure . . . . . . . . . . . . . . . . . . . . .
6.4.1 Part I: Interview . . . . . . . . . . . . .
6.4.2 Part II: Production experiment . . . . .
6.5 Acoustic analysis: method . . . . . . . . . . . .
6.6 Acoustic analysis: results . . . . . . . . . . . . .
6.6.1 Vowel quantity . . . . . . . . . . . . . .
6.6.2 Vowel quality . . . . . . . . . . . . . . .
6.6.3 Tenseness . . . . . . . . . . . . . . . . .
6.6.4 Effects of L1 . . . . . . . . . . . . . . .
6.6.5 Age effects . . . . . . . . . . . . . . . .
6.7 Discussion . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
opposition .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Summary and conclusions
81
A Tables and figures
84
B Wordlists
117
Bibliography
122
List of Figures
4.1
4.2
Critical periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NLM Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
44
6.1
6.2
6.3
6.4
6.5
6.6
The German monophthongs . . . . . . .
Screenshot of labels in WaveSurfer . . .
Vowel duration (per group) . . . . . . .
[i:]∼[I] and [y:]∼[Y] of speaker A03. . . .
Disturbing signal . . . . . . . . . . . . .
Voice quality parameters RCG and SKG
53
61
63
67
72
74
A.1
A.2
A.3
A.4
A.5
A.6
A.7
The F1 /F2 vowel spaces of speakers A01, A02 and A03. . . . . . .
The F1 /F2 vowel spaces of speakers A04, A05 and A06. . . . . . . . . .
The F1 /F2 vowel spaces of speakers A07, A08 and A09. . . . . . . . . .
The F1 /F2 vowel spaces of speakers A10, B01 and B02. . . . . . . . . .
The F1 /F2 vowel spaces of speakers B03, B04 and B05. . . . . . . . . .
The F1 /F2 vowel spaces of speakers B06, B07 and B08. . . . . . . . . .
The F1 /F2 vowel spaces of speakers C01 and C02 and the reference
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
points.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 94
. 95
. 96
. 97
. 98
. 99
. 100
B.1 Printed version of word list A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4
List of Tables
1
Plotting symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1
6.2
6.3
6.4
Denominations for the German vowel classes . .
The vowel contrast pairs. . . . . . . . . . . . . .
Reference F1 /F2 values and standard deviations .
P-Values of t-tests of long and short vowels . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
54
65
73
A.1 Demographic speaker characteristics . . . . . . . . . . . . .
A.4 Formant values of German monophthongs (from literature)
A.5 Vowel duration ratios . . . . . . . . . . . . . . . . . . . . . .
A.6 Formant values (A01) . . . . . . . . . . . . . . . . . . . . .
A.7 Formant values (A02) . . . . . . . . . . . . . . . . . . . . .
A.8 Formant values (A03) . . . . . . . . . . . . . . . . . . . . .
A.9 Formant values (A04) . . . . . . . . . . . . . . . . . . . . .
A.10 Formant values (A05) . . . . . . . . . . . . . . . . . . . . .
A.11 Formant values (A06) . . . . . . . . . . . . . . . . . . . . .
A.12 Formant values (A07) . . . . . . . . . . . . . . . . . . . . .
A.13 Formant values (A08) . . . . . . . . . . . . . . . . . . . . .
A.14 Formant values (A09) . . . . . . . . . . . . . . . . . . . . .
A.15 Formant values (A10) . . . . . . . . . . . . . . . . . . . . .
A.16 Formant values (B01) . . . . . . . . . . . . . . . . . . . . .
A.17 Formant values (B02) . . . . . . . . . . . . . . . . . . . . .
A.18 Formant values (B03) . . . . . . . . . . . . . . . . . . . . .
A.19 Formant values (B04) . . . . . . . . . . . . . . . . . . . . .
A.20 Formant values (B05) . . . . . . . . . . . . . . . . . . . . .
A.21 Formant values (B06) . . . . . . . . . . . . . . . . . . . . .
A.22 Formant values (B07) . . . . . . . . . . . . . . . . . . . . .
A.23 Formant values (B08) . . . . . . . . . . . . . . . . . . . . .
A.24 Formant values (C01) . . . . . . . . . . . . . . . . . . . . .
A.25 Formant values (C02) . . . . . . . . . . . . . . . . . . . . .
A.26 Within-speaker comparison of formant values. . . . . . . . .
A.27 Formant differences (reference values) . . . . . . . . . . . .
A.28 Formant differences (group B) . . . . . . . . . . . . . . . . .
A.29 Voice quality parameters: A01 and A02 . . . . . . . . . . .
A.30 Voice quality parameters: A03 and A04 . . . . . . . . . . .
A.31 Voice quality parameters: A05 and A06 . . . . . . . . . . .
A.32 Voice quality parameters: A07 and A08 . . . . . . . . . . .
A.33 Voice quality parameters: A09 and A10 . . . . . . . . . . .
A.34 Voice quality parameters: B01 and B02 . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
84
85
86
87
87
87
88
88
88
89
89
89
90
90
90
91
91
91
92
92
92
93
93
101
102
103
104
105
106
107
108
109
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
6
LIST OF TABLES
A.35 Voice quality parameters: B03
A.36 Voice quality parameters: B05
A.37 Voice quality parameters: B07
A.38 Voice quality parameters: C01
A.39 Summary . . . . . . . . . . .
and B04 .
and B06 .
and B08 .
and C02
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
110
111
112
113
116
Abbreviations and notational
conventions
The following notational conventions and abbreviations are used in this thesis:
For phonetic and phonological transcriptions the International Phonetic Alphabet (IPA) is used.
For reasons of readability, SAMPA-like symbols are used in the vowel diagrams. The equivalences
between IPA, SAMPA and the diagram plotting symbols are shown in table 1.
IPA
SAMPA
plots
ø:
2:
2
œ
9
9
a
a
A
a:
a:
a
E
E
E
e:
e:
e
E:
E:
3
I
I
I
i:
i:
i
O
O
0
o:
o:
o
U
U
U
u:
u:
u
Y
Y
Y
y:
y:
y
Table 1: Plotting symbols
Where language names appear in pairs, e. g. “German-Croatian”, the first refers to a speaker’s first
language and the second to the same speaker’s second language. In reference to bilinguals the
first language refers to a speaker’s dominant language and the second to his or her non-dominant
language (in the case non-balanced bilinguals).
The language codes listed below are used according to the ISO 639-1 standard1 .
AOL
AOA
bg
C
CPH
de
DD
F0
F1
F2
F3
FA
it
IPA
hr
1
age of learning (a second language)
age of arrival (in a second language’s speech community)
language code: Bulgarian
consonant
critical period hypothesis (for language acquisition)
language code: German
Daniel Duran, the author
fundamental frequency
first formant
second formant
third formant
foreign accent
language code: Italian
International Phonetic Alphabet
language code: Croatian
http://www.infoterm.info/standardization/iso_639_1_2002.php
7
8
LIST OF TABLES
Abbreviations (continued)
hu
ka
L1
L2
LOR
NLM
pl
pl.
ro
ru
SAMPA
sd
SLM
tr
uk
UG
V
VOT
/. . . /
[. . . ]
<. . . >
language code: Hungarian
language code: Georgian
first language = native language
second language
length of residence (in a second language’s speech community)
native language magnet
language code: Polish
plural
language code: Romanian
language code: Russian
Speech Assessment Methods Phonetic Alphabet
standard deviation
speech learning model
language code: Turkish
language code: Ukrainian
universal grammar
vowel
voice onset time
phonological transcription
phonetic transcription
orthographic representation
Chapter 1
Introduction
It is not only since the world apparently grows smaller and smaller that individuals with different
language backgrounds come into contact. Language contact happened at all times and at all levels
from two single individuals to whole societies. The human language capacity enables people to
recognise whether their interlocutors belong to the same language community or whether they are
foreigners by the way they speak.
The main focus of this thesis lies on this foreign accent as a specific outcome of a language contact
situation at the individuals’ level: the segmental deviances in the speech of non-native speakers
from the norm of a language.
Since foreign accent is the focus of this thesis, the following sections in this chapter are concerned
primarily with the domain of pronunciation, i. e. the production of speech sounds and utterances.
Other language domains like morphology, syntax, semantics or pragmatics are not considered here,
even though they might be (at least) as important as pronunciation to the topics following in this
introductory chapter or later parts of this thesis.
1.1
Definitions
In order to discuss the foreign accent phenomenon, some basic terminology has to be defined first.
There are often terminological inconsistencies or ambiguities in the literature on second language
acquisition or foreign accent research – even with respect to basic terminology. The following sections
provide an overview of problems with defining some of the terms and their usage in the literature.
As mentioned above, this overview is concerned only with the domain of pronunciation relevant to
foreign accent research. This means, that topics like (second) language proficiency or attainment in
language learning will be discussed primarily from a phonetic and phonological point of view, with
pronunciation in mind.
1.1.1
Accent
The first term which needs to be discussed here is accent. This is a highly ambiguous term. There
are two meanings of accent which are of interest in this thesis:
9
10
CHAPTER 1. INTRODUCTION
1. The term accent is often used in the (general) meaning of a way of speaking. This can be either
a characteristic regional pronunciation or the pronunciation of a specific group of people or
non-native speakers – i. e. a foreign accent (which will be further discussed in this thesis).
2. On the other hand, accent is often used synonymously to stress or emphasis placed on a
particular syllable or word. The terms word accent and pitch accent are often used in these
latter cases.
Up to this point the term has already been mentioned several times in this thesis. In all of these
cases, “accent” was used in the first meaning mentioned above. In most cases there is no confusion
about the interpretation of the term, as it can be inferred from the context. In texts on second
language acquisition, sociolinguistics or dialectology for example, it is most likely, that “accent”
is used in the first mentioned sense to refer to a specific way of speaking. Works on phonology
or prosody use the term “accent” most likely in the second mentioned sense to refer to stress or
emphasis. However, in works covering both areas, the term can be highly ambiguous. Jilka (2000)
for example uses the term to refer to “foreign accent”, “pitch accent” and “word accent”. In this case,
the meaning of the term cannot be understood without the context, as can be seen in sentences like
the following one: “The total of 1300 pitch movements they produced is measured and compared
instrumentally, further analysis being restricted to the tonal movements associated with accented
words” (Jilka, 2000, p. 46).
The terms “accent”, “accentedness” or “accented” will be used in this thesis only in the first mentioned
sense to refer to a specific (marked) pronunciation – especially the pronunciation of non-native
speakers (the following sections provide more in-depth discussions of the associated terms and
concepts).
The experiment presented in chapter 6 covers some aspects of German phonology. To avoid confusion
in that context, the word stress will be used exclusively to refer to the prominence or emphasis of
a certain syllable within a word, the so called word or lexical stress – even though “(word) accent”
is often used in the respective literature1 .
1.1.2
Language acquisition vs. language learning
Some researchers explicitly distinguish between language acquisition and language learning. The
former is used to refer to the unconscious, effortless, spontaneous process which young children go
through when they are exposed to one or more languages and begin to speak. The latter term refers
to the process of consciously learning a second language later in life. The term acquisition thus
might refer to a naturalistic setting while the term learning might refer to an instructional setting
like the language class room.
Language acquisition and language learning are not distinguished in this thesis and the terms will
be used interchangeably according to the cited sources.
1.1.3
Native language – First language
The question of what exactly constitutes a person’s native language is not as easy to answer as
it might seem. Several different criteria have been proposed for the definition of what makes a
1
The considerations in this section hold also true for the German language and the respective usage of the term
“Akzent” in the literature. This is especially important in chapter 6, which cites several sources on German
phonology.
1.1. DEFINITIONS
11
specific language a person’s mother tongue or native language. Often, the term native language is
not defined at all in the literature on second language acquisition or foreign accent. Further, the
terms native language and first language (abbreviated L1) are often used interchangeably. There
are also several other terms. Lenneberg (1967) for example uses the term primary language. This
section will provide an overview of various possible linguistic definitions of what constitutes an
individual’s native language.
As a theoretical approach to a definition of the term native language one might consider a person’s
proficiency, (grammatical) competence or performance in a language2 . The language one person
“knows best” could then be called the native language of this person. Although there are numerous
tests assessing language proficiency (e. g. the “Test of English as a Foreign Language – TOEFL”
or the “Test of Spoken English – TSE”, or the “Deutsche Sprachprüfung für den Hochschulzugang
– DSH” for German), there is, unfortunately, no generally accepted measure of proficiency. There
is also no general agreement upon the terms competence and performance. This thesis will not go
into the details of the discussion on these two concepts, despite its general relevance to the issue of
second language acquisition.
Frequency of language use seems to be another good candidate for the definition of a person’s native
language. By this criterion, a person’s most frequently used language would be her or his native
language. This could also be tested by additional or more specific usage criteria such as a person’s
language of counting, of speaking to oneself, of swearing, dreaming and so on.
In bilingualism research the term dominant language is used to refer to cases with more or less
objective (testable) criteria like the ones mentioned so far (i. e. to a person’s most frequently used
language or the language a person knows best). This term is used in contrast to a person’s nondominant second languages. Still, definitions like these are usually not used explicitly in second
language acquisition literature.
In most cases, the criterion used to assign the label native to an individual’s language is the age at
which the respective language has been learned or acquired (compare section 1.1.2). The language
(or the languages) learned from infancy is said to be the native language. In this case, the terms
native language and first language are really synonymous. However, this criterion is not without
problems either, even though it may appear to be so on first sight (compare for example the case
of bilingual speakers in section 1.1.5).
As mentioned above, it is often the case with studies found in the literature, that all these different
criteria are met by one and the same language – or at least it seems to be so, as there are often
no further considerations regarding the validity of the label native language. Thus, a person’s first
learned language might also be his or her most frequently used language and as well his or her
language with the highest competence or performance. The criteria taken into account in the various
studies can often only be implicitly inferred from the descriptions of the examined subjects due to
the lack of explicit information on this matter.
As linguistically motivated as the above-mentioned criteria might be, they presuppose a precise
definition of a language. As Li (2005) points out, “there is no simple answer to the question ‘what is
a language?’ ” as there is no strictly linguistic definition of the term, or of the distinction between
language and dialect. Beddor and Gottfried (1995) mention some problems that may arise with
the designation of a person as a native speaker of a particular language. The language a person
learns at home during early childhood might be different from the one used in school – and this
2
The three terms proficiency, competence and performance will be used in an informal sense in this thesis without
further formal definition. In general, the terms proficiency and competence both refer to the linguistic knowledge
about a specific language or the ability to produce and perceive it, while performance refers to the actual usage
of a language.
12
CHAPTER 1. INTRODUCTION
will be exactly the case in a lot of societies worldwide. The home language can also be different
from the language(s) of the surrounding society a child grows up in. It is almost impossible to
determine a person’s language experience precisely and exhaustively. Many languages in the world
show considerable dialectal diversification. The term language will be used in this thesis to refer to
any linguistic variety or code in general, i. e. to both a language and to what some might refer to as
a dialect. Scovel (1969, p. 249) mentions an interesting operational definition of a dialect as being
the second language an adult can, given enough exposure, learn to speak without a foreign accent.
Depending on the listeners and the dialectal diversity of a language, dialectal differences might be
perceived as foreign and vice versa (Major, 2001). There are cases of experiments where utterances
of native speakers were (supposedly incorrectly) rated as having a foreign accent. One such case
is described by Scovel, further examples are cited for example by Long (2005). Considering single
instances (not the overall rating of a speaker), Flege et al. (1995) report in one experiment on a
possibility of 2.9 % “misclassifications” of the native speakers’ samples. In some cases the authors
attribute such confusion in judgment or other inconclusive results to different dialectal backgrounds
of the participants: Mack (1989) for example examined (among others) speakers of “CanadianFrench” and “French from France”. The study found inconclusive results on VOT discrimination,
and Mack theorized that a “careful study of cues to the /d-t/ contrast in various French dialects
would help clarify the source of the bilinguals’ apparent perceptual confusion”. A study reported
by Bongaerts (1999) examined L2 learners who reported to have been trained in the Received
Pronunciation – the supraregional standard variety of British English. The study included a control
group of native English speakers from the south of England and from the Midlands. The judges on
the other hand where native English speakers from the north of England. As a result, the average
ratings for the native speakers were rather low and half of the examined group of highly successful
L2 learners received higher ratings than the native speakers (see section 5.5 for a discussion of
such methodological issues). However, the general problem of distinguishing between language and
dialect will not be further discussed in this work.
Thus, examiners trying to assess the production or perception of a foreign or native language
are somehow doomed to rely on the self-identification as a native speaker as the only practically
applicable criterion in their selection of the subjects.
Another possible identification criterion would be the judgment by other speakers of a person as
a native speaker of their language. But this again comes with the above mentioned problems of
not being objectively testable because of the various (more or less unknown) influencing factors on
speech perception. However, from a statistical point of view, an identification of a given person as
a native speaker by a group of independent judges seems to be of a higher validity in comparison
to the judgment of only one single person (namely the speaker her or himself).
An example of contradictory judgments of native-likeness is given by Mack (1989, p. 188). Two
of her English-French bilingual subjects rated themselves as “slightly more proficient” in French.
They were not judged as being native speakers by other French native speakers and received higher
ratings in English than French – despite their self-declared higher proficiency in French.
With respect to the problem of foreign accents the last given “definition” of native language could
be reformulated as follows: a speaker’s native language is the language he or she produces without
a (perceivable) foreign accent. This, however is a circular definition as the term foreign accent
presupposes the concept of a native language (or a native language standard).
To conclude this section, for the scope of this thesis the terms native language and first language,
short L1, will be used synonymously. An explicit criterion for the designation of a speaker’s native
language will not be decided upon. As this section shows, this is not an easy task. The above cited
examples show that this is not just a theoretical or terminological problem. A lot of inconsistencies
1.1. DEFINITIONS
13
and disagreement on this issue can be found in the literature. As a consequence, I will adopt
the terminology and the speaker classifications of the respective cited sources without detailed
examination of their underlying views on this issue. However, where needed, problems obviously
attributable to vague, inconsistent or varying definitions of native language or native speaker will
be addressed. There will be also be a discussion of ramifications for methodology in foreign accent
research (see chapter 5).
1.1.4
Foreign language – Second language
Similar difficulties like those with the definition of native language arise with the definition of foreign
language. As the term also presupposes the definition of language, the respective considerations
mentioned above apply here as well. Without going into too much detail, a foreign language is
a language an individual does “not know” (yet). The term second language, short L2, is used to
designate a particular foreign language which a speaker does subsequently learn or acquire, i. e. at a
later stage in life after the native language has already been established (at least to some degree). A
distinction between second or third language will not be made in this thesis. It is however important
to note, that such distinctions are sometimes made, while most authors use the term second language
exclusively. Second language is thus used as a cover term in this thesis for any language or languages
acquired after the first. It is also worth noting, that the terms foreign language and second language
sometimes seem to be used synonymously.
1.1.5
Bilingualism
To make all these considerations somewhat more complicated it has to be mentioned that many –
if not most – people worldwide do not grow up with just one language and can thus be considered
as bilinguals (Major, 2001; Li, 2005). The term bilingual is far from being used consistently in the
literature. There is an important distinction between its usage within the literature on foreign accent
and second language acquisition in general and its usage in the context of bilingualism research.
In the former, the term is mainly used in its broadest sense to refer to people who have learned a
second language or are in the process of doing so – regardless of their level of proficiency or the
respective age at onset of learning. Such a usage can be found for example in works by Flege and
Fletcher (1992); Flege et al. (1999); Guion et al. (2000) or Levi et al. (2007). In such cases the labels
native language or L1 are easily assigned to one language and L2 to another.
Within the field of bilingualism research, however, the term is most often used in a much narrower
sense to refer to people who can use two languages in conversational interaction, often at such a
proficiency level where code-switching, the systematic change from one language to another in the
course of conversation, takes place3 . Subjects of research within this field are often adult speakers
who have learned or acquired a second language long before the time of examination or children
growing up multilingual (Li, 2005). In these cases the designation of a single native language can
be difficult and so the term is often not used at all.
To avoid terminological confusion4 the term bilingual will be used in this thesis exclusively to refer
to so called early bilinguals, namely people who have acquired two (or more) languages from early
childhood on – either simultaneously or subsequently. As it is an obvious prerequisite for foreign
3
4
Note that code-switching is a characteristic language mode of bilinguals and not a phenomenon caused by L2
learning deficiencies. In other words, code-switching is not an instance of a “syntactical foreign accent”.
Li (2005, p. 6) gives in table 0.1 an example of 37 different terms which have been used in the description of
bilingualism.
14
CHAPTER 1. INTRODUCTION
accent research to examine subjects who actually speak more than one language, the label bilingual
can be misleading and is not needed to refer to these examined speakers in general.
1.1.6
Interlanguage phonology
The term interlanguage is used to denote a “separate linguistic system” which is responsible for a
learner’s “attempted meaningful performance” in a target language norm (Selinker, 1972, p. 214).
In other words, interlanguage is used to denote a linguistic system of an L2 learner which is (a)
separate from that learner’s L1 system and (b) employed to express a communicative utterance
in an L2. This interlanguage system is usually different from the system of an idealized L2 native
speaker (i. e. the L2 norm). It is also important to mention, that this system is not static but
constantly changing over time – as it is the case with all language knowledge.
1.1.7
Foreign accent
It is not easy to give a precise definition of foreign accent. There is no generally accepted definition
among researchers. The case is comparable to the difficulties of defining such diversely interpreted
yet commonly used terms like native language or bilingualism.
Flege (1987a) defines foreign accent as “the perceived effect of many discrete and general differences
in sounds produced by native and non-native speakers”.
Another approach defines foreign accent as “. . . all those features of interlanguage speech which
differentiate learners according to their native language backgrounds” (Ioup, 1984, p. 2). This means
foreign accent can also be seen as a manifestation of incomplete mastery of an L2 phonetic system.
Long (1990, p. 255) defines incomplete mastery of an L2 as the “objectively identifiable differences
between the underlying linguistic knowledge systems of SL [= L2] speakers and monolingual native
speakers of a language”.
The expressions overall foreign accent or degree of foreign accent will be used in this thesis to refer
to foreign accent from the above cited Flegean point of view – i. e. foreign accent as the perception of
a speaker as non-native. This is important to note, since foreign accent – and especially its degree –
is usually measured by the judgments of listeners who are native speakers of the respective language.
Thus, most of the studies cited in this thesis did not explicitly examine all the individual segmental
or supra-segmental deviances in the speech of L2 learners. Jilka (2000, p. 9) emphasizes that “only
those deviations that are perceived as such can be considered instances of foreign accent”.
1.2
Summary
Not only the vague use of terminology, but more severely its practical consequences can lead to
unreliable or incomparable results in foreign language studies.
The term accent will be used exclusively in reference to the foreign accent phenomenon. It thus
refers to the (perceived) differences on the phonetic and phonological level in the speech of nonnative speakers. With respect to word stress or emphasis the term accent will not be used, even
though it is often used synonymously in the respective literature.
The terms native language and first language (L1) will be used synonymously in this thesis. They
will be adopted from the cited sources, but terminological problems or their practical consequence
1.2. SUMMARY
15
will be addressed were necessary. Caution is needed whenever the issue of native speakers and
comparisons of different speakers is touched. The same holds true for issues related to the concept
of native language or language comparison in general.
The term second language (L2) is used as a cover term for any language (or languages) acquired
after the first.
The term bilingual or bilingual speaker will be used to refer only to those individuals who have
either acquired two or more languages simultaneously in early childhood, or who have acquired
a second “native” language at an early age. Diverging use found in the cited sources will not be
adopted.
A distinction between language learning and language acquisition will generally not be made in this
thesis. It is, however, addressed in section 2.3, where the influence of formal instruction on degree
of foreign accent is discussed.
Chapter 2
Factors affecting degree of foreign
accent
Not every L2 learner reaches the same level of proficiency. Some attain native-like pronunciation;
some retain a strong foreign accent. Such everyday observations of differences in degree of foreign
accent lead to the question: what affects foreign accent?
Examining the speech of non-native speakers, researchers aim to find out more about the phenomenon of foreign accent and possible factors affecting its degree. This chapter gives an overview
on variables in experimental studies on L2 speech production and foreign accent. Various factors
influencing degree of foreign accent have been identified in a wide range of studies. Most of these
factors correspond to characteristics of the examined speakers like gender, age or their previous
language experiences. From a phonetician’s point of view, who usually examines speech production,
perception or its acoustic features, such variables could be called extralinguistic, since they correspond primarily to characteristics of the examined speakers and not to acoustic features of the L2
speech utterances. Acoustic characteristics of the speech signal which contribute to the perception
of foreign accent are discussed in chapter 3.
One of the most important of these external factors seems to be the age of an individual at the
beginning of L2 learning or acquisition – abbreviated as age of learning (AOL). Other factors
supposed to be important for the discussion of degree of foreign accent are the age at which an
L2 learner moves to an L2 speaking community or the length of his or her residence within such a
surrounding. These and other factors are discussed in the following sections, beginning with speakerdependent factors. At the end of this chapter some speaker-independent factors affecting degree of
foreign accent are addressed.
2.1
Affective and psychological factors
Among the factors affecting degree of foreign accent, some affective or psychological factors like motivation, language learning aptitude, native language loyalty or the learner’s IQ have been proposed
by various researchers.
16
2.1. AFFECTIVE AND PSYCHOLOGICAL FACTORS
2.1.1
17
Motivation
Motivation has been suggested by some authors as one possible factor influencing the degree of
foreign accent. Some of the suggested motivational variables are for example “professional motivation”, “integrative motivation”, “instrumental motivation” or “concern for L2 pronunciation accuracy”
(Flege, 1987a; Piske et al., 2001). Piske et al. reviewed several earlier studies that examined the
influence of motivation on the degree of foreign accent. They conclude from the existing literature
that “most studies [. . . ] have reported at least some influence of motivation on the outcome measures”, but they also emphasize that motivation has only little general effect and that it has not
been quantified precisely. A factor like a person’s motivation (whether intrinsic or extrinsic) cannot
be measured directly (or easily) with the usual methods within the scope of experimental phonetics. A deeper knowledge of the psychological research and theories of motivation is needed for a
more comprehensive study of the influence of motivation on a speaker’s foreign accent. Although
motivation is often mentioned as one (possible) factor affecting degree of foreign accent, motivation
remains in most cases a vague concept that is not as extensively examined as other factors. For example Piske et al. state that professional motivation “may be a potent factor for groups of subjects
who are required by their profession to speak an L2 without a foreign accent, but not so much for
ordinary immigrants”. This assumption, however, is then not further examined (Piske et al., 2001,
p. 211). Motivation is often measured by using a questionnaire. In such studies, the subjects are
for example required to rate the importance of good or accent-free pronunciation of their L2 (e. g.
Flege et al., 1995, 1999).
In addition, the factors affecting motivation have to be considered as well. How (if at all) does motivation and its influence on L2 pronunciation change with the age of learning? Are there differences
in motivation between early and late learners? Does a speaker’s motivation to speak without an
accent decrease or increase with respect to the length of residence or his or hers L2 use patterns?
How does motivation change with increased proficiency? Piske et al. left these questions open for
future research.
Whatever the influence of motivation on degree of foreign accent might be, it is assumed – as already
mentioned – that it is only of relatively little importance.
2.1.2
Language learning aptitude
Language learning aptitude is another variable that influences the degree of foreign accent. It is also
a variable that is more psychological than linguistic in nature.
A case study by Novoa et al. (1988) describes a male native English speaker with an above-average
“talent” for foreign languages. He has acquired five languages with reported native-like performance,
all of which after the age of 15. After learning French, German and Spanish in high school, he
learned Moroccan Arabic “with unusual ease relative to his peers” and “picked up” Italian. Novoa
et al. explicitly state that native speakers attested the speaker’s lack of a foreign accent in all
these languages. One conclusion the authors draw is that “generally superior cognitive functioning
is not necessary for exceptional second-language acquisition”. The examined speaker had an average
IQ, average musical abilities and an average ability for manipulation of “abstract verbal concepts”.
The causes for such L2 learning abilities cannot be determined from this single case study alone.
The authors observed that the reported case is “consistent with some theories of second-language
acquisition and contradicts others”.
Cases of such highly successful L2 attainment are attributed to individual characteristics, which
distinguish these learners from “normal” people. It is argued that such cases are “superexceptional”
18
CHAPTER 2. FACTORS AFFECTING DEGREE OF FOREIGN ACCENT
and, by themselves, provide no counterevidence to theories which predict that a native-like attainment in a second language is only possible below a certain age (Bongaerts, 1999; the relevance of
age of learning is discussed in section 2.7).
Piske et al. (2001, p. 202) emphasise that studies relying on controlled conditions in their examinations of aptitude factors are not conclusive. Suggested factors affecting degree of foreign accent are
for example musical ability or the ability to mimic unfamiliar speech sounds. The latter has been
identified by various studies as a significant predictor of foreign accent. The emergence and nature
of such abilities is still not fully understood, however.
2.2
Gender
Gender has been found in some studies to affect degree of foreign accent. Flege et al. (1995) for
example observed that below the age of learning of 12 (see section 2.7) the examined female speakers
received higher ratings for their pronunciation (meaning their speech samples were perceived as less
accented) in comparison to male speakers. Above the age of 16, however, the ratings for the male
speakers were higher. Whether such results are generalizable is not clear, as earlier studies have
provided divergent results.
Piske et al. (2001) conclude from a review of the existing literature that the available results do not
“lead to any strong conclusions”. In contrast to the above cited findings, some studies suggest that
the effect of gender on degree of foreign accent “may vary as a function of AOL and amount of L2
experience”. In summary, there is conflicting evidence, and the influence of gender on L2 acquisition
and foreign accent remains still to be determined.
2.3
Formal L2 instruction
The introductory section 1.1.2 addressed the difference between language acquisition and language
learning. Usage of these terms in the literature might sometimes be confusing. According to the
definition given in section 1.1.2, research on second language acquisition actually studies language
learning, namely the process of consciously learning a second language usually after early childhood.
This concept of second language acquisition can be further divided into naturalistic L2 acquisition
(“outside the classroom”) and foreign language teaching (“inside the classroom”) (see e. g. Wode,
1980). Although some authors explicitly differentiate these concepts, this distinction is not made in
most of the studies on foreign accent.
Piske et al. (2001, p. 200) summarize that the various studies which examined the influence of
formal instruction on degree of foreign accent “have not produced encouraging results for language
teachers”. According to their review, there is no clear evidence, that formal instruction does affect
degree of foreign accent. In other words, degree of foreign accent does not depend on whether the
L2 learner receives formal instruction inside a classroom or whether he or she learns the language
without a language teacher. An explanation for this finding which they provide is the little attention
that pronunciation receives in most foreign language classrooms. Contrary to these findings, they cite
studies that revealed an effect of instructional variables under certain controlled conditions1 . Such
findings lead to the conclusion that “formal instruction” is too general a term to be used without
1
E. g. “intensive training in the perception and production of English sounds” or special “ ‘prosody-centered’ phonetic
training”.
2.4. L1 BACKGROUND AND L1-L2 COMBINATION
19
in-depth examination of the used methods and the precise circumstances of formal instructions
provided to the examined learners.
2.4
L1 background and L1-L2 combination
The term L1 background refers to a speaker’s native language(s), and in a more general sense, to the
sum of his or her previous language experience. As foreign accent research necessarily involves two
languages, the L1 background of a speaker will be discussed together with the L1-L2 combination
as a factor on the degree of foreign accent. As opposed to the majority of factors discusses in this
chapter (which are extralinguistic in nature), the current two are based on the linguistic structures
of the involved languages.
L1 background and L1-L2 combination are often discussed in the literature on a generally global
level (without studying acoustic characteristics of non-native speech). This section gives a short
overview of such global considerations on L1 background and L1-L2 combination.
The most trivial precondition for the emergence of a foreign accent is that the native language of
a person is different from a spoken second language, otherwise a foreign accent will not appear in
that persons speech2 (compare section 1.1.3).
Questions which need to be addressed with respect to a speaker’s L1 are (1) Does the L1 background
affect a speaker’s L2 foreign accent – and how? (2) Is an L1-L2 combination of closely related, i. e.
phonetically similar, languages more likely to result in a foreign accent or is the opposite the case,
that the more dissimilar two languages are the more likely foreign accents become? These are rather
non-trivial questions upon which there is still disagreement.
It is a popular observation, that the origin of non-native speakers, i. e. their L1 background, can
be recognized from a characteristic way of speaking. Wode (1980, p. 127) mentions that “there are
specific error types that are simply characteristic of a given L2/L1 combination”. Similar views
can be found in a wide range of literature on foreign language teaching about typical errors and
difficulties for the learners of a particular language. One such “typical error” that has been frequently
examined is the confusion English /ô/ and /l/ by native Japanese speakers, which represents a
phonemic contrast in English but not in Japanese (e. g. Iverson et al., 2003; Yamada, 1995).
That different native languages lead to language specific patterns of foreign accents, was experimentally demonstrated for example by Ioup (1984). The study she describes aimed to show that
the native language of a speaker influences an L2 system mainly at a phonological level (and not at
a syntactic one). One of her findings was that native speakers are able to group non-native speakers according to their respective native languages based on phonological cues, i. e. on their foreign
accent. Thus, different native languages lead to recognizably different patterns of foreign accents.
This effect that a person’s native language can have in shaping his or her foreign accent in an L2 is
by some authors attributed to a phenomenon called transfer (Ioup 1984, see section 4.3 for further
discussion).
Descriptions of frequently observed mispronunciations of L2 learners come for example from foreign
language teachers (see e. g. Ortmann, 1976). However, it is not possible to draw general conclusions
from such effects of L1 background on L2 phonetics, i. e. its segmentals, syllable structure or prosody.
2
Though extremely rare, there are cases where a person (after serious brain damage) appears to be speaking his
or her native language with a “foreign accent”. This frequently discussed, so called foreign accent syndrome won’t
be considered here, as it is a pathological phenomenon which goes beyond the scope of this thesis and is generally
not in the focus of foreign accent research.
20
CHAPTER 2. FACTORS AFFECTING DEGREE OF FOREIGN ACCENT
Wode (1980) points out the great pronunciation variability found not only among L2 learners.
The observed typical pronunciation errors have to be interpreted as “marking the range of the
phonological variation of learners” (Wode, 1980, p. 127) rather than actual error predictions.
So, the answer to the first question whether a given L1 background affects a speaker’s L2 foreign
accent seems to be yes. However, whether this takes place in a precisely predictable way cannot be
answered with certainty. It is not just a popular known cliché that the native language of a speaker
can often be recognized from a characteristic foreign accent. The L1 background of a speaker affects
L2 foreign accent in a way characteristic for all speakers of that L1. It has to be determined to what
degree this general observation can be used to predict particular phonetic deviances that a learner
of the L2 is likely to make. As Piske et al. (2001, p. 193) emphasize, the effect of L1 on degree of
foreign accent remains uncertain.
2.4.1
Language distance
Is there a correlation of phonetic or phonological similarity of two languages and degree of foreign
accent? If there were predictable difficulties or limits on ultimate attainment for the learner of a
language, this would provide a means of predicting a characteristic foreign accent.
That difficulties in learning to pronounce an L2 are not independent from the native language can
be concluded from the above cited finding that speakers of the same native language realize an L2
with similar foreign accents – apparently specific to their L1. Wode compared “typical errors” made
by learners of English L2 with different L1 which showed that the mispronunciations made by the
L2 learners depend on their L1 (Wode, 1980, p. 133).
It is a wide-spread popular belief that the more similar two languages are the easier the foreign
language is learned. And indeed, the view that the typological language distance affects (a) the rate
of acquisition and (b) the ultimate attainment, is supported by second language research literature
(Long, 2005). Brière (1966) showed in an experiment that target language sounds which have close
equivalents in the L1 of the learner – either phonetically or on a phonemic level – are easier to
learn than sounds that do not have such equivalents. The opposite observation is reported by Flege
and Hillenbrand (1984) who conclude from a study with native English learners of French that
French /y/, a sound that has no counterpart in English, was produced with relative great accuracy.
On the other hand, the French /u/ sound was produced incorrectly by a group of “inexperienced”
learners, and somewhat better, albeit incorrect as well by “experienced” learners. Similar results were
obtained for French /t/. English learners failed to produce it with short-lag VOT values typical for
monolingual native speakers of French. These findings contradict the former mentioned assumption
that the more similar the L2 sounds are to L1 sounds, the easier they may be to learn.
Flege et al. (1995) cite earlier findings where Chinese speakers of English, with an average age of
arrival in the United States of 7.6 years, were rated3 significantly lower for their spoken English
sentences compared to native English speakers, whereas native Spanish immigrants, with an average
age of arrival of 6, received ratings not significantly different from native speakers (compare section
2.6.2). Some authors attribute such differences between groups of learners according to their L1 to
the typological language distance between the two languages (e. g. Long, 2005).
Mildner and Horga (1999) point out that the typological differences between the sound systems of L1
and L2 may tell something about possible areas of difficulty in L2 acquisition, “but not necessarily
about the cues in L2 perception or production that non-native speakers may use differently than
the native ones”. In a study on the relations between L2 proficiency and the acoustic features of
3
Compare section 5 for methodological issues.
2.4. L1 BACKGROUND AND L1-L2 COMBINATION
21
vowels, they examined two, with respect to their vowel spaces typologically distant languages. The
reported study examined the production of English vowels by native Croatian speakers. English,
i. e. Received Pronunciation which was the examined variety of English, has eleven monophthongal
vowels while Croatian has only five. One of their conclusions is that native speakers of Croatian
reorganize the English vowel space according to Croatian principles. Regardless of their proficiency
in English, the Croatian speakers relied heavily on duration in distinguishing between the English
vowels /i/, /u/ and /I/, /U/ (where Croatian has only two categories /i/ and /u/) and not only
on spectral features like native English speakers. The level of proficiency in English was found to
have “no significant effect on either the position of most vowels in the vowel space or the trade-off
between duration and quality cues”.
In reference to a possible hierarchy of learning difficulty, Brière (1966, p. 795) points out, that such
hierarchies must be based on “exhaustive information at the phonetic level, rather than on descriptions solely in terms of distinctive features or allophonic memberships of the phoneme classes”. He
mentions the problems of comparing “convergent” and “divergent categories” of L1 and L2 based
on allophonic descriptions. At the phonetic level, sounds of the L2 “are never really equal” to L1
sounds. The above mentioned difference between English /t/ and French /t/ is one such example.
The study of foreign accent (and L2 acquisition in general) in conjunction with contrastive analyses
of particular L1-L2 combinations lead for example to the contrastive analysis hypothesis and theoretical explanations which attribute foreign accent phenomena to phonetic transfer and interference
(see section 4.3 for further discussion). Flege uses his “speech learning model” to explain findings
indicative of similar sounds in L1 and L2 as being more difficult for the learner than dissimilar
sound (see section 4.5).
Whether an appropriate measure of language distance can be formulated is not clear. Evidence from
existing studies is not conclusive whether such constructed hierarchies of difficulty would reveal
that more similarity increases or whether it decreases difficulty of learning and thus, contributes to
degree of foreign accent. If language distance is a factor on degree of foreign accent, this influence
might be formulated in terms of a function of the variables L1 and L2, permitting predictions for
the shape or degree of foreign accent. Existing research on that issue suggests that a relationship
between L1, L2 and degree of foreign accent exists. However, whether findings from individual
studies are generalizable is debatable, since only a few language combinations have received indepth attention so far. The most studies examined English as the target L2. Other often examined
target languages are French, German, Spanish, Hebrew or Dutch. The list of the examined subjects’
L1s includes, apart from the already mentioned languages, the following ones: Arabic, Chinese
(Mandarin, Taiwanese), Italian, Japanese, Korean, Persian, Russian, Swedish, Thai or Turkish (see
e. g. Long, 2005; Piske et al., 2001). Compared to the large number of existing natural languages and
the range of possible L1-L2 combinations, this is only a small sample. The sample is even smaller,
considering the fact that not even all of the possible combinations from the above mentioned list of
languages have been thoroughly examined so far.
2.4.2
L1 proficiency and influence of L2 on L1
A phenomenon related to the above mentioned considerations is the influence of an L2 on a speaker’s
L1 which might result in a partial or complete loss of L1 proficiency. This phenomenon is called
individual first language attrition.
Piske et al. (2001) assume that a high level of proficiency in L1 is more likely to be maintained if a
speaker uses it frequently, even after living a long time in a predominantly L2 speaking surrounding.
They also state, related to this issue, that the self-estimated L1 proficiency is significantly correlated
22
CHAPTER 2. FACTORS AFFECTING DEGREE OF FOREIGN ACCENT
with degree of foreign accent, such that higher self-estimated L1 proficiency correlates with higher
degrees of foreign accent. This correlation, however, is not independently affecting degree of foreign
accent from age of learning (see below).
On the other hand, Guion et al. (2000) found no differences in L1 proficiency, although their subjects
(Quichua-Spanish and Korean-English speakers4 ) varied in amount of L1/L2 use.
An L2, i. e. exposure to it in an L2 speaking surrounding or the use of it, can change the production
of a speaker’s L1. Such findings are cited for example by Piske et al. (2001).
Differences in a speaker’s L1 are by definition not the subject of interest in foreign accent research.
However, the effect of L2 on a speaker’s L1 might have general implications that have to be taken
into account in studies on foreign accent. This is related to the already mentioned problem of
defining criteria for native-likeness. Methodological implications are discussed in section 5.5.
The possible effects of L1 use will be discussed in the following section.
2.5
Language use patterns
Besides general L1/L2 language use, several contextual settings of language use can be distinguished:
language use at work, at home (i. e. with the family, the partner, spouse etc.), with friends or in
other social situations and so on.
Flege et al. (1995) report that language use factors account for 15 % of variance in the foreign
accent ratings, and thus making it the second most important factor in their analysis (preceded
only by age of learning, see below). The most important language use factor for male subjects was
language use at work, followed by social use. For female subjects the most important factor was,
according to this report, social use, followed by home use (Flege et al., 1995, p. 3131). Piske et al.
(2001) found that continued frequent L1 use contributed to a significantly stronger foreign accent,
a result also reported by Flege et al. (1997). The same effect was found for late learners as well as
for early learners, i. e. bilinguals. Even temporal changes in language use patterns are reported to
affect speech production, which leads to some implications for methodological issues (see chapter
5). They found that self-estimated L1 use is another independent predictor for foreign accent. Note,
that the earlier mentioned self-estimated L1 proficiency was not found to be an independent factor
in the same study.
Guion et al. (2000) confirmed the conclusion that amount of L1 use affects L2 production. The
examined Quichua-Spanish speakers who used their L1 more frequently had significantly stronger
accents then the examinees with lower L1 use. Flege et al. (1997, p. 184) even theorized that the
presence of another language may have been the most important difference between the examined
monolingual native English speakers and the Italian-English speakers (and not their age of learning
English). Other studies on the other hand found no significant effect of language use on degree of
foreign accent (cited in Piske et al., 2001).
It has to be mentioned, that the hypothesized effects of language use affect not only degree of foreign
accent in an L2 (or general L2 proficiency), but also proficiency in L1. The phenomenon of first
language attrition has already been briefly addressed in section 2.4. In fact, language use is said to
4
Guion et al. (2000) refer to the examined speakers as “bilinguals”, but according to the definition adopted for this
thesis, this is not appropriate, as the speakers in the first mention group all “learned Quichua as a first language
at home and later learned Spanish as an L2 when they began school or work”, thus, later than “early childhood”.
The Korean-English speakers do not meet the criteria either (compare chapter 1.1).
2.6. EXPOSURE TO L2 SURROUNDING AND AMOUNT OF L2 EXPERIENCE
23
be one major cause for L1 attrition. However, the domain of phonetics and phonology has received
only little in-depth attention in the research on first language attrition (Seliger and Vago, 1991).
2.6
Exposure to L2 surrounding and amount of L2 experience
In considering the influence of the exposure to an L2 speaking surrounding on L2 learners’ pronunciation, two important factors can be observed: (1) the amount of exposure, and (2) the age at
which he or she is first exposed to an L2 surrounding. The amount of exposure is often measured
by the length of residence (LOR) of a learner in an L2 speaking surrounding. The LOR is also
supposed to be a testable indicator of amount of L2 experience (Piske et al., 2001). The age of first
exposure is usually measured by an immigrant’s age of arrival in an L2 speaking country.
2.6.1
Length of residence
The length of residence (LOR) of a non-native speaker in an L2 speaking surrounding is reported
by some authors to affect degree of foreign accent. It is the second most frequently studied variable
on degree of foreign accent (after AOL, see below). A significant influence of LOR on foreign accent
ratings by native listeners is reported for example by Flege et al. (1995). On the other hand, several
studies provide contradicting findings (reviewed e. g. by Long, 1990, or Piske et al., 2001). A possible
explanation for contradicting findings regarding the effect of LOR is provided by Flege (quoted in
Piske et al., 2001). He hypothesizes that LOR affects L2 learners only in an “initial phase of rapid
learning” (see 2.7 below). According to this view, the L2 learner proceeds faster through early stages
of learning and is in this phase affected by the language surrounding. Studies which did not find a
significant effect of LOR are hypothesized to have examined too narrow a range of LOR values.
In summary, the existing literature indicates that the effect of LOR seems to decrease with the
increase of a learner’s level of proficiency. For highly experienced learners, additional years of residence are unlikely to change degree of foreign accent significantly. However, this does not at all
mean that there are no changes in L2 (or L1) proficiency over time (Piske et al., 2001).
2.6.2
Age of arrival
The age of first exposure to an L2 is often equalled with the age of an individuals’ arrival in a
predominantly L2 speaking country. A lot of studies examined the proficiency of immigrants in
the dominant language of their new country. The age of first exposure to an L2 surrounding of
immigrants is commonly called “age of arrival”, short AOA.
Sometimes it is assumed (or simplified) that the AOA does also mark the onset of L2 acquisition
(see next section), so these two variables are equalled and used interchangeably (e. g. Flege et al.,
1995).
2.7
Age of L2 learning (AOL)
The age of an individual at the beginning of second language learning, short age of learning (AOL),
is by far the most frequently cited and examined factor on foreign accents, and according to some
authors the most important one (Flege et al., 1995; Long, 1990; Piske et al., 2001).
24
CHAPTER 2. FACTORS AFFECTING DEGREE OF FOREIGN ACCENT
A common view is that “earlier is better” and, it can be added, that “later is faster” (Long, 1990,
2005). In other words, AOL affects not only the ultimate attainment but also the rate of L2 learning.
In early stages of learning, older learners proceed faster than younger ones. On the other hand, the
earlier in life a child starts with L2 acquisition the higher the level of ultimate attainment will be,
including the ability to achieve accent-free pronunciation. However, even an early start before the
age of four is no guarantee for a native-like pronunciation of an acquired L2. Flege et al. (1995, p.
3128) examined native listeners’ judgments on the pronunciation of English sentences of 240 native
Italians (with a control group of 24 native speakers). AOL5 was found to be the most important
factor accounting for an average of 59 % of variance in the foreign accent ratings. With an AOL of
less than 4 years the percentage of non-native speakers who received native-like ratings was 78 %.
On average, the ratings of the speakers’ pronunciation were significantly lower only above an AOL
of 7.4 years and they decreased with increasing AOL. From the speakers with AOL above the age
of 16 years no one received native-like ratings.
Contradicting evidence of higher pronunciation abilities of late learners compared to early learners
is explained by the initial rate advantage of late learners over younger ones. It is argued, that studies
suggesting better performance of late learners were actually examining learning rate, confusing it
with ultimate attainment (Long, 1990, 2005).
Long (1990, 2005) argues, only young learners can attain a native-like level of proficiency, but will not
necessarily do so. After an AOL of six the achievement of accent-free pronunciation begins to become
unlikely for most learners. With an AOL above twelve, native-like attainment in L2 pronunciation
is said to be generally impossible. Long concludes these age limits from reviewing the findings of
various studies. Other authors provide different AOL above which accent-free pronunciation, i. e.
native-like ultimate attainment, is said to be unlikely or impossible. The general effect of AOL on
degree of foreign accent (“earlier is better”) is nevertheless accepted by most researchers.
Older learners on the other hand have an initial rate advantage over younger ones. They acquire
early stages of an L2 faster than younger learners. According to Long (1990) this effect lasts only
for a shorter period in the acquisition of phonology compared to the acquisition of other linguistic
domains, and, to be measurable, a minimum amount of exposure to the L2 sounds is needed. This
initial advantage does not guarantee successful acquisition, however. Late learners usually do not
succeed in acquiring L2 phonology without a foreign accent. Neither does an early start guarantee
native-like performance compared to monolingual native-speakers (Flege et al., 1997).
AOL is a factor influencing L2 acquisition in general, but the effect is not the same in all linguistic
domains. The most often examined domains in L2 acquisition are morphology and syntax. According
to Long (1990) the upper limit of native-like attainment in the domains of morphology and syntax is
around age 15 (i. e. later than the proposed AOL limit for native-like attainment in pronunciation).
The effect of AOL on other domains, like semantics, pragmatics or lexis needs still to be determined.
Although such findings are important for a theoretical explanation of the observed influence of AOL
on foreign accent and L2 acquisition in general, they go beyond the scope of this thesis. An overview
of proposed theories explaining the foreign accent phenomenon and its relation to AOL is given in
chapter 4, especially in the section on the critical period hypothesis.
2.8
Speaker-independent factors
Besides factors dependent on speaker characteristics, some speaker-independent factors have also
been suggested to affect degree of foreign language. However, these factors deal with the perception
5
Flege et al. (1995) use AOA and AOL interchangeably (compare previous section).
2.8. SPEAKER-INDEPENDENT FACTORS
25
of the listener rather than the speech production of the speaker. As only the latter is within the
focus of this thesis, such factors can’t be discussed here in detail. Only a short overview will be
given.
The previous sections focus on speaker-dependent factors, i. e. factors that depend on the language
experience of the L2 learner or other individual characteristics. Such factors, like a speaker’s L1
background or age of learning, have been identified to affect a non-native speaker’s speech production
and thus to contribute to that speaker’s L2 foreign accent. Besides such speaker-dependent factors,
experimental studies have revealed that the perception of foreign accent is not exclusively based on
the “many discrete and general differences in sounds produced by native and non-native speakers”
(Flege, 1987a). The degree of foreign accent does also depend on the listener’s perception.
Levi et al. (2007) for example examined the effects of listening context and lexical frequency on
the perception of foreign-accented speech. Speech samples were presented to the listeners in two
different contexts: one exclusively auditory and one combined context with additional orthographic
display of the spoken samples. They found that high frequency words, i. e. words occurring more
often in a language6 , are perceived constantly less accented than low frequency words. This effect
was attenuated by additional orthographic presentation of the spoken words to the listeners. The
“auditory+orthography context” had also the effect that native speakers were generally perceived
as less accented and non-native speakers as more accented. Additional acoustic analysis of the
speech samples confirmed the existence of differences between native and non-native speakers. These
acoustic differences, however, were not correlated to lexical frequency and thus did not account for
the observed perceptual effects.
Other speaker-independent factors that were found to affect the perception of foreign accent are (1)
the resolution of the used rating scales, (2) the elicitation techniques, (3) the proportion of native
speakers among the examinees, (4) the range of L2 pronunciation proficiency included in the rating
set, and, finally, (5) the linguistic experience of the listeners.
Southwood and Flege (1999) suggest that rating scales with fewer intervals may produce ceiling
effects. This means, that scales which are not sufficiently sensitive, i. e. scales which have too few
points, cannot be used to reveal small differences between native and non-native speakers.
Different elicitation techniques also affect the perception of foreign accent. Long (2005, p. 302)
questions the validity of data obtained from limited, controlled samples of performance as indicators
of the overall L2 abilities. With respect to the measurement of foreign accent on the other hand, it is
argued, that samples above the level of isolated words would possibly be distorted with phenomena
from other linguistic domains like prosody, morphology, syntax or lexis. The general rule is that the
more natural and language-like a speech sample is, the less native-like it is likely to be perceived or
rated by native language listeners.
According to Flege and Fletcher (1992) the proportion of native speakers among the group of
speakers under investigation affects the degree of perceived foreign accent. The more native speakers
are included, the more accented the non-native speakers are perceived. A similar problem, which is
stressed by Long (2005), are varying levels of proficiency of the speakers included in the date which
is to be rated. He states how judges “may be fooled into accepting some of the near-native samples
as native” because of the presence of obviously non-native samples. Flege and Fletcher conclude
that ratings of foreign accent are not absolute, but influenced by the range of the talkers’ levels of
proficiency included in the data which are to be rated by the listeners.
It has been found that linguistically inexperienced, “naïve”, listeners tend to perceive a stronger
degree of foreign accent than linguistically trained listeners like linguists or foreign language teachers
6
The frequencies of words are usually derived from language corpora. Levi et al. (2007) used the CELEX database.
26
CHAPTER 2. FACTORS AFFECTING DEGREE OF FOREIGN ACCENT
(Piske et al., 2001).
The above cited results suggest that factors influencing the perception (or the rating) of foreign
accent exist, which are actually independent of a speaker’s speech production. Even though, such
speaker-independent factors are not relevant for the acoustic analysis of foreign accented speech,
they have implications for the theoretical description and the understanding of the foreign accent
phenomenon. They provide also implications for methodological issues regarding experiment designs
in foreign accent research (see section 5).
2.9
Summary
If one thing can be concluded from the various findings in L2 research, it is that multiple speakerdependent factors affect degree of foreign accent.
Gender may be a factor, but its exact influence on foreign accent is a matter of disagreement among
L2 acquisition researchers. Similar conclusions can be drawn for factors like motivation, language
learning aptitude or formal instruction.
Further, it has been suggested that contrastive analyses of the L1 and L2 might reveal sources of
predictable difficulties for the L2 learner which contribute to degree of foreign accent. There is no
doubt that a speaker’s L1 affects foreign accent. The exact nature of this influence and the role of
language distance or phonetic similarity between L1 and L2 are still a matter of disagreement.
Early learners are more likely to speak an L2 accent-free, but do not necessarily do so. In general,
overall degree of foreign accent increases with increased AOL. The proportion of L1/L2 language
use does also affect degree of foreign accent. The more a speaker uses his or her L2, the less it is
likely to be foreign-accented. Another important factor seems to be the age at first exposure to a
predominantly L2-speaking surrounding as well as the length of residence of a learner within such a
surrounding. In general, the lower the AOA and the longer the LOR, the lower the degree of foreign
accent is expected to be.
In addition it has been briefly mentioned that there are factors to degree of perceived foreign accent
which are speaker-independent. Such factors are for example (in experimental settings): differences
in the used rating scale, the elicitation techniques, the context and composition of data which are
presented to the listeners, and the listeners’ linguistic experience. Such factors might not be relevant
in acoustic examinations of foreign accent but they have to be considered by the experimenter in
order to eliminate unwanted influences.
In summary, foreign accent has to be seen as a relative phenomenon that changes over time and is
dependent on various speaker-dependent variables and characteristics of the L1 and L2, as well as
on speaker-independent factors, i. e. on the listeners’ perception and the surrounding circumstances.
Chapter 3
Phonetic and phonological
manifestations of foreign accent
This chapter provides a short overview on acoustic, i. e. phonetic and phonological manifestations of
foreign accent and the research on this matter. In section 1.1.7 it has been mentioned that judgments
on foreign accent are usually based on the overall impression of a speaker’s pronunciation. Such
impressionistic judgments do not explicitly refer to the various deviances in the speech of the nonnative speaker. Often, aspects like intelligibility or acceptability are involved in judgments about a
speaker’s degree of foreign accent.
Anderson-Hsieh et al. (1992) enumerate the main areas of pronunciation which need to be examined
in foreign accent research as follows: segmentals, prosody (suprasegmentals), syllable structure, and
voice quality. From these “areas of pronunciation” only the first is within the scope of this thesis, i. e.
the segmentals. The errors1 that are observed within this domain are usually categorised as either
substitutions or modifications of single sounds. There is, however, no strict distinction between
these two. What counts as a substitution and what as a modification depends on the examined
L1-L2 combination. If a non-native speaker realises an L2 sound in such a way, that it resembles
another sound from his or her L1 or L2, than it can be called a substitution – for example the
pronunciation of English [D] as [d]. A modification, then, is a deviance from the L2 norm, which has
no (obvious) corresponding L1 sound – for example the realization of plosives with a voice onset
time (VOT) – the time between the release of a stop consonant and the onset of voicing – which
neither corresponds to the L1 of the speaker nor to the L2.
Flege (1987b) observed that “the aim of most instrumental studies has not been to establish which
dimension(s) contribute(s) most importantly to foreign accent, but to determine to what extent 2
L2 learners [. . . ] differ from native speakers”. In other words, the focus of studies where acoustic
deviances are measured lies on the magnitude of the examined deviances, and not on the question,
which acoustic dimensions exactly contribute (most) to the perception of a foreign accent.
Brennan et al. (1975) report on a study which suggests that the degree of foreign accent is correlated
to the amount of segmental mispronunciations, i. e. the amount of segmental deviances from the
native speakers’ norm (see chapter 5 for a further discussion of this study by Brennan et al.).
1
2
The term “error” is widely used in the literature to refer to deviances from the norm of the examined L2. Usage
of this term in this thesis does not imply reference to an idealised, prescriptive pronunciation, however.
Emphasis in the original
27
28
CHAPTER 3. MANIFESTATIONS OF FOREIGN ACCENT
Flege (1987b) emphasizes that acoustic measurements can in some cases reveal information about
a non-native speaker’s language knowledge – about a categorical, i. e. phonological, contrast for
example – which might not be perceived by the listeners. Learners might produce a systematic,
measurable difference between sounds which belong to two different phonological L2 categories in a
different way than native speakers do – either along a different phonetic dimension or at a different
scale which both might be ignored or not be perceivable to native listeners.
3.1
Segmentals I: Consonants
One of the most often examined consonant features in foreign accent research is VOT (e. g. Flege and
Hillenbrand, 1984; Flege, 1987b; Mack, 1989). Other reported mispronunciations of L2 consonants
include all kinds of substitutions, e. g. the substitution of English [D] with [d] by native Italian
speakers (Flege et al., 1995, p. 3132), or the realisation of [T] as [t], [f] or [s] by native German
speakers (Wode, 1980, p. 132).
3.1.1
VOT
Several studies examined VOT values of plosives, e. g. the production of English [p, t, k] by native
Spanish speakers. Flege (1987b) states that these English sounds were realised with VOTs greater
than those of the corresponding Spanish sounds, but nevertheless shorter than the VOT values of
native speakers. This confirms other observations, that the VOT values in the speech of L2 learners
often take intermediate values between the L1 and the L2 norm. If the target language sounds have
longer VOT values than the L1, sometimes overshooting can be observed (this means, the speakers
produce the respective L2 with too large VOT values).
The VOT measurements reported by Flege and Hillenbrand (1984) showed that even experienced
French-English3 speakers produced VOT values which were higher than those typical for native
French speakers (and thus more English like). The authors speculate that late learners “will never
succeed in producing L2 stops with complete accuracy when stops in their native language differ
substantially in VOT from those in L2” (p. 717). The French speakers who were classified as
proficient speakers of English produced French [t] with higher VOT values than that of monolingual
French speakers, revealing an influence of their L2 on their L1.
Mack (1989) examined the speech of a group of ten English-French bilinguals (mean AOL: 4.5)
who were all “judged native speakers of English” by native speakers (p. 188). In one experiment,
the examinees read English CVC words. The acoustic analysis focused on the VOT values of English [d] and [t]. No significant differences between the English-French speakers and a monolingual
English control group were found. A second experiment with the same examinees focused on the
English vowels [i] and [I]. The measured features were vowel duration and the first three formant
frequency values. The analysis revealed that there were almost no significant differences in pronunciation between the bilingual and the monolingual speakers. The bilinguals produced significantly
more vowels with a decreased F2 value (of at least 50 Hz) from the midpoint of [i] to its offset.
Mack concludes that the phonetic system of bilinguals “approximates, but does not match, that of
monolinguals” (see following section).
The more important finding for this present thesis is, that despite the fact, that all of the bilingual
speakers in the study by Mack were judged as native speakers of English, there were measurable,
3
See notational conventions on page 7.
3.2. SEGMENTALS II: VOWELS
29
significant acoustic differences between the two groups4 . The speech of the bilinguals was rated on
a 10 point scale with a mean rating of 9.3. However, the effect of amount or degree of the acoustic
deviances on the ratings (i. e. the degree of foreign accent) was not examined.
3.2
Segmentals II: Vowels
Studies on non-native vowel production usually examine vowel duration5 or formant frequency
values (e. g. Flege and Hillenbrand, 1984; Flege, 1993; Levi et al., 2007; Mack, 1989; Mildner and
Horga, 1999). A correlation between vowel quantity and vowel quality can be found in various
languages and these two features are usually examined in combination. This correlation is also
examined in the experiment presented in this thesis (see chapter 6).
Flege and Hillenbrand (1984) analysed samples of French [ty] and [tu] syllables spoken by American
English and French speakers. They measured VOT (see previous section) as well as formant frequencies for F1 , F2 and F3 . They found that the native American English speakers produced French
[y] better than [u] (compare section 4.5). They also found that the native French speakers (with
English L2) produced French [u] with higher F2 than monolingual French speakers do – resembling
the English [u].
Flege (1993) examined the production and perception of the English word-final /t/–/d/ contrast
by speakers from China and Taiwan and a native American English control group. The measured
acoustic dimension was vowel duration. A general observation is, that all speakers produced longer
vowels in the examined /b d/ carrier context than in the corresponding /b t/ context. However, all
non-native speakers produced smaller differences in vowel duration than the native speakers. Although the study revealed a correlation between perception and production, the non-native speakers
were in general more similar to native speakers in the perception of vowel duration differences than
in the production of these differences.
In their earlier mentioned study, Mildner and Horga (1999) examined the relations between proficiency in English of native speakers of Croatian and their acoustic vowel spaces of English vowels.
Recorded speech samples of a group of 20 native-speakers of Croatian were rated for proficiency by
“10 university professors of English” (with unspecified language backgrounds). The ratings were then
compared to the results of acoustic analyses of vowel formant values and durations. Statistically
significant differences between F1 or F2 values in the speech of the examinees and the English norm
(Received Pronunciation) have been found in 8 out of 11 vowels. Vowel duration was significantly
different from the English norm, and the non-native speakers used mainly duration as a distinction
between English /i/, /u/ and /I/, /U/ – regardless of the respective speakers’ level of proficiency.
3.3
Phonotactics
Mispronunciations or deviances from the language norm in the domain of phonotactics (i. e. the possible sound sequences) and syllable structure involve insertion, deletion or metathesis (reordering)
of sounds. Flege et al. (1995) for example mention mispronunciations like the insertion of “schwa-like
4
5
There were also significant differences in perception – which is not discussed here. However, the results were similar
for both perception and production, and the two groups were “nearly indistinguishable”.
With respect to vowels, the terms duration, length and quantity will not be distinguished in this thesis. It should
be noted however, that the terms duration and length are frequently used to refer to articulatory, auditory or
acoustic aspects, while quantity is used to refer to the respective phonological feature of vowels (Ramers, 1988).
30
CHAPTER 3. MANIFESTATIONS OF FOREIGN ACCENT
sounds” at the end of “red”, or omission of word-final consonants in “good” or “carrots” by native
Italian speakers.
A series of three experiments by Altenberg (2005) examined judgement, perception and production of English word-initial consonant clusters by native Spanish speakers. The non-native speakers
“behaved, overall, like the native English speakers” in the judgment task, which suggests comparable linguistic knowledge in both groups (within this restricted domain of word-initial consonant
clusters). As it is the case in the other studies cited here, Altenberg (2005) did not examine the
contribution of the individual “types of modification” to the overall pronunciation rating. The mispronunciations were determined by phonetic transcription by two linguistically trained judges. 79,
out of a total of 88 production errors were made in consonant clusters that were phonotactically permissible in English (the speaker’s L2) but not in Spanish (the speaker’s L1) – the majority of which
involved word-initial epenthesis (insertion) of [O], [E], [e] or [P]. Epenthesis of a vowel in English
word-initial consonant clusters by native Spanish speakers is a frequently reported phenomenon –
e. g. the pronunciation of “school” as [Eskul].
An interesting finding of Altenberg’s production experiment is, that the production (but not the
perception) of word-initial consonant clusters correlates with overall pronunciation proficiency, i. e.
with the degree of foreign accent.
Another reported error in the domain of syllable structure is devoicing of word-final consonants.
This is a frequently observed phenomenon and is reported for non-native English speakers, for
example native German or Italian speakers (Flege et al., 1995).
3.4
Suprasegmentals
Piske et al. (2001) note that most studies on foreign accent have focused on segmental phenomena
and that only a few have examined suprasegmentals (e. g. Jilka, 2000).
In their above cited study, Anderson-Hsieh et al. (1992) examined the relation between the ratings which native speakers assigned to L2 speakers’ speech samples and the general deviance in
segmentals, prosody and syllable structure. They investigated previously recorded SPEAK Test6
data of 60 speakers from varying language backgrounds of varying levels of proficiency. From the
available test data, only a reading passage was used to ensure that only pronunciation skills were
evaluated. The samples were rated for pronunciation and phonetically analysed. Only overall ratings
for segmentals, prosody and syllable structure were determined. Single deviances from the native
speaker norm like wrong VOT or formant values were not explicitly examined. Their conclusion
from the statistical analysis can be summarised as follows: from the three examined domains of
pronunciation, prosody is most strongly associated with the rating of pronunciation. These results
are consistent with two of three earlier studies cited by Anderson-Hsieh et al. in the same report.
However, only a few studies so far examined the contribution of prosody to degree of foreign accent.
Jilka (2000) examined the contribution of intonation to the perception of foreign accent, analysing
German speech of native American English speakers and American English speech of native German
speakers and the perception of these production. He concludes that “intonation is by far the most
important prosodic factor contributing to foreign accent in relation to other prosodic factors such
6
The Speaking Proficiency English Assessment Kit or short the SPEAK Test, as referred to by Anderson-Hsieh
et al., is a test that uses forms from the Test of Spoken English (TSE) developed by the Educational Testing
Service (http://www.ets.org). It is used to assess a speaker’s English speaking and comprehension skills like
comprehensibility, pronunciation, grammar and fluency and it is not explicitly focused on the foreign accents of
the examinees.
3.5. VOICE QUALITY
31
as rhythm or speaking rate”. In comparison to segmental foreign accent, intonational aspects were
found to be clearly “of lesser importance” (p. 175).
Piske et al. (2001, p. 212) on the other hand conclude – after providing a detailed review on existing
literature on foreign accent research – that the examined evidence “does not allow one to quantify the
relative contribution of segmental parameters, prosodic parameters and fluency to degree of foreign
accent in an L2”. They emphasize the close relation between segmentals and suprasegmentals and
how it is “difficult to draw a clear distinction between the two”. Despite the fact that most studies
have examined segmentals, the exact contribution of individual phonetic deviances in the production
of segmentals to (perceived) foreign accent seems to be unclear.
Flege et al. (1995, p. 3132f) cite an experiment, which presented digitally processed recordings
of speech samples to native English listeners for foreign accent ratings. The original recordings
contained sentences spoken by native Italian and native English speakers. The processed sentences
“preserved only amplitude and F0 variations”. The unprocessed recordings of the native speakers
received – as expected – better ratings than the recordings of the non-native speakers. Interestingly,
this was also the case for the processed recordings. Flege et al. conclude that “prosodic dimensions
in the NI [native Italian] subjects’ production of English sentences were sufficient to cue foreign
accent”.
3.5
Voice quality
Anderson-Hsieh et al. (1992) notice that, in comparison to other domains of pronunciation, voice
quality has not been well examined in second language or foreign accent research. Aspects of voice
quality are (if at all) primarily addressed in the context of vowel distinctions (i. e. the articulatory
settings). This issue is further discussed in chapter 6, which describes an experiment on German
vowel production by non-native and bilingual speakers.
Unfortunately, no literature on voice quality in foreign accent research could be reviewed for this
thesis. Besides such general remarks as cited above, voice quality seems to have received only little
attention in foreign accent research.
3.6
Summary
Brennan et al. (1975, p. 35) suggest that a study of the correlations between degree of foreign accent
and the relative frequency of various deviances in L2 speech “would indicate which features serve as
the strongest cues of accentedness to listeners”. Findings like those cited here suggest that not all
measurable acoustic deviances from the L2 norm (in the production of segmentals) are necessarily
perceived as manifestations of a foreign accent. Which types of mispronunciations contribute to the
overall degree of foreign accent cannot be concluded from the reviewed literature. Studies which
focus on specific acoustic dimensions seem not to be as much concerned with overall degree of
foreign accent as are the numerous studies on “external” factors cited in the previous chapter.
Unfortunately, the literature does not provide any more than a vast collection of (too) specific case
studies not enough to draw general conclusions as suggested by Brennan et al. above.
Although the domains of segmentals and syllable structure received the most attention in foreign
accent research, studies on prosodic deviances in non-native speech indicate that prosody is an
important factor affecting the perception of foreign accent. According to some studies, it might
32
CHAPTER 3. MANIFESTATIONS OF FOREIGN ACCENT
even contribute more to degree of foreign accent than the domain of segmentals does.
Chapter 4
Theories on foreign accent
In the previous chapters, various observations and findings of foreign accent research have been
discussed. This chapter provides a short overview on various theoretical explanations of how and why
foreign accents emerge, and models and hypotheses associated with the foreign accent phenomenon.
As foreign accent can be seen as a specific phenomenon within the broader filed of second language
acquisition research, there are several theoretical frameworks which do not explicitly focus on foreign
accent but are nevertheless important for the theoretical approach toward this phenomenon.
First, the most general theoretical frameworks (with respect to foreign accent) of the universal
grammar and the critical period hypothesis for language acquisition are discussed.
4.1
Universal grammar and foreign accent
The interlanguage system of an L2 learner is, according to Major (2001), composed of parts from
the learner’s L1 system, parts from the target L2 and linguistic “universals”.
Such universals of second language acquisition can be for example effects like overgeneralisation,
simplification or overdifferentiation. In general, all those errors which an L2 learner makes that
cannot be attributed to L1 transfer are called universals (see section 4.3 for a discussion of language
transfer ).
One theoretical framework which is concerned with (second) language acquisition claims that one
part of these linguistic universals can be described by the concept of an universal grammar (UG).
This framework addresses such phenomena in early language development like the apparent comprehension of words which the child cannot or does not yet imitate or the imitation of words which the
child apparently does not understand. Children all acquire their first language at approximately the
same age and go through the same stages of development, regardless of the varying circumstances
they grow up in or the various languages they are exposed to (Lenneberg, 1967; Long, 1990; White,
1989). The competence which children acquire seems to go beyond the input they receive – which
is, according to White, underdetermined, often degenerate, and which does not contain negative
evidence (White, 1989, p. 4ff).
The proposed solution to such problems is, that some fundamental language knowledge, i. e. some basic linguistic competence, has not to be acquired by the child but is innate. These innate fundamental
33
34
CHAPTER 4. THEORIES ON FOREIGN ACCENT
linguistic structures supposedly underlying all natural languages are called universal grammar 1 .
What’s interesting about UG in foreign accent research is (1) that part of UG which is concerned
with the phonetic form, called universal phonetics (Chomsky, 1967), and (2) the predictions which
UG makes for L2 acquisition and the explanations it offers for its problems.
The general view in UG research today is, that UG is involved with L1 as well as with L2 acquisition
but that it operates in different ways, i. e. UG (as an underlying so called learning device) is only
partly accessible in L2 acquisition – either directly or via the L1. One explanation for the differences
between child and adult language acquisition is that UG is utilised in competition with other general
problem-solving abilities. These general problem-solving abilities are neither powerful enough nor
restrictive enough for the acquisition of linguistic structures of a natural language, it is argued.
Restrictiveness has been shown to be important in L1 acquisition. It limits the range of possible
language structures, aiding in the acquisition of native language competence from underdetermined
input. The reliance of adult learners on general problem-solving abilities and not on UG leads to
“inefficient, slower and incomplete learning” (Long, 1990).
Meisel points out that “. . . until the relationship between UG and other factors is spelled out more
clearly, it is virtually meaningless to state that UG is a learning device”. He concludes, referring, in
fact, more to syntax than to phonology, that UG is not directly accessible to the second language
learner in the same way as it is accessible in L1 development (Meisel, 1991).
Although UG might provide theoretical explanations of foreign accent, the majority of UG research
seems to be concerned primarily with syntax and morphology – and not with L2 pronunciation and
foreign accent. Long (1990) argues, that cognitive explanations for the differences between child
and adult language acquisition, like the hypothesis of UG competing with general problem-solving
abilities, fail to account for the different constraints on different linguistic domains (compare sections
2.7 and 4.2).
4.2
The critical period hypothesis
Varying theories and explanations for the differences between child and adult language acquisition have been proposed under the notion of the critical period hypothesis for language acquisition
(CPH). As formulated by Lenneberg (1967), the CPH states that there is a critical period for the
successful acquisition of language. This period is limited by “cerebral immaturity” at its beginning
and “termination of a state of organizational plasticity linked with lateralization of function” at its
end (Lenneberg, 1967, p. 176). The CPH attributes changes in the ability of language acquisition
to the biological development of the human being (namely his brain) and in that way it postulates
maturational constraints on successful language acquisition. Thus, the CPH predicts a fundamental,
biologically determined difference between first and second language acquisition. Lenneberg states
that a language is acquired automatically from mere exposure during the critical period, while after
the end of that period a language has to be learned consciously.
The proposed end of the critical period is subject to much debate in language acquisition research
and its literature. The end of the critical period is used to divide L2 learners into two distinct
groups: early learners with AOL below the end of the critical period and late learners with AOL
beyond that limit.
With respect to foreign accent, it is this end of the critical period that has been in the focus of much
1
Note, that the term universal grammar is used to denote both, the proposed linguistic system, i. e. the “grammar”,
as well as the corresponding theoretical framework.
4.2. THE CRITICAL PERIOD HYPOTHESIS
35
of the studies on ultimate attainment (Bongaerts, 2005). Applied to the area of L2 pronunciation,
the prediction of the CPH is that the ability for complete acquisition of a phonological system
is irreversibly lost after the critical period has passed, and so a foreign accent will inevitably be a
feature of the speech of late learners. Lenneberg points out that “foreign accents cannot be overcome
easily after puberty”. Long (1990) concludes that “the ability to attain native-like phonological
abilities in an SL [= L2] begins to decline by age 6 in many individuals and to be beyond anyone
beginning later than age 12, no matter how motivated they might be or how much opportunity they
might have”.
Birdsong and Molis (2001) summarise the various versions of the CPH in three basic criteria which
experimentally obtained data has to meet in order for the CPH to be true:
• Prior to the end of the critical period the level of attainment in L2 learning should be negatively correlated with the AOL. After the critical period has passed there should be no
correlation between AOL and level of attainment, as this would suggest factors other than
maturation.
• There should be no late learners who attain a native or near-native level of performance in
an L2.
• Maturational constraints on ultimate attainment in L2 acquisition should be independent of
L1 and L2.
Bongaerts (2005) summarises the testable predictions of the CPH almost identically:
• Related to the AOL, a discontinuity in the level of L2 proficiency should occur at the end of
the critical period.
• There should be no late learners who attain a native or near-native level of performance in
an L2.
In summary, these criteria state that the constraints on attainment in L2 acquisition and performance predicted by the CPH should be independent of variables not related to maturation.
However, as Long (1990) emphasizes, the CPH “does not explain the phenomena to which it is
applied, but is itself to be explained”. Several different explanations for the effects predicted by the
CPH can be found in the literature – only a brief overview of which can be given here.
Some authors suggest social, psychological or affective factors as an explanation for the observed
differences related to the age of a learner. Others attribute differences between child and adult
language acquisition to type or amount of input or various cognitive factors.
4.2.1
Nature not nurture
A neurological explanation for the observed ability of children to completely acquire an L2 phonological system and the inability of adults to do so without a foreign accent is provided by Scovel
(1969). His claims are based on the CPH as described by Lenneberg, but he modified it in a way
such that its predictions are limited to L2 pronunciation. He refers to the so called Joseph Conrad
phenomenon 2 as one of “many instances of adults learning the syntax of a second language completely and yet not being able to lose a foreign accent when speaking”. British writer Joseph Conrad
2
Although Scovel does not use this terminology with respect to that particular example, it is commonly referred to
as the “Joseph Conrad phenomenon”, e.g. by Major (2001).
36
CHAPTER 4. THEORIES ON FOREIGN ACCENT
is referred to as one example of a late learner of English as an L2. He acquired the language to a
native-like level except for pronunciation, where he is said to have had a strong Polish accent. Scovel
claims that it is not nurture that enables children to completely acquire an L2 system and which
prevents adults from doing so. Referring to Lenneberg (1967), he points out that “the simultaneous
occurrence of brain lateralization and the advent of foreign accents is too great a coincidence to be
left neglected”.
Thus, Scovel claims that cerebral lateralization and the emergence of foreign accents in L2 speech
are correlated. However, in contrast to Lenneberg he concludes that this affects only the area of
pronunciation and not other aspects of an L2, like syntax or lexis. Scovel attributes this difference
between pronunciation and lexical or syntactic proficiency to the involvement of neurophysiological
mechanisms in the production of sound patterns. The lexical and syntactic patterns, on the other
hand, lack such “neurophysiological reality”, as Scovel assumes.
4.2.2
Problems with the critical period hypothesis
Explanations for the decreasing attainment in L2 acquisition with increasing AOL, like the one
stated above, “all have problems” (Long, 1990). One problem is the homogeneity observed in L1
acquisition. Children go through the same stages in L1 development at around the same ages,
regardless of their motivation, attitude, social circumstances, cognitive abilities or the amount or
quality of input they receive. Theories attributing differences between child and adult language
acquisition have to explain, why such factors are irrelevant in child acquisition but not in adult
acquisition. In addition, such explanations generally fail to account for different constraints on the
various linguistic domains. The theories would have to explain, why L2 phonology is not affected in
the same way by the proposed maturational constraints like other linguistic domains are – syntax
and morphology for example (Long, 1990, compare section 2.7).
Birdsong and Molis (2001) found, in replicating an earlier study which tested grammaticality judgments of learners on L2 morphology and syntax, that the performance of the examinees was related
to their ages both before and after the assumed end of the critical period.
In his survey of second language research, Bongaerts (2005) cites several studies that found no
evidence for a discontinuity but a “quite linear” decline in L2 pronunciation proficiency related to
AOL before as well as after the proposed limit(s) of the critical period.
Figure 4.1 illustrates the proposed critical period for language acquisition, overlaid with examples
of “quite linear” declines of native-likeness as a function of AOL. The horizontal axis represents
age. The grey area marks the critical period according to Lenneberg (1967). The three vertical
dashed lines mark the various limits of the “sensitive periods” according to Long (2005). The first
line marks the end of the ability to attain a native-like accent “for many individuals” at the age of
six years. The second line marks the end of the ability to attain accent-free pronunciation “for the
remainder” at the age of 12. The third line marks the end of the sensitive period for morphology and
syntax at the “mid-teens” (included for completeness). The overlaid coloured graphs show results
from measurements of accentedness. The curve labelled “FMM” is adapted from Flege et al. (1995,
p. 3128, figure 2) showing mean ratings of sentences spoken by native English (age 0) and native
Italian speakers (ages above 0). The curves “B&M” and “J&N” are regression lines adapted from
Birdsong and Molis (2001, p. 240, figure 3). The results (the number of “correct” sentences) are
partitioned into two groups: the first representing “early arrivals” and the second “late arrivals”
above the age of 16. “B&M” marks results obtained from Birdsong and Molis and “J&M” marks
results from a study by Johnson and Newport which was cited and replicated by Birdsong and
Molis.
4.2. THE CRITICAL PERIOD HYPOTHESIS
37
Phonology
Begin of
decline
Morphology
and Syntax,
Long (2005)
100%
B&M
50%
J&N
FMM
Lenneberg
(1967)
0
2
4
6
8
10
12
14
16
18
20
Figure 4.1: Critical periods: The critical period according to Lenneberg (1967) and Long (2005) with
overlaid degree of foreign accent as a function of AOL according to Flege et al. (1995) (FMM) and Birdsong
and Molis (2001) (B&M and J&N).
According to Lenneberg, the ability to acquire a language (to a certain degree) after the end of the
critical period does not contradict the CPH. He attributes this ability to the mechanisms established
during L1 acquisition and the existence of similar (or “universal”) fundamental structures in all
natural languages (see chapter 4.1).
Possible counterevidence to the CPH for the acquisition of L2 pronunciation seems to come from
late learners who achieve a high, native or near-native level of pronunciation proficiency in an L2
despite a late onset of learning. Long (1990) explicates that “a single learner who began learning
after the period(s) have closed and yet whose underlying linguistic knowledge [. . . ] was shown to be
indistinguishable from that of a monolingual native speaker” could serve as such counterevidence.
The fact that there are not just a few “superexceptional” cases of (almost) native-like mastery
of an L2 phonological system poses a difficult problem to the CPH. Birdsong and Molis (2001,
p. 244) conclude from a review of earlier studies: “In most studies where nativelike attainment is
found, subjects who perform at nativelike levels comprise about 5-20% of the sample”. This is a
proportion that cannot be dismissed. They refer to studies on various aspects of L2 acquisition and
not just those examining L2 pronunciation. However, they point out that native-like performance
in L2 “phonetics and phonology [. . . ] has been demonstrated in several studies by Bongaerts and
his colleagues”.
Bongaerts (1999) reports on a series of three studies examining the L2 pronunciation of very advanced late learners. Each of the studies examined a carefully selected group of “highly successful”
learners and compared them to a control group of native speakers and another group of L2 learners
with the same L1 background but varying levels of proficiency. The studies show that a native-like
performance in an L2 is not impossible to achieve for late learners. While the difference between the
groups of native speakers and the highly successful late learners were still significant, some of the
non-native speakers received ratings that matched consistently the criterion for native-likeness as
used by Flege (1995) (see section 5.5). Findings like these can be interpreted as counterevidence to
the CPH for pronunciation – or at least as counterevidence to its strongest version which predicts
that there is an absolute age-limit for the acquisition of a native-like accent in an L2.
38
CHAPTER 4. THEORIES ON FOREIGN ACCENT
Bongaerts also pointed out that native-like attainment in L2 pronunciation is only an exceptional
phenomenon. The reported very successful late learners with native-like pronunciation represent
obviously only a minority. For example the third study discussed by Bongaerts (1999, p. 143ff)
found only three subjects with native-like performance, out of a group of nine pre-selected very
successful late learners. From the comparison group of 18 learners at different levels of proficiency
on the other hand, no one achieved native-like ratings.
Bongaerts (2005) concludes his review of several studies on native-like attainment of late learners
with two findings. First, attainment of native-like levels of proficiency of late learners, with AOL
“sometimes well beyond” the end of the critical period, is possible. Second, such attainment of
native-likeness is possible even for learners of typologically distant languages.
Findings like this second one refer to the above quoted third criterion by Birdsong and Molis.
It is argued that maturational (biological) constraints on ultimate attainment in L2 acquisition
should be language-independent. Long (2005), supporting the CPH, criticises studies restricted to
typologically related languages. Bongaerts (2005) points out that there are studies which found
native-like speech by learners from (supposedly) typologically distant languages. However, Long
(2005) argues that native-like attainment by late learners has typically been found in studies which
examined limited samples only and not “more natural language use”.
4.2.3
A sensitive period
Instead of a critical period, the modified concept of a sensitive period has been introduced. It is
used to explain the gradual increase of foreign accented speech with respect to AOL that shows no
sharp discontinuity as predicted by the CPH. However, the terms are often used interchangeably in
the literature (Long, 1990; Piske et al., 2001).
4.2.4
Summary
Thus, while results like those reported by Bongaerts (1999) seem to suggest that there is no critical
period for the acquisition of native-like L2 pronunciation which ends at around the time of puberty,
the differences between early and late learners remain obvious.
Wode (1980) criticises theories and approaches to L2 acquisition which are restricted to specific
structural domains as “fairly unenlightening for determining the nature of man’s language learning
system” (in general). He points out questions such as whether (supposedly innate) mechanisms in
L1 acquisition function in L2 acquisition as well or whether they can be manipulated by formal
instruction, and he argues that these problems were not investigated apart from the framework of
the CPH. Referring to this issue he states that “Lenneberg did not consider appropriate L2 data”.
Similar considerations led Bongaerts (2005) to point out the “unfortunate consequences” of this concentration on the CPH in L2 research: the almost exclusive focus on AOL and the basic assumption
that ultimate attainment is primarily a function of age.
Referring to the various formulations of the CPH and the problems associated with it, Bongaerts
quotes Singleton who concludes: “the CPH cannot plausibly be regarded as a scientific hypothesis
[. . . ] it is like the mythical hydra, whose multiplicity of heads and capacity to produce new heads
rendered it impossible to deal with it”.
Long (2005), on the other hand, attributes much of the supposed counterevidence for the CPH
to the overgeneralised usage of the term critical period hypothesis and to serious methodological
4.3. CONTRASTIVE ANALYSIS, PHONETIC TRANSFER AND INTERFERENCE
39
shortcomings, some of which will be discussed in chapter 5 (even though he is concerned with the
CPH, some general conclusions for methodological issues in L2 research can be drawn from his
remarks).
4.3
Contrastive analysis, phonetic transfer and interference
Transfer is a major learning strategy, which is not restricted to language acquisition but can be
found in a wide range of learning situations. The question, how a given L1 background affects a
speaker’s production of a second language, as discussed in section 2.4, addresses the phenomenon
of (phonetic) linguistic transfer. Phonetic or phonological transfer is the process of carrying over
certain features or principles from the L1 system to another language L2.
The study of L2 acquisition (and foreign accent), in conjunction with contrastive analyses of specific
language combinations of L1 and L2 led to theoretical explanations which attribute foreign accent
phenomena primarily to phonetic transfer and interference. The basic assumption (or claim) behind
this approach is that all foreign accent phenomena as well as other L2 errors can be attributed to
linguistic transfer – and that learning difficulties and errors can be predicted based on contrastive
analyses of the respective languages. The descriptions of transfer and the theories on this issue are
not restricted to foreign accent, but cover all kinds of linguistic domains, like syntax, morphology
etc.
Two kinds of transfer can be distinguished according to their outcome: negative and positive transfer.
Positive transfer is the process of carrying over L1 features to another language which results in
correct L2 expressions. Negative transfer takes place when carrying over L1 features to the L2
results in incorrect L2 expressions. The latter case is also called interference. However, the two
terms transfer and interference are sometimes used interchangeably.
In the domain of second or interlanguage phonology the term transfer is usually used to account
for cases of negative transfer of L1 sounds or features into a target language L2, or the transfer of
phonological rules from L1 to L2.
According to such considerations, it should be theoretically possible to formulate a hierarchy of
difficulty for any given pair of L1 and L2. Such a hierarchy could be constructed by a contrastive
analysis, systematically comparing the sound systems of two languages. As an example, from a
structural analysis, native German speakers might be expected to mispronounce word-final voiced
consonants in English, because their native language has no voicing contrast in that position.
This approach of contrastive analysis and the attribution of all L2 errors to linguistic transfer is (or
maybe was) one of the central concepts in research and theories on second language acquisition and
bilingualism. There are problems with this approach, however. One question for example is, why
transfer affects L2 phonetics and phonology more than other domains like syntax. Another problem
with this approach towards foreign accent is that not all errors can be attributed to transfer effects
(Major, 2001).
Ioup (1984) for example concludes that “transfer is the 3 major influence on interlanguage phonology”. She adds, the question is not whether phonetic transfer occurs or not, but how it affects the
process of second language acquisition and why it “is so much more a predominant force in shaping
the interlanguage phonology than in shaping the interlanguage syntax” (p. 14).
Mack (1989) reports that early bilinguals can show monolingual-like performance in their dominant
3
Emphasis in the original.
40
CHAPTER 4. THEORIES ON FOREIGN ACCENT
language without transfer effects from their weaker language. This can be interpreted as an indication that transfer is not an inevitable consequence of bilingualism. It is worth noting, that the
dominant language is not always the first language of a speaker. This was for example the case with
the examined English-French bilinguals in the study by Mack. Although the subjects showed some
differences in comparison to monolinguals, they were all rated as sounding like native speakers.
Today, the idea of linguistic transfer being the one major source for all problems in L2 acquisition
seems to be discarded. Transfer is seen as one important factor in L2 phonology, besides others.
As Major (2001, p. 35) put it, “even though universals are important, transfer exerts a very strong
influence in SLA [second language acquisition] and perhaps is a permanent component of IL [interlanguage]”. He points out that in order for transfer to take place, there has to be “a corresponding
existing structure”. Transfer is more likely with similar structures than with dissimilar (compare
sections 4.5 and 4.6).
4.4
Direct realism
Best (1995, p. 173) presents a working model of a direct realist view on cross-language speech perception. She points out that the central premise of direct realism is “that in all cases of perception,
the perceiver directly apprehends the perceptual object and does not 4 merely apprehend a representative or ‘deputy’ from which the object must be inferred”. This means, that listeners perceive
the relevant information directly from speech input without the involvement of innate linguistic
knowledge or acquired abstract mental representations.
Direct realism does not assume two separate informational domains for phonology and phonetics.
Phonetics and phonology are both assumed to be based on articulatory gestures but “tap different
levels of invariant structure” (p. 182). One central point of this assumption is, that “phonetic implementations” are language-specific rather than universal in nature. This accounts for phonologically
hard to explain observations of language-specific phonetic characteristics of individual sounds (e. g.
differing VOT values for English and French [t]).
The prediction of this direct realist model of cross-language speech perception is, that listeners are
attuned to language-specific information. As they become increasingly efficient in detecting crucial
acoustic cues they pick up only this reduced, more compact information from the input. In that
way the listeners become perceptually attuned to their L1.
Here again, the concept of similarity plays a central role. Generally speaking, listeners are expected
to be able to detect gestural similarities in non-native sounds to native ones. If the similarity between
an L2 and an L1 sound is great, the L2 is expected to be assimilated into the respective L1 category.
Although this model of speech perception is primarily based on articulatory gestures, the effects
on speech productions, especially in a second language, are note discussed by Best (1995). This
direct realist approach offers interesting explanations relevant to interlanguage speech perception.
However, the implications for second language speech need to be examined. Unfortunately, no
sources regarding this issue were reviewed for this thesis.
4
Emphasis in the original.
4.5. THE SPEECH LEARNING MODEL
4.5
41
The Speech Learning Model
The Speech Learning Model (SLM) as defined by Flege posits that the phonetic vowel and consonant
system of a speaker’s L1 influences the L2 system and vice versa (Flege, 1995). This interaction
imposes constraints on the accuracy in both languages. The SLM links speech production to speech
perception and incorporates the concept of similarity and dissimilarity of speech sounds. Its primary
focus is ultimate attainment, and not the beginning L2 learner.
Deviances in non-native speakers’ L2 pronunciation from the native speaker norm are attributed
to an age-related decline of the learners’ ability to recognise certain audible acoustic differences
between L1 and L2 sounds as phonetically relevant (Flege et al., 1995). As a consequence, new
phonetic categories for the respective L2 sounds are not established when the perceived dissimilarity
is too small. Late learners do not establish new phonetic categories because of this equivalence
classification. In other words, they do not establish new categories for L2 sound, which they perceive
as instances of similar L1 sounds. The underlying assumption behind such considerations is that
L1 and L2 sounds are stored mentally within one common phonological space. One of the SLM’s
hypotheses is that L1 and L2 sounds “are related to one another at a position-sensitive allophonic
level, rather than at a more abstract phonemic level” (Flege, 1995, p. 239). This predicts also, that
L1 sounds may be “deflected away” from neighbouring L2 sounds, which represent new categories,
in order to maintain sufficient phonetic contrast in the common space.
The ability to discriminate L2 sounds from similar L1 sounds decreases with increasing age. This
is caused by reduced attention that is paid to subtle phonetic cues once the phonetic categories
are well established. The perception of L1 sounds and L2 sounds as instances of the same phonetic
category takes place even in such cases where acoustic differences between those sounds are auditorily detectable. It is a “fundamental aspect of human speech perception” to identify acoustically
different sounds as members of the same linguistic category (Flege and Hillenbrand, 1984).
Those L2 sounds, which are not recognised as belonging to a different category than the closest L1
sounds are predicted to be pronounced incorrectly. In addition, the production of these L1 sounds
to which such L2 sounds are linked perceptually is predicted to gradually resemble that of the L2
sound. This is attributed to the need of maintaining sufficient phonetic contrast within the supposed
common phonological space.
The SLM predicts that for some L2 learners the perception of an L2 sound may be more accurate
than its production (Flege, 1999, p. 109). Flege (1993) cites cases where this effect was observed
and states that generally “productive abilities lag behind the development of perceptual abilities”.
Another prediction of the SLM is that even when a new category for an L2 sound is established,
this sound “might not be produced exactly as it is produced by native speakers” due to the presence
of L1 sounds in the supposed common phonological space (Flege, 1995, p. 243).
In their earlier cited study, Flege and Hillenbrand found that American English speakers were able
to pronounce French [y] better than [u]. Findings like this one support the hypothesis that “new”
L2 sounds are produced more accurately than “similar” ones.
Another example supporting the predictions of the SLM is the earlier mentioned observation of VOT
values in L1 sounds (e. g. in chapter 3), which lie between the values typical for of the respective
speakers’ L2 and that of the L1 sounds typical for monolingual speakers. This can be interpreted as
a result of mutual influence of L1 and L2 sounds – in cases, where those two sounds are identified
as instances of one and the same phonetic category (as in the cited experiments on the VOT
distinctions between English and French [t]).
Additional support for such predictions can be seen in observations reported by Flege et al. (1995),
42
CHAPTER 4. THEORIES ON FOREIGN ACCENT
for example: the self-reported pronunciation proficiency of the examinees in their L2 was inversely
related to their self-reported proficiency in their L1.
Supporting evidence for the existence of motoric output constraints on L2 speech, as suggested by
Flege, is reported by Altenberg (2005). She observed in the earlier cited series of three experiments
examining judgement, perception and production of English word-initial consonant clusters by
native Spanish speakers, that L1 transfer affects L2 production but not necessarily perception.
The results showed no correlation between the examinees’ scores on production and perception.
The examinees performed native-like in the judgment task. Altenberg states that these correct
judgments of permissible English consonant clusters “must be based on input”. She thus concludes
in contrast to the SLM that “it seems unlikely that difficulties with the production task are due
to problems in perception” (p. 75). She admits, however, that the observed differences may have
resulted from different task effects in the respective experiments. It has to be added, that the
study by Altenberg examined the phonotactic system while Flege explicitly states that the SLM is
concerned with vowels and consonants, as mentioned above.
4.5.1
Summary
The SLM predicts that “new” L2 sounds, i. e. those L2 sounds which are perceived as dissimilar
with respect to L1 sounds, will be mentally represented by new phonetic categories and that they
will be produced more accurately. Those sounds which are perceived as similar to L1 sound will be
produced less accurately.
As it is assumed that L1 and L2 sounds are represented in a common phonological space, those
L1 and L2 sounds which are linked to the same category will eventually be produced alike. Thus,
a major prediction of the SLM is, that a speaker’s L1 influences his or her L2 and vice versa. This
mutual influence has been well observed and described by numerous researchers. One implication
of this fact for experimenters is, that speech perception, and especially speech production of monolingual individuals cannot be compared to those of people who have acquired another language (see
chapter 5).
Flege points out that the SLM is a “working model” which can serve “as a useful heuristic for
planning research” on L2 pronunciation and foreign accent (Flege, 1995, p. 238).
4.6
The perceptual magnet effect
Sections 2.4 and 2.4.1 address the questions whether a learners’ language experience and whether
language distance affects degree of foreign accent.
This section describes the so called perceptual magnet effect and the corresponding model explaining
it – the native language magnet model (NLM). Kuhl and Iverson (1995) argue that “language
experience alters the mechanisms underlying speech perception, and thus, the mind of the listener”.
The NLM accounts for early language development up to the age of around one year. It states that
language exposure in that period plays a critical role in the development of language-specific speech
perception.
The theory predicts that the “distances” between sounds and a phonetic “prototype” are perceptually
decreased within the surrounding of that prototype. This term is used to refer to an ideal exemplar
or an abstract mental representation of a sound which is judged by listeners to be the best exemplar
4.6. THE PERCEPTUAL MAGNET EFFECT
43
representing a specific phonetic category. In other words, the NLM claims that such prototypes serve
as “perceptual magnets” for other sounds around them.
The ability to partition the continuous sound signals into categories is claimed to be innate. This
categorical perception is part of general auditory processing mechanisms. It was demonstrated even
for monkeys. However, they do not show a perceptual magnet effect. The perceptual magnet effect
seems to be specific to humans. It is an effect of language experience on phonetic perception which
is measurable as early as at the age of six months. However, it has been demonstrated for both
six-month-old infants and adults as well (Kuhl and Iverson, 1995). In six-month-olds, this effect in
perception is affected by exposure to their L1 and results from the infant’s analysis of the received
language input. According to the NLM theory, speech representations are initially auditory in
nature. Speech perception then changes from the initial “language-general” mode to a “languagespecific” one with increasing age (Iverson et al., 2003).
Infants or young children are able to perceive phonetic contrasts in foreign language sounds, which
adults from the same language background can or cannot discriminate. This ability to partition
spoken language into categories is assumed to be innate. At around six months of age they begin
to reanalyse the acoustic space according to their respective ambient language input. Sounds from
an L2 which are similar to an already existing L1 prototype are difficult to perceive as being
different from the respective L1 sound. The general principle of the change from “language-general”
to “language-specific” speech perception is illustrated on the basis of a two-formant vowel space in
figure 4.2. The initial auditory boundaries within this abstract vowel space are changed according
to the infants’ exposure to a specific language. By the age of six months, the perceptual magnet
effect emerges and unneeded phonetic boundaries disappear.
Iverson et al. (2003) used 18 synthesized English [ra] and [la] tokens which varied in their F2 and
F3 frequencies. The study tested the perception of these sounds by native English, Japanese and
German adults with “identification and goodness”, “similarity scaling” and “discrimination” tasks.
One of their findings is that American English listeners were more sensitive near the categorical, i. e.
the phoneme boundary between [r] and [l] than within each category. The German listeners showed
similar perceptual patterns as the American listeners. The Japanese listeners on the other hand
had no such higher sensitivity at the categorical boundary but showed higher sensitivity within the
English [r] category. Iverson et al. conclude that the Japanese listeners assimilated the sounds into
their [r] category. Their perceptual spaces seem to be “mis-tuned” for the perception of the contrast
between English [r] and [l]. Acoustic variations which are critical for the discrimination of sound
categories of American English listeners are irrelevant to Japanese listeners, or no more salient
than other acoustic cues. This distortion of the perceptual space does not represent a total lack of
perceptual sensitivity. Decreased perceptual sensitivity around L1 prototypes can lead to reduced
sensitivity to critical acoustic cues in the acquisition of non-native sounds. Increased sensitivity
according to a learner’s L1 on the other hand can lead to increased attention to irrelevant acoustic
cues in L2 sounds.
This subsequently changes not only perception but also speech production. The NLM holds that
the speech representations, which are developed in the first year of life under the influence of the
surrounding language of an infant, play “a crucial role in guiding their initial attempts at speech
production” (Kuhl and Iverson, 1995, p. 139). The same arguments about the influence of an
infant’s exposure to a specific language are applied to adults. Difficulties in the acquisition of an
L2 phonology are attributed (in part) to the perceptual magnet effect.
44
CHAPTER 4. THEORIES ON FOREIGN ACCENT
Infants’ Natural
Auditory Boundaries
F2 (Hz)
A
F1 (Hz)
B
Swedish
English
Japanese
C
Figure 4.2: NLM Theory: (A) Infants partition the acoustic space in a language-general way. (B) By
the age of 6 months infants exhibit language-specific magnet effects induced by ambient language input.
(C) Unneeded phonetic boundaries disappear and magnet effects alter the perceived distance between
stimuli. (Adapted from Kuhl and Iverson, 1995, fig. 10)
4.6.1
Summary
The NLM addresses the problem of decreasing phonetic abilities in second language acquisition
with increasing age, which is of great relevance to the foreign accent phenomenon. Exposure to
a language in early life changes speech perception in a way such, that the perceived contrasts
between acoustic cues are altered according to one’s L1 sound system. The perceived distances
between “good” exemplars of a sound are shrunk, and near phonetic boundaries the distances are
stretched. This has consequences for second language acquisition. A learner can have difficulties in
perceiving differences in distinct L2 sounds in the vicinity of an L1 prototype. On the other hand,
the learner might rely on acoustic cues which are irrelevant or only secondary in a L2.
The NLM describes how language experience affects and shapes speech perception and it is argued
that this language specific mode of perception has consequences for speech production as well.
The concept of similarity by means of phonetic prototypes, is employed in the explanation of the
perceptual magnet effect. With respect to second language acquisition, it is predicted, that L2
sounds which are more similar to L1 sounds will be perceived as less distinct than those L2 sounds
which are farther away from L1 phonetic prototypes. Thus, phonetic language distance affects
second language acquisition.
Chapter 5
Methodological issues
So far a wide range of experimental studies on foreign accent have been discussed. Methodological
issues mentioned in previous chapter are summarised and completed in this chapter.
Research on foreign accent faces the same problems as every empirical or experimental study in
linguistics. This chapter addresses primarily those issues that are special to subject selection in
studies on foreign accent.
5.1
Subject selection and control group
Some experimental variables can change over time in longitudinal examinations – like for example
LOR, the amount of exposure to an L2 or the L1 vs. L2 usage patterns of a speaker. Most speakerdependent factors, however, cannot be changed or manipulated directly by an experimenter without
replacing the examined speakers (Levi et al., 2007).
Speakers with varying levels of proficiency can influence the ratings given by listeners (Long, 2005
– this issue is already addressed in section 2.8). The more speakers are included in the group of
examined subjects with lower levels of proficiency, the more speakers with near-native proficiency
will be rated as native. Flege et al. (1995) point out the importance of examining non-native
subjects, who have reached their individual level of ultimate attainment in an L2. In any case, the
experimenter should be aware of the differences between measuring learning rate as opposed to
measuring ultimate attainment (as discussed in section 2.7).
Some authors require that speakers of a single native language should be examined (Flege et al.,
1995). This obviously depends on the examined variable(s), and with respect to the problems
of distinguishing between language and dialect it is a too general statement. The more precise
requirement is that speakers from different dialectal backgrounds should not be compared to a
single norm or variety. Dialectal differences or any other varieties of a language should never be
underestimated in order to minimize misclassifications of native speakers. In the previous chapters
examples of studies have been cited where dialectal differences lead to “confusing” results (e. g.
Bongaerts, 1999; Mack, 1989, compare section 1.1.3). As a consequence, the experimenter has to
examine possible regional or social variations of the object of investigation in the target language.
At least, the possibility of differing “standards” or linguistic systems within the subjects should be
considered.
45
46
CHAPTER 5. METHODOLOGICAL ISSUES
Piske et al. (2001) specify two problems associated with the omission of a control group of native
speakers: the performance of native speakers under the same conditions of the experiment remains
uncertain, as well as the ability of the recruited judges (if included in the study) to distinguish
between native and non-native speakers.
Flege (1987b) stipulates the following requirements on subject selection:
• Subject groups should be as homogeneous as possible (especially with respect to their language
backgrounds).
• Subjects with hearing problems should be excluded.
• Subjects who do not speak their native language “normally” should be excluded.
• Both male and female subjects should be included.
• At minimum 6-12 subjects should be examined.
Flege (1987b, p. 289) proposes an “repeated measures design” which reduces the number of subjects
which needs to be recruited. Speech productions in L1 and L2 of each subjects can be directly
compared, so each subject might “serve as his/her own control”. This approach is supposed to
minimize the influence of subject selection biases.
5.2
Obtaining data: The task
The subjects should be given enough time for acclimatisation to the situation before speech samples
are recorded – especially, if the recordings take place in an anechoic chamber. It is noted in section
2.5 that even temporal changes in language use patterns might affect speech production. Piske
et al. (2001) stress the possibility that individuals’ language behaviour in an experiment could be
influenced not only by “the conditions under which they had been exposed to their L1 or L2 in the
months preceding the experiment but also by the conditions under which they had been exposed
to the L1 and the L2 in the hours or even minutes preceding the experiment”.
For the recording procedure, Piske et al. (2001) suggest a delayed repetition technique as a way to
obtain “more reliable measures of degree of L2 foreign accent” – a technique also used by Flege et al.
(1995). In such a procedure a list of sentences is presented to the subject in both written form and
aurally via a recording. The sentences of interest are preceded and followed by a context sentence,
creating a mini-dialogue like in the following example:
Voice 1: What did Paul eat?
Voice 2: Paul ate carrots and peas.
Voice 1: What did Paul eat?
Subject: [repeats Voice 2]
(example taken from Piske et al., 2001, p. 205)
The delay between the sentence (or word) which is to be spoken by the subjects, and its repetition
is assumed to prevent direct imitation from sensory memory. Verbal presentation of the material
should prevent influence from differences in reading abilities.
On the other hand, presentation of the written form might prevent interference from perceptual
difficulties. Learners might be well aware of features of an L2 phonetic system and nevertheless be
5.3. FA RATING BY NATIVE SPEAKER JUDGES
47
unable to perceive (or produce) it correctly (Altenberg, 2005; Flege, 1993). In order to avoid such
problems, the studies reported by Flege et al. (1995) and Piske et al. (2001) presented the test
material both aurally and in written form to the subjects.
Long’s proposal of “natural language use” or spontaneous speech as the ideal indicator for an
individual’s L2 abilities seems to be not widely followed in studies on foreign accent.
The justification for use of limited, language-like samples (in phonology studies, at least)
is that judgments of pronunciation ability based on anything above isolated words, and
especially on natural speech, are vulnerable to bias from cues from other linguistic
domains than the one supposedly in focus [. . . ]
(Long, 2005, p. 302)
He questions the validity of results obtained from studies examining “controlled, elicited, often
rehearsed” speech samples as a measure for a speakers general L2 abilities. This constitutes a
difficult methodological problem in data collection on foreign accents.
An opposing view is put forward by Major (2001). He argues that isolated words are not as unnatural
as it is often claimed. There are more than just a few every-day examples of single word utterances
or word lists. Two slightly modified versions of the above stated mini-dialogue can illustrate this:
Example 1: word-list utterances
A: What did Paul eat?
B: Carrots, peas, beans and broccoli.
A: What?
B: Carrots, peas, beans and broccoli. [repeating more carefully]
Example 2: single-word utterances
A: Did Paul eat carrots or peas?
B: Carrots.
A: What?
B: Carrots. [repeating more carefully]
These two examples illustrate how “isolated words” or utterances consisting of simple word lists can
be encountered in every-day situations and thus can safely be regarded as well as “natural language
use”.
Some examiners use picture naming tasks, where the subjects are asked to name a given picture
without written or aural presentation of the respective target word (e. g. Altenberg, 2005). Such an
approach can be fruitful when phenomena like spelling pronunciations or hyperarticulation have to
be avoided and spontaneous speech is in the focus of interest. However, a drawback with such an
approach is the possible interference of problems in other linguistic domains, like lacking syntactic or
lexical knowledge. Another disadvantage of spontaneous speech is rooted in general communication
strategies like the avoidance of words with difficult sounds for example.
5.3
FA rating by native speaker judges
Most studies reviewed for this thesis use judgments of listeners who are native speakers of the
examined target L2 to assess degree of foreign accent. This approach is justified on the basis of the
48
CHAPTER 5. METHODOLOGICAL ISSUES
definition of foreign accent as a phenomenon of L2 speech which is perceived by native speakerslisteners (compare section 1.1.7).
5.3.1
Scaling foreign accent
Usually listeners’ judgments are used to indicate degree of foreign accent in speech samples on a
rating scale. Despite numerous studies on foreign accent, there still is no standard scale for measuring
degree of foreign accent (Piske et al., 2001; Southwood and Flege, 1999).
The used scale should be sufficiently sensitive to reveal even small differences between individual
speakers and between the level of native speakers. The end points of the rating scales are usually
labelled with “no accent”, “native-like pronunciation” or “native speaker” at one end and “definite”
or “heavy foreign accent” at the other (Piske et al., 2001). Points between these two extremes are
used to mark varying degrees of foreign accent. It is not known how many distinctive degrees of
foreign accent listeners are actually able to distinguish. Flege and Fletcher (1992) and Flege et al.
(1995) for example used a “continuous scale” (which in fact was a 256-point scale); others used a
nine-point scale (e. g. Guion et al., 2000; Flege et al., 1999), a seven-point scale (e. g. Levi et al.,
2007), a five-point scale (e. g. Altenberg, 2005) or a four-point scale (e. g. Asher and García, 1969;
Flege et al., 1997). Piske et al. (2001) point out the lack of a standardised scale for measuring
foreign accent and raise the question of whether the various utilised rating scales “ensure equally
valid and reliable measures of degree of L2 foreign accent”. Usually equal-appearing interval scales
are used. Southwood and Flege (1999) compared direct magnitude estimation and interval scaling
methods for measuring degree of foreign accent. They found that both methods can “provide valid
indices of accentedness” and that degree of foreign accent represents a “metathetic continuum” –
i. e. a continuum which can be partitioned into equal intervals. This suggests that equal-interval
rating scales are appropriate in foreign accent studies.
Another study which compared two different methods of scaling foreign accent is described by Brennan et al. (1975): (a) magnitude estimation and (b) sensory modality matching. The first method
required the judges to rate the presented speech samples with a number “that seemed appropriate
for the amount of accentedness”. The second method employed a “Lafayette hand dynamometer”
to measure the force of hand grip. The judges had to squeeze this hand dynamometer “with a force
matching the accentedness of each speaker”. As a reference, the speech samples were auditorily analysed by two judges who assessed “the frequency of occurrence of specific accented pronunciations” (of
segments). The judges both had no formal training in phonetic transcription. The results revealed
a strong agreement among the judges about the degree of foreign accent. The results revealed that
both methods can be used to scale degree of foreign accent, as the ratings obtained from both methods were correlated with one another. Additionally, the ratings were found to be highly correlated
with the amount of accented pronunciations, i. e. the number of segmental mispronunciations.
The rating of degree of foreign accent is a non trivial problem. Southwood and Flege (1999) speculate
that “response biases” in ratings of foreign accent are likely to occur “because perceptual dimension,
such as foreign accent, have no known physical units” (p. 344).
5.3.2
The judges
The recruited judges are usually native speakers of the examined L2. Some studies relied on linguistically inexperienced, “naïve”, judges (e. g. Flege et al., 1995), others on linguistically trained
ones, such as linguists or foreign language teachers (e. g. Altenberg, 2005; Mack, 1989). Diverging
5.3. FA RATING BY NATIVE SPEAKER JUDGES
49
results can be found in the literature regarding the effects of the judges’ linguistic experience. As
a consequence, Piske et al. (2001) suggest recruiting a representative group and not only linguistically naïve judges or only experts. Although linguistically naïve judges might not be able to identify
which mispronunciations contribute (to what degree) to their perception of foreign accent, they are,
according to Brennan et al. (1975), nevertheless able to accurately scale the overall degree of foreign
accent. They have demonstrated that linguistically naïve judges are able to “give reliable judgments
of the accentedness of speech samples, and that they are in agreement as to what constitutes various
levels of accentedness”.
Another issue regarding the language experiences of judges is addressed by Long (1990) or Southwood and Flege (1999). Individuals from linguistically heterogeneous areas like cosmopolitan cities
may have a higher tolerance for language variations and deviances from their L1 norm. In general,
individuals who are familiar with foreign accent may give less rigorous ratings than individuals
who have less or no experience with non-native speakers of their native language. More research
is needed to determine the effect of familiarity with foreign accent on listeners’ perceptions and
ratings.
Long (1990, 2005) also points out the importance of the instructions given to the judges. Misleading
instructions to the judges must be avoided. If the judges are to rate a set of speech samples of foreign
speakers which includes control samples of native speakers they have to be told as to expect an
unspecified number of samples form native and non-native speakers. It must also be clear, that
the samples will include (mainly) recordings of non-native speakers. Although these statements
might seem trivial, Long cites examples where misleading instructions might have lead the judges
to false assumptions about what they would have to rate, which resulted in inappropriate ratings.
In one case, he assumes the judges might have expected recordings of native speakers with more
or less accurate pronunciations. Such misleading instructions then lead the judges to rate more
of the samples as native-like than they might have done otherwise. The instructions have to be
unambiguous without influencing the judges by any misleading assumptions.
As mentioned earlier, the composition of the data which the judges have to rate poses another
problem. The listeners might be influenced by obviously non-native samples in a way that they give
higher ratings to speech samples from speakers with higher, near-native proficiency. The experimenter should be aware of the existence of such speaker-independent factors affecting the perceived
degree of foreign accent (as discussed in section 2.8).
An additional unanswered question is the total number of judges that is needed for reliable ratings
of degree of foreign accent. Piske et al. (2001) hypothesise that a larger number of judges might be
needed for reliable detection of smaller differences between speakers.
5.3.3
Native speakers’ judgments and acoustic features of foreign accent
Which phonetic cues contribute to the perception of a foreign accent? This question was discussed
in chapter 3. It is still largely unknown which acoustic cues contribute to what degree the perception
of a foreign accent.
Southwood and Flege (1999) emphasise the importance of examining which acoustic variables affect
listeners’ judgments of degree of foreign accent. They suggest that “identification of the potential
acoustic cues used by listeners may help provide a physical referent to assist in interpreting judgments of degree of perceived foreign accent” (p. 347).
50
5.4
CHAPTER 5. METHODOLOGICAL ISSUES
Foreign accent detection by acoustic measurements
Typical segmental acoustic phenomena that are examined in foreign accent experiments are voice
onset time values (VOT) of stop consonants and formant values of vowels.
Since the precise relation between specific acoustic cues and the perception of foreign accent is still
more or less unknown, acoustic measurements on their own cannot provide a measure for degree
of foreign accent. However, several studies indicate, that there is for example a correlation between
the number of segmental errors and the degree of perceived foreign accent.
Acoustic measurements can provide insight into the underlying linguistic knowledge of a learner
about the examined L2. As discussed in chapter 3, speakers might for example produce acoustic
contrasts in an L2 which are according to the “norm” irrelevant or even wrong. The mere fact,
that they produce a contrast however, indicates an underlying awareness (or assumption) of an L2
phonological contrast.
In combination with listener judgments of degree of foreign accent, acoustic measurements can
help identifying the acoustic cues responsible for the perception of foreign accents. This of course,
requires studies where both acoustic measurements and listener judgments are carried out. In the
majority of studies reviewed for this thesis, this was not the case. Either acoustic measurements or
listener judgments are usually used to determine foreign accents – but rarely both. Some studies
rely on the number of segmental errors without measuring specific acoustic parameters.
To conclude this section, measurements of specific acoustic parameters should be carried out in
addition to the evaluation of speech samples by listener judgments. Only the combination of both
the instrumental and the impressionistic approach toward foreign accent can provide a complete
insight into the phenomenon.
5.5
Criteria for native-likeness of speech
Studies relying on listener judgments as well as instrumental studies both are based on the idea of
an existing phonetic norm.
It is important to document the norms, or more general, the systems of both the L1 and the L2
and to compare them. Usually this is accomplished by comparing the speech of native speakers of
both languages in question (Flege, 1987b). Sometimes the examiners refer to previously published
descriptions or documented “standards” of the respective languages.
Establishing the norms of languages which are not as deeply examined as English might sometimes
be difficult. One possibility is to examine not only the subjects’ L2 speech but also their L1.
Comparisons of sounds from two (or more) languages should be made within the same, or similar
phonetic contexts.
However, examining the L1 speech of the same speakers as the ones whose L2 speech is examined
has a disadvantage which needs to be taken into account. As mentioned earlier, a speaker’s L2
influences his or her L1. Thus, the L1 norm of such speakers may not be comparable to the norm
of monolingual speakers of that language.
Flege et al. (1995, p. 3129) proposed a statistical criterion of native-likeness which is also used in
other studies (e. g. Guion et al., 2000; Piske et al., 2001). The criterion is defined as a mean rating
(of speech samples judged by native language listeners) that falls within 2.0 standard deviations
of the mean rating assigned to a control group of native speakers. However, Flege et al. emphasize
5.5. CRITERIA FOR NATIVE-LIKENESS OF SPEECH
51
that such a criterion does “not provide direct evidence for foreign accent detection”, the most direct
criterion being “a paired comparison task”.
After discussing several studies which found native-like performance of L2 learners, Birdsong and
Molis (2001, p. 245) explicitly “point out that many factors could invalidate demonstrations of
nativelike attainment by artificially elevating subjects’ performance”. Among the possible causes
for such artificially elevated performance they mention tasks that do not cover the full grammar of
the L2. Tasks could be challenging only to inexperienced learners and result in no difference above
a certain level of proficiency (but still below that of native speakers). As another cause for elevated
performance they identify careful screening of subjects.
A fundamental problem for the application of native-likeness criteria is the question whether performance of monolingual native speakers is appropriate data in bilingualism research at all. In section
1.1.3 the problems associated with the definition of what constitutes a native language were briefly
discussed. Whatever definition is applied (provided, it is appropriate), the L2 learner can possibly
never become a native speaker of L2 in that sense. Considering the differences in the L1 between
monolingual and bilingual speakers it seems inappropriate to insist on the idealised monolingual
native speaker as the only reference for L2 learners. As Bongaerts (2005) points out, no researcher
would attribute L2 induced deviances in L1 usage to limited language learning abilities.
According to Long (2005, p. 305f) the tested areas of language should be those where “clear native
norms can be reliably documented”. This is important with respect to possible variations within
the control group of native speakers and also within the samples of individual native speakers.
Long questions the validity of measures in such cases, where even native speakers disagree. He cites
studies where native speakers received ratings below the proposed level of native-likeness or where
non-native speakers received better ratings than native speakers. Thus, the same problem as already
discussed above with respect to subject selection or native language judges arises again: Dialectal
(or other) within-language variations have to be considered when criteria of native-likeness are to
be applied.
As a consequence, it might not be sufficient to refer to an (idealised) standard which in practice is
not spoken by the examined speakers or the recruited listeners.
Chapter 6
Experimental study
This chapter provides an overview of an experimental study on the production of German monophthongal vowels and the realisation of a phonological contrast by native and non-native speakers.
The first section describes the linguistic background relevant to the present study: A brief introduction to the German vowel system is followed by a more detailed description of the German
phonological vowel opposition.
6.1
Linguistic background: German vowels
Following the conclusions from section 5.5, the first step in the experimental study described in
this chapter is an examination of the phonetic and phonological system of the target L2, in this
case: German. It is concluded in chapter 5 that dialectal or regional differences should not be
underestimated. This means that the relevance of regional or dialectal diversification of the target
language has to be considered as well as the regional background of the native speakers from the
control group. In order to achieve this, not only the norm of standard German has to be analysed but
also the possible regional variations1 . Additionally, a contrastive analysis of the phonetic systems
of the target L2 and the L1 of the respective subjects is needed.
Since the beginning of the standardisation of German orthography there have been attempts to
standardise German pronunciation along with orthography. Although there are prescriptive norms
for standard German pronunciation, they are usually not realised thoroughly by the speakers and
not even generally accepted. The vowel system presented in this section follows descriptive presentations and acoustic studies from Becker (1998); Claßen et al. (1998); Kohler (1977); Ramers
(1988); Sendlmeier and Seebode (2007) and Wängler (1968).
The German vowel space comprises 15 monophthongal vowels with an additional “reduction vowel”
(schwa). There are also three diphthongal phonemes, and an additional vowel [5] along with several [5]-diphthongs which correspond to various /r/-vocalisations. Figure 6.1 shows a phonological
representation of the German monophthongal vowel phonemes.
The set of monophthongs can be divided into two subsets (a) {/I/, /Y/, /E/, /œ/, /a/, /O/, /U/}
and (b) {/i:/, /y:/, /e:/, /E:/, /ø:/, /a:/, /o:/, /u:/}. Each vowel from one subset is contrasted with
1
This does not imply that various “dialects” of the target language are examined in the present study, since there
can naturally be regional variations within the “standard language”.
52
6.1. GERMAN VOWELS
53
i: •
• u:
•U
• y:
e: • I • • Y
• ø:
E•
E: •
• o:
•O
•œ
a, a: •
Figure 6.1: The German monophthongs (according to Kohler, 1977)
one vowel from the other subset. This binary relation constitutes the so-called vowel opposition,
which is a central feature of German vowel phonology (Becker, 1998). The two vowels within such
a relation will be referred to as a contrast pair in the following text. According to Becker these two
sets or classes of vowels are defined by the following phonological (phonotactic) constraints: Vowels
from set (a) cannot appear in open tone syllables, and vowels from set (b) cannot appear before
ambisyllabic consonants.
The denomination of these two sets of vowels is not uniformly and often depends on the individual
author’s underlying theory about the primary acoustic correlate of the opposition. Usual denominations which can be found in the literature are shown in table 6.1 (not all of which can be discussed
in this thesis). Becker (1998) points out:
“Die Bezeichnung Kurzvokale bzw. Langvokale ist dabei am neutralsten, weil sich diese
Bezeichnungen auf den unumstrittenen phonetischen Dauerunterschied beziehen können,
ohne daß dabei präjudiziert wird, daß es sich um einen phonologischen Quantitätsunterschied handelt. . . ”(Becker, 1998, p. 31)
The denomination short or long vowels is the most neutral, as these denominations can refer
to the undisputed phonetic duration difference, without prejudging any phonological quantity
difference. . . (translation DD)
The terminology of short vs. long vowels is also used by Kohler (1977) or Ramers (1988). I too will
use the term short vowels to refer to the above stated set (a) and the term long vowels to refer to
set (b) – without implying any phonological function of vowel duration.
(a)
I, Y, E, œ, a, O, U
short vowels
open vowels
lax vowels
centralised vowels
abruptly cut vowels/syllables
∼
∼
∼
∼
∼
(b)
i:, y:, e:, E:, ø:, a:, o:, u:
long vowels
close vowels
tense vowels
decentralised vowels
smoothly cut vowels/syllables
Table 6.1: Denominations for the German vowel classes
The focus of the experiment presented in this chapter lies only on this vowel opposition between
short and long vowels and thus on those monophthongs which are part of a phonological contrastpair. This excludes schwa, which is a result of vowel reduction in unstressed syllables and does not
54
CHAPTER 6. EXPERIMENTAL STUDY
belong to a contrast-pair. The above mentioned vowel [5] does also not belong to a contrast-pair
and is not further examined.
The seven commonly described contrast-pairs are exemplified by minimal pairs2 shown in table 6.2.
The list of contrast-pairs as presented in table 6.2 suggests, that the only monophthongal vowel
phoneme which does not belong to a contrast-pair is /E:/. The positions of /e:/, /E/ and /E:/
within the phonological system, as in stehlen∼stellen∼stählen 3 , is a matter of disagreement among
Germanists.
/I/
/Y/
/E/
/œ/
/a/
/O/
/U/
∼
∼
∼
∼
∼
∼
∼
/i:/
/y:/
/e:/
/ø:/
/a:/
/o:/
/u:/
e. g.Mitte
e. g.Hütte
e. g.Bett
e. g.Hölle
e. g.Stadt
e. g.Pollen
e. g.Busse
∼
∼
∼
∼
∼
∼
∼
Miete
Hüte
Beet
Höhle
Staat
Polen
Buße
Table 6.2: The vowel contrast pairs.
According to some authors – the realisation of /E:/ is a matter of (regional) variation. Wängler
(1968) notes that usually /e:/ is used instead of /E:/4 . Kohler (1977, p. 175) states that the distinction between /e:/ and /E:/ is just a spelling pronunciation (“Schriftaussprache”) which reflects
the difference between orthographic <e> and <ä>, and that these two vowels are often merged
to /e:/, especially in the speech of northern Germany. The above stated contrast /e:/∼/E/ thus in
some cases might as well be replaced or supplemented by a /E:/∼/E/ pair, depending on the norm
and realisations of the examined native speakers.
The common explanation is, that if /E:/ is used at all by a speaker, it is usually the realisation
of orthographic <ä>. Becker (1998) argues that this view is too simple. He points out that the
realisation of orthographic <ä> as /E:/ is not just a spelling pronunciation, but that it is the result
of a process of mutual adjustment of both pronunciation and orthography along with the realisation
of orthographic <e> as /e:/. He claims that this process is (almost) completed and that today /e:/
corresponds to <e> and /E:/ corresponds to <ä>.
“Der Ausgleichsprozeß, der sich am Anfang dieses Jahrhunderts abzeichnete, ist jetzt
in der überregionalen Standardaussprache vollständig durchgeführt. Lediglich in einem
Gebiet im Südwesten (Stuttgart, Ulm, Tuttlingen) werden die historischen Distinktionen
auch von den gebildetsten Sprechern gemacht, sie sind jedoch inzwischen sehr auffällige
Kennzeichen einer Regionalsprache.” (Becker, 1998, p. 18)
The process of adjustment of the differences, which emerged at the beginning of this century,
is now completely accomplished in the supraregional standard pronunciation. The historic distinctions are realised even by the most educated speakers, only in an area in the southwest
(Stuttgart, Ulm, Tuttlingen). They are by now, however, very noticeable features of a regional
pronunciation. (translation DD)
Note that this study was carried out in Stuttgart, and that 18 of the examined speakers were
living in Stuttgart at the time of recording or are natives of the region (compare section 6.2). Such
2
3
4
["mIt@]–["mi:t@] (“middle” vs. “rent”), ["hYt@]–["hy:t@] (“hut” vs. “hat (pl.)”), [bEt]–[be:t] (“bed” (furniture) vs. “bed”
(in the garden)), ["hœl@]–["hø:l@] (“hell” vs. “cave”), [Stat]–[Sta:t] (“town” vs. “state”), ["pOl@n]–["po:l@n] (“pollen” vs.
“Poland”), ["bUs@]–["bu:s@] (“bus (pl.)” vs. “repentance”),
["Ste:l@n]–["StEl@n]–["StE:l@n] (“to steal” – “to put” –“to steel”)
“Anstelle des /E:/ hört man heute im Deutschen überwiegend /e:/.” (Wängler, 1968, p. 36)
6.1. GERMAN VOWELS
55
marginal remarks about regional (or other) variations in descriptions of the linguistic system of the
examined target language may prove crucial in an analysis of pronunciation or perception as it is
pointed out in chapter 5.
6.1.1
Acoustic correlates of the German vowel opposition
The various denominations of the two distinct vowel classes reflect various underlying theories
about the (primary) acoustic correlate of the observed phonological vowel opposition. Similarly,
the graphic notation of the contrast-pairs as shown above can be seen as a visualisation of the
underlying acoustic correlates of this contrast relation. The symbols /i:/ and /I/ for example imply
an opposition in quantity (indicated by the length mark “:”) as well as an opposition in quality
(indicated by the different vowel symbols “i” and “I”).
This study does not provide an exhaustive examination of all the possible acoustic correlates of the
vowel opposition, but concentrates on those most often described and referred to in the literature.
Vowel quantity
The denomination of the two vowel classes as long vs. short vowels can be interpreted as a reference
to a phonological function associated with vowel quantity. There is no agreement upon the question
whether vowel quantity constitutes the primary distinctive feature of the vowel opposition. However,
the fact is undisputed, that there is a contrast in vowel duration in stressed syllables. The acoustic
correlate of phonological vowel quantity is time – an acoustic parameter, which is rather easy to
measure.
Ramers (1988, p. 73) cites several sources that give ratios of 1:2 or 2:5 for the duration of short vs.
long vowels5 . Measured in milliseconds, long vowels have a duration well above 100 ms, and short
vowels well below 100 ms. Ramers for example found mean ratios of 1:2.1, 1:2.0, 1:2.58 and 1:1.65
for four male speakers and Claßen et al. (1998, p. 226) observed a mean duration of approximately
140 ms for long vowels and a mean duration of approximately 80 ms for short vowels in stressed
positions.
According to Kohler (1977) the only contrast in the pair /a:/∼/a/ is vowel quantity, and Wängler
(1968) states that the only difference between /E:/ and /E/ is vowel quantity.
Ramers concludes, that the obvious contrast of vowel duration under primary stress means that
quantity cannot be excluded as a possible correlate of the phonological vowel opposition. Despite
the disagreements among phonologists a contrast in vowel quantity between the two classes of short
vs. long vowels can be expected to be a feature of a native German pronunciation.
Vowel quality
The term vowel quality is sometimes used to refer to all features of vowels which are not attributable
to quantity. The term is used in this thesis in its narrower sense, referring only to supralaryngeal
articulatory settings and the corresponding acoustic characteristics. In the acoustic analysis of
this experimental study, vowel quality will refer primarily to the vowels’ formant structure. Vowel
5
The possibility of three distinct levels of vowel quantity and the existence of overlong vowels in the German
language will not be considered in this thesis, as it is treated only marginally in the literature and seems not to
be generally accepted in descriptions of German phonology (see e. g. Ramers, 1988, p. 76).
56
CHAPTER 6. EXPERIMENTAL STUDY
tenseness, which is sometimes subsumed under the term vowel quality, will be treated separately
(see below).
The values for the first two formants F1 and F2 taken from various sources (as cited above) are
shown in table A.4 on page 85. Values for the third formant F3 were available only from Ramers
(1988), who lists formant values for four male speakers.
In general, short vowels have a higher F1 than long vowels. The value of the second formant shows a
split relation: long front vowels have higher F2 values, and long back vowels have lower F2 values than
their short counterparts. These differences were found to be systematic for stressed and unstressed
vowels; however, only for stressed vowels these differences were significant – the only exception being
in the opposition of the two a-vowels (see e. g. Claßen et al., 1998, p. 224, or Ramers, 1988).
Tenseness
The phonological vowel opposition is often associated with the distinction between tense and lax
vowels. However, the nature of the acoustic correlate of tenseness is still a matter of debate.
The proposed acoustic correlates of vowel tenseness which will be of interest here are those associated
with voice quality in it’s narrower sense – namely those acoustic characteristics caused by different
laryngeal settings. Some researchers see the articulatory correlate of vowel tenseness in higher
tension of the tongue, the walls of the vocal tract or the vocal folds (Ramers, 1988, p. 123f).
According to Claßen et al. (1998, p. 223), the most prominent acoustic correlates of vowel tenseness
are the spectral tilt parameters skewness and rate of closure. Skewness (SK) refers to the slope of
the glottal closure and indicates how abrupt the glottis closes. The rate of closure (RC) refers to
the speed at which the air stream is cut off.
Tense (long) vowels show somewhat higher SK values and considerably higher RC values than lax
(short) vowels. The speed of glottal closure is the major voice quality correlate of vowel tenseness
which correlates most with the acoustic parameter RC.
In addition to the parameters SK and RC, Claßen et al. observed that tense vowels show higher
OQ values and lax vowels higher values of GO, although these differences were not significant. The
open quotient (OQ) represents the time during which the glottis is open in relation to the entire
duration of a glottal period. Glottal opening (GO) refers to the degree of the glottal opening during
the entire glottal period.
With respect to word stress, tenseness is a more stable feature of the vowel opposition than vowel
length. This means, that this feature is not neutralised in unstressed positions. The only exceptions
are (again) /a:/ and /a/, which show no significant differences in unstressed positions, neither in
quality (including tenseness) nor in quantity. Claßen et al., p. 226 summarise their findings as
follows: tense vowels are long (in duration) only if they are stressed, and stressed vowels are only
long (in duration), if they are tense.
Another exception is the vowel /E:/, which on the one hand is attributed with the feature lax, but
on the other hand is nevertheless categorised as a long vowel.
6.1.2
German vowels: Summary
Since the aim of the present study is not the examination of the nature of the German phonological
vowel system, but its realisation by non-native speakers (in comparison to native speakers), most of
the above stated questions can be left open. It is however important to point out, that the acoustic
6.2. THE PARTICIPANTS
57
features proposed as primary correlates of the vowel opposition have to be taken into account in
the phonetic analysis of the speech samples taken in this examination.
To summarise these considerations – (possible) acoustic correlates of the German phonological which
seem most promising in an examination of the vowel opposition are: vowel quantity (duration), vowel
quality (formant structure) and tenseness (spectral tilt). These are the acoustic features which will
be examined in the experimental study described in this chapter.
6.2
The participants
The speakers for the production experiment were recruited primarily through a circular email at the
Institute for Natural Language Processing and some by personal contact. There was no pre-selection
of the participants according to any linguistic or non-linguistic criteria – every speaker who wanted
to participate was included in the study. The speakers were not paid for their participation; the
only possible reward being the results of the later analysis of their own speech.
The speakers were not informed about the precise nature of the study prior to the recordings.
However, the general focus on foreign accent was known to the majority of the speakers and some
of them were also aware of the experiment’s focus on the German vowel opposition. No further
attempts were undertaken to mislead the participants into believing that something else than their
pronunciation was examined.
18 speakers were recorded at the Institute for Natural Language Processing at the University of
Stuttgart, Germany, and in addition two speakers were recorded with help from Mateusz Wiącek
in Kielce, Poland. All except two (speakers A02 and B07) were living in the region of Stuttgart at
the time of recording or had lived there in the past.
The speakers were grouped afterwards into three groups according to their ages of learning German
(AOL): group A comprising the speakers with AOL beyond early childhood (ranging from 7 to
a maximum of 22 years with a mean value of 16.8 and a standard deviation of 4.73), group B
comprising bilingual speakers6 with AOL not above early childhood (AOL from 0 to 3, with a
mean of 1.13 and standard deviation 1.55, and group C comprising the non-bilingual speakers of
German (AOL 0). Informally speaking, group A represents the non-native speakers, group C the
native German speakers7 , and group B represents bilingual speakers, who are in general not easy to
classify. There are some problems with this categorisation. First, some of the speakers in group B
would not consider themselves as bilinguals. For example, speaker B07 was put in group B, because
of an Italian speaking nursemaid. C02 was put in group C despite “German and Swabian” speaking
parents.
An overview of the demographic characteristics of the speakers is shown in table A.1 on page 84. The
column “List” shows which of the four word lists was given to the speaker. The column “Age” shows
the age of the speaker at the time of recording. Speakers who reported to have learned German from
their parents are given an AOL of 0. Speakers who were born in a German speaking surrounding are
given an AOA of 0, respectively. LOR is shown in months, as one speaker reported only one month
of residence in a German speaking surrounding. Speakers who did not report any longer residence
6
7
Compare the definition of bilingualism in section 1.1.5
Compare the introductory sections, esp. the problems with the definition of the concept of a speaker’s native
language in section 1.1.3. The term native language is not only avoided in the description of the three groups
of speakers as given above, but it was also avoided throughout the entire experiment. The word Muttersprache
(native language) was neither used in the circular email nor during the recording procedure (see section 6.4) and
the preceding informal conversations with the participants.
58
CHAPTER 6. EXPERIMENTAL STUDY
outside a German speaking surrounding are given an LOR according to the formula: Age × 12. The
first languages are shown in column L1. Speakers who reported to have acquired two languages
simultaneously are given a compound L1 value (the language codes are listed on page 7). The lower
part of table A.1 shows a summary of the respective variables.
6.3
The speech material
The carrier words for this experiment containing the examined long and short vowels were selected
according to the following restrictions:
• no “doubtful cases”
• no vowels in a V+/r/ context
• at least one minimal pair for each vowel contrast
• only vowels in stressed syllables
• orthographically marked and unmarked vowels
Tröster-Mutz (2004) cites several “doubtful cases” (“Zweifelsfälle”) of words with varying designations of vowel quantity or quality in different pronunciation dictionaries or phonetic descriptions
of German vowels. Some of these words are for example: Distel, Gas, Geruch, Krebs, Magd, Obst,
Ost or schon 8 (the vowels in question are underlined). As stressed in section 5.5 such cases, without “clear native norms” should be excluded from examinations comparing native and non-native
speakers.
As sequences of vowel + /r/ are usually realised as [5]-diphthongs, such words were excluded as
well as the other German diphthongs as described above.
At least one minimal pair was included in the list of carrier words for each contrast pair. This
provides (at least one) instance of a vowel opposition within an otherwise identical context. To
obtain vowels in comparable contexts the additional carrier words were selected such that the vowels
appear in stressed, non-final syllables. In addition to a controlled phonetic context, the carrier
words were selected to cover not only “orthographically marked” but also “unmarked” instances
of each vowel9 . Short vowels may be marked orthographically for example by doubling of the
following consonant letter as in Spott 10 . Long vowels can be marked orthographically for example
by a following <h> letter or by doubling the corresponding vowel letter as in Stahl or Staat 11 .
Orthographically unmarked, i. e. not marked explicitly as being short or long, are vowels which
are represented by single vowel letters, as for example in Koch or hoch 12 . The purpose of this
consideration of orthography was to minimise a possible influence of the written form, i. e. to avoid
spelling pronunciations. This was especially important considering the way the word lists were
presented to the speakers (see following section).
8
9
10
11
12
["d{I/i:}st@l] (“thistle”), ["g{a/a:}s] (“gas”), ["g@K{U/u:}x] (“smell”), ["kK{E/e:}ps] (“crab, cancer”), ["m{a/a:}kt]
(“maidservant”), ["{O/o:}pst] (“fruit”), ["{O/o:}st] (“east”), ["S{O/o:}n] (“already”).
Note that this terminology is used informally, as the German orthography is of course far more complex than the
simple examples given in this section.
[SpOt] (“mockery”).
[Sta:l], [Sta:t] (“steel”, “state”)
[kOx], [ho:x] (“cook”, “high”).
6.4. PROCEDURE
59
Four randomised versions of the compiled word list were created and manually edited to ensure the
presence of at least two other vowels between any repetitions of instances of the same vowel group.
The four final word lists are shown in appendix B. They comprise each the same 88 carrier words
with each vowel included at least five times.
6.4
Procedure
The recordings at the Institute for Natural Language Processing were carried out by myself, while a
friend assisted as experimenter for two speakers who were recorded in Kielce, Poland. The recording
sessions were all carried out entirely in German. The experimenter and the two speakers in Poland
were explicitly instructed to speak exclusively German during the experiment. In addition, the
experiments in Poland were preceded by about 10 to 15 minutes of informal German conversation.
These measures were taken, to ensure comparable conditions for all participants and to allow the
speakers to “switch to a German mode” to avoid short term influences by the speakers’ L1 (compare
the considerations by Piske et al., 2001, discussed in section 5.2).
6.4.1
Part I: Interview
At the beginning of each recording session a short interview was conducted with the participant
already sitting inside the anechoic recording chamber. This served two purposes: first, the collection
of demographic data and second, the acclimatisation of the speakers to the unusual situation within
such a chamber.
The interview was carried out verbally. The experimenter asked the questions via headphones from
outside the anechoic chamber and wrote down the given answers. The speakers were asked about
their age, place of birth and the language or languages they have learned from their parents and
the respective ages of learning (if the languages were acquired subsequently). They were also asked
about their parents’ first languages. These first questions refer to the speakers’ L1. Note, that the
word Muttersprache (“native language”) was not used during the experiment.
A second block of questions aimed at the additional language experiences, especially with respect
to the non-native speakers and their German language experience. The speakers were asked about
any additionally learned languages and the respective ages of learning. They were also asked about
their current place of living and previous places of residence within a German speaking surrounding
and the respective lengths of residence as well as the age of first arrival in such a surrounding. The
corresponding experimental variables to these questions are AOL, LOR and AOA.
6.4.2
Part II: Production experiment
In the main part of the experiment the speech samples were recorded using a dialogue technique
as suggested in section 5.2. The carrier words were embedded within a short dialogue as in the
following example:
Examiner: Hütte bitte als nächstes! (“Hütte – hut – next please!”)
Subject: Er hat Hütte gesagt. (“He said Hütte.”)
60
CHAPTER 6. EXPERIMENTAL STUDY
The exact instructional sentence spoken by the experimenter is not important and was not always
the same as stated above. However, the target word was in all cases followed by other words spoken
by the examiner (e. g. Hütte ist das nächste Wort – “Hütte is the next word” – or something alike).
The sentences spoken by the subjects on the other hand were always the same and varied only in
one word. The word list was printed out on a single sheet of paper (DIN A4 size), containing the
carrier sentences which the speakers had do produce. A sample of such a printed version is shown
in figure B.1 on page 118.
This technique was used to avoid direct imitation of the verbally presented target word. The additional presentation of the words in written form was chosen to avoid mistakes caused by difficulties
in perception. It is stated in section 5.2 that a learner might be well aware of a phonological distinction in an L2 and able to produce it, but nevertheless unable to perceive it consistently. Since
the focus of the present study is on production and not perception, it seemed reasonable to avoid
perception mistakes by additional orthographic display of the target words.
On the other hand, the verbal presentation by the examiner within an interactive dialogue was
employed to avoid unnatural spelling pronunciations or hyperarticulation. A prerecorded version of
the instructional sentences was presented to the two speakers in Poland to assure the same recording
conditions for all speakers, i. e. the same spoken examples. The only difference worth mentioning
might be that the pauses between each sentence were somewhat larger for the speakers in Poland
because of the fixed nature of the recorded “dialogue”.
The dialogue was repeated in six instances due to non-linguistically caused mispronunciations (“slips
of the tongue”), hesitations or the like. Nevertheless, one utterance had to be excluded afterwards
from further analysis because the speaker did not say “zerstößt” as required, but “zerstört”. In
another case eight target words could not be analysed and had to be removed from the recordings
due to technical error.
Only the speech of the subjects was recorded. The subjects were all recorded continuously direct
to hard disc on a Linux machine in wave format at a sampling rate of 48,000 Hz with a resolution
of 16 bits (mono). The recording procedure resulted in 20 audio files – one for each speaker. The
longest recording had a duration of 10:05 min (speaker A02), and the shortest 4:03 min (speaker
B07).
6.5
Acoustic analysis: method
The recordings were analysed in several steps to measure the proposed acoustic correlates of the
phonological vowel opposition.
The first step in the acoustic analysis of the speech recordings was labelling of the audio files. The
vowels were labelled manually using the WaveSurfer software version 1.8.5 (see figure 6.2). Two
label files were produced for each recording. One file contained labels for the beginning and end
points of the vowels (with the labels placed at zero crossings at the beginning and the end of a
period), and the other file contained four labels near the temporal midpoint of each vowel on four
subsequent periods (only two instances had less than four complete periods – these were labelled
accordingly at three points). The first label file was used for formant and duration measurements
and the second for the voice quality parameter measurements.
Vowel duration and the values of the first three formants F1 , F2 and F3 were measured automatically
in Hertz at the temporal midpoint according to the labels at the beginning and at the end of each
6.5. ACOUSTIC ANALYSIS: METHOD
61
Figure 6.2: Screenshot of labels in WaveSurfer. The picture shows the [te:] part of Steg [Ste:k], in the
recording of speaker A03. The topmost pane shows the waveform, below that are the spectrogram, the time
axis and the two panes used for the labels – the pane with the labels marking beginning and end of a vowel
(the highlighted area), and at the bottom the pane marking four points for voice quality analysis.
vowel using Praat version 4.6.3413 .
Voice quality parameters were measured with the harmonics-tools (HI) developed by Wolfgang
Wokurek at the Institute for Natural Language Processing at the University of Stuttgart. Despite
its broader functionality, HI was employed only in the measurements of voice quality parameters.
The two voice quality parameters of special interest in this experiment are skewness and rate of
closure. Skewness can be measured acoustically by the difference between the amplitude of the first
harmonic H1 and the amplitude of the second formant A2 . The skewness gradient (SKG) is defined
internally in HI as follows:
SKG =
H̃1 − Ã2
1−10 + (Bark(F2p ) − Bark(F0 ))
(6.1)
An acoustic measure for the rate of closure is the difference of the amplitude of the first harmonic
and the amplitude of the third formant (A3). The rate of closure gradient (RCG) is defined in HI
as:
RCG =
H̃1 − Ã3
1−10 + (Bark(F3p ) − Bark(F0 ))
(6.2)
Besides these two parameters, which are expected to show a correlation with vowel class, some
additional parameters were measured with HI: the open quotient, glottal opening, incompleteness
of closure (IC) and T4G. The latter two are not mentioned in section 6.1.1 where the acoustic
correlates of the vowel opposition are discussed. They were nevertheless included in the analysis
of voice quality since they both refer to phonatory settings and might provide interesting results.
The incompleteness of closure is defined as the bandwidth of the first formant B1 divided by F1 .
T4G is another spectral tilt parameter and is defined as the difference between the amplitude of the
first harmonic H1 and the amplitude of the fourth formant A4 . The four corresponding formulae as
defined in HI are as follows:
13
http://www.praat.org
62
CHAPTER 6. EXPERIMENTAL STUDY
OQG
=
GOG =
IC
=
T 4G
=
H̃1 − H̃1
1−10 + Bark(2F0 ) − Bark(F0 )
H̃1 − Ã1
1−10 + Bark(F1 ) − Bark(F0 )
B1
F1
H̃1 − Ã4
1−10 + Bark(F4 ) − Bark(F0 )
(6.3)
(6.4)
(6.5)
(6.6)
As can be seen from equations 6.1 to 6.6, the frequency values of the formants given in Hertz are
converted to the Bark scale by HI.
The measurements were carried out according to the second label file with four labels for each
vowel near its temporal midpoint. This resulted in a maximum of 352 values for each speaker (four
measurements for each of the 88 spoken vowels).
Statistical analysis of the measured data was carried out with R version 2.3.014 . The output produced by Praat and HI was loaded into R.
Although the amount of collected data is considerable for a thesis like this one, the number of
instances of each individual vowel is rather low. Measurement errors or simple slips of the tongue 15
in just one instance can lead to considerable differences. As a consequence, in cases with a restricted
number of samples, as in this experiment, the automatically measured values should be inspected,
and where necessary corrected manually. Unfortunately, this could not be done within the scope of
this thesis. The statistical analysis of this experiment must therefore be treated with caution. The
limited amount of data does not allow generalisations anyhow.
6.6
Acoustic analysis: results
For the reason that the examined speakers both are small in numbers and rather heterogeneous with
respect to the various demographic variables (as shown in table A.1), a strictly statistical analysis
of the results cannot be performed reasonably.
6.6.1
Vowel quantity
The realisation of a contrast between short-class and long-class vowels was measured by the ratios
of long-class vowel divided by short-class vowel. The ratios as cited in section 6.1.1 could not be
reproduced by the measurements of the two supposedly native speakers in group C.
Speaker C01 has means of 95.24 ms and 58.83 ms for long-class and short-class vowels respectively
(standard deviations: 16.54 ms and 11.49 ms), and speaker C02 has means of 99.46 ms and 53.73 ms
respectively (standard deviations: 14.84 ms and 10.84 ms). This equals to a mean ratio of 1:1.71 for
short-class versus long-class vowels. Figure 6.3 shows a box–whisker plot the mean vowel durations
14
15
http://www.r-project.org/
As mentioned earlier, though great care has been taken, to reduce such mispronunciations, some might have
remained undetected.
6.6. ACOUSTIC ANALYSIS: RESULTS
63
for each group in milliseconds. As can be seen from this figure, the speakers in group A tend to
realise long-class vowels with considerably longer durations (mean: 110.93 ms, sd: 15.27 ms) than
do the speakers in group B (mean: 100.87 ms, sd: 6.06 ms) or group C.
Figure 6.3 depicts also the higher variability within group A as compared to the other two. The
speaker who realises on average the greatest difference between long-class and short-class vowels
is A09. The mean duration for long-class vowels is 122.69 ms (sd: 18.8 ms), and for short-class
vowels 60.01 ms (sd: 15.4 ms). The two speakers with the smallest difference are A10 and A05. The
mean durations for speaker A10 are for long-class vowels 84.82 ms (sd: 14.63 ms), and for shortclass vowels 68.21 ms (sd: 15.49 ms), and for speaker A05 114.95 ms (sd: 19 ms), and 90.52 ms (sd:
19.32 ms) respectively.
100
120
140
Vowel duration (in ms)
80
●
60
●
gr. A (l)
gr. B (l)
gr. C (l)
gr. A (s)
gr. B (s)
gr. C (s)
Figure 6.3: Vowel duration (per group)
The ratios for each vowel pair, each speaker individually and each group are shown in table A.5 on
page 86. Note that the table includes ratios for the pair [E:]∼[E] but not [e:]∼[E:].
With respect to the e-vowels [e:], [E] and [E:], the results show, that all speakers realise [e:] longer
than [E], and [E:] longer than [E]. The speaker with the lowest [e:]/[E] ratio is A05 with a value of
1.20. He realises [e:] with a mean duration of 112.51 ms (sd: 38.02 ms) and [E] with a mean duration
of 93.87 ms (sd: 18.56 ms). The mean ratio of [E:]/[e:] for all speakers is 1.03. The speaker with
the lowest ratio is A02 with a value of 0.67 (mean duration of [E:]: 97.80 ms, sd: 20.86 ms; mean
duration of [e:]: 145.35 ms, sd: 19.51 ms), and the speaker with the highest ratio 1.29 is B05 (mean
[E:]: 114.43 ms, sd: 17.73 ms; mean [e:]: 88.48 ms, sd: 9.99 ms).
Another interesting case is the pair [a:]∼[a], which according to the literature should be distinguishable primarily in duration. The results show, that all speakers realise [a:] longer than [a]. The
speaker with the lowest ratio is speaker A05, with a ratio of 1.30 (mean duration of [a:]: 139.08 ms,
sd: 18.44 ms; and for [a]: 106.81 ms, sd: 31.86 ms). The mean ratio for group A is 1.6 (sd: 0.25) –
exactly the same mean values as can be observed for group C.
In table A.39 on page 116 the results of the vowel duration measurements are summarised together
with the results for vowel quality and voice quality measurements (see below). The symbols in the
64
CHAPTER 6. EXPERIMENTAL STUDY
rows labelled “quan. opp” mark the significance level of t-tests performed on the durations of the
respective vowels of each vowel pair. A • marks a p-value at the 0.001 level. This means, that the
difference between the durations of the two vowels within a pair is highly significant. A ◦ marks a
p-value at the 0.01 level and a ∗ a value at the 0.05 level. P-values above that level are indicated
by a dash. This marks cases, where a speaker does not realise the two vowels with significantly
differing durations.
6.6.2
Vowel quality
A summary of the results of the formant measurements for each speaker is shown in tables A.6
to A.25 on pages 87-93. Because the numerical data is not so easy to interpret – although presented
in these readable tables – further evaluation of the data was necessary.
Some formant values for [u:] and [U], e. g. for speakers A01 or B04, include obvious errors of measurement (see figures A.1(a) and A.5(c) on pages 94 and 98). A visual inspection of the spectrogram
of some of these instances in the recordings of speaker A01 confirms this assumption. As a manual
inspection and correction of all measurements could not be done, the respective [u:] and [U] values
of speakers A01 and B04 were not generally excluded and removed from further analysis. The data
was not manually corrected in some instances just in order to receive more convenient results. The
tables on pages 87-93 thus show a summary of all the measured values – including some of these
extreme outliers.
The F1 /F2 vowel diagrams for each speaker are shown in figures A.1(a) to A.7(c) on pages 94
to A.7(c). Note that the axes in these diagrams are reversed and rotated to resemble the traditional
vowel diagram. The equivalent IPA signs for the symbols used in the diagrams are shown in table 1
on page 7.
Two questions are of interest in the present study: first, the realisation of the vowel opposition and
second, the native-likeness of this realisation. The second point includes the native-likeness of the
individual vowels as well.
In order to make the formant values comparable across speakers, the values measured in Hertz were
normalised using Lobanov’s z-score transformation shown in equation 6.7 (Lobanov, 1971; Adank
et al., 2004).
FtiN =
Fti − Mti
,
δti
(6.7)
where Mti is the average value of formant i across all vowels and δti is the standard deviation for
each speaker t. This normalisation was chosen because it supposedly provides the best means to
remove anatomical influences from the acoustic data while at the same time preserving linguistic
and social information.
To determine the native speaker norm, reference points within the vowel space were defined for
each of the examined vowels by computing the mean values for the two speakers from group C,
the “monolingual native speakers”, and the respective values taken from literature16 . This was done
16
It has to be pointed out that the term monolingual is used here informally to refer to those speakers who are
not bilinguals in the sense as defined in section 1.1.5. Strictly speaking, speakers C01 and C02 cannot be called
monolinguals. Both speakers reported having learned English and French. In addition, speaker C01 reported a
9 month stay in Great Britain and speaker C02 even stated to have learned Swabian from his parents. In addition,
the language backgrounds of the speakers who served as sources for the values taken from the literature are
unknown.
6.6. ACOUSTIC ANALYSIS: RESULTS
65
because of the relatively small number of speakers in group C – a group which normally serves as
the control group in a larger study under more professional conditions. Missing values for a specific
vowel and/or formant in the data from the literature were ignored in computing the normalised
values for the reference points. The resulting F1 /F2 vowel space for these reference points is shown
in figure A.7(e) (in Hertz) and figure A.7(f) (normalised) on page 100.
First to test the validity a comparison of the F1 and F2 values between long and short vowels will
be made for the reference points with respect to the discussion in section 6.1.1. Table 6.2(a) shows
a summary of the mean F1 and F2 values of the reference points (group C and the values from
literature).
Short vowels thus should have a higher F1 than their long counterparts. The only vowel pair where
this is not the case is [a:]∼[a] – but this was expected, as it was explicitly pointed out in the
literature as an exception. With respect to F2 , the long back vowels all have lower values. The long
front vowels should have higher F2 values than their short counterparts according to the literature.
The only vowel pair where this is not the case is [ø:]∼[œ]. This is due to the values of speaker
C01, since the values for speaker C02 and the values taken from literature agree with the rule. The
mean values from literature are 1485 Hz for [ø:] and 1476 Hz for [œ]. The exception in the values of
speaker C01 can most likely be attributed to a measurement error, as the vowel diagram suggests
(see figure A.7(a) on page 100). Thus, the constructed reference points generally agree with the
literature.
(a) group C + literature
V
F1
mean
sd
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
405
518
712
752
557
396
549
395
279
583
403
394
305
385
289
57
68
88
91
103
60
112
73
23
74
41
53
49
61
19
F2
mean
1526
1548
1398
1303
1833
2224
1932
1962
2370
1104
727
1038
786
1557
1655
(b) group B
sd
V
F1
mean
175
130
161
168
255
299
309
170
295
169
130
189
222
139
152
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
394
502
668
732
536
381
476
397
300
561
396
399
339
379
294
sd
F2
mean
sd
60
82
95
167
77
47
116
58
63
68
49
56
73
56
46
1700
1679
1496
1303
1853
2270
2038
1993
2333
1068
735
1112
845
1793
1912
170
238
269
245
223
225
289
237
279
196
193
367
420
185
217
Table 6.3: Reference F1 /F2 values and standard deviations
The mean values for the speakers from group B served as a secondary set of reference points for
the speech of group A. A comparison of the F1 and F2 values according to the above stated rules
reveals that the mean values of group B show no unexpected exceptions (see table 6.2(b)).
Realisation of the vowel opposition
The differences in F1 and F2 between short and long vowels are discussed in section 6.1.1. The
realisation of a contrast in quality between two vowels of a given contrast pair will be assumed to
66
CHAPTER 6. EXPERIMENTAL STUDY
be measurable by a pair wise comparison of the two vowels’ formant values. If there is at least one
formant (F1 , F2 or F3 ), which is “sufficiently” distinguishable from the corresponding formant value
of the other vowel, it will be assumed, that this is an acoustic indication of an intended realisation
of a contrast between these two. The problem is finding a measure for this distinction.
Although the formant frequencies alone might not in all cases be sufficient to describe vowel quality,
they will be used here as the only examined acoustic correlate (voice quality parameters possibly
correlating with the vowel opposition are examined separately – see following section).
As a working hypothesis, it will be assumed, that such a comparison can be computed using t-tests
to compare the vowel formants pair wise. Once again, it has to be mentioned, that the following
analyses will be carried out and interpreted despite the very limited amount of data. The following
example should illustrate this proposed ad hoc method:
T-tests performed on the three formants (see table on the upper left in figure 6.4) of [i:] and [I] as
realised by speaker A03 yields the following results: for F1 , t = −4.30, df = 9.85 and p = 0.0016, for
F2 , t = 5.31, df = 6.61 and p = 0.0013, and finally for F3 , t = 9.00, df = 10 and p = 4.15 ∗ 10−6 . The
p-values for F1 and F2 below the 0.01 level, and the p-value for F3 below the 0.001 level indicate
that the two vowel might have formants with pair wise different mean values. The vowels [y:] and
[Y] from the same speaker on the other hand have formant values which are much closer. The t-test
yields t = −3.18, df = 8.35 and p = 0.01 for F1 , t = 0.17, df = 6.17 and p = 0.87 for F2 , and
t = −2.04, df = 5.53 and p = 0.09 for F3 . The relatively high p-values all above the 0.01 level
suggest, that the mean values are probably not different. This numerical analysis suggests, that the
speaker probably distinguishes [i:] from [I] by vowel quality. The evidence for a similar distinction
of [y:] from [Y] by the same speakers is less conclusive. Provided the small p-value in F1 is not just
a consequence of the very limited amount of data, it is possibly an indication, that the speaker does
not distinguish the two vowels by quality (as measured by formant frequencies). A comparison with
the vowel diagram shown on the right in figure 6.4 confirms these conclusions.
Table A.26 on page 101 shows the results of such a pair wise application of t-tests of all vowel
pairs for all examined speakers. The symbol • indicates, that the p-value is below the 0.001 level. ◦
indicates a p-value below the 0.01 level, and the symbol ∗ a p-value below the 0.05 level. A dash in
the table indicates a p-value above the 0.1 level. This construction can be interpreted as a kind of
“proximity score” for the formant values of each vowel pair. If interpreted that way, a designation
of “–” for all three formants represents the maximum score, meaning, that the vowels have the most
similar formant values (according to this method). A designation of • for all three formants on the
other hand represents the minimum score, meaning, that the formant values are the most dissimilar.
The corresponding entries for the above stated example for speaker A03 are shown in the table on
the lower left in figure 6.4.
It seems safe to assume, that two vowels are distinct if their mean values differ in such a way as
described above. This does, however, not imply that vowels which are not considered as distinct
according to this method cannot be classified as two (phonetically) distinct vowels by other criteria.
Thus, this method is used only as an indicator of dissimilarity of pair wise compared formant values
of two vowels. Interestingly the results obtained by such an evaluation agree with the interpretations
of the visual appearance of the vowel diagrams. Note that the characteristic F1 and F2 patterns
associated with the vowel opposition are not of interest here but are discussed in the following
section on the native-likeness of the realisation of the vowel opposition.
The i-vowels [i:] and [I] are realised with formant values, which show a distinction for at least
one formant with a p-value below the .001 level, by speakers A01, A02, A03, A04, A08 and A09
6.6. ACOUSTIC ANALYSIS: RESULTS
F2
mean
sd
2165
52
1860 130
1641
63
1637
20
F3
mean
sd
3277 137
2561 139
2251
56
2368 117
−1.5
sd
22
25
16
17
i
yy
i
−1.0
[i:]
[I]
[y:]
[Y]
F1
mean
288
347
287
319
67
yy
ii
i
y yY
y Y
i I
i
[y:]∼[Y]
.012 ∗
–
.092
Y
F1n
[i:]∼[I]
.002 ◦
.001 ◦
<.001 •
Y
Y
−0.5
Fi
F1
F2
F3
I
I I
II
I
Y
0.0
ID
A03
Speaker A03
2.0
1.5
1.0
0.5
0.0
−0.5
F2n
Figure 6.4: [i:]∼[I] and [y:]∼[Y] of speaker A03.
from group A. From the remaining speakers, all but one (speaker B08) realises that distinction as
well with p-values below the 0.001 level or only slightly above it. Speakers A05, A07 and A10 realise
[i:] and [I] with very close formant values in all three formants.
The ü-vowels [y:] and [Y] are realised with considerably high p-values by all the speakers in
group A. Only speaker B01 realises the two vowels with all three formants’ p-values well beyond
0.001. All the other speakers in group B and speaker C01 realise a distinction in vowel quality.
Speaker C02 displays some proximity in the formant frequencies, however, a comparison of the
statistical analysis with the graphic display in figure A.7(c) on page 100 indicates, that the vowels
are despite their proximity very likely distinguished by quality by that speaker.
The e-vowels [e:], [E] and [E:] were compared in the three combinations [e:]∼[E], [E:]∼[E] and
[E:]∼[e:]. The first mentioned corresponds to the traditional description of the German vowel system
as described in section 6.1. The second pair – [E:]∼[E] – should display no difference in vowel quality,
as these two vowels are described as differing only in duration (if [E:] is realised at all by a speaker).
The last pair – [E:]∼[e:] – should show no difference in vowel quality for those speakers, who identify
orthographic <ä> (as a long vowel) with [e:]. This means, that only the distinction [e:]∼[E] can
be tested unambiguously, as the other two pairs are expected to display variation even for native
speakers.
The pair [e:]∼[E] is realised with clearly differing formants by speakers A01, A02, A03, A06, A08 and
A09. The other speakers in group A show closer formant values, i. e. higher p-values for the t-tests.
However, no one realises these two vowels with almost identical mean values for all three formants.
All the speakers of groups B and C realise these two vowels with significantly differing formants.
Notably speakers B03 and B05 realised textipa[e:] and [E] with highly significant differences in all
three formants. Except for speaker A07 (and possibly A10), a tendency to distinguish [e:]∼[E] can
be seen on all vowel diagrams.
[E:] and [E] are realised differently by speakers A01, A02, A03, A08 and A09, as well as by B01,
68
CHAPTER 6. EXPERIMENTAL STUDY
B03, B04, B05 and speaker C02. In contrast, [E:] is realised differently from [e:] by speakers A01,
A02, B02 and B07.
This comparison shows, that speaker A01 realises the vowel pairs [e:]∼[E], [E:]∼[E] and [E:]∼[e:] with
formant values which show a distinction for at least one formant at a highly significant level. Thus,
the three respective vowels are mutually distinguished from one another and take three different
places within the vowel space of that speaker. The same effect can be observed with speaker A02,
and to a lesser degree with speakers A03 and C02. The highest overlap in all three pairs can be
observed with speaker A07, who seems to realise all three German e-vowels alike.
The ö-vowels [ø:] and [œ] are distinguished only by speakers A03 and A09 from group A. They
are not distinguished considerably by all other speakers in group A as well as by speakers B01, B06
and B07 (although, these latter two could be measurement errors or slips of the tongue). Despite
the statistical similarity of the formant values, the visual display in the vowel diagrams show a
possible tendency for a realisation as two different vowels by speaker A01.
The a-vowels [a:] and [a] are realised with two significantly different vowel qualities only by
speaker B01, and to a lesser degree by speakers A03, A07 and A09. All the other speakers do not
realise these two as different vowels. A mutual overlap in all three formants can be observed in the
realisations of speakers A01, A02, A05, A06, A10 and C01.
The o-vowels [o:] and [O] are distinguished considerably by three of the ten speakers in group
A: A04, A06 and A09 – and to a lesser degree as well by speakers A02, A03, A07 and A08. All
speakers in groups B and C produce these two vowels with significantly different qualities. Although
for speaker A02 the minimum F1 of [O] (492 Hz) is lower than the maximum F1 of [o:] (514 Hz),
the t-test yielded a p-value of 0.003, although the formants show a high dispersion of the individual
instances of these vowels (compare figure A.1(c) on page 94).
The u-vowels [u:] and [U] could not be analysed for all speakers due to some instances of extreme outliers in linguistically unlikely positions within the F1 /F2 vowel space. The u-vowels could
therefore not be analysed reliably for the speakers A01, A02, A05, B02, B04, B05, B07 and B08.
The data for speaker A03 includes one such outlier which can be interpreted as a measurement
error. The remaining instances show a distribution which shows a distinction between the two vowels. The same is true for speaker B02 – provided, the two instances of [u:] with the highest F2
are interpreted as measurement errors (see figure A.4(e) on page 97). Of the remaining speakers
in group A no one realised the two u-vowels as significantly distinct vowels, and in group B only
speaker B06 distinguishes [u:] and [U]. Despite a considerable computed overlap for the two vowels
by speaker C02, the distributional pattern indicates the existence of a distinction. Disregarding the
one extreme [u:] outlier at 1516 Hz, the remaining values show a distribution as it can be expected
for a native German speaker.
Native-likeness of the vowel opposition
Native-likeness was computed in two different ways. The patterns described in section 6.1.1 are
examined first. The general rule, that short vowels have a higher F1 than their long counterparts
can be observed in almost all vowels of all examined speakers.
6.6. ACOUSTIC ANALYSIS: RESULTS
69
This means, that there is either too little data, or that a comparison like this is not an adequate
measure of native-likeness of the realisation of the vowel opposition.
Assuming the latter, the next step in this analysis was a comparison of the individual vowels
with the target values. A method corresponding to the ad hoc measure using pair wise t-tests
as described above was employed. This time, each of the vowels of the individual speakers was
compared individually to its corresponding reference point.
For this between-speakers comparison the normalised formant values were used and not the values
as measured in Hertz. The native reference points are also shown in the example of speaker A03
in figure 6.4. The mean native speaker values of [i:] and [I] are shown in the background of the
vowel diagram. The respective vowel symbol marks the mean value and the surrounding boxes
show one (solid lines) and two (dotted lines) standard deviations of the normalised reference value.
The complete diagrams with the normalised vowels for each speaker are shown in figures A.1(b)
to A.7(d) on pages 94 to 100. The reference points for each vowel are shown in the background in
each graph (for the sake of readability without standard deviations).
This corresponds to the traditional approach of comparing the speech of non-native speakers to the
speech of “monolingual” native speakers. In addition, the vowels of each speaker in group A were
also compared to the mean values of the corresponding vowels from the speakers in group B (see
below).
The interpretation of the computed proximities is reversed in comparison to the above discussed
evaluation: the higher the score is, i. e. the closer two vowels are, the more native-like they are. As
it was enough if at least one formant was sufficiently distinct from its counterpart in the above
mentioned considerations, only one such significantly distinct formant value can now be interpreted
as an indication of considerable deviation from the native speaker norm. Table A.27 sums up the
results of this numerical comparison (see page 102).
The symbol • indicates that the mean of one formant of a given vowel falls within one standard
deviation of the native speakers’ mean value. ◦ marks instances where the mean of a speaker’s vowel
production falls within two standard deviations and the dash “–” marks instances where the mean
of a vowel is further away from the native mean value than two standard deviations.
The symbol • indicates that the t-test yielded in a p-value smaller than 0.001. ◦ marks instances
where the p-value is below the 0.01 level, and an asterisk ∗indicates a p-value smaller than0.05.
Values above the level of 0.1 are not shown and the respective cells in the table marked with a
dash –.
This comparison of each individual vowel together with the characteristic native speaker patterns
of F1 and F2 in correlation with vowel class now gives a more distinguished picture of the nativelikeness of the realisation of the vowel opposition.
The i-vowels are realised native-like by most of the examined speakers. All speakers realise [I]
with a higher F1 than [i:] – although the difference is quite small for speakers A05 (339 Hz vs.
354 Hz) or A08 (367 Hz vs. 373 Hz). The difference for speaker A07 is irrelevant (382.11 Hz vs.
382.43 Hz). The differences for the remaining speakers in group A range from 22 Hz (speaker A10)
to 105 (speaker A01). The speakers in groups B and C realise [I] and [i:] with differences in F1 in
the range from 68 Hz (speaker B08) to 194 Hz (speaker C01). All speakers without exception realise
[i:] with a higher F2 than [I] (with differences in the range from 63 Hz (A05) to 736 Hz (C01).
With respect to the individual vowels, [I] is realised with mean formant values (F1 , F2 and F3 )
which show no deviance from the native speaker’ norm by speakers A03, A06, B02, B03, B04, C01
70
CHAPTER 6. EXPERIMENTAL STUDY
and C02. A greater deviation in only one of the three compared formants is realised by four speakers
in group A and foive in group B. There are only two speakers (A09 and A10), who realise [I] with a
highly significant deviance in one formant. Such an indicator of non-native-likeness for [i:] is found
in four speakers: A01, A05, A08, A09 and B05. An overlap in all three formants is observed only
for speakers A02, A04, B03, B07 and B08.
The ü-vowels are realised with higher F1 for [Y] by all speakers, except A10, and higher F2 for
[y:] by all but three speakers. A01 realises [y:] and [Y] with mean F2 values of 1710 Hz and 1799 Hz
respectively. B08 realises these two vowels with mean F2 values of 2003 Hz and 2043 Hz respectively.
Speaker A10 realises the ü-vowels non-native-like in both F1 and F2 .
A strong indication of non-native-likeness for the individual vowels is found in the productions of
[y:] by speakers A04 and A08, and in the productions of [Y] by speaker A10. Native-like values for
all three formants are realised by speakers A09, B01, B04 and C02 for [y:] and by speakers A04,
A07, A09, C01 and C02 for [Y].
The e-vowels [e:] and [E] are realised by all speakers with a higher F1 for [E] and a higher F2 for
[e:]. The F2 difference is quite small for A10, but this might be due to measurement errors, as two
unusual outliers in the vowel diagram imply (figure A.4(a) on page 97). Examined individually, [e:]
is realised with native-like values for all three formants by speakers A03, A09, B02, B03, B06 and
C02. Accordingly, [E] is realised native-like by speakers A01, A05, A06, A08, A09, A10, B03, B04,
B05, B06, B07, B08, C01 and C02, and [E:] is realised native-like by speakers A06, A10, B06, B06,
B07, B08, C01 and C02. Strikingly, out of the three e-vowels, [E] is realised most native-like with
only little deviations from the formants’ reference mean values even by the non-native speakers.
The ö-vowels have a native-like relation of F1 in the samples of all speakers except A05 who
realises [ø:]∼[œ] with mean F1 values of 369 Hz and 365 Hz respectively. The difference is also quite
small for speakers A02 (414 Hz vs. 431 Hz) and A10 (550 Hz vs. 556 Hz) . The differences in F2
are more noticeable. Four speakers in group A, and three speakers in group B have a lower F2
for [ø:]. Definitely native-like are seven instances of [ø:] (speakers A03, A04, A09, B01, B04, B08
and C02) and five instances of [œ] (speakers B01, B06, B08, C01 and C02). Strong evidence for
non-native-likeness is found in five instances of [œ] (speakers A02, A04, A05, A06 and A07).
Notice, that the values of the reference point for [ø:] might be distorted due to unusual values (or
measurement errors) in the recordings of C01, which were nevertheless included in the computation
of the reference values.
The a-vowels [a:] and [a] are realised native-like in all three formants by six speakers in the case
of [a:], and by nine speakers in the case of [a]. Only speakers A01, A07, A09, B01 and B05 realise
[a:] with a highly significant deviation from the reference values and for [a] only speaker A01 had
highly significant deviation in the formants’ mean values.
The difference F1 ([a :]) − F1 ([a]) has a mean value of 17.69 Hz for group A, with a minimum of
-41 Hz, a maximum of 149 Hz and a standard deviation of 52.48 Hz. This means, that six out of
ten speakers in group A realise [a] with a lower F1 . For group B the mean value is 50.54 Hz (min.
-46 Hz, max. 270 Hz and standard deviation 91.44 Hz). Seven out of eight speakers realise [a] with a
lower F1 . The speakers C01 and C02 both realise [a] with a lower F1 (with differences of 39 Hz and
43 Hz respectively). Thus, a higher F1 for [a:] can be considered a native-like realisation. Hence, the
6.6. ACOUSTIC ANALYSIS: RESULTS
71
difference between [a:] and [a] is not realised native-like only by speakers five speakers: A01, A02,
A09, A10 and B06. In addition, the difference seems quite irrelevant for speakers A06 (6 Hz) and
A08 (2 Hz).
The difference F2 ([a :]) − F2 ([a]) has a mean value of -42.39 Hz for group A (min. -215 Hz, max.
138 Hz, s.d. 101.95 Hz) and a mean value of -193.3 Hz for group B (min. -369 Hz, max. -97 Hz,
s.d. 92.37 Hz). Speakers C01 and C02 realise [a:] with a lower F2 (with differences of -108 Hz and
-116 Hz). Only four speakers realise [a:] with a higher F2 : A03, A05, A07 and A10.
The o-vowels [o:] and [O] are realised with higher F1 and F2 for [O] by all speakers. Looking
at the native-likeness of the individual formant values reveals however, that four speakers realise
[O] (A01, A07, B01 and B04) and at least two speakers realise [o:] (A01 and A02) definitely not
according to the acoustic norm.
The u-vowels [u:] and [U] are realised with higher F1 for [U] by all speakers (excluding A01 and
B04), and only one speaker (A07) realises [U] with a lower F2 . There are however, some instances
with quite small differences between the two vowels. Speakers A05, A07, A08 and A10 realise
the difference F1 ([u :]) − F1 ([U ]) with values of -10 Hz, -12 Hz, -7 Hz and -12 Hz respectively. The
difference F2 ([u :]) − F2 ([U ]) is realised with a difference of only -11 and -13 by speaker A05 and
A10.
The comparison of the individual vowels with their reference points is less conclusive. All speakers
(excluding A01 and B04) realise [u:] and [U] with formant values rather close to the native speaker
means. Only speaker A10 has a highly significant deviance in F3 of [u:].
Group B as control
Applying the suggestions discussed in section 5.5, that second language learners cannot be compared
to monolingual speakers of the respective target language, group B served as the control group
in a second comparison of the individual vowel values. A summary is shown in table A.28 on
page 103 where an improved value (in comparison to the values in table A.27) is marked in blue, and
deteriorations in red. The differences are only marked, if the new p-value lies at another significance
level (according to the marks as indicated above, with the levels 0.1, 0.05, 0.01 and 0.001)
In general, there were only slightly more p-values which improved, i. e. were higher, in comparison
to the tests with the reference values. Overall 453 p-values were higher, and 447 lower. However,
regarding only those changes, which resulted in a higher or a lower level of significance as indicated
in table A.28, the majority of differences is marked red in all three groups, and thus implicates in
general more differences between the individual values and the reference (in this case group B).
These results have to be treated with caution. Due to the different sizes of the three groups and
the overall small sample sizes, no definite conclusions or generalisations can be drawn. In group B
are 8 speakers as opposed to two speakers in group C plus the values taken from literature (which
were rather incomplete, especially with respect to F3 ).
The validity of such a comparison of non-native speakers with bilingual speakers of the L2 cannot
be determined anyways by an acoustic analysis. Since the whole idea of using bilinguals as a control
group is based on the observation, that a speaker’s L1 is affected by any of his or hers L2, a
comparison between bilingual and monolingual native speakers will obviously result in differring
values. Thus, the native-likeness of group B cannot be determined by any acoustic measure, but
has to be tested with a perception experiment with native speakers-listeners.
72
6.6.3
CHAPTER 6. EXPERIMENTAL STUDY
Tenseness
Unfortunately the voice quality measurements yielded no results which would indicate a relation
between vowel class and one of the examined voice quality parameters. The results from Claßen
et al. (1998) for German speakers could not be reproduced in general for both speaker C01 and C02.
Table 6.4 shows the p-values of t-tests performed for each parameter on an overall comparison of
long and short vowels for each speaker and the respective groups. The p-values suggest that there
are no highly significant differences for the parameters with respect to vowel class in general.
Figure 6.5: Screenshot of record with disturbing signal. The section shown in this figure is the [a:k^t] part
of gesagt ([g@za:kt] – “said”) with adjacent “silence” after the cursor position. As the spectrogram shows,
the distortion ranges from 0 up to about 500 Hz. This part of the signal covers (at least) the first harmonic,
which is used for calculating voice quality parameters.
Although every vowel was labelled four times for the voice quality measurements, there were a
lot of obvious measurement errors, which led to a further reduction of the already small amount
of available data – this was especially important for group C with only two speakers as opposed
to ten in group A. A lack of data is marked in tables A.28(a) to A.37(b) (on pages 104 to 113)
by “ na”. Of the remaining data, a part of the measurements might be distorted as well due to a
disturbing signal in some of the recordings, which remained undetected during recording procedure.
A repetition of the respective recordings was not possible, so the analysis was carried out with
reservations. Unfortunately, there are several such cases where the respective analysis could not be
performed. The results are nevertheless presented in this section, and included in the summary as
well (table A.39).
The two speakers C01 and C02 as a group show no general effect of vowel class on any of the
six measured voice quality parameters. The voice quality measurements for the two speakers are
summarised in tables A.37(a) and A.37(b) on page 113. For each vowel pair and each parameter,
the corresponding mean values with standard deviations (in parentheses) are given. Additionally,
the p-values of a t-test with the two vowels for the respective parameters are given (if the p-value
is above 0.1 only dash is drawn – see tables A.28(a) to A.37(b) for all speakers).
The parameter SK was found to have significantly higher values for [e:] than for [E] in stressed
syllables (Claßen et al., 1998, table 4 (a)). The same effect was observed for the i- and u-vowels.
With respect to RC, the same effect was observed for e-, o- and u-vowels.
Figure 6.6 shows box-whisker plots for the two parameters RCG and SKG. The ticks on the left and
on the right of the plots indicate the distributions of long and short vowels respectively. However, if
compared for each speaker and vowel-pair separately, some parameters show a significant differences
for the long-class and short-class vowels.
6.6. ACOUSTIC ANALYSIS: RESULTS
ID
A01
A02
A03
A04
A05
A06
A07
A08
A09
A10
gr. A
B01
B02
B03
B04
B05
B06
B07
B08
gr. B
C01
C02
gr. C
OQG
0.7928
0.6104
0.6229
0.6731
0.7096
0.281
0.6849
0.5561
0.4834
0.2429
0.0899
0.3955
0.4539
0.9994
0.8692
0.8853
0.9977
0.5195
0.593
0.3828
0.9798
0.7692
0.9717
GOG
0.7183
0.9072
0.0089
0.6145
0.4819
0.7725
0.9785
0.6058
0.9813
0.5705
0.3822
0.4605
0.5701
0.2667
0.0915
0.4719
0.0768
0.0308
0.6025
0.0471
0.3281
0.1955
0.8053
73
SKG
0.1494
0.522
0.0439
0.7052
0.6357
0.5747
0.902
0.9373
0.2585
0.3126
0.7399
0.933
0.6227
0.5582
0.9688
0.2479
0.3151
0.1389
0.6462
0.1761
0.2784
0.1428
0.1568
RCG
0.9579
0.1638
0.4381
0.7862
0.3068
0.1988
0.4035
0.8972
0.1049
0.3653
0.3798
0.2051
0.3409
0.4294
0.4823
0.7766
0.6371
0.1882
0.8672
0.3281
0.402
0.0588
0.1476
T4G
0.962
0.4428
0.3009
0.8103
0.2419
0.8458
0.692
0.849
0.1071
0.8329
0.9092
0.1915
0.5246
0.4613
0.9424
0.091
0.6248
0.253
0.4208
0.0884
0.2732
0.2264
0.1626
IC
0.5732
0.9517
0.0147
0.6715
0.7212
0.3133
0.9821
0.9209
0.7605
0.6659
0.1801
0.2138
0.2632
0.285
0.286
0.769
0.7918
0.4739
0.2707
0.1724
0.0052
0.5532
0.0068
Table 6.4: P-Values of t-tests of long and short vowels
As a method to detect a contrast in the various voice quality parameters with respect to vowel class,
pair wise t-tests were performed on the measured values for each speaker. This was also employed
by Claßen et al..
For the pair [e:]∼[E], a highly significant difference in SKG can be observed with speakers A01,
A09, B02, B06 and C01 (determined by t-tests with a p-value at the 0.001 level). Less significant
values are measured for B01 (p < 0.01), and A04, A05, A06, A08 and A10 (p < 0.05). There is also
a difference between [e:] and [E:] in the samples of speakers B06 (p < 0.001), A08, B01 and C02
(p < 0.01), and A05, A09 and C01 (p < 0.05). Interestingly, a highly significant difference in SKG
between [E:] and [E] is realised speakers A03, A09, B02 and C02, at the 0.01 as well by speaker
A01, and at the 0.05 level additionally by speakers A06, A10 and B07. For the pair [i:]∼[I], a highly
significant difference in SKG is realised only by speaker A08. Additionally, the t-test for speakers
B03 and C02 yielded a p-value at the 0.05 level. And finally, the pair [u:]∼[U] is not realised with a
highly significant difference in SKG by any of the examined speakers. The only significant difference
at the 0.05 level can be observed in the samples of speakers A08, A09 and B01.
The pair [e:]∼[E] is realised with highly significant differences in RCG by speakers A09, B01, B07
and C01, at the 0.01 level by speakers A06, B04 and B08 and at the 0.05 level by speakers A02, A07,
A08, B02 and B05. The pair [e:] and [E:] is significantly distinguished by speakers C02 (p < 0.001),
A05, A08, C01 (p < 0.01), and B01 and B07 (p < 0.05). [E:] and [E] have distinct mean values in
the samples of A09 and C02 (p < 0.001), and A05, B02, B07 and B08 (p < 0.01). A significant
difference in RCG for the pair [o:]∼[O] can be seen in the results of speakers A03, A08, B01, B05,
74
CHAPTER 6. EXPERIMENTAL STUDY
RCG ~ Speaker group : Vowel class
SKG ~ Speaker group : Vowel class
−1
0
0
1
1
2
●
−1
●
●
●
−2
●
●
−3
−2
●
●
●
−4
−3
●
●
gr. A (l)
gr. A (s)
gr. B (l)
gr. B (s)
(a)
gr. C (l)
gr. C (s)
●
gr. A (l)
gr. A (s)
gr. B (l)
gr. B (s)
gr. C (l)
gr. C (s)
(b)
Figure 6.6: Voice quality parameters RCG and SKG by speaker group and vowel class
B06, B07 (p < 0.001), A06, B08, C02, A01 (p < 0.01), and B03, B04 and C01 (p < 0.05).
Looking at the remaining vowel pairs it can be observed that the parameters SKG and RCG do
not show consistently significant p-values. There are for example for speaker C01 no significant
differences between RCG values for the pairs [ø:]∼[œ], [a:]∼[a] and [i:]∼[I], and for speaker C02
there are no significant differences for [ø:]∼[œ], [a:]∼[a] and [y:]∼[Y].
Taking into account these observations for SKG and RCG, as wells as the remaining voice quality
parameters, it seems obvious that with the present data no reasonable statistical analysis can be
performed. The distributional patterns of individual significant findings over the various speakers,
vowel pairs and parameters do not show obvious regularities. Especially for the groups B and C,
there are no general relations between vowel class (i. e. tenseness) and the respective parameters.
In order to compare non-native speakers to the native-speaker norm, this norm has first to be
determined precisely. A tendency for a higher mean number of highly significant values for speakers
in groups B and C can be observed. The mean number of p-values at the 0.001 level over all vowel
pairs and parameters for the speakers in group A is 10.5. The means for group B and C are 15.63
and 18 respectively. These values are, however, not significant and due to the unbalanced size of
the groups (10 versus 8 versus 2 speakers) not conclusive.
6.6.4
Effects of L1
The factors affecting degree of foreign accent, discussed in chapter 2 cannot be examined in this
study, because of the missing “score” for each speaker. As discussed in chapter 5, degree of foreign
accent can be determined only by the perception of native speakers and their judgments about the
native-likeness of the non-native speakers’ pronunciation. The total number of speakers is already
low, so the number of each individual L1 represented in this study is definitely too low for a
statistical analysis to be reasonable. Nevertheless, a brief theoretical overview of the speakers’ L1
is given in this section, and speculations about possible effects of the speakers’ L1 on their German
pronunciation are made without an extensive statistical analysis.
6.6. ACOUSTIC ANALYSIS: RESULTS
75
An in-depth analysis of the phonetic and phonological systems of the respective languages’ is required – including regional or other variation within the language – in any examination of possible
effects of the speakers’ L1 on their L2. This could be done in order to determine sources of interferences for example. In the following sections, some languages are discussed based on phonological
descriptions or brief remarks found in the literature, without the required phonetic analyses. Such
in-depth analyses – though necessary for any thorough examination – would go far beyond the scope
of this thesis.
An analysis of a survey among teachers of German as a foreign language reported by Ortmann
(1976) is used as one source to illustrate which vowels might are considered to be “difficult” and are
reported to be mispronounced by non-native speakers of German (at least at the beginning of the
learning process).
Polish
Polish was specified by three speakers as their single L1 (A01, A02 and A05)17 . The Polish vowel
system comprises six monophthongal vowel phonemes.
A phonetic comparison of Polish18 and German vowel formants (the reference values prepared for
this thesis) shows that five of the six Polish vowels have close F1 and F2 values to German vowels.
Polish [i] is close to German [i:], Polish [a] lies between German [a] and [a:], Polish [e] between
German [E:] and [E], and Polish [o] is slightly higher (i. e. has a slightly lower F1 ) than German [O].
Polish [u] is somewhat lower than German [u:]. According to the predictions of the speech learning
model or the native language magnet model, these vowels might be identified with the corresponding
German vowels. The “difficult” German vowels for native Polish speakers are according to Ortmann
all long-class vowels. From the short-class vowels, only [Y] and [œ] are reported by the majority of
the informants to be difficult for native Polish speakers.
Speakers A01 and A05 show in comparison to the normalised reference points a more retracted
position of [i:], which might be affected by the corresponding position of Polish [i]. The realisation
of German [o:] and [O] appears to be affected by interference from Polish [o] in the speech of all
three Polish subjects. All three realise neither a significant contrast in the two vowels’ formants nor
a native-like vowel quality. Only speaker A02 shows a distinction between the two German o-vowels
by vowel quantity (and probably in F1 ). The “central” vowel pairs corresponding to <ü> and <ö>
are not distinguished significantly by any of the Polish subjects. However, speakers A01 and A02
realise [ø:] and [œ] with more native-like formant values than A05. Figure A.2(c) (page 95) shows
an empty central area within the vowel space that resembles the Polish distribution of vowels.
Polish has no phonological vowel quantity. Speakers A01 and A02 both realise significant differences
in vowel duration for most vowels. Speaker A05 has a significant contrast only in the pair [E:] ∼[E]
(the only pair that speaker distinguishes in quantity and quality), and to a lesser degree as well in
[u:]∼[U] and [y:]∼[y].
17
18
The subjects were not asked explicitly to specify their L1, but were asked about the languages they have learned
from their parents and other language experience together with the respective ages of learning (compare section 6.4.1).
Polish vowel formants are taken from Majewski and Hollien (1967).
76
CHAPTER 6. EXPERIMENTAL STUDY
Croatian
Three speakers specified Croatian as their L1 (A04, B03 and B04), and two speakers specified it as
one of their two L1 besides German (B01 and B06).
There are only five monophthongal vowels – [i] [e], [a], [o], and [u] – in the Croatian standard
language, and no vowel opposition like in German. Vowel quantity is morphologically-conditioned
and vowels with differing quantities are not considered different phonemes (like in German). The
short vowels are also described as having the same quality (in terms of their formants) as their long
equivalents.
A phonetic comparison of the vowels’ formant values shows, that the Croatian vowels are all close to
German vowels, except for Croatian [o], which lies between German [O] and [o:], though somewhat
closer to the latter19 . Croatian [e] is much closer to German [E] and [E:] than to German [e:]. The
central area of the F1 /F2 vowel space is not occupied by any of the Croatian vowels. This implies,
that – according to SLM or NLM – the German vowels in this central area of the vowel space might
be pronounced more accurately, since there are no Croatian vowels in the vicinity that could be
identified with the German ones.
There are no rounded vowels in Croatian. According to the German language teachers cited by
Ortmann, the “difficult” German vowels for native Croatian speakers are [y:], [Y], [e:], [E:], [ø:] and
[œ]. This is an interesting overlap with those vowels which are predicted to be pronounced more
accurately due to their dissimilarity (namely, the ü- and ö-vowels). This seeming contradiction
can be explained by considering the different sources of these assertions. The SLM for example
is focused on ultimate attainment of second language learners, while language teachers obviously
observe learners at earlier stages of learning. Another difference is of course, that theoretical models
like the SLM or the NLM are concerned with perception and/or production of speech sounds, while
the focus of a teacher might be something completely different (like intelligibility for example).
Speaker A04 for example realises [ø:] and [œ] with a significant difference in duration but not in
vowel quality. With respect to [y:] and [Y] there is – besides a less significant difference in F3 – not
even a distinction in duration. The other Croatian speakers (all categorised as bilinguals), mostly
realise a distinction for both pairs in quality and in quality. German [o:] and [O] are both realised
almost native-like by speaker A04, but are not distinguished in duration. The remaining Croatian
speakers realise a significant distinction in duration. The vowels [e:] and [E] are distinguished as
well by speaker A04 and the other Croatian speakers. All bilingual Croatian speakers have mean
ratios of long versus short vowels minimally above the mean ratio of 1.71 of group C, but speaker
A04 in contrast has a mean ratio of only 1.34. Although not a generalisable conclusion with only
one “monolingual” Croatian speaker, it can be said, that there are no obvious effects of interference
from Croatian on the German vowels in the examined speech material.
Bulgarian
Only a few remarks about Bulgarian, as well as the remaining languages, will be discussed here. No
detailed acoustic, phonetic or phonological literature on these various languages could be reviewed
for this thesis.
Two speakers (A06 and A08) specified Bulgarian as their single L1.
Bulgarian has two front vowels, one high [i] and one open mid [E], one central open vowel [a], and
19
Formant values for Croatian are taken from Bakran (1996).
6.6. ACOUSTIC ANALYSIS: RESULTS
77
two back vowels [u] and [O] (IPA, 1999, p. 55). The latter two can be neutralised to [o] in unstressed
positions.
According to the predictions of the SLM or the NLM, these are the vowels which might be identified
with their German counterparts. This would be an expected error, if the Bulgarian vowels are
acoustically (perceptually) close to the German vowels. According to Ortmann20 , all German vowels
are “difficult” for native Bulgarian speakers.
On first sight, it seems likely that some speakers might have difficulties in producing a stressed
[o:]. This could explain the observed formant values for speaker A06. The speaker does not clearly
distinguish [o:] (mean F1 : 441 Hz, sd: 21; F2 : 739 Hz, sd: 86) neither from [U] (mean F1 : 419 Hz, sd:
41; F2 : 1043 Hz, sd: 446) nor from [u:] (mean F1 : 369 Hz, sd: 41; F2 : 892 Hz, sd: 155). The three vowels
are realised alike, but notably, [o:] and [O] are clearly distinguished both in quality and quantity. In
comparison, the other Bulgarian Speaker (A08) mutually distinguishes the two o-vowels. Both pairs
of rounded front vowels are not distinguished by speakers A06 and A08, neither in quality nor in
quantity. While speaker A06 seems to distinguish at least the ü-vowels from the ö-vowels, speaker
A08 realises the four vowels with very close F1 and F2 values (compare figures A.3(d) and A.3(d)
on pages 96 and 96).
Bulgarian has also two additional central vowels [5] and [7]. The latter has no direct counterpart in
German, but might interfere with German [@], and [5] might be identified with German [5]. Both
these vowels are unlikely to interfere with the examined German vowels, as both schwa and [5] are
not included in the examined data.
Hungarian
Two speakers (A09 and B02) specified Hungarian as their L1.
Hungarian is described as having seven basic vowel qualities, which appear as both short and long
vowels. Except for the pairs [a:]∼[A] and [e:]∼[E], the short vowels are only a little lower and
more centralised than their long counterparts (IPA, 1999, p. 104). There are additional vowels, but
these are described as being of only marginal/regional relevance. The “difficult” German vowels for
native Hungarian speakers are according to Ortmann [E:], and [I] or [Y] which supposedly have no
Hungarian equivalents.
Due to the large number of vowels and a similar opposition of long and short vowels, interference
effects can be expected for all examined German vowels. Provided the primary distinction in the
Hungarian vowels is quantity (a fact that was not further examined), speakers might rely more on
this feature and pay less attention to vowel quality with the German vowel pairs. This describes
exactly the results for speaker A09. On one hand, all vowel pairs have highly significant differences
in duration for the two respective vowels, the formants on the other hand, show no significant
distinction for the pair [u:]∼[U], and only to a small degree for the pair [y:]∼[Y]. The two i-vowels
are much closer than their two native reference values.
Turkish
Two speakers (A03 and B08) specified Turkish as their single L1.
Turkish has eight vowels [i], [y], [e], [œ], [a], [o], [u], and [W] (IPA, 1999, p. 154). As far as phonology
is concerned, only the last one has no counterpart in German. The other vowels might be identified
20
The data for Bulgarian is based on the answers of only one informant, however.
78
CHAPTER 6. EXPERIMENTAL STUDY
with the respective German counterparts and therefore the German vowels might be affected by
the speakers’ L1 system. Only [i:], [e:], [a:] and [u:] are reported as long vowels. However, according
to the remarks given by Ortmann, there are no long vowels in Turkish – except for loanwords. The
precise phonetic nature and phonological function of vowel quantity in Turkish should be considered
in an experiment with Turkish speakers. The “difficult” German vowels for native Turkish speakers
are according to Ortmann [e:], [E:]. These two are nevertheless realised native-like by speakers A03
and B08. Although the statistical comparison resulted in non-native-like formant values for [E:] of
speaker A03, a closer look at the data reveals, that the speaker realises [E:] very close to [e:] (see e. g.
table A.8 on page 87 or figure A.1(e) on page 94). According to the considerations in section 6.1
this can be seen as native-like pronunciation (since the respective values for [e:] are native-like
in all three formants). The opposition in vowel quantity is realised consistently by both speakers
A03 and B08. With respect to vowel quality, speaker B08 realises a distinction in quality in all
vowel pairs but [a:]∼[a] and [E:]∼[E] – both cases of lacking distinctions in vowel quality can be
considered native-like. Speaker A03 on the other hand shows only for the pairs [u:]∼[U] and [y:]∼[Y]
no clear distinction. The former case of the u-vowels might be attributed to measurement errors.
As mentioned earlier, there were generally problems with the measurements of these vowels. A look
at the vowel diagram indicates, that the speaker might nevertheless realise a distinction between
[u:] and [U] in the domain of vowel formants F1 and F2 . The very close mean values for [y:] and
[Y] can probably be explained by an interference effect of that speakers’ L1. The speaker relies on
vowel duration to distinguish the two vowels, but might identify them in quality with the Turkish
equivalent.
6.6.5
Age effects
The effects of age of learning (AOL) , length of residence in a German speaking environment (LOR)
or the respective age of first exposure to such a surrounding (AOA – age of arrival) can only be
hypothesised about.
The speaker were grouped according to their respective AOL: groups B and C have a maximum age
of learning of 3, and group A has a minimum AOL of 7. Looking at the mean values of the three
speaker groups can therefore reveal some tendencies possibly related to this variable.
As an example, both the absolute mean values of vowel duration in milliseconds and the long/short
vowel ratio are similar for groups B and C and stand in opposition to the values of group A. A
more detailed effect of AOL cannot be seen on the individual results of the measurements of the
realised oppositions in vowel quantity. There are in total more speakers in group A who do not
consistently realise a significant distinction in vowel quantity. For example A03 and A09, the two
speakers with the most extreme AOL values (22 versus 7) have almost identical results for the
statistical examination of vowel duration. In comparison, speakers A05 and A07 have the least
native-like patterns of vowel duration – the former with an AOL of 16 and the latter with and AOL
of 22.
The mean formant values for group B are similar to those of group C, but the values for group
A show a greater deviation of the mean values. The realisation of the vowel opposition by quality
shows a tendency for better results with lower AOL – but there are again speakers who contradict
this observation. Speaker A03 (with an AOL of 22) realises a consistent distinction in vowel quality
between long- and short-class vowels. Speakers A05 and A08 (both with an AOL of 16) produce
much more vowel pairs with equal formant values.
Such observations seem to confirm the view that there is such a thing as a critical period for language
(or just pronunciation) learning, and that therefore there should be no correlation between AOL and
6.7. DISCUSSION
79
foreign accent. However, the observation of “better” results up to an AOL of three and no obvious
further relation between AOL and the results above that age is neither based on a sufficiently large
group and a large amount of data, nor is it the result of native-speaker judgments about the degree
of foreign accent. Therefore it is on one hand an interesting observation, that a speaker with one of
the highest AOL can have such “good” results in comparison to speakers with AOL of 16, but on
the other hand these results are not generalisable and need further examination.
6.7
Discussion
This chapter provided a presentation of an experimental study on the realisation of the German
vowel opposition by non-native speakers.
First, the German vowel system and especially, the German vowel opposition between the long- and
short-class vowels was discussed. Various differing descriptions of the vowel system can be found in
the literature. There is substantial disagreement among linguists about the status or existence of a
long [E:] in the German language. Another relevant aspect of German phonology and phonetics is
the question about the acoustic correlates to the vowel opposition. Diverging terminologies denoting
the two vowel classes as either long and short, tense and lax, or centralised and decentralised and
so on all relate do different theories about the (primary) acoustic correlate to the vowel opposition.
The German vowel opposition is generally described as having three phonetic/acoustic correlates:
duration, vowel quality as represented by the formant structure and vowel tenseness. The latter has
its acoustic correlate in voice quality parameters and was examined in addition to the traditional
approach of determining vowel quality primarily by formant measurements. Therefore, the two most
prominent acoustic parameters (vowel duration and vowel quality in its narrower sense, referring
to the vowels’ first three formant values), were chosen for analysis, and in addition, the acoustic
correlate to the vowel opposition – which is usually not examined in foreign accent research – was
added: voice quality (measured by acoustic spectral tilt parameters).
The reasons for examining voice quality parameters were twofold: first, voice quality seems to have
received only marginal attention in foreign accent research. Besides some remarks in the literature,
no closer examination of voice quality features in non-native speech could be reviewed for this
thesis. The second reason was that vowel “tenseness” – one of the most often referred to correlates
of the German vowel oppositions – is seen to have its primary acoustic correlate in voice quality
parameters. If a phonological distinction in a language has its correlate in certain voice quality
parameters, it is obviously of interest to foreign accent research to examine the realisation of these
parameters by non-native speakers.
Besides the review of the linguistic background of the examined phenomenon, the experiment comprised the recruitment of subjects, the recoding of speech material, acoustic measurements and a
statistical analysis of the results – as far as possible with the limited amount of data.
Twenty speakers were recruited for this experiment. Half of them are non-native speakers with
ages of learning German above early childhood. The other half comprised eight speakers who were
categorised as bilinguals – due to an early acquisition of German and another language – and two
speakers who were categorised as non-bilinguals, i. e. native speakers (in the traditional sense).
Unfortunately the results of the voice quality measurements cannot be reliably interpreted due to
technical problems with the recordings. The experiment should be repeated to determine if and how
the realisation of the vowel opposition by non-native speakers deviates from the native speakers’
norm in the domain of voice quality.
80
CHAPTER 6. EXPERIMENTAL STUDY
The results for the vowel duration and vowel quality measurements revealed that the speakers use
vowel duration and vowel quality in various ways to realise a phonetic contrast between the two
German vowel classes. Some speakers, e. g. A05, A07 or A10, seem to make no distinction whatsoever
between the two vowel classes (with respect to pronunciation at least, since perception was not
examined). Other speakers show relatively good results, like for example speakers A02 or A03 who
despite their late onset of learning German not only distinguish most vowel pairs in both quality
and quantity but also realise the individual vowels mainly native-like. This examination revealed
further that bilingual speakers can be compared to monolingually raised speakers – the traditionally
referred to native speakers. The results for group B did not show any major deviances from the two
speakers in group C or the “standard” values taken from literature. In order to reveal systematic
phonetic differences between speakers with one native language or L1 and those with more than
one L1, more detailed examinations are needed. In addition, an evaluation of the pronunciation of
such speakers by judgments of native speakers is needed.
Finally, the results were also discussed with regard to theoretical issues and commonly studied
experimental variables in foreign accent research. For this reason, possible effects of the speakers’
L1 or their ages of learning German were briefly discussed, although a formal (statistical) analysis
was not performed.
A summary of the results is given in table A.39 by means of a symbolic representation of the data
for easier readability. Whether these results are generalisable or not is questionable with respect
to the small group of examined speakers and the limited amount of data. But then, however, the
generalisability of every experimental and empirical study is questionable. In any case, a statistical
evaluation is not more than a summary of the analysed data. As Southwood and Flege (1999)
explicitly pointed out in their paper: “From this study, generalizations about the nature of the
accentedness continua of other non-native speakers of English or speakers with stronger foreign
accents are inappropriate”.
How far these measured deviances affect the degree of perceived foreign accent is an open question
which needs to be further examined. It should be further examined how foreign a non-native like
realisation of the vowel opposition is perceived and what acoustic cues affect this perception. This
however, is beyond the scope of this thesis.
Chapter 7
Summary and conclusions
This thesis examined foreign accent with a special focus to segmental phenomena. An overview
of current and earlier research on this broad topic was presented and an experimental study was
performed to examine foreign accent from a phonetic perspective.
First, it was pointed out, that the term accent is commonly used in various senses – two of which
are relevant to the topics discussed in this thesis. In this thesis accent refers to a characteristic way
of speaking and not to emphasis placed on a certain word or syllable (in which case the term stress
was used). Then, concepts like native and foreign language and bilingualism were discussed. It was
shown, that these concepts and the associated terms cannot be defined precisely without great
difficulty. The fact, that most of the human population does not speak just one language collides
severely with the traditional concept of the idealised monolingual native speaker. This is of great
importance in research on foreign accent, where the speech of a native speaker is usually regarded
as the ultimate goal of language learning and non-native speakers are usually judged against this
ideal norm.
In the second chapter, titled “factors affecting degree of foreign accent”, experimental variables like
the age of learning a language, the speakers gender, their linguistic background and experiences
were discussed.
Affective and psychological factors like the speakers’ motivation or language learning aptitude are
often discussed in the literature. There is no general agreement among researchers whether such
factors affect foreign accent or not. The same disagreement among researchers was found in the
literature regarding the speakers’ gender. Possibly a surprising finding is that formal language
instruction seems to have no effect on the degree of foreign accent.
The fact that a speakers’ L1 affects his or hers L2 is undisputed. The precise nature and degree
of this influence is however still largely unknown and a matter of disagreement among researchers.
One important point is that most conclusions about general L1 effects on a speakers’ L2 are based
on specific language pairs. Despite the large number of natural languages world wide, researchers
so far examined basically only a small number of language pairs.
The amount of exposure to the L2 and the age of learning were identified as strong factors on
the degree of foreign accent. The length of residence within an L2 speaking environment affects a
speakers’ foreign accent such that pronunciation in general becomes less accented with time. More
important are however, the age of arrival within such an environment or the age of learning an
L2. The every-day observation that young children of immigrants usually learn a language faster
81
82
CHAPTER 7. SUMMARY AND CONCLUSIONS
and, what’s more important, “better” then their parents has been confirmed by researchers. The
general rule is that the earlier an individual starts to learn a language, the higher his or her level
of proficiency can ultimately be. A great dispute can be found in literature about the reasons for
this effect of age on language acquisition (these were discussed in chapter four).
The review of factors affecting degree of foreign accent was completed with a brief discussion of
speaker-independent factors on the perceived degree of foreign accent. Although factors completely
independent of the respective speakers are not in the focus of researchers examining the speech of
non-native speakers, the existence of such factors has important consequences an experimenter has
to be aware of when designing and interpreting experiments on foreign accent. This also points out
the fact, that foreign accent is a phenomenon of both speech production and speech perception.
Chapter three focused on segmental characteristics and acoustic manifestations of foreign accent.
The two most often examined acoustic parameters are VOT of stop consonants or the formants of
vowels. Most studies examining segmental manifestations of foreign accent concentrated on either
both or one of these features. Other segmental features are treated only marginally. Often, segmental
deviances in foreign-accented speech are not acoustically measured, but determined auditorily by
phoneticians or linguistically naïve listeners. The literature linking certain acoustic deviances and
the perceived degree of foreign accent is rather scarce.
In chapter four, theories and models which are used to explain the foreign accent phenomenon
were reviewed. The concept of an innate universal grammar and the critical period hypothesis
were briefly discussed. Both theories are not concerned primarily with foreign accent, but have
implications for foreign accent research and are often referred to in the literature. The critical
period hypothesis of language acquisition is of particular interest as it identifies age as the most
important factor in language acquisition. A great part of foreign accent research is concerned with
age as a factor affecting degree of foreign accent. With respect to the various versions of the critical
period hypothesis researchers often compare speakers which can be grouped into speakers who
learned the examined L2 prior or after the supposed end of this critical period.
The concept of interference and linguistic transfer was discussed, as it addresses the relation between
a speaker’s L1 and L2. For a long time in linguistics it was assumed that transfer could explain all
effects observable in foreign-accented speech (or the language of non-natives in general). Today this
view is generally regarded as too simplistic, but the effects of linguistic transfer are still seen as an
adequate model explaining certain foreign accent phenomena.
Other approaches often referred to in the literature – direct realism, the speech learning model
or the native language magnet model – were reviewed as well. These models all incorporate the
concept of perceptual similarity of speech sounds. The basic idea, central to all these models is
that a speaker gets attuned to his or her L1 and with time concentrates only on the most salient
acoustic cues, characteristic to the respective L1. The perception of non-native speech sounds is
affected severely by this
Chapter five discussed methodological issues. The problems with selecting an appropriate group of
speakers or listeners for an experimental study are addressed as well as possible problems caused
by various elicitation techniques.
Chapter six described an experimental study examining the realisation of the phonological opposition between German long- and short-class vowels by non-native speakers in comparison to native
speakers. One important finding was, that speakers who learned German in early childhood (classified as bilinguals) could not be distinguished substantially from speakers who learned only German
until school-age. Another interesting finding was that speakers with a relatively high age of learning
are able to acquire the vowel opposition and produce it consistently.
83
Foreign accent received considerable attention in the last years. This thesis has summarised the
findings from various studies and pointed out that there are still several open questions which need
further examination.
Appendix A
Tables and figures
ID
Group
List
A01
A02
A03
A04
A05
A06
A07
A08
A09
A10
B01
B02
B03
B04
B05
B06
B07
B08
C01
C02
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
C
C
B
A
A
C
B
A
C
D
A
A
B
B
C
B
C
D
C
B
D
A
A: 10
B: 8
C: 2
A: 6
B: 6
C: 5
D: 3
Age
29
29
32
24
24
26
28
26
52
28
24
29
30
27
22
28
23
28
27
29
min: 22.00
max: 52.00
mean: 28.25
Sex
AOL
AOA
m
m
m
f
m
f
f
f
f
f
f
f
m
m
f
m
m
f
f
m
15
12
22
19
16
18
22
16
7
21
0
0
3
3
0
0
0
3
0
0
22
18
22
18
19
18
24
19
0
21
18
0
0
0
10
0
0
0
0
0
f: 11
m: 9
min:
0.00
max: 22.00
mean: 8.85
min:
0.00
max: 24.00
mean: 10.45
Table A.1: Demographic speaker characteristics
84
LOR
89
1
120
72
54
96
48
84
624
84
72
348
360
324
144
336
276
336
315
348
min:
1.00
max: 624.00
mean: 206.60
L1
pl
pl
tr
hr
pl
bg
ka
bg
hu
ru+uk
de+hr
de+hu
hr
hr
de+ro
de+hr
de+it
tr
de
de
hr: 3
pl: 3
bg: 2
de: 2
de+hr: 2
tr: 2
other: 6
680
737
896
a
750
a
850
a
762
a
622
a
749
831
676
F1
F2
2450
2171
2533
2400
1930
1986
2148
2181
2382
2199
F2
1150
1275
1517
a
1150
a
1221
a
1172
a
1139
a
1176
1302
1577
a:
i:
680
694
836
800
768
706
593
618
795
676
F1
375
369
433
325
358
319
352
270
388
345
F1
a
I
1280
1372
1586
1400
1192
1237
1048
1224
1425
1572
F2
2200
1902
2095
2200
1940
1888
1868
1914
2028
2065
F2
517
572
500
391
358
316
343
-
F1
260
302
320
275
306
300
286
261
-
F1
@
F2
1447
1763
1200
1409
1550
1530
1400
-
F2
1550
1722
1810
2000
1689
1452
1615
1491
-
y:
540
482
534
514
-
F1
450
373
426
325
339
358
387
306
-
F1
F2
1009
1022
1081
1126
-
F2
1400
1543
1670
1800
1510
1530
1393
1478
-
5
Y
400
383
440
375
354
358
381
365
353
372
F1
400
348
434
375
365
306
330
308
359
370
F1
F2
680
841
889
850
589
599
677
592
707
1150
F2
2250
2126
2461
2100
2103
2044
2090
2109
2293
2004
o:
e:
550
537
605
500
502
531
521
501
536
440
F1
482
584
e
500
495
482
514
515
-
F1
F2
O
980
1074
1200
900
921
931
928
931
1100
1246
F2
1902
2166
e
1900
1992
1835
1892
1868
-
E:
250
310
345
275
293
228
310
290
271
336
F1
490
489
608
500
501
482
417
495
569
403
F1
F2
650
854
956
750
606
567
612
573
653
1009
F2
1990
1817
2040
1900
1702
1801
1697
1764
1892
1886
u:
E
400
391
442
325
383
361
358
375
377
348
F1
420
371
440
o
375
345
349
339
319
-
F1
F2
U
850
1010
1081
850
885
983
1042
863
989
1228
F2
1500
1501
1605
o
1700
1387
1426
1400
1361
-
ø:
550
474
564
500
514
436
490
462
-
F1
œ
F2
1500
1477
1654
1550
1400
1400
1452
1374
-
[S] = Sendlmeier and Seebode (2007), m: male speakers, f: female speakers; [W] = Wängler (1968, p. 23); [R] = Ramers (1988), male speakers B, M, H
and P, context 1: [b t@n]; [C]a = Claßen et al. (1998, p. 219), accented. . . ; [C]u = Claßen et al. (1998, p. 219), unaccented. . . ; a : [R] and [W] use the
symbol A:, e : [W] does not give a numerical value for E: but notes that it is distinguished from E only by length (“Hinsichtlich der Bildung wird auf
[E] verwiesen, von dem dieser Laut nur durch Dauer verschieden ist.”, p. 46); o : [W] does not use the length mark for ø.
Table A.4: Formant values of German monophthongs. Sources and notes: [K] = Kohler (1977, p. 54), approximations taken from figure 7;
[K]
[S]m
[S]f
[W]
[R]B1
[R]M1
[R]H1
[R]P1
[C]a
[C]u
F1
250
263
302
275
290
228
260
248
280
333
-front vowels
[K]
[S]m
[S]f
[W]
[R]B1
[R]M1
[R]H1
[R]P1
[C]a
[C]u
+front vowels
85
APPENDIX A. TABLES AND FIGURES
86
ID
A01
A02
A03
A04
A05
A06
A07
A08
A09
A10
gr. A
B01
B02
B03
B04
B05
B06
B07
B08
gr. B
C01
C02
gr. C
all
[ø:]∼[œ]
1.07
1.31
1.62
1.38
0.88
1.22
1.25
1.23
1.78
0.91
1.27 (0.28)
1.42
1.54
1.67
1.68
1.57
1.45
1.45
1.33
1.51 (0.12)
1.53
1.59
1.56 (0.04)
1.39 (0.25)
[a:]∼[a]
1.55
1.96
1.97
1.46
1.3
1.46
1.47
1.66
1.85
1.36
1.60 (0.25)
1.93
2.03
2.15
2.11
2.09
1.54
2.09
1.7
1.96 (0.22)
1.69
2.02
1.86 (0.23)
1.77 (0.28)
[e:]∼[E]
1.91
2.52
1.63
1.31
1.2
1.79
1.28
1.64
2.21
1.67
1.72 (0.42)
1.72
1.42
1.5
2
1.39
1.64
1.42
1.74
1.60 (0.21)
1.33
1.55
1.44 (0.16)
1.64 (0.33)
[E:]∼[E]
1.65
1.69
1.7
1.35
1.5
1.58
1.47
1.57
2.22
1.31
1.60 (0.25)
1.64
1.7
1.68
1.85
1.8
1.59
1.72
1.73
1.72 (0.09)
1.66
1.65
1.65 (0.01)
1.65 (0.19)
[i:]∼[I]
2.04
2.42
2.42
1.51
1.29
2.14
1.4
2.66
2.8
1.51
2.02 (0.56)
2.04
1.97
1.98
2.42
1.86
2.47
1.27
1.97
2.00 (0.37)
1.65
2.01
1.83 (0.26)
1.99 (0.45)
[o:]∼[O]
1.35
1.55
1.99
1.28
1.18
1.55
1.21
1.24
1.79
1.05
1.42 (0.3)
1.96
1.67
1.7
1.79
1.46
1.52
1.78
1.7
1.70 (0.16)
1.48
1.83
1.66 (0.25)
1.55 (0.27)
[u:]∼[U]
1.37
1.66
1.97
1.26
1.58
1.83
1.14
1.29
2.11
1.21
1.54 (0.34)
1.79
1.93
2.06
2.33
1.65
1.89
1.49
2.28
1.93 (0.29)
1.78
2.12
1.95 (0.24)
1.74 (0.36)
[y:]∼[Y]
1.22
1.67
2.48
1.21
1.37
1.34
0.96
1.36
1.99
1.25
1.48 (0.45)
1.69
1.76
1.82
2.05
2.09
2.03
1.36
1.9
1.84 (0.24)
1.59
1.85
1.72 (0.18)
1.65 (0.38)
mean (sd)
1.52 (0.33)
1.85 (0.42)
1.97 (0.33)
1.34 (0.10)
1.29 (0.22)
1.61 (0.30)
1.27 (0.18)
1.58 (0.47)
2.10 (0.34)
1.28 (0.24)
1.58 (0.41)
1.77 (0.20)
1.75 (0.21)
1.82 (0.22)
2.03 (0.26)
1.74 (0.27)
1.77 (0.35)
1.57 (0.27)
1.79 (0.27)
1.78 (0.27)
1.59 (0.14)
1.83 (0.22)
1.71 (0.21)
1.67 (0.36)
Table A.5: Vowel duration ratios of long-class vowels divided by short-class vowels.The values in parenthesis show the standard deviations of the
respective mean values.
87
F1
A01
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
340
455
823
798
552
350
458
316
228
483
442
376
296
287
236
max
459
529
883
861
643
436
558
406
272
636
574
528
461
437
309
F2
mean
406
486
853
833
601
396
512
358
253
576
507
440
380
368
275
sd
52
30
30
25
35
35
43
34
16
56
53
59
73
70
32
min
1462
1425
1238
1261
1676
2120
1916
1836
2200
773
668
706
797
1685
1526
max
1674
1906
1371
1340
1975
2234
2131
2223
2265
1011
909
2466
2627
1863
1883
F3
mean
1554
1599
1305
1304
1871
2169
2046
2001
2227
888
796
2079
1762
1799
1710
sd
83
174
58
35
116
50
72
132
25
84
86
769
755
83
156
min
2152
2424
2594
2592
2464
2535
2325
2431
3000
2784
2793
2867
2340
1940
2013
max
2420
2514
2768
2816
2826
2756
2712
3072
3477
3148
3038
3163
3207
2607
2357
mean
2268
2477
2698
2712
2612
2644
2535
2623
3293
2979
2957
3085
2992
2270
2166
sd
115
36
79
91
153
87
128
244
174
156
93
123
327
302
142
mean
2416
2433
2712
2634
2787
2856
2651
2791
3403
2980
3067
2932
2867
2393
2375
sd
121
86
300
220
334
135
191
374
336
202
85
342
418
99
274
mean
2290
2360
2481
2512
2494
2756
2618
2561
3277
2506
2493
2526
2423
2368
2251
sd
101
33
71
125
66
182
58
139
137
180
71
114
101
117
56
Table A.6: Formant values speaker A01 (in Hertz)
F1
A02
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
386
406
641
704
514
341
450
333
272
492
391
243
338
306
251
max
445
463
848
768
647
413
529
392
295
612
514
486
428
337
309
F2
mean
414
431
734
729
568
381
470
362
283
556
447
384
377
325
292
sd
20
23
68
26
48
28
29
25
9
49
47
81
37
13
22
min
1383
1284
1092
1215
1531
2090
1668
1736
2299
684
557
553
462
1547
1672
max
1573
1738
1541
1298
1758
2359
1968
2467
2469
982
866
1521
2386
1818
1956
F3
mean
1481
1458
1283
1255
1699
2244
1879
1954
2359
885
715
1021
945
1678
1799
sd
76
186
164
35
84
93
106
267
61
113
130
359
724
97
97
min
2277
2344
2286
2274
2287
2673
2448
2468
2872
2631
2967
2495
2491
2307
2116
max
2641
2545
3035
2872
3293
3009
2935
3534
3810
3238
3180
3403
3498
2549
2876
Table A.7: Formant values speaker A02 (in Hertz)
F1
A03
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
351
419
609
653
442
332
354
299
254
468
229
354
295
299
268
max
403
481
648
703
515
360
399
371
316
586
475
431
441
339
306
F2
mean
375
456
630
673
482
354
384
347
288
527
374
381
336
319
287
sd
20
24
15
18
24
11
16
25
22
38
82
32
53
17
16
min
1465
1448
1257
1328
1731
2054
1926
1686
2109
840
515
780
734
1611
1568
max
1595
1617
1535
1515
1847
2124
2084
2079
2255
1130
881
1966
892
1662
1719
F3
mean
1516
1541
1431
1431
1771
2085
2007
1860
2165
970
745
1192
811
1637
1642
sd
47
77
93
68
42
27
69
130
53
93
129
407
59
20
63
min
2114
2315
2382
2356
2385
2498
2558
2389
3127
2276
2393
2379
2371
2236
2147
max
2415
2406
2603
2697
2556
2944
2697
2772
3460
2795
2587
2652
2628
2522
2294
Table A.8: Formant values speaker A03 (in Hertz)
88
APPENDIX A. TABLES AND FIGURES
F1
A04
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
465
462
732
786
567
483
445
445
298
558
450
463
437
434
416
max
509
536
809
811
679
613
621
487
430
675
537
502
470
503
448
F2
mean
490
512
765
800
637
526
540
470
369
632
500
479
456
472
433
sd
20
29
31
12
41
51
68
15
46
41
31
15
15
29
13
min
1514
1550
1522
1471
2042
2234
1787
2241
2047
942
761
742
709
1619
1616
max
1834
1920
1906
1804
2232
2476
2485
2475
2744
1268
1098
1282
905
1928
2067
F3
mean
1668
1716
1696
1641
2131
2323
2223
2331
2564
1121
927
993
840
1760
1816
sd
110
133
148
136
69
86
233
99
260
125
134
183
76
127
176
min
2380
2288
1880
1718
2845
2989
2974
2849
2655
2225
2550
2532
2533
2605
2236
max
2770
2747
2925
2831
3067
3157
3135
3286
3411
2848
2704
2830
2975
2743
2750
mean
2581
2578
2469
2472
2968
3094
3069
3046
3217
2697
2632
2690
2707
2701
2497
sd
125
163
393
509
102
68
63
178
284
241
67
102
167
65
205
mean
2232
2292
2465
2466
2456
2534
2513
2436
2496
2592
2644
2527
2621
2167
2155
sd
95
242
87
117
83
103
86
194
185
193
171
154
325
98
39
mean
2757
2711
2561
2140
2945
3136
2840
3022
3280
2691
2723
2683
2523
2691
2697
sd
72
57
217
493
153
47
432
168
103
96
96
157
198
83
74
Table A.9: Formant values speaker A04 (in Hertz)
F1
A05
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
358
331
598
645
440
401
414
299
306
442
442
355
369
321
287
max
389
399
729
686
566
438
466
414
363
543
550
514
481
357
365
F2
mean
369
365
647
665
491
421
436
354
339
521
496
411
401
338
326
sd
13
27
48
18
55
16
22
42
23
39
45
65
41
17
28
min
1345
1232
1123
1222
1730
1914
1842
1525
1833
755
585
625
446
1247
1435
max
1793
1922
1612
1532
1941
2167
2023
2177
2105
1123
1005
1832
1755
1772
2033
F3
mean
1566
1518
1351
1357
1828
2013
1935
1939
2002
935
823
1008
997
1552
1684
sd
153
273
180
116
91
97
61
224
95
150
146
454
539
207
232
min
2092
2120
2399
2327
2335
2409
2409
2217
2331
2296
2414
2294
2083
2061
2088
max
2360
2768
2636
2600
2542
2668
2623
2722
2743
2824
2841
2741
3029
2313
2191
Table A.10: Formant values speaker A05 (in Hertz)
F1
A06
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
443
439
809
799
502
420
434
368
315
588
407
347
326
361
300
max
479
511
933
889
649
451
645
482
365
684
464
454
413
434
437
F2
mean
460
472
853
859
607
441
546
424
341
629
441
419
369
387
367
sd
16
30
48
35
56
13
93
43
20
33
21
41
41
29
46
min
1892
1941
1254
1174
1069
2536
1482
2216
2689
907
659
663
629
2045
1977
max
2184
2070
1773
1529
2292
2754
2766
2584
2785
1148
884
1713
1038
2253
2292
F3
mean
2040
2008
1483
1356
2008
2668
2188
2479
2738
994
739
1043
892
2132
2148
sd
109
45
194
164
467
87
580
143
35
105
86
446
155
81
126
min
2664
2608
2197
1517
2658
3075
2115
2766
3084
2554
2618
2357
2368
2592
2583
max
2846
2757
2832
2664
3096
3197
3209
3272
3379
2828
2880
2858
2840
2777
2789
Table A.11: Formant values speaker A06 (in Hertz)
89
F1
A07
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
391
422
744
951
485
426
417
312
304
569
413
393
375
320
305
max
461
479
958
1065
638
586
615
420
425
651
605
454
442
436
431
F2
mean
431
450
860
1009
557
491
539
382
382
606
508
418
406
388
384
sd
34
24
80
48
63
59
79
42
47
27
76
20
26
53
49
min
1776
1544
1019
1620
1943
2097
1965
762
2112
1021
745
620
747
1754
1872
max
1970
1997
1909
1705
2197
2243
2252
2746
2717
1494
1212
1301
1527
2148
2184
F3
mean
1858
1767
1514
1653
2115
2176
2108
2270
2462
1295
942
1123
1127
1914
2032
sd
81
194
297
33
110
64
120
744
235
187
165
258
265
159
108
min
2538
2052
1809
2095
2938
3057
3081
2655
3047
2166
2900
2696
2497
2513
2204
max
2817
2751
3015
2952
3329
3227
3187
3509
3510
3173
3327
3070
3150
2928
2760
mean
2712
2466
2359
2603
3124
3171
3125
3079
3344
2881
3075
2818
2850
2662
2610
sd
119
242
509
324
132
62
39
365
171
374
152
137
241
168
210
mean
2292
2288
2540
2413
2611
2743
2682
2793
3249
2379
2558
2490
2482
2314
2276
sd
68
124
178
270
198
155
145
172
117
167
96
139
170
112
85
mean
2613
2679
2651
2662
2820
3059
2913
2730
3629
2900
2877
2820
2810
2667
2648
sd
95
84
76
153
106
252
115
78
103
321
100
108
263
106
81
Table A.12: Formant values speaker A07 (in Hertz)
F1
A08
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
365
378
641
678
429
372
381
352
355
488
436
370
351
382
352
max
422
507
752
761
532
391
464
419
374
571
572
408
404
405
395
F2
mean
395
431
705
707
498
382
411
373
367
537
486
392
385
394
381
sd
22
54
39
33
41
7
32
24
7
36
47
16
18
9
17
min
1592
1352
1244
1258
1803
1991
1892
1939
2190
936
647
794
736
1636
1429
max
1703
1712
1731
1474
1924
2203
2107
2239
2513
1209
995
1480
1151
1769
1814
F3
mean
1640
1602
1517
1357
1881
2080
2004
2099
2335
1078
809
1096
921
1705
1712
sd
49
129
168
104
47
95
84
116
137
103
129
220
155
58
141
min
2224
2166
2343
2170
2354
2461
2448
2552
3111
2171
2431
2318
2290
2211
2201
max
2412
2498
2844
2776
2869
2887
2818
3021
3427
2648
2707
2640
2760
2504
2398
Table A.13: Formant values speaker A08 (in Hertz)
F1
A09
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
382
446
630
651
418
366
378
364
353
475
386
369
356
372
299
max
427
536
825
770
634
452
429
420
393
550
414
449
412
424
379
F2
mean
407
489
726
717
541
397
392
396
373
506
395
414
386
402
344
sd
15
36
78
53
76
29
19
19
15
30
10
29
20
22
38
min
1514
1451
1256
1134
1754
2450
2337
1952
2481
904
746
596
513
1600
1714
max
1753
1837
1561
1305
2124
2649
2539
2547
2686
1216
861
1263
912
1863
1965
F3
mean
1668
1618
1427
1212
1903
2528
2442
2306
2609
1042
804
947
757
1768
1849
sd
95
128
118
76
138
76
80
202
87
119
51
225
172
100
112
min
2516
2578
2552
2416
2690
2596
2781
2632
3498
2530
2744
2673
2494
2550
2492
max
2792
2799
2780
2828
2971
3268
3111
2856
3751
3462
3037
3009
3135
2794
2707
Table A.14: Formant values speaker A09 (in Hertz)
90
APPENDIX A. TABLES AND FIGURES
F1
A10
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
505
528
703
696
508
482
535
357
334
547
497
342
376
359
319
max
592
633
1047
906
653
565
756
471
473
656
593
485
488
407
509
F2
mean
550
556
845
803
602
521
606
423
401
601
537
431
419
377
398
sd
33
39
124
86
53
28
82
41
61
45
40
55
43
21
64
min
1500
1565
1031
1434
1886
1573
1343
2277
2547
927
792
740
646
1453
1362
max
1837
2097
1765
1796
2226
2621
2306
2773
2809
1342
1534
1382
1440
1801
1971
F3
mean
1732
1778
1538
1557
2043
2048
1982
2525
2697
1213
1106
1066
1052
1701
1689
sd
125
179
269
150
116
407
355
188
92
151
246
212
304
144
223
min
2450
2474
1810
1898
2255
2282
2142
3002
2965
2298
2520
2482
2899
2697
2273
max
2784
2755
2912
2546
3057
3378
3325
3341
3615
3230
2863
3383
3206
2892
3027
mean
2645
2617
2351
2226
2675
2722
2835
3191
3276
2581
2686
2895
2998
2783
2766
sd
125
114
379
274
271
418
403
122
275
373
143
313
111
83
262
Table A.15: Formant values speaker A10 (in Hertz)
F1
B01
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
399
436
741
1059
548
370
369
393
226
484
351
307
158
337
242
max
478
714
959
1187
618
433
467
461
382
627
456
490
432
450
418
F2
mean
435
538
839
1109
574
413
417
435
294
560
413
409
353
399
303
sd
27
108
75
55
30
24
33
24
58
49
42
60
108
43
70
min
1544
1766
1495
1553
2056
2070
1102
2069
2096
1056
418
927
367
1872
1749
max
1986
2264
2301
1712
2193
2814
2619
2617
3140
1448
1415
1358
1753
2095
2585
F3
mean
1859
1963
1825
1659
2100
2592
2201
2297
2784
1282
813
1214
980
1980
2100
sd
165
180
268
64
52
269
553
193
370
167
352
148
492
84
352
min
2460
2892
2493
2401
2781
3253
2382
2929
3478
2625
2979
2221
2233
2802
2708
max
3204
3086
3092
2647
3177
3569
3310
3428
3943
2976
3444
3143
3086
3481
3839
mean
2841
2976
2720
2544
3004
3412
3034
3188
3635
2809
3231
2722
2812
3061
3061
sd
269
74
215
93
130
121
341
216
175
165
170
344
342
308
481
Table A.16: Formant values speaker B01 (in Hertz)
F1
B02
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
414
518
652
706
608
436
629
451
313
565
416
405
294
438
311
max
460
657
764
827
675
455
672
474
404
701
450
452
471
474
356
F2
mean
443
587
719
774
646
442
656
461
348
638
431
423
352
454
328
sd
18
53
41
48
30
7
17
9
38
46
13
20
64
17
17
min
1555
1636
1562
1038
1534
2322
1270
1909
2445
1084
647
709
473
1728
1821
max
1730
1887
1825
1569
2105
2600
2222
2404
2589
1549
1148
1378
1883
1877
2155
F3
mean
1663
1742
1686
1388
1959
2454
2003
2091
2535
1256
770
1123
965
1818
1970
sd
78
86
104
204
220
103
363
170
63
186
190
286
572
71
149
min
2208
2718
2661
1510
1967
3022
2023
2728
3405
2552
2223
2309
2281
2145
2117
max
2320
3072
3175
2915
3112
3324
3212
3249
3731
3299
2947
2862
2805
2674
2653
Table A.17: Formant values speaker B02 (in Hertz)
mean
2272
2856
2911
2355
2865
3192
2977
3061
3570
2985
2609
2592
2522
2454
2431
sd
52
132
216
627
445
104
469
189
121
250
281
229
231
227
174
91
F1
B03
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
352
409
532
604
457
346
366
350
257
483
353
353
281
325
266
max
378
483
646
707
507
379
402
389
310
553
390
403
456
375
299
F2
mean
368
453
585
654
478
361
382
369
281
510
366
381
345
344
280
sd
10
25
44
45
18
13
12
16
22
24
13
19
59
23
11
min
1483
1353
1312
1273
1606
1985
1854
1659
2137
848
651
752
615
1569
1646
max
1629
1546
1588
1378
1760
2151
2083
2051
2265
1196
810
1347
845
1711
1952
F3
mean
1574
1494
1415
1318
1697
2077
1968
1806
2198
1066
698
1136
717
1619
1764
sd
57
72
109
39
66
54
89
152
56
120
59
219
86
58
121
min
2060
2198
1843
2023
2241
2463
2414
2297
2811
2128
2364
2252
2082
2160
2069
max
2244
2313
2345
2402
2483
2652
2568
2717
3241
2318
2584
2967
2668
2354
2317
mean
2176
2238
2155
2293
2378
2589
2489
2521
3016
2255
2489
2488
2469
2247
2193
sd
69
43
180
154
82
66
62
152
167
67
91
286
227
75
86
mean
2335
2276
2315
2473
2523
2758
2713
2610
3257
2471
2703
2476
2818
2314
2383
sd
195
158
83
104
116
91
73
197
279
128
138
201
642
105
402
mean
2309
2364
2303
2282
2643
3119
2922
2703
3781
2198
2359
2383
2347
2435
2464
sd
113
112
321
112
24
100
277
180
151
152
92
56
143
134
300
Table A.18: Formant values speaker B03 (in Hertz)
F1
B04
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
303
358
505
540
362
287
314
286
189
432
300
235
128
255
201
max
343
439
600
598
503
339
362
322
320
490
351
325
369
301
263
F2
mean
322
405
542
569
447
318
332
303
224
457
317
301
283
283
232
sd
16
31
33
26
54
19
18
16
49
21
18
31
96
17
22
min
1380
1073
766
1025
1741
2016
2045
1722
2119
647
448
503
533
1549
1657
max
1883
2001
1546
1269
2023
2328
2142
2213
2371
817
691
2181
2383
2039
1951
F3
mean
1655
1518
1301
1141
1812
2157
2089
1895
2258
751
559
1144
1028
1751
1765
sd
170
313
283
91
108
122
49
172
119
74
94
704
780
184
114
min
2107
2091
2171
2308
2392
2655
2593
2449
2914
2298
2593
2235
2463
2131
1956
max
2692
2455
2406
2581
2716
2929
2791
2998
3643
2612
2973
2857
3958
2394
2930
Table A.19: Formant values speaker B04 (in Hertz)
F1
B05
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
327
386
539
650
438
340
358
346
236
476
343
367
241
355
237
max
372
519
734
731
517
378
601
388
357
559
402
396
336
376
348
F2
mean
345
454
640
687
495
359
408
370
292
517
362
381
302
363
286
sd
17
53
64
37
30
12
95
14
45
28
23
12
36
8
36
min
1774
1631
988
926
1882
2215
2116
1848
2209
819
690
784
435
1658
2018
max
2040
2219
1739
1157
2170
2393
2411
2238
2402
1237
940
1877
1735
1961
2257
F3
mean
1891
1943
1405
1035
2000
2326
2271
2029
2285
994
777
1293
809
1845
2157
sd
118
209
295
96
109
64
109
140
67
162
99
384
473
121
85
min
2188
2163
1860
2188
2605
2942
2437
2529
3520
2020
2229
2323
2132
2233
2087
max
2460
2494
2653
2425
2670
3208
3167
3024
3919
2414
2456
2467
2539
2535
2890
Table A.20: Formant values speaker B05 (in Hertz)
92
APPENDIX A. TABLES AND FIGURES
F1
B06
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
361
468
634
450
410
355
389
344
250
541
387
356
279
368
285
max
596
545
682
698
568
415
592
404
311
629
432
449
352
394
313
F2
mean
411
501
656
610
507
377
499
382
274
588
409
411
313
378
293
sd
83
27
22
97
65
22
81
23
20
40
19
37
25
10
11
min
1441
1412
1174
1163
1572
1904
1627
1612
1966
1016
630
846
652
1566
1648
max
1636
1532
1506
1286
1856
2031
1992
1964
2082
1139
851
1285
922
1814
1871
F3
mean
1550
1487
1341
1213
1715
1972
1788
1806
2016
1075
734
990
753
1633
1766
sd
66
50
127
51
106
41
122
117
41
46
75
168
110
103
94
min
1971
2149
2436
2497
2381
2543
2405
2241
2838
2425
2460
2287
2121
2125
1957
max
2495
2637
2691
2789
2732
2822
2822
2679
3227
2706
2699
2839
2574
2491
2225
mean
2161
2301
2528
2617
2549
2726
2578
2470
3039
2573
2605
2530
2398
2249
2084
sd
185
178
101
122
169
103
152
157
139
91
79
230
175
142
109
mean
2167
2135
2054
2059
2374
2718
2470
2371
3003
2060
2401
2266
2486
2245
2202
sd
66
97
128
81
182
107
34
123
218
98
268
91
341
67
83
mean
2491
2542
2428
2310
2556
2800
2669
2897
3131
2352
2544
2523
2624
2496
2516
sd
64
165
364
321
392
246
323
195
246
230
53
148
85
123
79
Table A.21: Formant values speaker B06 (in Hertz)
F1
B07
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
343
373
616
637
447
338
443
355
244
547
357
389
288
359
267
max
383
555
680
707
547
356
554
411
327
654
449
468
517
394
287
F2
mean
361
486
651
669
509
345
499
382
280
592
401
430
357
372
274
sd
13
66
21
28
34
7
37
23
36
35
33
31
81
14
8
min
1502
1488
1034
912
1662
2133
1863
1642
2130
896
451
654
526
1586
1599
max
1874
1597
1414
1273
1917
2286
1951
2019
2447
1167
1311
1380
1664
1841
1903
F3
mean
1637
1553
1308
1126
1807
2195
1896
1795
2233
994
709
984
796
1658
1771
sd
136
45
142
131
96
62
34
133
112
94
306
240
440
105
111
min
2066
1986
1887
1953
2058
2573
2420
2246
2763
1958
2163
2177
2244
2143
2108
max
2234
2213
2190
2160
2593
2845
2513
2578
3414
2222
2925
2402
3172
2313
2322
Table A.22: Formant values speaker B07 (in Hertz)
F1
B08
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
432
524
696
697
587
424
467
362
335
587
446
443
321
390
337
max
521
630
755
846
682
463
694
512
459
692
489
479
453
474
400
F2
mean
469
592
711
783
627
447
588
471
403
630
469
469
397
442
359
sd
34
40
23
63
35
15
96
55
45
40
16
14
60
35
22
min
1491
1556
1505
1017
949
2317
1632
1962
1749
993
650
591
592
1912
1835
max
1986
1859
1965
1724
1994
2520
2443
2416
2585
1207
945
1802
859
2315
2089
F3
mean
1798
1735
1688
1542
1710
2403
2093
2224
2356
1127
821
1007
743
2044
2003
sd
168
129
169
301
413
75
286
169
308
74
104
453
94
165
93
min
2409
2332
1798
1982
1944
2498
2348
2655
2690
2053
2496
2267
2517
2346
2402
max
2564
2789
2778
2793
2907
3015
3106
3205
3386
2585
2643
2699
2719
2674
2625
Table A.23: Formant values speaker B08 (in Hertz)
93
F1
C01
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
469
516
729
781
609
459
463
359
268
645
427
387
286
374
268
max
502
654
842
901
797
503
777
532
310
704
473
517
310
503
307
F2
mean
477
599
790
828
695
479
674
486
291
667
452
456
296
451
280
sd
12
46
45
45
62
15
113
64
15
22
20
50
10
50
14
min
1032
1500
1219
946
962
2491
900
1668
2631
1091
619
819
724
1360
1552
max
1835
1796
1689
1585
2230
2769
2555
2234
2854
1505
802
1579
1169
1872
1746
F3
mean
1588
1679
1466
1357
1910
2622
1972
2026
2763
1235
711
1079
840
1623
1631
sd
295
116
176
266
472
105
563
198
89
157
71
280
166
192
76
min
2324
2501
1720
1488
2310
3017
2126
2590
3330
1557
2721
2191
2693
1962
1947
max
3481
3229
2622
2764
3030
3442
3139
3319
3490
3116
3135
3277
3051
2907
2500
mean
2770
2821
2160
2086
2801
3208
2884
3065
3420
2604
2891
2743
2841
2504
2303
sd
396
251
346
618
271
154
382
274
60
549
153
378
138
445
189
Table A.24: Formant values speaker C01 (in Hertz)
F1
C02
ø:
œ
a
a:
E
e:
E:
I
i:
O
o:
U
u:
Y
y:
min
342
433
535
538
469
337
436
313
264
527
339
320
274
299
274
max
400
509
663
721
585
409
558
401
291
672
430
389
480
386
325
F2
mean
378
464
626
669
521
374
468
373
277
601
397
363
337
343
298
sd
23
31
48
75
43
26
47
32
11
53
34
28
76
32
17
min
1438
1377
1280
1282
1592
1648
1768
1646
2045
812
539
777
574
1417
1283
max
1592
1606
1540
1341
1826
2095
1977
1993
2441
1365
746
1224
1516
1607
1767
F3
mean
1517
1514
1436
1320
1730
1934
1888
1818
2199
1111
691
1096
838
1516
1664
sd
51
93
103
24
81
174
91
136
178
179
76
177
342
72
189
min
1925
2166
2167
2250
2265
2095
2229
2251
3010
2095
2002
1995
2062
2065
2108
max
2270
2446
2722
2659
2621
2922
2739
2552
3340
2922
2329
2289
2393
2391
2546
Table A.25: Formant values speaker C02 (in Hertz)
mean
2094
2249
2358
2393
2434
2435
2438
2440
3167
2373
2149
2162
2224
2240
2217
sd
117
102
188
164
121
301
184
109
160
343
130
109
108
121
174
APPENDIX A. TABLES AND FIGURES
u
u
I Yu
e
2
U
9
3
u
600
E
E
U
o
o
0o
o
0
o0
o
0
u
o
mean
0
mean
i
i
i
ii
I
e II
e I
U U ee
uu e3
3
U
i
−1
y
0
300
y
I
2
I
u
I I
e mean
22
mean
e
mean
I
U U mean
Y
e
e
Y mean
u
9
3
2 92 9
mean
3
U u
9mean9
3
3
U
9
mean
E
33
E E
500
400
I
e
mean
y
y
yyY
Y
I
YI
Y
9
2
9
3 3
33 E
EE
E
E
y
y
9
3E
uu
u
2 u
Y222
2
e I
U
U
U
oo
99
9
9
o 0o o
0
o0
o
0 0
0
0
0
0
E
A a
aAaAaa A
2
700
F1 (Hz)
y
y Ymean
y
Y
1
y
i
F1n
i
ii
ii
mean
−2
200
94
800
a
AA
a
Aa A
aa
Aa
mean
mean
A A
900
2500
2000
1500
400
I
I
9
500
3
9
UUu
U
2
2
2 229 9
mean 9
2 mean9
E3
0
o
0
u
0
mean
00
A
E
A
2
700
o
0
EE
00
a aA
a A
a
mean
mean
A
a
Aa aA
aaaA
A
Aa
a
A
3
800
A
U
U
3E E 9
A
E
u
u
uu
oo U
u
u
oU
oo
0
o
0
0
0
Y2
e I
00
E
E
Emean
E
600
u
U
o
meano o
y
i
o
U
E
mean
U
mean oo
mean
29
U
y
ii
y yy
i iii
y YY
Y
I
e
I YI Y
I ee
U
ee e I I 9 2 22 9
2 299
u
33 9 2 9
33
U
E3 E
u
uu
−1
mean
e
3333
F1 (Hz)
Y
mean
0
I
I
e
u
−2
−2
U
yyy Yy
YY
IY Y
I
1
I
−1
(b) Speaker A01
mean
ee
ee
mean
0
(a) Speaker A01
y
e
1
F2n
F1n
300
i
2
1000
F2 (Hz)
y
ii
i iii
mean
3
a
A
A
2500
2000
1500
1000
500
2
−2
(d) Speaker A02
U
U
uo
o o
o
−1
UU U
2
E
i
u
uu
u
u
o
Uu
9
0o
Y
Y
Y
I II IY 22
I
2
22
3
3
3U
2
9
E
9 9
EE 9 9 9
EE
E
3E
Y2
u
u
uuu
U
U
UU U
mean
0
0
0o
00
0
0 0
0
A
a
A
A
AA
AA a
a aa
a
A
A
A aA
aa
aa
u
o
o
oo
o
Uu
9
0
0
0
2
0
e
eee3e
33
yyyy
yy
yY
e I
mean
A
ii
mean
mean
mean
E
i iiiI
1
e I
3
−2
o
y
Y
i
0
yy
y
yy
mean
yYY
Y
e
Y
mean
I Y 22
II
eee3e
meanI
mean
2
I
3
22
3 33
mean
3
mean
2
U
9
E
9
9
9
E
9mean9
EEE
F1n
300
400
F1 (Hz)
500
−1
(c) Speaker A02
i
ii
i
mean
i I
600
0
F2n
o
i
1
F2 (Hz)
700
3
AA
mean
mean
a
2000
1500
1000
500
2
1
0
−1
F2 (Hz)
F2n
(e) Speaker A03
(f) Speaker A03
−2
Figure A.1: The F1 /F2 vowel spaces of speakers A01, A02 and A03. The graph on the left shows the
formant values in Hertz with the mean values for the individual sounds of the respective speaker in the
background. The graph on the right shows the normalised formant values with the reference points in the
background. See table 1 on page 7 for an explanation of the symbols.
i
−2
300
95
i
ii
ii
i
mean
mean
0
700
0
A
A
A
A
aA
a mean
a
Aa
mean
800
2000
0
A
a
1500
2
1000
1
0
−1
−2
F2n
(a) Speaker A04
(b) Speaker A04
−2
Y
29
y
U
u
U
e
3
U
u
u
U
iI iIi i
u
u
o
0
o
o
o
o
0
mean
E
U
0
0 00mean
0
oo
2
600
A
A
a
mean
mean
a
0
A
a
3
700
A
o
A
a
A
A
a a
a
U
3E 9
E
A
u
Y2
e I
mean
mean
E
y
Yy Y
y 9Y y9
I I
Y
i
U
i 9 i29 22y Y2 Y U
u
u
u
U
U
2 I2
99
e e
u
I
3
e e 33
e
0 o
3 3E E
o
3U
o
u
o
E
U
0
0 00 0
E
oo
E
A
A
A
a Aa
a
A a a
−1
U
0
y
1
400
000 00
F2 (Hz)
y
iI
y
i
Y Y Y y
y 9
I
Ii I
9
mean
Y
i
mean
mean9
i2 9
22y Y2 Y
i mean
umean
2
mean
2 I
9
9
e e
e e I333
mean e
3E E
3 mean
3U
u
meanE
iI
o
A A
A
A
a aA
a Aa
F1n
300
2500
u u
o
u
U
UU U u U
o
o
oU
o o
0
3
A
a
U
U
3E 9
00
EE
u
3
3e
EE
E
EE
0
mean
0
y
Y2
2
600
3
E
3e
E
E
500
i
yy
y
I y Yy 3 y
II
2 92
Y
I
I
Y
e
3I e
9 2 22 Y2
ee
9
3
99 9
3
e
E
1
mean
i
e I
u
Uo
U
E
mean
F1 (Hz)
u
ou
U
uu U
U U Umean
o
mean o
oU
mean
o
o
0
mean
I
e3
500
−1
y3 y
YYY 2 92
9 2 22Y2
929
99 9
y
0
y y
Yy
I y
I
I II eI
e
3
mean
e e
3
mean
e3mean E
i
i
F1n
400
i
i
F1 (Hz)
i
mean
A
400
1000
500
2
−2
uu
uU
U
u
mean
u u o
o
U mean o o U
U
mean
oo
U
mean
U
Yy
2 299 2
9 92
2
I 2mean
mean
9 9
U
29
o
0
0
0
0
0
mean
3EEEE 9
E 3
u
U
U u
uu o
o
o
oUU
U
oo
E
U
00
3
A
A aa A
A
A
800
2
700
0
a
A
A
A
A
Aa
mean
a
a
a
a
3
A
aa
A
A
aa
mean
o
0
0 0
0
0
3
E
1
00
3
uu
Y2
e I
F1n
600
3
E
E mean
E EE3
y
i i I yYy YYy
I
Yy
ee I
3
9y U U
I I 2 Y29
e3e ee
99 2
I2 22
3
99
mean
900
F1 (Hz)
y
i iiii
E
3
E
−2
(d) Speaker A05
y
Y
3
−1
(c) Speaker A05
y y Yy
Y Y
mean
Yy
e I
0
F2n
y
I
I
e I
3ee mean
I
e3 ee
I
mean
1
F2 (Hz)
ii
ii
i
mean
i i
500
1500
−1
300
2000
A
2500
2000
1500
1000
2
1
0
−1
F2 (Hz)
F2n
(e) Speaker A06
(f) Speaker A06
Figure A.2: The F1 /F2 vowel spaces of speakers A04, A05 and A06.
−2
u
96
APPENDIX A. TABLES AND FIGURES
y
yY
u u
U
UU U U u o Iu
mean
u
mean
U
o o
o
mean
mean
o
mean
600
mean
mean
Ii
y
iiI
I
iII
i
y
Y
yY
u
U u
y 2 y 2Y
u
ie3 iy
UU UU u o Iu
9y9
u
e e Y 29 22Y 9
U
o o
3E
99
o
Eee
E
3E
0
3
e
0
00 o 0 o
33E
E
0
0
o
00 o 0
mean
0
0
u
Y2
e I
mean
U
3E 9
A
o
0
1
F1 (Hz)
33 E
E
−1
u
U
mean
mean
0
mean
F1n
i Ii3 iyy2Yy 2Y u
e y 9y9 Y
2292 9
e e Y 29
3
99
E
e
Eee
E
33
3EE
e
0
I
mean
iII
i
400
Y
−2
iI
Ii
A
A
mean
a
1000
A
AA A
a
A
A
2
800
A
a
a
a
a
3
aa
A
aa A
A
A
mean
a
a
2500
2000
0
y
mean
mean
U
mean
mean
mean
UUU uuuu
mean
mean
U
U
u
u
U
o
mean
o
0
E EE
0
0
mean
0
0 0
Y2
E3
E
EEE
9
9
3E
000 00
600
2
a
700
A
A
A
3
1500
1000
2
−1
−2
(d) Speaker A08
300
(c) Speaker A08
−2
y
u
U
Uu o o u u
o
omean
u
o
mean
U
U
mean U
U
0
0 0
0
Y2
U
9
E
ou
0
mean
0
0
E
y
Y2
e I
0
9
yy
y
iei
I
yyY y
i i33e
i
Y2
ieee3I3II
Y2 2 2
I
2
3 E I Y2 Y
9
e
9
9
9
EE
9 9
i
E
u
u
U
Uu oo u u
oo
ou
U
U
00
0 0
U
o
U
U U
3E 9
0
0
E
1
mean
U
−1
mean
yyY y
Y2
Y2 2 2
2
I mean
Y2meanY
9
9
9
9
mean
EE
9
F1n
i
Iie
i i3 e
i
33 3 I
mean
i
I I
e ee I
mean
mean mean E
3
e
500
0
F2n
y y
600
1
F2 (Hz)
y
ie3
I
A
a
A
A
a
2000
Aa
A
aa
AAA a
a mean
mean
o
A
A
aa a
a
A
0
E
A
E
700
A
A
E
a
a
A
AA aa
a Aa
2
400
o o
0 o o
0
9
1
o
u
u
U
u
UU
UU uuuoU u
o
y
I
i
i I
y 92
I
iI
ie
yY
eI33e eee
Y
y92 2 y
33 y Y9YY9 U
I
22
3E
e I
o
mean
y
i
i
i
0
9
mean
F1 (Hz)
o
0 o
9
E
o
F1n
500
3
E
E
2500
−2
−2
u
y
mean
A aa
A
mean
A
mean
aa
3
aa
800
F1 (Hz)
−1
(b) Speaker A07
−1
400
1
(a) Speaker A07
I
y 2
e
iiiIieIeiII
yY
y9
3e e
e
3
yY92 Y2
e
yY
Y2
33
3IE 29Y29
3
9
mean
2
1000
F2n
I
i
i
1500
F2 (Hz)
A
A
A
A
2500
2000
1500
1000
500
2
1
0
−1
F2 (Hz)
F2n
(e) Speaker A09
(f) Speaker A09
Figure A.3: The F1 /F2 vowel spaces of speakers A07, A08 and A09.
−2
U
Y
Y
y
mean
mean
I
mean
I
eE e
9
3 3 e mean
i
u
U
e
uU
u
uU U
u
u
mean
mean
−1
YY y
y Y y
mean
y Y
I
Ii I
iI
i
−2
y
i i
I
i
UU
o
oo
u
o
92
a
a
A
3E 9
A
a
2
1500
2
1000
−1
(a) Speaker A10
(b) Speaker A10
y yy
600
9
9
−2
o
y
i
−1
U
ii
i
0
0
9
800
o
3
U mean
u
UU
oo u o
Uu
mean
mean
o
0 U
00
0
mean 0
0
yi
ii
A
e I
Y2
A
2
A
1000
A
aa
a
3
aa
a
a
1200
a
i
i
i
mean
i
e
y
1000
500
2
−1
(d) Speaker B01
u
u
u
I Y2
Uo
U
mean
U U
oo
o
oo
mean
o
i
i
e e
e e
Ie
1
0
u
2 U Uo
2
UU
Y2
I Y 2
I II IY Y2 2 u
9
9
0
9 9
E
0
3 E E 9
0
33
0
0
9 AE
3E3E
3
A 0a
A
A
A
Aa a
a
a
Y2
U
u
U o
Uo
o
ooo
0
2
3
800
y
A
a
A
a
u
u u
y
y yy
u
3E 9
F1n
9
mean
E
0
3 E E
9
0 mean00
33
9 A
mean
E
3E3mean
E
3
A 0
a
A
mean
A Aa
A
a
3E
y
e I
0
9 9
y
i i
iii
mean
2 U
2
Y 2
U
Y2
II
2 2 u U
mean
I I Imean
Y
Y
mean
−2
−2
u
u u
yy yyy
mean
9
600
0
(c) Speaker B01
9
700
1
F2n
0
500
e e
e e
Ie
mean
y
1500
F2 (Hz)
−1
ii i
0
A
a
A
aa
2000
0 0
A
a
mean
2500
0 U
00
A A
A
3000
2
0
9
mean
A
U uu
o
3
UU u
U
u
U
oo
U
o ou o
o
3E 9
A A
A
y yy
Y
eY
3 I
2
e e y3
3
YI 9 2Y u
ee 3
I I I93I 2
Y 2
2 9
9
EEEE
EE9
y
1
E9
u
o
2
mean
2mean
E EE
E mean
mean
EE9
u
U
Y
mean
mean
Imean
3
−2
u
y
y
eY
I
3
2Y u
e e3
e 3y3 I3 YIY
e e
I9 I 29Y222
Ie
i
F1n
yi
mean
0
F2n
mean
i
1
F2 (Hz)
200
400
o
A
u
ii
F1 (Hz)
U
3
2000
ii
300
u
uU
AA A a
a
1000
2500
400
u
0
A
F1 (Hz)
u
u
U
Y2
e I
mean
A a
A A mean
a
U
Y
u U
I
e
U
u Uo
o
y
2
eE e
oo
3 3e 9
9 29 2
0
0
9
e
232
o
E
3
2
0 0o
EE
3
9
0 0
EE
A a
A
a
3
a
0
0
800
F1 (Hz)
E3
YYy
y Y y
y Y
I
1
600
e
y
iI
Ii
I i
i
e
o
y
2
29 2
0mean 0
99 e
2
2
mean
o
mean
3
30 o
E E 2
0
E
mean
mean
3mean
9
0 0
EE
a
A
A
a
3
e
y
ii
I
i
F1n
400
97
mean
a
a
2500
2000
1500
1000
500
2
1
0
−1
F2 (Hz)
F2n
(e) Speaker B02
(f) Speaker B02
Figure A.4: The F1 /F2 vowel spaces of speakers A10, B01 and B02.
−2
APPENDIX A. TABLES AND FIGURES
y
i
ii
yyy yy
y
u
mean
mean
ii
i i
i
ii
u
YYY
eI I
e
22
ee
Y mean
3 e 3I I
2 2Y2 2
emean
3 3 3 mean II mean
mean
3
9
i
u
U
U
U
mean
mean
o
Umean
9
00
0
0
0
E
A
mean
A
A
Aa
Aa
mean
700
3
1500
2
1000
u
U
o
o
1
9
mean
9
600
aa
1500
1000
500
0
a
2
1
0
−1
−2
F2n
(c) Speaker B04
(d) Speaker B04
u
y
u
u
u
mean
2
2u
ye I 2 Y2
i
3ee3
ee3
YY
I Ymean
I
2
mean
e3I3 9 Imean Y
mean
I U
9
i
e I 2Y
3
E
9 9
E
EE9EEE
uu
oo
oUo
o
U U
U
U
e I
mean
0
0
A
0
1
0
0
0
u
y
Y2
A
2
A
700
Aa
0
0
00
A0
0
A
a
A
Aa
aa
A
a
a
A
A
a a
a
3
A
o
Uo
0
3
3
mean
u
ou
o o
U
o
U
UU
U
0
mean
0
A
u
uu
3E 9
9
mean
i y
i yi y
y
iy
u
ie 2
y I2 Y2 2
i
3ee3
ee3
I Y IY2 Y
e3I3 9I
I
9 U
E
9 9
E
9
E
E
E
9 E
F1n
mean
i
Uo
mean
mean
mean
a a
A
2000
A
F2 (Hz)
meanmean
9
AAA aa
aa a
0
a
y
iy
00
00
mean
A
2000
9
aa
mean
a
o
0
A
A
A
AA
oUoooo
U u
U
o
u
3
500
0
A
A
yi
y
iy
A
mean
E
E
U
A
0
E
i
0
00
0
0
mean
u uU
Y2
E
E
E
9
9
y
3E 9
mean
u
y
e I
oUo
U uUo oo
mean
mean
2U
0
Y
e3 I 2
i
−2
u
I IY Y
mean2 Y
I
ie e 33eUmean
I I
22
3
3
e
mean e
2 2 mean
mean
9
3 E
u
9
9
E
EE
y
y
yy
y Y
eIeU Y I IY Y
I2 Y 2U
eUI I 22
ie e 33
3
3e
e 2 2
9
3
E
u
9
9
E
9
E 9
−1
U
Y
i
i
i i
i
i
−2
300
F1 (Hz)
400
yyy
Y
eIU
e
y
y
mean
y
u
2
y
i
−2
(b) Speaker B03
−1
i
−1
(a) Speaker B03
i
i
0
F2n
F1n
200
i
i
1
F2 (Hz)
mean
300
a
a
u
400
a
a
2000
500
AAa
A
aa
Aa
mean
F1 (Hz)
0
0
A
Aa
a
600
u
2
600
A
UU o
0
00
0
0
9
A
0
A
uu
u oo
Uo oo
o
Y2
1
E
u
uu
9
0
9
y
99
9
3EEEE E E 9
u
F1n
E
E
mean
500
F1 (Hz)
9
E 99
9mean
EE
E
y yy
y
y
YYY
e
eI I
U
ee
Y 22
3
e 3 e3 33I III22Y22
U UU
3
9
y
e I
0
U
uoooou
Uo
o
u
U
−1
e I Y2
3
400
300
i
−2
98
1500
1000
500
2
1
0
−1
F2 (Hz)
F2n
(e) Speaker B05
(f) Speaker B05
Figure A.5: The F1 /F2 vowel spaces of speakers B03, B04 and B05.
−2
300
i
i ii
i
i
yy
y
mean
u
yy
y
u
mean
y
i
−2
99
i
iiii
uuuu
ii
U
U
2
1
3
2
a
aa
700
A
2
1000
u
500
u
uo
o
U
oo o
U U
o
U
U
mean
U
9
o
3 E
E
1
9
A
700
1500
000
000
A
a
A AA 0a
a
A
a A
a
A
a
3
1000
500
2
1
0
−1
−2
(c) Speaker B07
(d) Speaker B07
mean
E
3
3A
−1
i iiIi
U o
9
9
9 mean93
9
E
E
9
mean
E
Aa A
A A
A
0 0E
0
0
mean
0
0
u
U
Y2
e I
0
2
y
2
3EEE 99 993
9
U
9
3
E
3 E
3A A a AE
AA
u
u
o ouou
U UU
Uo o
o
o
0 0E
0000
0
A
a
0
A a
2
3
meanE
E
U
2
umean
u
u ou
o
U o UU
Uo o
mean mean
o
uu
y
yy y i
y
yY
e
Y
e e
222
Y
ii 3eee
Y U
I Y3 I
I2
I
I 2
9
u
ii i
yY
emean
Y
e e
22
e
i
2
Y
i mean
mean
3e e
Y U
I Y3 I
mean
I 2mean
I
I
2
9
Y
−2
uu
y i
F1n
y
o
U
U
F2n
y
yy
y
u
u
o u
o
o o
U UU o
U
F2 (Hz)
300
400
u
u
a
A a
500
Y2
mean
A AA a
0
A mean a
mean
a A
a
2000
y
2
600
0
0
00
600
−2
y yy y
y y
ei
e ee
2
e I 2Y I 2
2Y
I 2YY9 2
I Y
I
I
E3
o
9 U
3
9
3E3E E u 9
E 9
3 E 9
0
9
0
a
mean
A a
a
800
a
mean
aa
3
F1 (Hz)
−1
3E 9
mean
U
i
iii
e I
mean
mean
mean
mean
i
0
400
2
IY
i
−1
2
2Y I 2
2Y
YY9 2
I mean
2
Imean
mean IY
I
99
3
3E3EEE u 9
700
0
−2
u
u
u
3
3
1
(b) Speaker B06
y
I
A
a
A
A
(a) Speaker B06
E3
i
e
A
3
A
y yy y
ymeany
I
0
F2n
e
o
00
0
aa 0
0
A A0
Aa
a
F2 (Hz)
i
ee e
ee I
mean
2500
9
9
3E
2
mean
1500
mean
o
0
0
A 0
A mean A a
a
A
U
U a U
mean
A
2000
0
uU
UU Uooooo
99
9
9
2
0
u
uuu
9
00
9
E
1
600
3
i
iii
E
mean
u
u
Y2
3
33E
E
F1n
999
3
i
y
E
9
9
33E
3
mean
mean
300
o
9
E
E E
F1 (Hz)
U
U
a
oo
o
mean
mean
mean
U
i
o
yy
yy
y
y
I
ee I
ee
YYY2 2 2
3 e IYI 22
Y2
IE I
e
3
e I
oo
U
F1n
400
e I Y
500
F1 (Hz)
u U
22
YYY 2
mean
Y2
I
22
0
I
e e
I
ee
Y
3
meane mean
III
E
e
3
−1
mean
aa
2500
2000
1500
1000
2
1
0
−1
F2 (Hz)
F2n
(e) Speaker B08
(f) Speaker B08
Figure A.6: The F1 /F2 vowel spaces of speakers B06, B07 and B08.
−2
y
y yyy
mean
y
Y
400
2
Y
I
9
E
3E
0
mean
0
A
E
a
aA
a mean
A
A
a
9 U
U ooo
ooo
U2U U o
E
0 00
0
0
A
A
A
3
a
a
aA
a
A A
E
3E
E3
3
E
0
A
aa
3
a
2000
0
−1
y
u u
yY
yy
y
y
u
mean
U
y
U
e
Y2Y
Y
2mean
U
e II
U mean
2 Y
emeanmean e
I 2
U
I
2 2
I mean
e
9
9
3
3
9
3 E3 E
u9
mean
mean
9
9
0
E
E E A
mean
a
0
3
Y
2
ou
o
0
0
0mean
A
U
uu
u u
u
o
u
Uo
oo
o
o
o
0
A
a
0
mean
AA
Y2
3E 9
2
600
A
y
e I
1
E
E
y
yyy Y
y
y U
U
e
Y2Y
e
2Y
U
e II
U
2
e
Y
e
I
U
I 22 2
e I
9
3 9
3
9
3 E3 E
u99
9
0
EE E A
a
0
3
E
0
A
0
AA
A
A 0
aa
aa
iiiiI
o
U
oo
mean
o
o
9
3
i
ii
meanu
U
0
eI
u
−1
I
−2
−2
(b) Speaker C01
i
400
1
(a) Speaker C01
ii i
500
2
1000
F2n
mean i
e
1500
F2 (Hz)
F1n
300
ii
0
aA
A
0
3
700
aa
mean
aa
2500
2000
1500
1000
500
2
1
0
−1
−2
F2 (Hz)
F2n
(c) Speaker C02
(d) Speaker C02
mean F1/F2
(speaker group C + literature)
mean normalised F1/F2
(speaker group C + literature)
300
−2
200
I
U
a
2500
F1 (Hz)
A
mean
900
A
I
u
U
Y
9
3E 99999
0
3
E
0
A
Y
Y
e ee e
3e I
2 222Y
ee
I Y Y2
I
I
u
u
uu
u
u
yy
I
000
0
9
EE
3 mean
E3 mean
3
700
U U
9
mean
800
U
U
999
3
E
o
2
mean
mean
mean
9
oo
U o
o
oo
mean
0
I
mean
I
i
2
I
I
600
F1 (Hz)
500
e
U
Y
Y22Y
Y 2
I
2
y yyy
ii
ii
i i
U
3
e e 3
e e
e
mean e
mean
−1
I
uuu uuu
u
1
i
F1n
i
ii i
imean i
−2
APPENDIX A. TABLES AND FIGURES
300
100
[i:]
● [y:]
● [u:]
−1
● [y:]
● [U] ● [o:]
● [u:]
● [Y]
● [2:]
● [U] ● [o:]
0
500
[e:] [I]
● [9]
● [O]
● [9]
● [O]
1
600
[E:]
[E]
F1n
[E:]
[E]
700
[a]
[a:]
[a]
[a:]
900
3
800
2
F1 (Hz)
400
[i:]
● [Y]
● [2:]
[e:] [I]
2500
2000
1500
1000
500
F2 (Hz)
(e)
2
1
0
−1
−2
F2n
(f)
Figure A.7: The F1 /F2 vowel spaces of speakers C01 and C02 and the reference points (mean values from
speakers C01 and C02 and values taken from literature).
101
ID
Fi
A01
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
.022
–
.013
–
–
–
<.001
–
–
–
–
–
–
–
–
–
–
–
–
–
.061
–
–
–
.001
–
–
–
–
–
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
F1
F2
F3
.067
–
–
.001
–
<.001
<.001
.06
.099
<.001
–
–
.003
–
–
.028
.077
–
.005
–
–
<.001
–
–
F1
F2
F3
F1
F2
F3
.001 •
–
–
<.001 •
–
.035 ∗
A02
A03
A04
A05
A06
A07
A08
A09
A10
B01
B02
B03
B04
B05
B06
B07
B08
C01
C02
[ø:]∼[œ]
∗
∗
•
◦
•
•
•
•
◦
∗
◦
•
[a:]∼[a]
–
–
–
–
–
–
.003
–
–
.041
–
–
–
–
–
–
–
–
.005
–
–
–
.088
–
–
.006
–
–
–
–
<.001
–
–
.077
.027
–
.032
.085
–
–
–
.026
–
.027
–
–
.06
–
–
.056
–
.062
–
–
◦
∗
◦
◦
•
∗
∗
∗
∗
–
–
–
–
.038 ∗
–
[e:]∼[E]
<.001
.003
–
<.001
<.001
–
<.001
<.001
.015
.002
.002
.034
.026
.007
–
.001
.017
.027
.094
–
–
.001
.002
–
.004
<.001
.071
.011
–
–
•
◦
•
•
•
•
∗
◦
◦
∗
∗
◦
•
∗
∗
•
◦
◦
•
∗
<.001
.006
<.001
<.001
.002
–
<.001
<.001
.001
.001
<.001
.003
<.001
<.001
<.001
.009
.004
.085
<.001
<.001
.004
<.001
.009
–
•
◦
•
•
◦
<.001
.013
.013
<.001
.035
–
•
∗
∗
•
∗
•
•
•
◦
•
◦
•
•
•
◦
◦
•
•
◦
•
◦
[E:]∼[E]
.001
.009
–
<.001
<.001
.06
.005
.037
–
–
–
–
–
–
–
.04
.099
–
–
–
–
.076
–
–
–
.086
–
.05
–
–
–
–
.041
<.001
.027
–
.016
.032
.022
–
–
–
–
–
–
.013
.012
.08
<.001
<.001
.002
.015
.044
–
•
◦
•
•
◦
∗
∗
∗
•
∗
∗
∗
∗
∗
∗
•
•
◦
∗
∗
.008 ◦
.036 ∗
.098
.003 ◦
–
–
[E:]∼[e:]
.004
.024
–
.003
.009
–
<.001
<.001
.006
.017
–
.073
.061
.042
–
–
–
–
–
–
–
.002
.014
–
.004
<.001
–
–
–
–
<.001
–
–
–
–
–
<.001
<.001
.026
.002
.001
.01
.077
.002
.056
–
–
–
–
.075
–
–
.095
–
◦
∗
◦
◦
•
•
◦
∗
∗
◦
∗
◦
•
•
•
•
∗
◦
•
◦
◦
–
–
–
.069
.01 ∗
–
[i:]∼[I]
[o:]∼[O]
<.001
.008
<.001
<.001
.013
.014
.002
.001
<.001
.002
.083
–
–
–
–
.004
.006
.012
–
–
–
–
.009
<.001
.046
.012
<.001
–
.082
–
•
◦
•
•
∗
∗
◦
◦
•
◦
.001
.023
.003
.001
.001
<.001
<.001
.001
<.001
.009
.002
.001
.006
.005
<.001
<.001
.006
<.001
<.001
<.001
<.001
.042
–
.1
◦
∗
◦
•
•
•
•
•
•
◦
◦
◦
◦
◦
•
•
◦
•
•
•
•
∗
<.001
.021
.001
<.001
.001
.035
<.001
<.001
.001
<.001
.003
.013
<.001
.023
.057
<.001
<.001
–
<.001
.072
.025
<.001
<.001
.098
•
∗
◦
•
◦
∗
•
•
•
•
◦
∗
•
∗
<.001
<.001
.024
<.001
.002
<.001
•
•
∗
•
◦
•
<.001
<.001
–
<.001
.001
–
•
•
◦
◦
∗
◦
•
∗
∗
•
.054
.088
–
.003
.036
–
.004
.007
–
<.001
.027
–
–
–
–
<.001
.001
–
.024
.006
–
.063
.003
.052
<.001
.003
–
.028
–
–
◦
∗
◦
◦
•
∗
[u:]∼[U]
na
na
na
na
na
na
–
.07
–
.031 ∗
.079
–
na
na
na
•
◦
∗
◦
◦
•
◦
∗
•
•
•
∗
•
•
•
◦
.067
–
–
–
–
–
–
–
–
.079
–
–
–
–
–
–
–
–
na
na
na
–
.004 ◦
–
na
na
na
na
na
na
<.001 •
.019 ∗
–
na
na
na
na
na
na
<.001 •
–
–
–
–
–
[y:]∼[Y]
.07
–
–
.014
.07
–
.012
–
.092
.068
–
.042
–
–
–
–
–
–
–
–
–
–
–
–
.014
–
–
–
–
–
.023
–
–
<.001
.059
–
.002
.034
–
.002
–
–
.003
.002
–
<.001
.055
.069
<.001
–
–
.003
–
–
∗
∗
∗
∗
∗
•
◦
∗
◦
◦
◦
•
•
◦
.001 ◦
–
–
.029 ∗
–
–
Table A.26: P-values of t-tests of within-speaker comparisons of formant values. Only p-values below the
0.1 level are shown.
102
APPENDIX A. TABLES AND FIGURES
ID Fi
[ø:]
[œ]
[a]
[a:]
A01 F1
F2
F3
A02 F1
F2
F3
A03 F1
F2
F3
A04 F1
F2
F3
A05 F1
F2
F3
A06 F1
F2
F3
A07 F1
F2
F3
A08 F1
F2
F3
A09 F1
F2
F3
A10 F1
F2
F3
B01 F1
F2
F3
B02 F1
F2
F3
B03 F1
F2
F3
B04 F1
F2
F3
B05 F1
F2
F3
B06 F1
F2
F3
B07 F1
F2
F3
B08 F1
F2
F3
–
.053
.006
.074
–
.073
–
–
–
–
–
–
.003
–
.078
.078
.001
.027
–
.054
–
–
–
.082
–
–
–
.002
–
–
.002
–
.001
<.001
–
<.001
–
–
.001
.001
–
–
<.001
–
–
<.001
<.001
–
<.001
–
.017
.029
–
.007
–
–
.044
.08
–
–
◦ <.001
<.001
◦
.06
• .087
–
•
–
.006
–
◦
–
• .016
–
–
•
–
–
.04
• .005
•
–
–
•
–
–
∗
–
∗ .005
–
◦
–
.06
–
∗
–
–
–
–
•
–
• <.001
–
–
–
–
◦ .014
.045
–
∗ .003
–
–
–
–
∗
–
◦ .069
.085
–
<.001
.005
–
◦ .066
–
–
–
.001
–
–
–
.059
–
–
–
–
–
.002
–
.067
.081
–
–
–
.02
.004
–
–
.016
.044
.001
.057
–
–
–
–
C01 F1
F2
F3
C02 F1
F2
F3
.034 ∗
–
–
–
–
–
◦
◦
◦
∗
◦
◦
∗
◦
∗
∗
◦
–
–
–
–
–
.063
–
–
.002
–
–
.015
–
.037
.065
–
–
–
–
–
.014
–
–
–
–
–
–
–
–
–
◦
∗
∗
∗
[E]
•
∗
∗
◦
•
◦
•
[e:]
[E:]
–
<.001 • .029 ∗
–
<.001 •
–
–
–
–
–
–
.028 ∗
–
.064
–
–
–
.025 ∗
–
.002 ◦
.042 ∗
–
–
–
–
.017 ∗
–
–
–
–
–
.093
–
–
<.001 •
–
–
–
–
–
–
–
–
–
–
–
–
–
–
<.001 •
–
.015 ∗
–
.044 ∗
–
–
–
–
–
–
–
.096
–
–
–
–
<.001 •
–
.028 ∗
–
–
–
–
.014 ∗
–
.079
–
–
–
–
–
.04 ∗
–
–
–
.027 ∗ .016 ∗
–
–
–
–
–
–
<.001 •
–
–
.037 ∗
–
–
–
–
–
–
.017 ∗
–
–
–
–
.023 ∗
–
–
–
–
–
–
–
–
–
–
–
–
–
.063
–
–
–
–
–
–
–
–
–
[I]
–
–
.023 ∗ .022 ∗
–
<.001 •
–
–
–
.001 • .003 ◦ .069
–
–
.011 ∗ .082
.051
–
–
–
–
.1
.061
–
–
–
<.001 •
–
–
–
.086
–
.081
–
–
–
–
.094
.092
–
–
.023 ∗
–
.015 ∗
.012 ∗ .074
<.001 •
–
–
.003 ◦ .001 • .091
–
.087
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
.02 ∗
–
–
.032 ∗ .087
.062
.021 ∗
–
<.001 •
–
–
.01 ∗
–
.009 ◦
–
–
<.001 • <.001 •
.02 ∗
–
.028 ∗
–
.055
–
–
–
–
–
–
<.001 •
–
–
–
.008 ◦
–
–
–
–
<.001 •
–
.007 ◦
–
.03 ∗
–
.074
–
.003 ◦
–
–
–
.001 •
.087
–
–
–
–
–
.008 ◦
.014 ∗
–
–
–
–
.065
–
–
–
–
–
–
–
–
–
–
–
–
.099
–
–
–
.093
–
.092
–
–
–
.085
–
–
–
–
–
–
–
–
–
–
–
–
[i:]
[O]
[o:]
.087
<.001
.075
–
–
–
–
.039
.034
–
–
–
.018
<.001
.001
.014
.002
.017
.005
–
.004
<.001
–
–
<.001
–
.026
–
.04
.059
.078
<.001
.01
–
.006
.051
–
.002
–
–
.033
–
–
.022
–
–
.001
–
.004
–
–
–
.022
–
.005
.072
–
.019
–
–
.007
–
<.001
.048
–
<.001
–
–
–
–
–
–
.003
–
.016
–
–
–
–
–
.005
.011
–
–
.01
.006
–
.018
.074
–
.075
–
–
–
–
.082
–
–
–
–
.006
–
–
<.001
.039
.096
.001
–
–
–
–
–
–
–
•
∗
∗
∗
•
◦
∗
◦
∗
◦
◦
•
•
∗
∗
◦
•
∗
◦
.028 ∗
.028 ∗
.001 •
–
–
.097
.001
–
–
–
–
–
–
–
.09
–
<.001
–
–
.021
.015
.055
–
–
.038
.063
.008
–
.052
.098
•
◦
◦
◦
∗
∗
•
◦
∗
◦
∗
•
•
∗
∗
∗
◦
–
–
–
.037 ∗
–
–
–
–
.006
.002
–
–
–
.021
.022
–
–
.017
–
–
.092
–
–
.005
–
–
–
–
–
–
[U] [u:]
◦
•
∗
•
na
na
na
na
na
na
na
na
na
na
na
na
–
–
–
–
–
–
–
.004 ◦
.021 ∗
–
–
–
◦
∗
◦
∗
∗
◦
∗
◦
◦
na
na
na
na
na
na
–
–
–
–
–
–
–
–
–
–
–
–
–
–
.096
–
–
.081
.001
–
–
.002
–
–
.003
–
–
–
–
.001
–
–
–
–
–
–
na
na
na
na
na
na
.075
∗
–
∗
–
∗
◦
–
–
.025 ∗
–
.016 ∗
.047 ∗
◦
◦
•
–
.033 ∗
–
na
na
na
na
na
na
na
na
na
na
na
na
–
–
–
–
–
–
na
na
na
na
na
na
na
na
na
na
na
na
–
–
–
–
–
–
◦
.004 ◦
–
.092
–
–
–
[Y]
[y:]
–
.074
–
<.001
.026
.036
.008
.001
–
–
–
–
<.001
–
.009
.052
<.001
–
–
–
–
–
.024
–
–
–
–
<.001
–
.007
–
–
.02
–
.01
–
–
–
–
<.001
–
–
–
–
.014
.047
.018
.002
.009
.019
–
<.001
–
–
–
–
–
–
–
.025
–
.056
–
–
.049
–
–
.023
–
–
.073
–
–
.058
–
–
.021
–
–
.043
–
–
.007
–
–
–
–
–
–
–
•
∗
∗
◦
◦
•
◦
•
∗
•
◦
∗
∗
∗
∗
◦
–
–
–
.088
.073
–
–
.027
–
–
–
–
–
<.001
.083
–
.001
.015
.014
.021
.038
–
.004
.004
∗
∗
•
∗
∗
∗
◦
◦
∗
•
∗
∗
•
•
∗
∗
∗
∗
◦
◦
.001 •
.016 ∗
–
–
–
–
Table A.27: Between-speaker comparison of formant differences (against the reference values). P-values of
the corresponding t-tests are shown below the 0.1 level.
103
ID Fi
[ø:]
[œ]
A01 F1
F2
F3
A02 F1
F2
F3
A03 F1
F2
F3
A04 F1
F2
F3
A05 F1
F2
F3
A06 F1
F2
F3
A07 F1
F2
F3
A08 F1
F2
F3
A09 F1
F2
F3
A10 F1
F2
F3
B01 F1
F2
F3
B02 F1
F2
F3
B03 F1
F2
F3
B04 F1
F2
F3
B05 F1
F2
F3
B06 F1
F2
F3
B07 F1
F2
F3
B08 F1
F2
F3
–
.002
.023
.048
.007
–
–
.005
–
–
.018
–
.005
–
–
.045
.042
<.001
–
–
.082
–
–
–
–
.042
–
.001
–
.02
.001
–
.014
<.001
–
.003
–
–
.021
.001
–
–
<.001
–
–
<.001
.001
.027
<.001
–
.059
.022
–
.049
–
.071
–
.042
–
–
C01 F1
F2
F3
C02 F1
F2
F3
◦
∗
∗
◦
◦
∗
◦
∗
∗
•
∗
◦
∗
[a]
•
∗
•
◦
∗
•
•
•
•
∗
•
∗
∗
∗
.001
<.001
.003
–
–
–
.015
–
.044
.034
–
–
–
–
.01
.01
.099
–
–
–
–
.009
–
–
.08
–
–
–
–
–
•
•
◦
∗
∗
∗
◦
◦
◦
[a:]
[E]
[e:]
[E:]
[I]
–
.052
.002
–
–
–
–
.001
.057
.078
.029
–
–
.079
.033
–
–
–
.002
<.001
–
–
–
–
–
.073
–
–
.096
.071
–
–
–
–
–
–
–
–
–
–
.001
.003
–
.021
–
–
–
.047
.019
.04
.003
–
.015
–
–
–
–
–
–
–
–
.001
<.001
–
–
.005
–
–
–
.06
.061
–
.001
–
.054
.043
–
.016
.047
<.001
–
.001
.076
–
–
.031
–
.003
–
.082
–
–
.001
–
–
.021
<.001
.006
–
–
–
<.001
.081
–
–
–
–
–
–
–
.005
.001
–
–
<.001
<.001
–
–
–
–
.01
–
.082
.027
–
–
–
–
–
.04
.001
.081
.064
–
–
–
.02
–
.013
–
–
.01
.012
–
–
.064
<.001
.019
.002
.001
◦
◦
∗
∗
◦
•
•
◦
∗
∗
∗
∗
◦
∗
•
•
◦
◦
∗
∗
∗
•
•
∗
◦
•
∗
•
◦
•
◦
◦
•
•
◦
–
–
.007 ◦
–
–
.002 ◦
–
.093
.047 ∗
–
–
.084
–
.07
–
–
–
–
–
–
–
–
–
–
–
<.001 • .001 ◦ .073
<.001 •
–
.001 ◦
–
–
–
–
.002 ◦
–
–
–
–
–
.03 ∗
–
<.001 •
.061
–
–
–
–
.007 ◦
–
–
–
–
–
–
–
–
<.001 •
–
.02 ∗
–
–
.056
–
–
–
–
–
–
–
–
.071
.002 ◦
–
–
–
–
.034 ∗
–
.076
–
.018 ∗
–
–
–
–
–
–
–
<.001 •
–
.057
.045 ∗
–
–
–
.024 ∗
–
–
–
–
–
–
–
–
–
–
–
.004 ◦ .002 ◦
–
–
–
–
–
–
<.001 •
–
–
–
–
.038 ∗
–
.067
–
–
.059
–
.074
–
–
.096
–
–
–
–
–
–
–
–
–
–
–
.017 ∗
–
.068
–
–
–
–
–
.042 ∗
–
–
–
–
–
.061
–
–
–
.08
–
–
–
.006
<.001
–
–
–
–
–
–
.035
.07
–
–
–
–
.002
–
–
–
–
.004
◦
•
∗
◦
.033 ∗
–
–
–
.003 ◦
.033 ∗
–
–
–
–
–
–
<.001 •
.002 ◦
–
–
–
–
.063
–
–
–
–
–
[i:]
[O]
.008
<.001
.095
–
.003
–
–
–
.013
–
–
–
.053
.024
.003
–
–
.011
.011
–
.006
<.001
–
–
<.001
.051
.003
–
<.001
.08
◦
.04
• <.001
<.001
–
◦
–
.004
–
.051
∗
–
–
–
–
–
∗
–
◦ .029
–
.052
∗
–
∗ <.001
–
◦
–
•
–
–
–
• .002
–
◦
–
.01
•
–
–
∗
•
•
.058
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
.066
–
–
–
–
.02
.065
–
.029
–
–
–
–
–
<.001
.004
.023
–
–
–
–
–
–
–
–
<.001
–
–
–
–
.012
–
–
–
–
<.001
–
–
–
.023
.064
.095
.001
.041
–
.008
–
–
–
•
–
–
–
–
–
–
.002
<.001
<.001
.086
–
.038
∗
∗
∗
◦
∗
∗
∗
∗
•
∗
◦
•
∗
∗
•
◦
∗
◦
•
•
∗
[o:]
◦
∗
•
◦
∗
∗
•
∗
◦
∗
◦
–
.018 ∗
–
.047 ∗
–
–
.007
–
.001
.04
–
<.001
–
–
.091
–
.087
<.001
.003
–
.052
–
–
–
–
–
.044
.01
–
–
.009
<.001
–
.016
.047
–
–
–
.048
.002
–
–
–
.094
–
–
–
–
–
–
<.001
.072
–
.063
–
–
–
–
–
.016
◦
•
∗
•
[U]
[u:]
[Y]
[y:]
na
na
na
na
na
na
na
na
na
na
na
na
–
–
–
<.001
–
.007
.01
.061
–
–
.093
.077
.001
–
.005
.061
.053
–
–
–
–
–
–
–
–
.069
–
<.001
.034
.002
–
.019
.002
–
–
–
–
.001
.002
<.001
.015
–
–
–
<.001
.035
–
.036
.008
–
–
<.001
.054
.045
–
.011
–
–
.014
–
–
–
–
.085
.03 ∗
•
–
◦ na
na
na
∗
∗
◦
•
∗
∗
∗
◦
•
∗
–
–
–
.005 ◦
–
–
.014 ∗
–
–
–
–
–
.058
–
–
–
–
–
.076
–
–
na
na
na
–
–
.061
.03
–
–
.041
–
–
.039
–
–
–
–
.001
–
–
–
–
–
–
na
na
na
na
na
na
–
–
–
–
.051
–
na
na
na
na
na
na
na
na
na
na
na
na
–
–
–
.098
–
–
na
na
na
na
na
na
na
na
na
na
na
na
∗
∗
∗
•
–
–
<.001 •
.062
–
–
–
–
–
–
.084
–
.053
–
–
.004 ◦ .043 ∗
.08
•
◦
∗
•
◦
•
∗
◦
–
–
–
.072
–
–
–
–
–
–
–
–
.028 ∗
–
–
–
–
–
–
–
–
–
.09
–
∗
◦
◦
◦
•
∗
•
∗
∗
◦
•
∗
∗
∗
–
–
–
.093
–
–
–
–
–
–
–
–
–
.005 ◦
–
–
–
.001 •
.003 ◦
–
–
–
–
.081
–
<.001 •
.057
<.001 •
–
.082
–
–
.015 ∗
–
–
–
Table A.28: Between-speaker comparison of formant differences (against group B). Blue marks “improved”
values and red marks “worse” values.
104
APPENDIX A. TABLES AND FIGURES
(a) speaker A01
V
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
SKG
-0.62 (0.42)
-0.45 (0.4)
–
-0.59 (0.45)
-1.08 (0.54)
.0034
-1.21 (0.32)
-0.73 (0.38)
<.001
-1.06 (0.29)
-1.21 (0.32)
–
-1.06 (0.29)
-0.73 (0.38)
.0026
-1.07 (0.57)
-0.91 (0.55)
–
-1.4 (0.7)
-0.44 (1.32)
.0033
-0.58 (1.26)
-0.18 (0.97)
–
-0.42 (0.37)
-0.41 (0.31)
–
RCG
-0.94 (0.34)
-1.03 (0.24)
–
-0.64 (0.49)
-1.19 (0.19)
<.001
-0.83 (0.53)
-0.83 (0.39)
–
-0.94 (0.18)
-0.83 (0.53)
–
-0.94 (0.18)
-0.83 (0.39)
–
-1.17 (0.44)
-0.9 (0.43)
.0366
-1.46 (0.23)
-0.86 (0.98)
.0067
-1.31 (0.99)
-1.49 (0.62)
–
-1.07 (0.42)
-0.98 (0.25)
–
OQG
4.22 (0.64)
0.98 (1.56)
<.001
2.21 (2.24)
2.81 (1.05)
–
1.7 (1.38)
1.67 (1.81)
–
2.55 (2.12)
1.7 (1.38)
–
2.55 (2.12)
1.67 (1.81)
–
3.13 (1.41)
3.15 (1.35)
–
1.35 (1.72)
2.65 (1.74)
.0296
2.49 (2.27)
2.56 (1.57)
–
2.89 (1.78)
3.31 (1.12)
–
GOG
-2.77 (0.83)
-1.38 (1.12)
<.001
-0.04 (0.38)
-0.58 (0.51)
<.001
-2.63 (0.99)
-0.82 (0.8)
<.001
-1.51 (1.23)
-2.63 (0.99)
.0016
-1.51 (1.23)
-0.82 (0.8)
.031
1.24 (1.93)
-1.72 (1.37)
.0027
-1.26 (0.67)
-0.57 (0.67)
<.001
-2.95 (1.18)
-1.95 (0.93)
.0178
-3.93 (1.34)
-3.28 (2.23)
–
IC
-1.91 (0.37)
-1.7 (0.38)
.0602
-1.34 (0.55)
-1.33 (0.28)
–
-1.63 (0.71)
-1.54 (0.4)
–
-1.4 (0.31)
-1.63 (0.71)
–
-1.4 (0.31)
-1.54 (0.4)
–
-1.53 (0.35)
-1.71 (0.37)
.0852
-1.97 (0.44)
-1.73 (1.31)
–
-2.56 (0.93)
-2.63 (0.28)
–
-2.21 (0.43)
-2.18 (0.21)
–
T4G
0.17 (0.09)
0.24 (0.15)
.0689
0.24 (0.06)
0.2 (0.07)
.0701
0.19 (0.06)
0.37 (0.15)
<.001
0.23 (0.11)
0.19 (0.06)
.0992
0.23 (0.11)
0.37 (0.15)
.0013
0.49 (0.12)
0.33 (0.1)
<.001
0.27 (0.08)
0.32 (0.18)
–
0.32 (0.1)
0.39 (0.14)
.0824
0.24 (0.07)
0.22 (0.12)
–
l
s
mean
-0.87 (-0.87)
-0.6 (-0.6)
-0.74 (0.36)
-1.05 (-1.05)
-1.04 (-1.04)
-1.04 (0.24)
2.57 (2.57)
2.45 (2.45)
2.51 (0.83)
-1.73 (-1.73)
-1.47 (-1.47)
-1.61 (1.36)
-1.82 (-1.82)
-1.83 (-1.83)
-1.82 (0.42)
0.27 (0.27)
0.29 (0.29)
0.28 (0.09)
(b) speaker A02
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.35 (0.85)
-1.07 (0.49)
<.001
-1.51 (0.49)
-1.52 (0.34)
–
-1.13 (0.52)
-1.08 (0.4)
–
-1.18 (0.43)
-1.13 (0.52)
–
-1.18 (0.43)
-1.08 (0.4)
–
-0.79 (0.52)
-1 (0.41)
–
-1.09 (1.01)
-1.3 (0.88)
–
0.31 (0.96)
-0.25 (1.32)
–
-0.4 (0.32)
-0.41 (0.35)
–
-1.03 (0.65)
-1.11 (0.32)
–
-1.2 (0.43)
-1.16 (0.82)
–
-0.73 (0.38)
-1.02 (0.49)
.0298
-0.89 (0.39)
-0.73 (0.38)
–
-0.89 (0.39)
-1.02 (0.49)
–
-0.67 (0.31)
-1.35 (0.57)
<.001
-1.67 (0.62)
-1.33 (0.64)
.0646
-0.51 (0.74)
-1.26 (1.01)
.0054
-1.25 (0.39)
-1.25 (0.22)
–
2.58 (1.21)
1.92 (0.86)
.0387
0.66 (2.42)
1.36 (1.07)
–
1.26 (0.98)
2.43 (2.14)
.0292
2.92 (0.77)
1.26 (0.98)
<.001
2.92 (0.77)
2.43 (2.14)
–
4.01 (1.13)
3.88 (0.94)
–
3.09 (1.5)
2.27 (1.28)
.0522
3.19 (1.14)
3.84 (1.17)
.0706
2.29 (1.68)
3.81 (1)
.0011
-3.11 (0.87)
-2.86 (0.47)
–
-1.07 (0.48)
-1.04 (0.45)
–
-2.89 (1.17)
-1.18 (1.12)
<.001
-1.76 (0.87)
-2.89 (1.17)
.0011
-1.76 (0.87)
-1.18 (1.12)
.0499
-1.5 (1.88)
-2.98 (1.24)
.0071
-1.7 (1.2)
-1.07 (1.51)
–
-1.92 (0.81)
-1.99 (1.5)
–
-2.3 (0.82)
-2.74 (0.95)
–
-2.02 (0.55)
-2.31 (0.41)
.0477
-2.25 (0.48)
-2.3 (0.63)
–
-1.97 (0.32)
-1.91 (0.43)
–
-2.31 (0.47)
-1.97 (0.32)
.0056
-2.31 (0.47)
-1.91 (0.43)
.0034
-1.98 (0.54)
-2.45 (0.69)
.0116
-2.19 (0.45)
-2.19 (0.33)
–
-1.18 (0.95)
-1.84 (1.12)
.0333
-2.25 (0.61)
-2.01 (0.82)
–
0.14 (0.05)
0.18 (0.06)
.0221
0.28 (0.14)
0.31 (0.1)
–
0.2 (0.05)
0.3 (0.08)
<.001
0.28 (0.08)
0.2 (0.05)
<.001
0.28 (0.08)
0.3 (0.08)
–
0.27 (0.09)
0.23 (0.07)
.0939
0.34 (0.16)
0.32 (0.11)
–
0.37 (0.12)
0.29 (0.07)
.0083
0.23 (0.05)
0.23 (0.07)
–
l
s
mean
-0.77 (-0.77)
-0.95 (-0.95)
-0.85 (0.52)
-1 (-1)
-1.21 (-1.21)
-1.1 (0.3)
2.5 (2.5)
2.79 (2.79)
2.63 (1.04)
-2.03 (-2.03)
-1.98 (-1.98)
-2.01 (0.76)
-2.02 (-2.02)
-2.14 (-2.14)
-2.08 (0.31)
0.26 (0.26)
0.27 (0.27)
0.26 (0.06)
Table A.29: Voice quality parameters (means and p-values of t-tests). The rows l and s show means for
long-class and short-class vowels.
105
(a) speaker A03
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.79 (0.34)
-0.65 (0.46)
–
-0.1 (0.21)
0.18 (0.52)
.0206
-0.94 (0.85)
-0.63 (0.38)
–
-1.09 (0.22)
-0.94 (0.85)
–
-1.09 (0.22)
-0.63 (0.38)
<.001
-0.79 (0.36)
-0.77 (0.35)
–
-1.43 (0.5)
-0.61 (0.89)
<.001
-1.06 (1.3)
-0.14 (1.01)
.0083
-0.97 (0.66)
-0.7 (0.6)
–
-1.15 (0.17)
-0.93 (0.38)
.0154
-0.4 (0.25)
-0.39 (0.33)
–
-0.7 (0.62)
-0.62 (0.45)
–
-0.8 (0.37)
-0.7 (0.62)
–
-0.8 (0.37)
-0.62 (0.45)
–
-0.19 (0.42)
-0.75 (0.29)
<.001
-1.51 (0.53)
-0.94 (0.56)
<.001
-1.39 (0.54)
-0.8 (0.77)
.0037
-1.06 (0.44)
-0.86 (0.54)
–
0.24 (1.49)
0.69 (1.2)
–
-0.09 (1.03)
-0.02 (1.54)
–
-0.22 (0.65)
0.04 (0.97)
–
-0.23 (0.84)
-0.22 (0.65)
–
-0.23 (0.84)
0.04 (0.97)
–
4.25 (0.72)
1.66 (2.44)
<.001
1.72 (1.17)
0.62 (1.52)
.0079
4.27 (0.99)
2.2 (2.76)
.0037
3.77 (2.74)
3.66 (0.98)
–
-2.66 (0.87)
-0.35 (1.74)
<.001
-1.04 (0.56)
-0.68 (0.7)
.0635
-3.11 (0.96)
-0.91 (1.34)
<.001
-2.98 (0.74)
-3.11 (0.96)
–
-2.98 (0.74)
-0.91 (1.34)
<.001
-3.27 (0.93)
-1.95 (1.62)
.0296
-3.04 (1.45)
-0.1 (1.1)
<.001
-4.32 (0.81)
-2.95 (1.53)
.006
-4.27 (0.72)
-2.55 (1.98)
.0394
-1.78 (0.31)
-1.68 (0.4)
–
-0.93 (0.29)
-0.67 (0.56)
.0527
-1.44 (0.49)
-1.11 (0.55)
.0318
-1.27 (0.41)
-1.44 (0.49)
–
-1.27 (0.41)
-1.11 (0.55)
–
-1.39 (0.65)
-1.5 (0.28)
–
-1.96 (0.39)
-1.51 (0.56)
.0026
-2.6 (0.57)
-1.61 (0.85)
<.001
-1.56 (0.41)
-1.54 (0.47)
–
0.22 (0.04)
0.36 (0.15)
<.001
0.16 (0.04)
0.22 (0.08)
.0022
0.26 (0.37)
0.3 (0.13)
–
0.19 (0.04)
0.26 (0.37)
–
0.19 (0.04)
0.3 (0.13)
<.001
0.15 (0.09)
0.31 (0.14)
<.001
0.24 (0.03)
0.51 (0.27)
<.001
0.15 (0.05)
0.32 (0.12)
<.001
0.11 (0.03)
0.19 (0.08)
<.001
l
s
mean
-0.9 (-0.9)
-0.47 (-0.47)
-0.7 (0.42)
-0.9 (-0.9)
-0.75 (-0.75)
-0.83 (0.36)
1.71 (1.71)
1.27 (1.27)
1.5 (1.72)
-3.09 (-3.09)
-1.36 (-1.36)
-2.28 (1.36)
-1.62 (-1.62)
-1.37 (-1.37)
-1.5 (0.45)
0.18 (0.18)
0.31 (0.31)
0.24 (0.1)
V
SKG
RCG
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.28 (1.24)
-0.69 (0.99)
–
-0.03 (1.03)
-0.17 (0.78)
–
-0.36 (0.36)
-0.81 (0.9)
.0282
-0.39 (0.8)
-0.36 (0.36)
–
-0.39 (0.8)
-0.81 (0.9)
.0889
-1.53 (1.86)
-1.28 (1.42)
–
-0.63 (0.88)
0.11 (1.34)
.0303
0.18 (1.1)
-0.44 (2.22)
–
-0.38 (1.08)
-0.38 (0.66)
–
-0.66 (1.09)
-0.9 (0.73)
–
-0.17 (0.52)
-0.09 (0.55)
–
-0.29 (0.18)
-0.52 (0.83)
–
-0.34 (0.8)
-0.29 (0.18)
–
-0.34 (0.8)
-0.52 (0.83)
–
-1.76 (1.45)
-1.15 (1.16)
–
-0.9 (0.45)
-0.79 (0.51)
–
-0.81 (0.43)
-1.03 (0.79)
–
-0.71 (0.95)
-0.89 (0.63)
–
–
4.45 (0.29)
3.52 (0.36)
<.001
2.8 (1.21)
1.74 (1.1)
.0069
1.83 (1.23)
2.8 (1.21)
.0266
1.83 (1.23)
1.74 (1.1)
–
-1.46 (2.14)
2.64 (2.18)
<.001
3.27 (0.76)
1.46 (0.76)
<.001
2.45 (1.37)
4.42 (0.54)
<.001
3.37 (0.5)
2.75 (1.71)
–
l
s
mean
-0.43 (-0.43)
-0.52 (-0.52)
-0.47 (0.47)
-0.71 (-0.71)
-0.77 (-0.77)
-0.73 (0.43)
2.39 (2.39)
2.75 (2.75)
2.56 (1.52)
(b) speaker A04
OQG
na
na
GOG
IC
T4G
-2.5 (0.71)
-2.69 (0.99)
–
-1.56 (1.07)
-1.93 (1.26)
–
-2.7 (1.5)
-1.54 (1.93)
.0343
-2.01 (1.76)
-2.7 (1.5)
–
-2.01 (1.76)
-1.54 (1.93)
–
–
-2.01 (1.28)
-1.4 (1.72)
–
-0.68 (1.08)
-1.6 (0.76)
.0064
0.6 (0.29)
-1.65 (1.84)
.0014
-1.97 (0.85)
-2.21 (0.5)
–
-1.13 (0.72)
-1.17 (0.45)
–
-1.19 (0.19)
-1.64 (0.72)
.0073
-1.33 (0.59)
-1.19 (0.19)
–
-1.33 (0.59)
-1.64 (0.72)
–
-2.82 (1.29)
-2.38 (1)
–
-1.95 (0.37)
-1.6 (0.84)
.0692
-1.67 (0.2)
-1.99 (0.86)
.0654
-1.99 (0.77)
-1.74 (0.53)
–
0.1 (0.05)
0.16 (0.05)
<.001
0.29 (0.17)
0.24 (0.07)
–
0.24 (0.11)
0.41 (0.25)
.0054
0.27 (0.15)
0.24 (0.11)
–
0.27 (0.15)
0.41 (0.25)
.0218
0.3 (0.13)
0.39 (0.31)
–
0.29 (0.08)
0.41 (0.35)
.0944
0.3 (0.16)
0.19 (0.08)
.016
0.29 (0.08)
0.17 (0.12)
<.001
-1.55 (-1.55)
-1.8 (-1.8)
-1.67 (0.88)
-1.76 (-1.76)
-1.82 (-1.82)
-1.79 (0.48)
0.26 (0.26)
0.28 (0.28)
0.27 (0.09)
na
na
Table A.30: Voice quality parameters (means and p-values of t-tests)
106
APPENDIX A. TABLES AND FIGURES
(a) speaker A05
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
0.48 (0.38)
0.28 (0.43)
–
0.09 (0.62)
0.02 (0.37)
–
-0.74 (0.42)
-0.46 (0.4)
.0261
-0.52 (0.24)
-0.74 (0.42)
.0388
-0.52 (0.24)
-0.46 (0.4)
–
0.13 (0.45)
0.01 (0.62)
–
-0.25 (0.49)
-0.38 (0.45)
–
-0.57 (1.2)
-0.12 (1.4)
–
0.42 (0.33)
0.51 (0.5)
–
-0.66 (0.43)
-0.15 (0.57)
.001
-0.4 (0.43)
-0.43 (0.61)
–
-0.87 (0.36)
-0.86 (0.36)
–
-0.6 (0.25)
-0.87 (0.36)
.004
-0.6 (0.25)
-0.86 (0.36)
.0048
-0.28 (0.32)
-0.47 (0.48)
–
-0.94 (0.29)
-0.81 (0.37)
–
-1.07 (0.52)
-0.45 (1.17)
.0245
-0.64 (0.42)
-0.61 (0.21)
–
4.19 (0.75)
4.25 (0.53)
–
–
3.67 (1.14)
3.78 (1.67)
–
3.83 (0.84)
3.67 (1.14)
–
3.83 (0.84)
3.78 (1.67)
–
3.9 (0.68)
3.82 (0.83)
–
4.02 (0.96)
4.11 (0.41)
–
3.99 (0.64)
4.37 (0.74)
.08
4.69 (0.59)
4.3 (1.12)
–
-1.95 (1.15)
-1.32 (2.08)
–
-0.27 (0.83)
-0.74 (0.47)
.0343
-3.1 (1.04)
-2.57 (0.75)
.0495
-1.59 (1.43)
-3.1 (1.04)
<.001
-1.59 (1.43)
-2.57 (0.75)
.0054
-0.39 (1.57)
-1.28 (2.2)
–
-1.81 (0.82)
-1.63 (0.81)
–
-1.21 (0.84)
-1.96 (1.18)
.0154
-0.73 (1.53)
-2.26 (1.55)
.0021
-1.91 (0.41)
-1.09 (0.63)
<.001
-0.82 (0.9)
-1.08 (0.68)
–
-1.4 (0.32)
-1.56 (0.35)
.0926
-1.56 (0.23)
-1.4 (0.32)
.048
-1.56 (0.23)
-1.56 (0.35)
–
-1.2 (0.41)
-1.09 (0.54)
–
-2.17 (0.38)
-1.66 (0.52)
<.001
-1.72 (0.43)
-1.13 (1.07)
.0172
-1.65 (0.53)
-1.68 (0.53)
–
0.2 (0.08)
0.14 (0.06)
.0022
0.12 (0.02)
0.11 (0.04)
.0886
0.09 (0.05)
0.08 (0.03)
–
0.12 (0.03)
0.09 (0.05)
.0058
0.12 (0.03)
0.08 (0.03)
<.001
0.17 (0.04)
0.18 (0.06)
–
0.16 (0.08)
0.13 (0.03)
–
0.29 (0.15)
0.3 (0.11)
–
0.27 (0.09)
0.22 (0.06)
.0391
l
s
mean
-0.12 (-0.12)
-0.02 (-0.02)
-0.07 (0.4)
-0.68 (-0.68)
-0.54 (-0.54)
-0.62 (0.26)
4.04 (4.04)
4.1 (4.1)
4.07 (0.29)
-1.38 (-1.38)
-1.68 (-1.68)
-1.52 (0.8)
-1.55 (-1.55)
-1.33 (-1.33)
-1.45 (0.37)
0.18 (0.18)
0.16 (0.16)
0.17 (0.07)
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.28 (0.94)
-0.5 (0.43)
–
-0.04 (1.02)
-0.76 (1.79)
–
-0.27 (0.4)
-0.93 (1.19)
.0153
-0.34 (0.46)
-0.27 (0.4)
–
-0.34 (0.46)
-0.93 (1.19)
.0303
-0.87 (0.54)
-0.42 (0.45)
.0027
-0.76 (1.19)
-0.79 (0.51)
–
-0.2 (0.97)
0.29 (1.46)
–
-0.2 (0.48)
-0.25 (0.38)
–
-0.66 (0.72)
-0.86 (0.39)
–
-0.3 (0.64)
-0.99 (1.11)
.0132
-0.47 (0.28)
-1.18 (1.22)
.0098
-0.74 (0.77)
-0.47 (0.28)
–
-0.74 (0.77)
-1.18 (1.22)
–
-0.75 (0.4)
-0.57 (0.57)
–
-0.82 (0.49)
-1.18 (0.27)
.0033
-0.88 (0.69)
-0.62 (0.87)
–
-0.77 (0.5)
-0.52 (0.38)
.0648
4.24 (0.97)
1.68 (1.66)
<.001
1.93 (2.28)
1.5 (2.42)
–
–
0.64 (1.61)
0.15 (1.1)
–
-2.05 (3)
-0.42 (4.22)
–
5.05 (0.3)
1.05 (0.76)
<.001
1.07 (0.92)
-0.69 (3.05)
.0416
0.82 (2.68)
0.08 (3.49)
–
-4.66 (0.52)
-4.38 (0.94)
–
-0.54 (0.85)
-1.42 (2)
.0628
-4.57 (0.8)
-1.64 (1.45)
<.001
-2.79 (0.64)
-4.57 (0.8)
<.001
-2.79 (0.64)
-1.64 (1.45)
.0015
1.65 (0.43)
-0.73 (1.64)
.0013
-3.66 (1.01)
-1.45 (1.56)
<.001
-2.16 (0.08)
-3.04 (1.8)
–
-0.03 (0.84)
0.03 (0.13)
–
-1.6 (0.58)
-1.53 (0.47)
–
-1.23 (1.17)
-1.46 (1.08)
–
-1.06 (0.29)
-1.77 (1.16)
.0073
-1.49 (0.88)
-1.06 (0.29)
.0297
-1.49 (0.88)
-1.77 (1.16)
–
-1.94 (0.51)
-1.52 (0.37)
.0024
-1.75 (0.43)
-1.72 (0.42)
–
-1.91 (0.62)
-1.34 (0.57)
.0021
-1.69 (0.4)
-1.57 (0.59)
–
0.08 (0.06)
0.17 (0.08)
<.001
0.33 (0.3)
0.41 (0.32)
–
0.05 (0.01)
0.3 (0.16)
<.001
0.19 (0.12)
0.05 (0.01)
<.001
0.19 (0.12)
0.3 (0.16)
.0143
0.35 (0.19)
0.23 (0.11)
.0119
0.09 (0.01)
0.28 (0.1)
<.001
0.3 (0.13)
0.27 (0.15)
–
0.38 (0.16)
0.3 (0.11)
.0712
l
s
mean
-0.37 (-0.37)
-0.48 (-0.48)
-0.42 (0.34)
-0.67 (-0.67)
-0.85 (-0.85)
-0.75 (0.25)
1.67 (1.67)
0.53 (0.53)
1.15 (1.9)
-2.09 (-2.09)
-1.8 (-1.8)
-1.96 (1.88)
-1.58 (-1.58)
-1.56 (-1.56)
-1.57 (0.24)
0.22 (0.22)
0.28 (0.28)
0.25 (0.11)
na
na
(b) speaker A06
na
na
–
na
na
Table A.31: Voice quality parameters (means and p-values of t-tests)
107
(a) speaker A07
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.9 (0.32)
-0.32 (0.35)
<.001
-1.12 (0.52)
-0.66 (0.73)
.0172
-0.06 (0.58)
-0.17 (0.54)
–
-0.34 (0.43)
-0.06 (0.58)
.0566
-0.34 (0.43)
-0.17 (0.54)
–
0.02 (0.37)
0.06 (0.87)
–
-0.94 (0.82)
-0.77 (0.7)
–
0.83 (1.83)
0.24 (1.36)
–
0.29 (0.64)
-0.58 (0.94)
.0015
-0.3 (0.39)
-0.09 (0.43)
.0988
-1.4 (0.55)
-0.39 (1.03)
<.001
-0.58 (0.26)
-0.74 (0.23)
.0336
-0.7 (0.43)
-0.58 (0.26)
–
-0.7 (0.43)
-0.74 (0.23)
–
-0.14 (0.66)
-0.15 (0.86)
–
-1.05 (0.52)
-0.79 (0.6)
–
-0.34 (1.19)
-0.36 (0.63)
–
-0.1 (0.44)
-0.35 (0.42)
.0612
4.57 (0.51)
2.12 (1.58)
<.001
0.88 (0.5)
0.52 (0.98)
–
1.06 (2.25)
2.04 (1.05)
.0827
1.92 (1.72)
1.06 (2.25)
–
1.92 (1.72)
2.04 (1.05)
–
2.45 (2.65)
3.79 (1.04)
.0796
1.57 (1.05)
1.3 (2.24)
–
1.85 (1.71)
3.13 (2.43)
–
0.54 (2.73)
1.84 (2.09)
–
-5.15 (0.13)
-4.42 (0.66)
<.001
0.2 (0.95)
0.13 (0.83)
–
-2.96 (1.19)
-2.12 (1.33)
.0273
-2.47 (0.65)
-2.96 (1.19)
.0898
-2.47 (0.65)
-2.12 (1.33)
–
-0.13 (0.76)
-0.89 (1.62)
–
-3.31 (1.46)
-1.86 (1.2)
<.001
-0.55 (1.12)
-1.99 (2.57)
.0406
-1.71 (1.56)
-2.75 (1.88)
–
-1.16 (0.34)
-1.07 (0.43)
–
-1.73 (0.65)
-1.08 (1.16)
.0233
-0.76 (0.5)
-0.82 (0.34)
–
-0.88 (0.45)
-0.76 (0.5)
–
-0.88 (0.45)
-0.82 (0.34)
–
-1.06 (0.79)
-0.83 (0.68)
–
-1.21 (0.49)
-1.21 (0.51)
–
-1.16 (0.85)
-1.2 (0.59)
–
-1.17 (0.59)
-1.42 (0.3)
.0763
0.05 (0.02)
0.13 (0.07)
<.001
0.88 (0.21)
0.62 (0.25)
<.001
0.17 (0.08)
0.31 (0.29)
.0301
0.14 (0.07)
0.17 (0.08)
–
0.14 (0.07)
0.31 (0.29)
.0135
0.25 (0.15)
0.16 (0.09)
.0107
0.14 (0.09)
0.29 (0.17)
<.001
0.45 (0.52)
0.33 (0.35)
–
0.18 (0.11)
0.16 (0.12)
–
l
s
mean
-0.28 (-0.28)
-0.31 (-0.31)
-0.29 (0.54)
-0.58 (-0.58)
-0.41 (-0.41)
-0.5 (0.38)
1.86 (1.86)
2.11 (2.11)
1.97 (1.15)
-2.01 (-2.01)
-1.99 (-1.99)
-2 (1.59)
-1.14 (-1.14)
-1.09 (-1.09)
-1.12 (0.25)
0.28 (0.28)
0.28 (0.28)
0.28 (0.22)
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.7 (0.34)
-0.98 (0.5)
.0293
0.29 (0.25)
0.03 (0.32)
.0036
-1.16 (0.2)
-0.88 (0.61)
.0405
-0.87 (0.46)
-1.16 (0.2)
.0089
-0.87 (0.46)
-0.88 (0.61)
–
0.52 (0.4)
0 (0.39)
<.001
-1.11 (0.72)
-0.11 (0.48)
<.001
1 (1.62)
0.01 (1.09)
.0174
-0.18 (0.4)
-0.2 (0.29)
–
-0.7 (0.35)
-0.85 (0.5)
–
0.04 (0.26)
0.02 (0.33)
–
-1.05 (0.41)
-0.69 (0.71)
.0384
-0.74 (0.38)
-1.05 (0.41)
.0098
-0.74 (0.38)
-0.69 (0.71)
–
-0.1 (0.31)
-0.2 (0.32)
–
-1.03 (0.31)
-0.49 (0.23)
<.001
-0.06 (0.71)
-0.59 (0.69)
.0129
0 (0.15)
-0.21 (0.4)
.0393
4.05 (1.12)
3.25 (2.3)
–
1.22 (0.9)
2.1 (1.19)
.0086
4.65 (0.76)
1.04 (1.49)
<.001
2.86 (1.01)
4.65 (0.76)
<.001
2.86 (1.01)
1.04 (1.49)
<.001
–
-3.61 (1.1)
-3.99 (1.33)
–
-0.2 (0.53)
-0.13 (0.51)
–
-4.79 (0.43)
-1.95 (0.85)
<.001
-2.79 (1.34)
-4.79 (0.43)
<.001
-2.79 (1.34)
-1.95 (0.85)
.0409
-1.5 (0.86)
-1.32 (1.31)
–
-2.62 (1.67)
-0.89 (1.1)
<.001
-1.64 (1.34)
-1.77 (1.1)
–
-2.7 (1.53)
-4.42 (0.9)
<.001
-2.09 (0.4)
-2.33 (0.45)
.0499
-0.98 (0.75)
-0.9 (0.34)
–
-1.74 (0.26)
-1.62 (0.49)
–
-1.59 (0.54)
-1.74 (0.26)
–
-1.59 (0.54)
-1.62 (0.49)
–
-0.95 (0.3)
-1.28 (0.4)
.0026
-2.4 (0.37)
-1.78 (0.29)
<.001
-1.71 (0.92)
-1.69 (0.36)
–
-1.47 (0.3)
-1.38 (0.21)
–
0.18 (0.1)
0.16 (0.07)
–
0.28 (0.06)
0.3 (0.13)
–
0.11 (0.05)
0.24 (0.08)
<.001
0.2 (0.12)
0.11 (0.05)
.0025
0.2 (0.12)
0.24 (0.08)
–
0.13 (0.05)
0.15 (0.09)
–
0.37 (0.16)
0.33 (0.11)
–
0.19 (0.08)
0.11 (0.06)
<.001
0.1 (0.08)
0.05 (0.02)
.005
l
s
mean
-0.28 (-0.28)
-0.3 (-0.3)
-0.29 (0.64)
-0.46 (-0.46)
-0.43 (-0.43)
-0.44 (0.39)
3.22 (3.22)
2.68 (2.68)
2.98 (1.39)
-2.48 (-2.48)
-2.07 (-2.07)
-2.29 (1.45)
-1.62 (-1.62)
-1.57 (-1.57)
-1.59 (0.46)
0.2 (0.2)
0.19 (0.19)
0.19 (0.09)
(b) speaker A08
na
na
–
2.16 (1.08)
2.03 (1.17)
–
4.37 (0.75)
5 (0.58)
.0342
na
na
Table A.32: Voice quality parameters (means and p-values of t-tests)
108
APPENDIX A. TABLES AND FIGURES
(a) speaker A09
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.16 (0.94)
-1.2 (0.99)
<.001
-0.87 (0.73)
0.19 (0.73)
<.001
-0.44 (0.34)
-1.12 (0.79)
<.001
-0.21 (0.44)
-0.44 (0.34)
.0443
-0.21 (0.44)
-1.12 (0.79)
<.001
0.1 (0.72)
-0.17 (0.94)
–
-1.37 (0.59)
-0.92 (0.45)
.0047
0.73 (2.37)
-0.93 (2.09)
.017
0 (0.56)
-0.3 (0.47)
.0609
-0.73 (0.71)
-1.25 (0.7)
.0136
-0.78 (0.4)
-0.07 (0.62)
<.001
-0.07 (0.37)
-0.8 (0.48)
<.001
-0.01 (0.45)
-0.07 (0.37)
–
-0.01 (0.45)
-0.8 (0.48)
<.001
-0.11 (0.5)
-0.27 (0.73)
–
-0.86 (0.38)
-0.93 (0.58)
–
0.15 (1.06)
-1.13 (0.47)
<.001
-0.59 (0.46)
-0.85 (0.58)
–
3.05 (0.94)
1.82 (0.95)
.0043
-0.06 (1.46)
1.14 (2.63)
.0921
3.89 (0.89)
2.05 (0.59)
<.001
3.65 (1.34)
3.89 (0.89)
–
3.65 (1.34)
2.05 (0.59)
<.001
–
4.03 (0.77)
1.9 (1.49)
<.001
2.62 (1.32)
0.91 (2.03)
–
1.77 (1.9)
4.92 (0.48)
.0017
-3.43 (1.76)
-2.04 (1.22)
.0175
1.05 (1.67)
0 (1.05)
.0402
-3.42 (1.15)
-2.36 (1.88)
.0361
-4.05 (0.58)
-3.42 (1.15)
.0364
-4.05 (0.58)
-2.36 (1.88)
<.001
-1.3 (2.31)
-2.74 (0.7)
.0205
-4.09 (0.97)
-2.05 (1.19)
<.001
-1.91 (2.39)
-4.1 (0.74)
.0013
-2.7 (1.67)
-3.95 (1.54)
.0778
-2.07 (0.71)
-2.33 (0.55)
–
-1.34 (0.4)
-1 (0.54)
.0229
-1.37 (0.38)
-1.99 (0.38)
<.001
-1.45 (0.3)
-1.37 (0.38)
–
-1.45 (0.3)
-1.99 (0.38)
<.001
-0.9 (0.49)
-1.91 (0.5)
<.001
-2.31 (0.4)
-2.16 (0.42)
–
-1.45 (0.94)
-2.47 (0.47)
<.001
-1.8 (0.41)
-2.18 (0.42)
.004
0.15 (0.11)
0.25 (0.1)
.0026
0.89 (0.28)
0.62 (0.54)
.0376
0.11 (0.07)
0.2 (0.08)
<.001
0.18 (0.06)
0.11 (0.07)
<.001
0.18 (0.06)
0.2 (0.08)
–
0.28 (0.27)
0.11 (0.08)
.0046
0.16 (0.05)
0.29 (0.12)
<.001
0.23 (0.18)
0.64 (1.22)
.0938
0.17 (0.1)
0.07 (0.05)
<.001
l
s
mean
-0.28 (-0.28)
-0.64 (-0.64)
-0.44 (0.6)
-0.37 (-0.37)
-0.76 (-0.76)
-0.55 (0.45)
2.71 (2.71)
2.12 (2.12)
2.44 (1.42)
-2.48 (-2.48)
-2.46 (-2.46)
-2.47 (1.52)
-1.59 (-1.59)
-2.01 (-2.01)
-1.78 (0.5)
0.27 (0.27)
0.31 (0.31)
0.29 (0.23)
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-1.62 (1.09)
-0.5 (0.28)
<.001
-0.85 (1.1)
-0.9 (2.01)
–
-0.1 (0.99)
-0.76 (1.12)
.0354
-0.11 (0.75)
-0.1 (0.99)
–
-0.11 (0.75)
-0.76 (1.12)
.0242
-0.26 (1.67)
-0.95 (1.05)
–
-1.65 (1.24)
-1.45 (1.54)
–
-0.86 (2.18)
-1.44 (2.12)
–
-1 (0.75)
-1.62 (1.73)
–
-1.31 (0.92)
-0.44 (0.36)
<.001
0 (1.16)
-0.32 (1.58)
–
-0.54 (0.73)
-0.87 (0.76)
–
-0.45 (0.97)
-0.54 (0.73)
–
-0.45 (0.97)
-0.87 (0.76)
–
-0.72 (1.45)
-0.98 (1.01)
–
-0.96 (0.46)
-1.05 (1.09)
–
-1.12 (1.47)
-1.42 (0.73)
–
-0.98 (0.72)
-1.93 (1.27)
.0059
4.48 (0.71)
3.76 (1.2)
.0729
1.89 (0.79)
1.62 (1.08)
–
4.8 (0.36)
0.57 (1.74)
<.001
3.28 (1.18)
4.8 (0.36)
<.001
3.28 (1.18)
0.57 (1.74)
<.001
-0.59 (3.12)
0.08 (2.14)
–
3.13 (0.82)
0.94 (0.99)
<.001
1.32 (2.13)
1.28 (1.54)
–
-0.83 (2.23)
-1.01 (1.3)
–
-3.75 (0.16)
-3.32 (0.88)
.0341
-0.48 (2)
-0.06 (0.56)
–
-3.43 (0.95)
-2.61 (1.9)
–
-2.09 (1.43)
-3.43 (0.95)
.0014
-2.09 (1.43)
-2.61 (1.9)
–
1.77 (0.13)
-1.31 (2.11)
.0305
-2.96 (0.52)
-1.85 (0.65)
<.001
2.05 (0.38)
-2.08 (1.16)
<.001
–
-2.15 (0.94)
-0.94 (0.55)
<.001
-1.72 (1.02)
-1.42 (1.31)
–
-1.1 (0.71)
-1.66 (0.74)
.0108
-1.62 (0.64)
-1.1 (0.71)
.0113
-1.62 (0.64)
-1.66 (0.74)
–
-1.6 (1.34)
-2.41 (1.58)
.0748
-2.32 (0.76)
-2.13 (1.14)
–
-2.87 (1.3)
-2.63 (1.07)
–
-1.95 (0.83)
-2.7 (1.33)
.038
0.13 (0.11)
0.12 (0.07)
–
0.58 (0.24)
0.28 (0.14)
<.001
0.06 (0.03)
0.37 (0.21)
<.001
0.21 (0.12)
0.06 (0.03)
<.001
0.21 (0.12)
0.37 (0.21)
.0027
0.31 (0.2)
0.3 (0.12)
–
0.16 (0.04)
0.4 (0.17)
<.001
0.36 (0.14)
0.28 (0.14)
.0724
0.26 (0.13)
0.26 (0.06)
–
l
s
mean
-0.8 (-0.8)
-1.09 (-1.09)
-0.94 (0.54)
-0.76 (-0.76)
-1 (-1)
-0.87 (0.48)
2.19 (2.19)
1.04 (1.04)
1.65 (1.89)
-1.27 (-1.27)
-1.87 (-1.87)
-1.55 (1.89)
-1.92 (-1.92)
-1.98 (-1.98)
-1.95 (0.58)
0.26 (0.26)
0.29 (0.29)
0.27 (0.13)
na
na
(b) speaker A10
na
na
Table A.33: Voice quality parameters (means and p-values of t-tests)
109
(a) speaker B01
V
SKG
RCG
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
0.49 (0.59)
-0.19 (0.43)
<.001
-1.37 (0.3)
-0.76 (0.55)
<.001
0.63 (0.59)
0.1 (0.61)
.0034
0.1 (0.6)
0.63 (0.59)
.003
0.1 (0.6)
0.1 (0.61)
–
-0.7 (0.37)
0.11 (0.7)
<.001
-0.61 (2.9)
0.1 (1.11)
–
1.07 (1.78)
-0.17 (1.1)
.0358
-0.75 (0.71)
0.02 (1.34)
.0312
0.04 (0.62)
-0.36 (0.46)
.0135
-1.68 (0.48)
-0.75 (0.77)
<.001
0.37 (0.39)
-0.16 (0.52)
<.001
-0.03 (0.68)
0.37 (0.39)
.0178
-0.03 (0.68)
-0.16 (0.52)
–
-1.83 (1.33)
0.13 (0.53)
<.001
-0.48 (0.58)
0.08 (0.47)
<.001
-0.73 (1.25)
-0.7 (0.76)
–
-1.14 (0.9)
-0.01 (0.93)
<.001
OQG
–
1.01 (0.76)
1.78 (1.28)
.0454
5.03 (0.14)
2.14 (2.43)
<.001
2.06 (1.63)
5.03 (0.14)
<.001
2.06 (1.63)
2.14 (2.43)
–
-3.68 (1.33)
0.99 (3.79)
<.001
3.64 (0.93)
2.61 (1.63)
.02
-1.58 (2.6)
0.87 (3.6)
–
-3.09 (1.45)
1.6 (2.16)
<.001
l
s
mean
-0.14 (-0.14)
-0.11 (-0.11)
-0.13 (0.63)
-0.68 (-0.68)
-0.25 (-0.25)
-0.48 (0.66)
0.48 (0.48)
1.67 (1.67)
1.03 (2.49)
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
0.04 (0.34)
-0.21 (1.02)
–
0.44 (0.9)
0.53 (0.74)
–
0.53 (0.55)
-0.33 (0.71)
<.001
0.43 (0.7)
0.53 (0.55)
–
0.43 (0.7)
-0.33 (0.71)
<.001
-0.69 (0.41)
-0.35 (0.96)
–
0.15 (2.17)
0.26 (0.54)
–
-2.01 (2.52)
0.75 (0.7)
.0013
-0.54 (0.75)
-0.8 (0.88)
–
0.06 (0.33)
0.02 (0.84)
–
0.5 (0.43)
0.6 (0.5)
–
0.08 (0.3)
-0.2 (0.51)
.027
0.31 (0.56)
0.08 (0.3)
.078
0.31 (0.56)
-0.2 (0.51)
.0017
-1.2 (0.41)
-0.38 (0.66)
<.001
-0.71 (0.73)
-0.44 (0.28)
–
-1.45 (1.07)
0.18 (0.3)
<.001
-0.51 (0.56)
-0.31 (0.95)
–
4.89 (0.35)
3.51 (0.99)
<.001
4.22 (0.65)
4.12 (0.56)
–
–
4.25 (0.84)
1.6 (2.33)
<.001
-2.1 (1.1)
0.09 (3)
.0311
2.58 (1.82)
1.59 (1.72)
–
-2.69 (1.27)
1.83 (2.8)
<.001
0.13 (1.21)
5.17 (0.11)
<.001
-1.97 (0.45)
0.63 (1.11)
<.001
1.73 (1.31)
0.06 (1.14)
<.001
-2.19 (0.32)
-1.18 (2.04)
.0665
0.03 (1.6)
-2.19 (0.32)
<.001
0.03 (1.6)
-1.18 (2.04)
.0564
-3.33 (1.47)
-1.86 (1.4)
–
-1.91 (1.4)
-1.42 (1.51)
–
–
-1.63 (0.25)
-1.63 (0.6)
–
-0.73 (0.39)
-0.74 (0.35)
–
-1.16 (0.38)
-1.32 (0.56)
–
-0.65 (0.51)
-1.16 (0.38)
<.001
-0.65 (0.51)
-1.32 (0.56)
<.001
-1.99 (0.43)
-1.7 (0.66)
.077
-2.4 (0.8)
-1.45 (0.39)
<.001
-2.84 (0.71)
-1.37 (0.7)
<.001
-1.7 (0.46)
-1.85 (0.67)
–
0.12 (0.04)
0.66 (0.49)
<.001
0.67 (0.36)
0.33 (0.14)
<.001
0.08 (0.01)
0.43 (0.33)
<.001
0.21 (0.07)
0.08 (0.01)
<.001
0.21 (0.07)
0.43 (0.33)
.0042
0.3 (0.03)
0.31 (0.22)
–
0.35 (0.13)
0.56 (0.4)
.021
0.22 (0.07)
0.36 (0.09)
<.001
0.37 (0.06)
0.14 (0.03)
<.001
l
s
mean
-0.21 (-0.21)
-0.02 (-0.02)
-0.12 (0.71)
-0.37 (-0.37)
-0.08 (-0.08)
-0.23 (0.58)
1.61 (1.61)
2.72 (2.72)
2.12 (2.6)
-1.27 (-1.27)
-0.75 (-0.75)
-1.04 (1.48)
-1.64 (-1.64)
-1.44 (-1.44)
-1.54 (0.6)
0.29 (0.29)
0.4 (0.4)
0.34 (0.18)
na
na
GOG
IC
T4G
-2.42 (1.03)
-1.02 (1.86)
.0593
–
-1.28 (0.67)
-1.55 (0.49)
–
-2.21 (0.35)
-1.76 (0.57)
.0025
-0.41 (0.49)
-0.93 (0.47)
<.001
-0.91 (0.6)
-0.41 (0.49)
.0032
-0.91 (0.6)
-0.93 (0.47)
–
-2.46 (1.2)
-1.06 (0.56)
<.001
-1.68 (0.73)
-1.15 (0.51)
.0064
-2.06 (1.52)
-1.44 (0.91)
.092
-2.32 (0.76)
-0.95 (0.83)
<.001
0.08 (0.03)
0.38 (0.2)
<.001
0.76 (0.22)
1.31 (0.88)
.0067
0.13 (0.08)
0.54 (0.35)
<.001
0.33 (0.23)
0.13 (0.08)
<.001
0.33 (0.23)
0.54 (0.35)
.017
0.11 (0.06)
0.28 (0.13)
<.001
0.57 (0.46)
0.67 (0.28)
–
0.27 (0.13)
0.16 (0.06)
<.001
0.14 (0.06)
0.34 (0.19)
<.001
-1.3 (-1.3)
-0.6 (-0.6)
-0.99 (1.28)
-1.67 (-1.67)
-1.26 (-1.26)
-1.48 (0.6)
0.3 (0.3)
0.52 (0.52)
0.4 (0.33)
na
na
–
-1.57 (1.33)
0.36 (1.19)
<.001
-1.89 (1.44)
-1.57 (1.33)
–
-1.89 (1.44)
0.36 (1.19)
<.001
na
na
–
0.83 (2.63)
0.64 (1.57)
–
-1.45 (0.65)
-2.38 (2.03)
–
na
na
(b) speaker B02
na
na
–
na
na
na
na
–
na
na
Table A.34: Voice quality parameters (means and p-values of t-tests)
110
APPENDIX A. TABLES AND FIGURES
(a) speaker B03
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.36 (0.36)
-0.17 (0.61)
–
0.63 (0.5)
0.42 (0.55)
–
-0.34 (0.64)
-0.24 (0.39)
–
-0.39 (0.43)
-0.34 (0.64)
–
-0.39 (0.43)
-0.24 (0.39)
–
-0.3 (0.47)
-0.61 (0.39)
.0179
-0.73 (0.44)
0.83 (0.64)
<.001
0.13 (1.08)
0.4 (1.11)
–
0.21 (0.38)
-0.54 (0.52)
<.001
-0.64 (0.43)
-0.47 (0.39)
–
0.34 (0.55)
0.45 (0.4)
–
-0.12 (0.55)
-0.17 (0.35)
–
-0.09 (0.48)
-0.12 (0.55)
–
-0.09 (0.48)
-0.17 (0.35)
–
0.26 (0.26)
-0.48 (0.59)
<.001
-0.61 (0.54)
-0.2 (0.53)
.0102
-0.09 (0.6)
-0.5 (0.48)
.0129
-0.28 (0.34)
-0.85 (0.36)
<.001
2.29 (0.59)
3.52 (0.98)
<.001
4.08 (0.66)
4.43 (0.74)
–
2.41 (0.81)
3.57 (1.06)
<.001
3.21 (1.19)
2.41 (0.81)
.0095
3.21 (1.19)
3.57 (1.06)
–
3.45 (0.63)
2.7 (1.06)
.0576
2.59 (1.93)
3.18 (0.88)
–
4.22 (0.74)
2.55 (1.53)
<.001
4.54 (0.52)
3.5 (1.44)
.03
-2.43 (1.17)
-1.15 (0.8)
<.001
0.38 (0.64)
0.2 (0.69)
–
-2.8 (1.04)
-1.08 (1.14)
<.001
-2.3 (1.15)
-2.8 (1.04)
–
-2.3 (1.15)
-1.08 (1.14)
<.001
-1.78 (1.05)
-2.53 (1.18)
.045
-2.66 (0.88)
-0.62 (1.11)
<.001
-1.62 (1.32)
-1.76 (1.67)
–
-2.93 (1.2)
-2.74 (1.54)
–
-1.3 (0.43)
-1.24 (0.57)
–
-0.67 (0.45)
-1.08 (0.41)
.0038
-1.18 (0.3)
-1.32 (0.21)
.059
-1.32 (0.33)
-1.18 (0.3)
–
-1.32 (0.33)
-1.32 (0.21)
–
-1.34 (0.44)
-1.57 (0.54)
–
-1.33 (0.51)
-1.28 (0.58)
–
-1.46 (0.72)
-1.57 (0.36)
–
-1.66 (0.39)
-1.59 (0.34)
–
0.12 (0.05)
0.18 (0.04)
<.001
0.15 (0.05)
0.18 (0.06)
–
0.11 (0.08)
0.2 (0.09)
<.001
0.14 (0.05)
0.11 (0.08)
–
0.14 (0.05)
0.2 (0.09)
.0081
0.22 (0.1)
0.2 (0.06)
–
0.19 (0.06)
0.28 (0.09)
<.001
0.31 (0.07)
0.29 (0.13)
–
0.18 (0.04)
0.17 (0.04)
–
l
s
mean
-0.14 (-0.14)
0.01 (0.01)
-0.07 (0.48)
-0.15 (-0.15)
-0.32 (-0.32)
-0.23 (0.38)
3.35 (3.35)
3.35 (3.35)
3.35 (0.74)
-2.02 (-2.02)
-1.38 (-1.38)
-1.72 (1.07)
-1.28 (-1.28)
-1.38 (-1.38)
-1.33 (0.24)
0.18 (0.18)
0.21 (0.21)
0.19 (0.06)
(b) speaker B04
V
SKG
RCG
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
1.35 (0.59)
1.51 (0.5)
–
2.01 (0.34)
1.66 (0.7)
.0357
0.85 (0.63)
0.73 (0.53)
–
0.74 (0.27)
0.85 (0.63)
–
0.74 (0.27)
0.73 (0.53)
–
0.77 (0.57)
0.76 (0.41)
–
1.08 (1.79)
1.51 (0.52)
–
1.89 (0.49)
1.31 (1.61)
.0834
1.03 (0.66)
0.98 (0.44)
–
0.8 (0.5)
0.67 (0.34)
–
0.45 (0.35)
0.54 (0.36)
–
0.57 (0.42)
0.26 (0.4)
.009
0.45 (0.29)
0.57 (0.42)
–
0.45 (0.29)
0.26 (0.4)
.0741
0.53 (0.71)
0.2 (0.47)
.0646
-0.1 (1.63)
0.81 (1.21)
.0349
1.47 (0.57)
0.66 (1.04)
.0014
0.9 (0.42)
0.37 (0.3)
<.001
l
s
mean
1.22 (1.22)
1.21 (1.21)
1.21 (0.43)
0.63 (0.63)
0.5 (0.5)
0.57 (0.36)
OQG
GOG
IC
T4G
–
0.7 (1.16)
0.32 (0.91)
–
1.57 (0.67)
0.89 (0.97)
.0097
-0.2 (2)
0.21 (1.44)
–
-0.17 (1.25)
-0.2 (2)
–
-0.17 (1.25)
0.21 (1.44)
–
1.96 (2.02)
-0.3 (2.35)
.0017
1.3 (2.01)
0.19 (1.01)
.0709
2.26 (2.03)
0.89 (1.49)
.0538
1.87 (1.46)
0.96 (1.7)
.0799
-0.34 (0.55)
0.03 (0.31)
.008
0.26 (0.21)
0.25 (0.55)
–
0.1 (0.51)
0 (0.46)
–
-0.27 (0.21)
0.1 (0.51)
.0015
-0.27 (0.21)
0 (0.46)
.0162
-0.43 (0.6)
-0.81 (0.34)
.0092
-0.58 (1.29)
0.29 (0.81)
.0083
0.44 (0.59)
-0.31 (1.02)
.0031
-0.55 (0.51)
-0.75 (0.4)
–
0.1 (0.03)
0.12 (0.04)
.0413
0.21 (0.06)
0.16 (0.06)
.0081
0.15 (0.09)
0.12 (0.05)
–
0.18 (0.1)
0.15 (0.09)
–
0.18 (0.1)
0.12 (0.05)
.0175
0.42 (0.23)
0.17 (0.09)
<.001
0.49 (0.25)
0.2 (0.06)
<.001
0.64 (0.35)
0.56 (0.48)
–
0.28 (0.14)
0.13 (0.04)
<.001
3.93 (3.93)
3.73 (3.73)
3.84 (1.45)
1.16 (1.16)
0.45 (0.45)
0.83 (0.83)
-0.17 (-0.17)
-0.19 (-0.19)
-0.18 (0.4)
0.31 (0.31)
0.21 (0.21)
0.26 (0.18)
na
na
–
na
na
–
3.34 (1.35)
4.65 (1.03)
.0532
4.73 (0.39)
3.34 (1.35)
.0503
4.73 (0.39)
4.65 (1.03)
–
5.21 (0.34)
4.92 (0.46)
–
3.99 (0.22)
4.57 (0.51)
.0131
2.35 (2.83)
0.78 (1.53)
–
na
na
Table A.35: Voice quality parameters (means and p-values of t-tests)
111
(a) speaker B05
V
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
SKG
0.25 (0.25)
0.59 (0.67)
.0273
1.35 (0.58)
0.92 (0.54)
.0157
-0.53 (0.47)
-0.5 (0.6)
–
-0.41 (1.12)
-0.53 (0.47)
–
-0.41 (1.12)
-0.5 (0.6)
–
-0.62 (0.69)
-0.12 (0.64)
.0123
-0.21 (1.08)
0.5 (0.32)
.0046
-0.47 (1.83)
0.51 (0.89)
.0243
-0.12 (0.63)
-0.05 (0.6)
–
RCG
-0.05 (0.33)
0.06 (0.39)
–
0.12 (0.69)
0.09 (0.55)
–
-0.08 (0.3)
-0.41 (0.66)
.0318
-0.11 (1.06)
-0.08 (0.3)
–
-0.11 (1.06)
-0.41 (0.66)
–
-0.4 (0.78)
-0.01 (0.81)
.0934
-1.11 (0.27)
-0.64 (0.22)
<.001
-0.64 (1.25)
-0.43 (0.75)
–
-0.01 (0.45)
-0.29 (0.34)
.0225
l
s
mean
-0.09 (-0.09)
0.26 (0.26)
0.07 (0.59)
-0.28 (-0.28)
-0.23 (-0.23)
-0.26 (0.35)
OQG
4.51 (0.5)
2.29 (1.37)
<.001
1.32 (1.08)
1.33 (1.58)
–
–
–
-2.86 (1.74)
-1.26 (1.45)
.0033
-4.48 (1.31)
-1.61 (1.32)
.0459
-0.35 (2.88)
-1.58 (0.63)
–
IC
-1.11 (0.34)
-0.57 (0.55)
<.001
-0.25 (0.4)
-0.24 (0.83)
–
-0.85 (0.38)
-1.21 (0.51)
.009
-1.39 (0.75)
-0.85 (0.38)
.0031
-1.39 (0.75)
-1.21 (0.51)
–
-1.15 (0.31)
-0.9 (0.39)
.0168
-2.04 (0.69)
-0.83 (0.31)
<.001
-1.77 (1.46)
-0.96 (0.88)
.0264
-1.18 (0.62)
-0.89 (0.43)
.0776
T4G
0.13 (0.06)
0.25 (0.09)
<.001
0.19 (0.07)
0.23 (0.07)
–
0.07 (0.03)
0.23 (0.15)
<.001
0.18 (0.26)
0.07 (0.03)
.0521
0.18 (0.26)
0.23 (0.15)
–
0.29 (0.08)
0.15 (0.13)
<.001
0.17 (0.1)
0.26 (0.16)
.0207
0.32 (0.16)
0.22 (0.1)
.0093
0.41 (0.14)
0.1 (0.04)
<.001
2.03 (2.03)
2.2 (2.2)
2.11 (1.89)
-2.18 (-2.18)
-1.61 (-1.61)
-1.92 (1.43)
-1.22 (-1.22)
-0.8 (-0.8)
-1.02 (0.49)
0.22 (0.22)
0.2 (0.2)
0.21 (0.09)
na
na
–
na
na
–
4.12 (0.66)
0.2 (1.47)
<.001
-1.68 (1.69)
2.45 (1.2)
<.001
3.95 (0.92)
1.45 (1.75)
<.001
-0.02 (2.86)
3.48 (1.1)
<.001
na
na
GOG
-0.58 (2.2)
-2.02 (1.64)
.0258
-0.17 (0.91)
-0.3 (1.04)
–
-4.15 (0.7)
-2.87 (1.26)
<.001
-2.69 (3.49)
-4.15 (0.7)
–
-2.69 (3.49)
-2.87 (1.26)
–
na
na
(b) speaker B06
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
0.04 (0.5)
0.26 (0.71)
–
1.17 (0.72)
0.21 (0.71)
<.001
-0.5 (0.4)
0.15 (0.52)
<.001
0.14 (0.41)
-0.5 (0.4)
<.001
0.14 (0.41)
0.15 (0.52)
–
0.06 (0.56)
0.04 (0.24)
–
-0.68 (0.72)
-0.04 (0.5)
<.001
0.07 (1.37)
0.63 (0.78)
.0937
-0.1 (0.53)
0.53 (0.35)
<.001
-0.37 (0.32)
0.04 (0.75)
.0199
0.28 (0.59)
-0.42 (0.48)
<.001
0.06 (0.31)
0.15 (0.23)
–
0.23 (0.31)
0.06 (0.31)
.0614
0.23 (0.31)
0.15 (0.23)
–
-0.31 (0.35)
0 (0.29)
.0019
-1.36 (0.34)
-0.78 (0.44)
<.001
-0.57 (0.57)
-0.27 (0.63)
.0916
-0.41 (0.43)
-0.08 (0.34)
.0097
2.3 (0.6)
3.47 (1.01)
<.001
3.88 (1.02)
3.69 (0.94)
–
2.59 (0.75)
3.02 (0.86)
–
2.77 (1.01)
2.59 (0.75)
–
2.77 (1.01)
3.02 (0.86)
–
4.6 (0.37)
4.25 (0.69)
–
3.67 (0.62)
2.41 (1.24)
<.001
4.23 (1.04)
4 (0.82)
–
4.71 (0.22)
4.33 (1.09)
–
-1.91 (0.89)
-0.16 (0.83)
<.001
1.31 (1.13)
0.16 (0.54)
<.001
-1.74 (0.77)
0.39 (1.39)
<.001
-0.32 (0.69)
-1.74 (0.77)
<.001
-0.32 (0.69)
0.39 (1.39)
.0467
-2.67 (0.79)
-0.93 (1.23)
<.001
-1 (0.74)
-0.67 (0.65)
–
-1.2 (1.56)
-0.1 (1.43)
.0161
-2.81 (1.49)
-0.64 (1.9)
<.001
-1.28 (0.31)
-1.15 (0.4)
–
-0.62 (0.41)
-0.89 (0.33)
.0235
-1.04 (0.28)
-0.97 (0.39)
–
-0.79 (0.31)
-1.04 (0.28)
.0053
-0.79 (0.31)
-0.97 (0.39)
.0993
-0.76 (0.24)
-0.81 (0.38)
–
-1.44 (0.44)
-1.29 (0.28)
–
-1.6 (0.4)
-1.19 (0.8)
.029
-1.45 (0.58)
-1.06 (0.38)
.0142
0.12 (0.05)
0.18 (0.05)
<.001
0.36 (0.21)
0.18 (0.05)
.0019
0.11 (0.05)
0.29 (0.25)
.0064
0.14 (0.05)
0.11 (0.05)
.0669
0.14 (0.05)
0.29 (0.25)
.0191
0.19 (0.1)
0.14 (0.05)
.0656
0.33 (0.08)
0.19 (0.08)
<.001
0.32 (0.15)
0.31 (0.21)
–
0.18 (0.06)
0.15 (0.07)
–
l
s
mean
0.03 (0.03)
0.25 (0.25)
0.13 (0.44)
-0.3 (-0.3)
-0.2 (-0.2)
-0.25 (0.43)
3.6 (3.6)
3.6 (3.6)
3.6 (0.8)
-1.29 (-1.29)
-0.28 (-0.28)
-0.82 (1.13)
-1.12 (-1.12)
-1.05 (-1.05)
-1.09 (0.29)
0.22 (0.22)
0.21 (0.21)
0.21 (0.08)
Table A.36: Voice quality parameters (means and p-values of t-tests)
112
APPENDIX A. TABLES AND FIGURES
(a) speaker B07
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
-0.45 (0.48)
0.05 (0.5)
<.001
0.84 (0.76)
0.91 (0.53)
–
-0.78 (0.37)
-0.5 (0.59)
.0522
-0.78 (0.22)
-0.78 (0.37)
–
-0.78 (0.22)
-0.5 (0.59)
.0329
-0.07 (1.1)
-0.17 (0.29)
–
-0.58 (0.86)
0.13 (0.9)
.0073
-0.86 (0.81)
-0.05 (0.39)
<.001
-0.22 (0.38)
-0.09 (0.43)
–
-0.86 (0.32)
-0.61 (0.59)
.0727
-0.15 (0.58)
-0.06 (0.48)
–
-0.86 (0.17)
-0.3 (0.59)
<.001
-0.69 (0.27)
-0.86 (0.17)
.0144
-0.69 (0.27)
-0.3 (0.59)
.005
0.01 (0.79)
-0.42 (0.37)
.0216
-1.04 (0.3)
-0.17 (0.85)
<.001
-1.45 (0.36)
-0.98 (0.32)
<.001
-0.72 (0.29)
-0.53 (0.32)
.0752
2.99 (0.73)
3.38 (1.59)
–
–
2.09 (1.65)
3.5 (0.88)
<.001
4.09 (0.92)
2.09 (1.65)
<.001
4.09 (0.92)
3.5 (0.88)
.0314
3.85 (0.86)
4.46 (0.61)
–
2.84 (1.19)
3.67 (0.96)
.0201
4.54 (0.62)
4.53 (0.53)
–
4.51 (0.04)
3.46 (0.82)
.0049
-3.24 (0.96)
-1.34 (1.11)
<.001
0.37 (0.39)
-0.34 (0.72)
<.001
-3.75 (0.92)
-1.1 (1.1)
<.001
-1.59 (0.79)
-3.75 (0.92)
<.001
-1.59 (0.79)
-1.1 (1.1)
.0816
-2.24 (2.75)
-1.85 (0.61)
–
-1.94 (0.88)
-0.41 (0.88)
<.001
-2.64 (1.62)
-1.21 (0.61)
<.001
-3.56 (1.55)
-0.94 (0.73)
<.001
-1.82 (0.63)
-1.37 (0.51)
.0093
-0.84 (0.49)
-0.79 (0.43)
–
-2.21 (0.27)
-1.49 (0.56)
<.001
-1.42 (0.32)
-2.21 (0.27)
<.001
-1.42 (0.32)
-1.49 (0.56)
–
-1.11 (0.8)
-1.55 (0.45)
.023
-2.48 (0.44)
-0.88 (1.08)
<.001
-2.47 (0.38)
-2.19 (0.52)
.0417
-1.89 (0.42)
-1.79 (0.34)
–
0.1 (0.04)
0.19 (0.08)
<.001
0.26 (0.13)
0.25 (0.12)
–
0.1 (0.02)
0.19 (0.06)
<.001
0.15 (0.05)
0.1 (0.02)
<.001
0.15 (0.05)
0.19 (0.06)
.026
0.22 (0.11)
0.15 (0.06)
.0071
0.25 (0.06)
0.31 (0.11)
.0304
0.32 (0.09)
0.27 (0.06)
.0225
0.13 (0.03)
0.18 (0.07)
.0292
l
s
mean
-0.36 (-0.36)
0.04 (0.04)
-0.18 (0.53)
-0.72 (-0.72)
-0.44 (-0.44)
-0.59 (0.41)
3.56 (3.56)
3.83 (3.83)
3.68 (0.75)
-2.32 (-2.32)
-1.02 (-1.02)
-1.72 (1.21)
-1.78 (-1.78)
-1.44 (-1.44)
-1.62 (0.57)
0.19 (0.19)
0.22 (0.22)
0.2 (0.07)
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
0.23 (1.02)
-0.86 (0.91)
<.001
-0.18 (1.5)
-0.53 (1.69)
–
-0.86 (0.57)
-1.51 (1.85)
–
-0.59 (1.39)
-0.86 (0.57)
–
-0.59 (1.39)
-1.51 (1.85)
.057
-3.53 (1.59)
-0.45 (1.17)
<.001
-0.36 (2.27)
-0.45 (0.57)
–
-2.65 (2.28)
-1.13 (0.84)
.0055
-0.19 (0.68)
-0.52 (0.95)
–
-0.73 (0.75)
-1.07 (0.97)
–
-0.46 (0.83)
-0.59 (1.47)
–
-0.44 (0.65)
-1.46 (1.29)
.0014
-0.26 (1.62)
-0.44 (0.65)
–
-0.26 (1.62)
-1.46 (1.29)
.0068
-3.01 (0.75)
-0.42 (1.29)
<.001
-1.13 (0.77)
-0.42 (0.77)
.0025
-1.81 (0.66)
-2.56 (1.08)
.0065
-1.08 (0.85)
-0.78 (0.99)
–
5.36 (0.09)
1.34 (1.77)
<.001
2.01 (0.8)
2.73 (0.5)
.0058
4.03 (0.36)
0.62 (0.8)
<.001
1.95 (1.08)
4.03 (0.36)
<.001
1.95 (1.08)
0.62 (0.8)
<.001
0.14 (1.33)
3.48 (3)
.0069
4.86 (0.49)
1.28 (2.1)
<.001
-2.66 (0.92)
-1.94 (1.71)
–
-0.71 (1.19)
-2.14 (2.06)
.0072
-2.53 (1.59)
-1.79 (2.23)
–
-1.73 (1.42)
-2.53 (1.59)
–
-1.73 (1.42)
-1.79 (2.23)
–
–
-0.98 (0.88)
-1.28 (1.18)
–
-1.15 (1.31)
-1.34 (1.52)
–
-0.96 (0.59)
-1.16 (1.86)
–
-1.19 (1.32)
-0.96 (0.59)
–
-1.19 (1.32)
-1.16 (1.86)
–
-3.12 (0.78)
-1.07 (1.23)
<.001
-1.83 (0.55)
-0.74 (0.55)
<.001
-2.8 (0.49)
-2.92 (0.88)
–
-1.65 (0.88)
-1.15 (0.8)
.0577
0.07 (0.04)
0.25 (0.14)
<.001
0.35 (0.24)
0.35 (0.38)
–
0.09 (0.05)
0.58 (0.46)
<.001
0.15 (0.09)
0.09 (0.05)
.0026
0.15 (0.09)
0.58 (0.46)
<.001
0.43 (0.13)
0.26 (0.21)
.0016
0.1 (0.05)
0.23 (0.09)
<.001
0.38 (0.13)
0.46 (0.2)
.0998
0.35 (0.09)
0.17 (0.05)
<.001
l
s
mean
-1.02 (-1.02)
-0.78 (-0.78)
-0.91 (0.99)
-1.11 (-1.11)
-1.04 (-1.04)
-1.08 (0.82)
2.79 (2.79)
2.27 (2.27)
2.55 (1.69)
-1.76 (-1.76)
-2.23 (-2.23)
-1.97 (1.34)
-1.71 (-1.71)
-1.38 (-1.38)
-1.56 (0.77)
0.24 (0.24)
0.33 (0.33)
0.28 (0.15)
na
na
(b) speaker B08
na
na
–
1.15 (1.1)
4.17 (0.55)
<.001
na
na
–
-3 (1.22)
-0.47 (1.28)
<.001
0.08 (0.87)
-4.8 (0.16)
.0091
na
na
Table A.37: Voice quality parameters (means and p-values of t-tests)
113
(a) speaker C01
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
0.53 (0.87)
-0.22 (0.51)
<.001
0.06 (0.77)
-0.01 (1.15)
–
0.47 (0.42)
-0.41 (0.97)
<.001
-0.14 (1.09)
0.47 (0.42)
.0164
-0.14 (1.09)
-0.41 (0.97)
–
-1.59 (0.35)
-0.98 (1.14)
.0304
-0.56 (1.33)
-0.05 (1.13)
–
-3.99 (0.64)
-0.16 (1.57)
<.001
-1.5 (0.65)
0.48 (0.68)
<.001
0.05 (0.96)
-0.11 (0.71)
–
-0.07 (1.24)
0.11 (0.64)
–
0.25 (0.52)
-0.62 (0.93)
<.001
-0.6 (1.14)
0.25 (0.52)
.0022
-0.6 (1.14)
-0.62 (0.93)
–
-1.37 (0.41)
-1.51 (1.54)
–
-0.86 (0.52)
-0.36 (0.95)
.0289
-2.53 (0.76)
-0.74 (1.01)
<.001
-1.35 (0.36)
-0.08 (0.31)
<.001
3.16 (1.43)
0.92 (1.41)
.0012
2.1 (0.62)
1.87 (1.34)
–
4.98 (0.26)
1.06 (0.85)
<.001
2.07 (0.62)
4.98 (0.26)
<.001
2.07 (0.62)
1.06 (0.85)
<.001
-4.77 (0.56)
-3.96 (0.72)
.0067
3.36 (0.64)
1.4 (1.43)
<.001
-3.93 (0.67)
0.45 (1.69)
<.001
-3.47 (2.67)
1.04 (2.84)
<.001
-1.5 (1.09)
-1.88 (1.61)
–
-0.02 (0.56)
-0.73 (1.6)
.0498
-1.6 (0.61)
-2.45 (1.38)
.0169
-1.02 (1.37)
-1.6 (0.61)
.0685
-1.02 (1.37)
-2.45 (1.38)
.0014
–
-0.98 (0.78)
-1.02 (0.36)
–
-0.88 (1.05)
-0.65 (0.57)
–
-0.56 (0.53)
-1.01 (0.98)
.0561
-0.94 (0.93)
-0.56 (0.53)
.0914
-0.94 (0.93)
-1.01 (0.98)
–
-2.16 (0.36)
-2.2 (1.33)
–
-1.87 (0.24)
-1.07 (0.87)
<.001
-3.5 (0.56)
-1.63 (1.09)
<.001
-2.62 (0.44)
-0.89 (0.5)
<.001
0.22 (0.18)
0.46 (0.29)
.0018
0.37 (0.22)
0.28 (0.27)
–
0.17 (0.05)
0.32 (0.2)
.001
0.32 (0.32)
0.17 (0.05)
.0255
0.32 (0.32)
0.32 (0.2)
–
0.06 (0.03)
0.58 (0.25)
<.001
0.33 (0.18)
0.31 (0.24)
–
0.08 (0.04)
0.51 (0.22)
<.001
0.05 (0.03)
0.41 (0.28)
<.001
l
s
mean
-0.84 (-0.84)
-0.19 (-0.19)
-0.54 (1.15)
-0.81 (-0.81)
-0.47 (-0.47)
-0.66 (0.77)
0.44 (0.44)
0.4 (0.4)
0.42 (3.01)
-0.9 (-0.9)
-1.46 (-1.46)
-1.15 (0.77)
-1.69 (-1.69)
-1.21 (-1.21)
-1.46 (0.84)
0.2 (0.2)
0.41 (0.41)
0.3 (0.16)
V
SKG
RCG
OQG
GOG
IC
T4G
[ø:]
[œ]
p
[a:]
[a]
p
[e:]
[E]
p
[E:]
[e:]
p
[E:]
[E]
p
[i:]
[I]
p
[o:]
[O]
p
[u:]
[U]
p
[y:]
[Y]
p
0.04 (0.29)
-0.15 (0.3)
.0338
0.28 (0.36)
0.36 (0.69)
–
0 (0.32)
0.19 (0.37)
.071
-0.35 (0.4)
0 (0.32)
.0014
-0.35 (0.4)
0.19 (0.37)
<.001
0.32 (0.49)
-0.02 (0.47)
.0164
-0.22 (0.85)
0.87 (0.51)
<.001
-0.35 (0.93)
0.46 (0.47)
<.001
0.28 (0.9)
0.09 (0.19)
–
-0.09 (0.29)
-0.23 (0.32)
–
0.01 (0.39)
-0.26 (0.62)
.0859
0.04 (0.41)
0.07 (0.36)
–
-0.41 (0.36)
0.04 (0.41)
<.001
-0.41 (0.36)
0.07 (0.36)
<.001
-0.66 (0.4)
-0.02 (0.31)
<.001
-0.51 (1.1)
0.27 (0.54)
.0039
-0.97 (0.68)
0.2 (0.43)
<.001
-0.17 (0.53)
-0.24 (0.45)
–
4.22 (0.87)
3.47 (0.45)
.001
3.94 (0.59)
3.96 (0.84)
–
4.41 (0.6)
3.94 (0.48)
.0043
3.26 (0.42)
4.41 (0.6)
<.001
3.26 (0.42)
3.94 (0.48)
<.001
2.44 (2.36)
4.32 (0.44)
.0794
4.43 (0.53)
3.68 (0.91)
.0018
2.66 (1.83)
3.96 (2.18)
.0716
5.4 (0.01)
4.36 (1.01)
.0101
-0.81 (1.18)
-0.29 (0.93)
.0964
0.21 (0.65)
0.31 (1.33)
–
-0.44 (0.87)
0.61 (0.87)
<.001
-0.42 (0.4)
-0.44 (0.87)
–
-0.42 (0.4)
0.61 (0.87)
<.001
-0.84 (1.17)
-0.86 (1.78)
–
-0.07 (0.67)
0.38 (0.83)
.0455
-1.57 (0.91)
-0.6 (1.73)
.029
-1.95 (0.63)
-1.21 (1.37)
.0335
-1.15 (0.36)
-1.23 (0.29)
–
-0.82 (0.44)
-0.61 (0.34)
.0898
-0.88 (0.3)
-0.85 (0.23)
–
-0.76 (0.28)
-0.88 (0.3)
–
-0.76 (0.28)
-0.85 (0.23)
–
-1.03 (0.34)
-0.84 (0.42)
.0913
-1.3 (1.22)
-0.7 (0.65)
.0398
-1.79 (0.55)
-1.01 (0.63)
<.001
-0.94 (0.36)
-1.05 (0.39)
–
0.22 (0.05)
0.34 (0.12)
<.001
0.18 (0.04)
0.22 (0.09)
.0577
0.22 (0.05)
0.34 (0.16)
.0033
0.3 (0.1)
0.22 (0.05)
.0014
0.3 (0.1)
0.34 (0.16)
–
0.29 (0.2)
0.22 (0.04)
–
0.39 (0.08)
0.29 (0.1)
<.001
0.33 (0.16)
0.3 (0.13)
–
0.16 (0.05)
0.27 (0.09)
<.001
l
s
mean
0 (0)
0.26 (0.26)
0.12 (0.33)
-0.34 (-0.34)
-0.03 (-0.03)
-0.2 (0.33)
3.84 (3.84)
3.96 (3.96)
3.9 (0.74)
-0.74 (-0.74)
-0.24 (-0.24)
-0.5 (0.73)
-1.08 (-1.08)
-0.9 (-0.9)
-1 (0.29)
0.26 (0.26)
0.28 (0.28)
0.27 (0.07)
na
na
–
-0.36 (1.19)
-0.78 (1.18)
–
na
na
–
na
na
(b) speaker C02
Table A.38: Voice quality parameters (means and p-values of t-tests)
114
APPENDIX A. TABLES AND FIGURES
[ø:]∼[œ]
[a:]∼[a]
[e:]∼[E]
[e:]∼[E:]
[E:]∼[E]
[i:]∼[I]
[o:]∼[O]
[u:]∼[U]
[y:]∼[Y]
A01
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
–
∗/–/∗
•◦–/–•–
–
–
•
•
–
–
◦
–/–/–
•–•/––◦
◦
•
–
•
–
–
•
•/◦/–
•––/•••
•
–
–
•
–
•
–
•/◦/–
•––/∗•–
–
–
–
◦
–
–
◦
◦/∗/–
∗•–/•••
◦
–
–
∗
–
◦
•
•/◦/•
◦–◦/∗•◦
–
∗
–
◦
–
•
–
–/–/–
–•–/◦––
◦
◦
∗
•
–
–
∗
na
na
–
–
–
∗
–
–
–
–/–/–
••∗/•◦•
–
–
–
–
–
–
A02
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
–
–/–/–
◦•◦/–•–
•
–
∗
–
∗
∗
◦
–/–/–
•••/◦••
–
–
–
–
–
–
•
•/•/–
••◦/•◦•
–
∗
∗
•
–
•
◦
•/•/–
••◦/∗•◦
–
–
•
◦
◦
•
◦
◦/◦/–
∗•◦/•◦•
–
–
–
∗
◦
–
◦
•/∗/∗
•••/◦••
–
•
–
◦
∗
–
◦
◦/∗/–
∗•–/•–◦
–
–
–
–
–
–
◦
na
na
–
◦
–
–
∗
◦
∗
∗/–/–
•∗•/–∗∗
–
–
◦
–
–
–
A03
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
∗
•/–/–
•••/••–
–
∗
–
•
–
•
•
◦/–/–
∗∗•/–••
∗
–
–
–
–
◦
◦
•/•/∗
•••/••◦
–
–
–
•
∗
–
–
◦/∗/–
•••/–◦•
–
–
–
–
–
–
◦
•/•/◦
–◦•/••◦
•
–
–
•
–
•
•
◦/◦/•
•∗∗/•••
–
•
•
∗
–
•
•
◦/◦/–
•••/•–•
•
•
◦
•
◦
•
◦
–/–/–
•••/•••
◦
◦
◦
◦
•
•
•
∗/–/–
•••/––•
–
–
–
∗
–
•
A04
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
◦
–/–/–
•••/–••
–
–
na
–
–
•
◦
∗/–/–
–••/∗••
–
–
•
–
–
–
∗
◦/◦/∗
◦∗◦/••∗
∗
–
◦
∗
◦
◦
–
–/–/–
◦∗◦/◦•–
–
–
∗
–
–
–
∗
∗/–/–
◦•–/••∗
–
–
–
–
–
∗
◦
◦/–/–
•••/•∗•
–
–
•
na
–
–
–
•/∗/–
•••/•∗•
∗
–
•
–
–
–
∗
∗/–/–
–••/•∗•
–
–
•
◦
–
∗
–
–/–/∗
–••/•••
–
–
–
◦
–
•
A05
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
–
–/–/–
–•◦/–••
–
◦
–
–
•
◦
–
–/–/–
•••/••∗
–
–
na
∗
–
–
–
∗/◦/–
–◦•/•••
∗
–
–
∗
–
–
–
–/–/–
–◦•/–••
∗
◦
–
•
∗
◦
◦
–/∗/–
–••/•••
–
◦
–
◦
–
•
–
–/–/–
∗––/◦••
–
–
–
–
–
–
–
–/–/–
–•∗/•∗•
–
–
–
–
•
–
∗
na
na
–
∗
–
∗
∗
–
∗
–/–/–
••∗/–•–
–
–
–
◦
–
∗
A06
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
–
–/–/–
◦–∗/––•
–
–
•
–
–
•
∗
–/–/–
◦◦•/–••
–
∗
–
–
–
–
•
•/∗/∗
••∗/•••
∗
◦
na
•
◦
•
–
∗/–/–
••∗/•••
–
–
na
•
∗
•
∗
–/–/–
•••/•••
∗
–
–
◦
–
∗
•
◦/◦/∗
∗–∗/•••
◦
–
–
◦
◦
∗
•
•/◦/–
•••/•–•
–
◦
•
•
–
•
◦
–/–/–
••◦/•••
–
–
∗
–
◦
–
∗
–/–/–
∗∗–/◦–•
–
–
–
–
–
–
A07
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
–
–/–/–
•◦•/–•∗
•
–
•
•
–
•
◦
◦/–/–
––•/•••
∗
•
–
–
∗
•
–
–/–/–
◦–•/∗•∗
–
∗
–
∗
–
∗
–
–/–/–
◦–•/◦•–
–
–
–
–
–
–
◦
–/–/–
◦•–/∗•∗
–
–
–
–
–
∗
∗
–/–/–
–•–/∗••
–
–
–
–
–
∗
–
–
–
∗/◦/–
–/–/–
–/–/–
••–/–••
–••/•••
–∗•/•••
–
–
◦
–
–
–
–
–
–
•
∗
–
–
–
–
•
–
–
continued on following page. . .
Table A.39:
Summary (see page 116 for description).
ID
115
. . . continued from previous page
ID
[ø:]∼[œ]
[a:]∼[a]
[e:]∼[E]
[e:]∼[E:]
[E:]∼[E]
[i:]∼[I]
[o:]∼[O]
[u:]∼[U]
[y:]∼[Y]
A08
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
–
–/–/–
••◦/∗•–
∗
–
–
–
∗
–
•
–/–/–
◦••/–••
◦
–
◦
–
–
–
•
•/◦/–
–∗•/•••
∗
∗
•
•
–
•
–
–/–/–
–∗•/–••
◦
◦
•
•
–
◦
•
◦/∗/–
–••/•••
–
–
•
∗
–
–
•
–/◦/•
–••/∗◦•
•
–
na
–
◦
–
∗
–/◦/–
∗••/•∗•
•
•
–
•
•
–
–
–/–/–
–••/•••
∗
∗
∗
–
–
•
–
–/–/–
–••/•∗•
–
∗
na
•
–
◦
A09
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
◦
◦/–/–
•••/••∗
•
∗
◦
∗
–
◦
•
–/◦/–
•–•/◦••
•
•
–
∗
∗
∗
•
◦/•/–
•••/•••
•
•
•
∗
•
•
–
–/–/–
•••/––•
∗
–
–
∗
–
•
•
◦/•/–
––•/•••
•
•
•
•
•
–
◦
∗/∗/•
–•∗/••–
–
–
na
∗
•
◦
•
•/◦/–
∗–•/–◦•
◦
–
•
•
–
•
◦
–/–/–
–••/•••
∗
•
–
◦
•
–
∗
∗/–/–
•••/•••
–
–
◦
–
◦
•
A10
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
–
–/–/–
–••/◦••
•
•
–
∗
•
–
–
–/–/–
••◦/•••
–
–
–
–
–
•
◦
∗/–/–
–◦•/•••
∗
–
•
–
∗
•
–
–/–/–
–◦•/•••
–
–
•
◦
∗
•
∗
–/–/–
•••/•••
∗
–
•
–
–
◦
◦
–/–/–
•∗◦/∗––
–
–
–
∗
–
–
–
∗/–/–
∗◦•/∗••
–
–
•
•
–
•
–
–/–/–
••–/••◦
–
–
–
•
–
–
–
–/–/–
••∗/–•–
–
◦
–
na
∗
–
B01
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
–
–/–/–
•••/•••
•
∗
na
–
–
•
◦
•/–/–
–•∗/•••
•
•
∗
na
◦
◦
◦
•/◦/•
••◦/∗••
◦
•
•
•
•
•
–
–/–/∗
••◦/–••
◦
∗
•
–
◦
•
◦
•/–/–
–••/∗••
–
–
–
•
–
∗
◦
◦/∗/◦
◦••/◦••
•
•
•
na
•
•
◦
•/∗/◦
••–/–••
–
•
∗
–
◦
–
∗
–/–/–
•••/•••
∗
–
–
–
–
•
∗
∗/–/–
•••/•◦•
∗
•
•
na
•
•
B02
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
◦
•/–/•
••–/••◦
–
–
•
•
–
•
•
–/∗/–
•••/•∗∗
–
–
–
•
–
•
∗
•/◦/–
•••/∗••
•
∗
na
–
–
•
∗
•/∗/–
•••/–••
–
–
na
•
•
•
◦
–/–/–
–••/∗••
•
◦
•
–
•
◦
•
•/•/•
••◦/•••
–
•
∗
–
–
–
◦
•/◦/∗
–••/•••
–
–
–
–
•
∗
∗
na
na
◦
•
•
na
•
•
•
•/–/–
◦◦•/•∗•
–
–
•
na
–
•
B03
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
◦
•/–/–
•◦◦/••–
–
–
•
•
–
•
•
∗/–/–
◦••/•••
–
–
–
–
◦
–
◦
•/•/•
•••/•••
–
–
•
•
–
•
–
∗/∗/∗
•••/–••
–
–
◦
–
–
–
◦
•/•/∗
–••/•••
–
–
–
•
–
◦
◦
•/•/•
•••/•••
∗
•
–
∗
–
–
•
•/•/•
•∗∗/••◦
•
∗
–
•
–
•
•
–/◦/–
•∗•/◦••
–
∗
•
–
–
–
•
◦/∗/–
•∗•/•∗•
•
•
∗
–
–
–
B04
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
∗
•/–/–
•••/••∗
–
–
na
–
◦
∗
•
–/–/∗
•∗•/•••
∗
–
na
◦
–
◦
•
◦/•/◦
•∗•/•••
–
◦
–
–
–
–
–
–/–/–
•∗•/–••
–
–
–
–
◦
–
◦
◦/•/◦
–••/•••
–
–
–
–
∗
∗
•
◦/◦/◦
•–•/•••
–
–
–
◦
◦
•
◦
•
∗
•/◦/∗
na
◦/–/–
••∗/•–•
na
•••/•◦•
–
–
–
∗
◦
•
∗
–
na
–
–
–
◦
◦
–
•
–
•
continued on following page. . .
Table A.39:
Summary (see page 116 for description).
116
APPENDIX A. TABLES AND FIGURES
. . . continued from previous page
ID
[ø:]∼[œ]
[a:]∼[a]
[e:]∼[E]
[e:]∼[E:]
[E:]∼[E]
[i:]∼[I]
[o:]∼[O]
[u:]∼[U]
[y:]∼[Y]
B05
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
∗
◦/–/–
∗–•/•∗◦
∗
–
•
∗
•
•
•
–/∗/–
◦–•/•••
∗
–
–
–
–
–
◦
•/•/•
•∗∗/•••
–
∗
na
•
◦
•
∗
–/–/–
•∗∗/•••
–
–
na
–
◦
–
•
–/◦/–
•••/•••
–
–
•
–
–
–
•
◦/◦/•
•–∗/•◦•
∗
–
•
na
∗
•
◦
•/∗/–
••◦/•∗∗
◦
•
•
◦
•
∗
∗
na
na
∗
–
•
∗
∗
◦
◦
◦/◦/–
•–◦/•◦•
–
∗
na
–
–
•
B06
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
∗
∗/–/–
•∗∗/•••
–
∗
•
•
–
•
◦
–/–/–
•∗∗/◦•∗
•
•
–
•
∗
◦
◦
◦/◦/–
•••/•••
•
–
–
•
–
◦
–
∗/∗/–
•••/•••
•
–
–
•
◦
–
◦
–/–/–
•••/•••
–
–
–
∗
–
∗
•
•/◦/•
◦–•/••◦
–
◦
–
•
–
–
◦
•/•/–
••–/◦••
•
•
•
–
–
•
∗
•/∗/–
•••/•••
–
–
–
∗
∗
–
◦
•/–/–
•–∗/•∗•
•
◦
–
•
∗
–
B07
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
◦
◦/–/–
–◦•/••∗
•
–
–
•
◦
•
•
–/–/–
•∗•/•••
–
–
na
•
–
–
∗
•/•/◦
–•∗/•••
–
•
•
•
•
•
–
•/•/◦
–•∗/•••
–
∗
•
•
•
•
◦
–/–/–
•••/•••
∗
◦
∗
–
–
∗
∗
•/•/•
•••/•◦•
–
∗
–
–
∗
◦
◦
•/–/∗
•••/∗◦–
◦
•
∗
•
•
∗
◦
na
na
•
•
–
•
∗
∗
–
•/–/–
∗∗∗/•∗•
–
–
◦
•
–
∗
B08
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
◦
•/–/–
•••/•••
•
–
•
–
–
•
◦
–/–/–
•••/•••
–
–
◦
◦
–
–
•
•/◦/–
∗••/•••
–
◦
•
–
–
•
–
∗/∗/–
∗••/•••
–
–
•
–
–
◦
•
–/–/–
•••/•••
–
◦
•
–
–
•
◦
∗/–/–
•••/••◦
•
•
◦
na
•
◦
◦
•/•/–
•••/•◦◦
–
◦
•
•
•
•
•
na
na
◦
◦
na
◦
–
–
•
◦/–/–
•––/•–•
–
–
•
na
–
•
C01
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
∗
•/–/–
∗••/•••
•
–
◦
–
–
◦
◦
–/–/–
•••/••◦
–
–
–
∗
–
–
◦
•/∗/∗
–∗•/◦••
•
•
•
∗
–
•
–
◦/∗/–
–∗•/•••
∗
◦
•
–
–
∗
◦
–/–/–
•••/◦••
–
–
•
◦
–
–
•
•/•/∗
∗∗–/•••
∗
–
◦
na
–
•
∗
•/•/–
••∗/•••
–
∗
•
–
•
–
◦
•/–/–
–•◦/•••
•
•
•
na
•
•
◦
◦/–/–
–∗•/•••
•
•
•
na
•
•
C02
dur.
F opp.
F nl.
SKG
RCG
OQG
GOG
T4G
IC
∗
•/–/∗
•••/•••
∗
–
•
–
–
•
◦
–/∗/–
•••/•••
–
–
–
–
–
–
–
•/∗/–
•••/•••
–
–
◦
•
–
◦
–
◦/–/–
•••/•••
◦
•
•
–
–
◦
∗
–/∗/–
•••/•••
•
•
•
•
–
–
•
•/◦/•
••◦/•••
∗
•
–
–
–
–
◦
•/◦/–
•∗∗/∗••
•
◦
◦
∗
∗
•
◦
–/–/–
•••/•••
•
•
–
∗
•
–
∗
∗/–/–
•••/•••
–
–
∗
∗
–
•
Summary of the acoustic measurements for each speaker. The
rows “dur.” show the realisation of a distinction in vowel quantity (duration). A • marks a highly significant difference at the 0.001 level, a ◦
a difference at the 0.01 level and a ∗ a difference at the 0.05 level. The
row “F opp.” shows the realisation of a distinction in the three formant
values and “F nl.” shows the native-likeness of the two vowels of the rexpective column. The remaining rows show the realisation of a distinction
in the various measured voice quality parameters (note, that the voice
quality results might be distorted).
Table A.39:
Appendix B
Wordlists
Word list A:
Schiff, stellen, Bühne, Stadt, muss, offen, steht, spuken, Gewalt, Hölle, dehnen, fühle, gewöhne, Geld, gezackt, Köter,
dämlich, kippen, zwölf, Spott, Lawine, Töchter, vertuschen, Beet, Gespött, kaputt, Stall, Verstoß, Schutt, Stahl,
dünsten, Tuch, egal, Pollen, dumm, Staat, tief, Häftling, Laden, Blume, Wohl, Täler, Danke, zerstößt, Blümchen,
erspäht, böse, Bogen, Bude, Gepäck, höflich, dünn, Ofen, beginnen, Gestüt, Draht, Diebe, dem, Stück, Höhle, Tochter,
stehlen, gönnen, Miete, Frosch, Bett, Pappe, Donner, Mitte, Teller, gewönne, Polen, Dänen, Tisch, gebuhlt, Hütte,
biete, Steg, Hüte, Hof, stählen, Bitte, spucken, Düse, schief, gewählt, Mus, Fülle
Word list B:
Bett, dünsten, gezackt, steht, Schutt, zerstößt, Düse, vertuschen, dehnen, fühle, gewönne, Draht, Bogen, kaputt,
Mitte, Steg, Blümchen, egal, erspäht, Hölle, Tochter, Diebe, böse, Gestüt, Verstoß, stehlen, Lawine, dünn, Häftling,
Frosch, gebuhlt, gönnen, Gewalt, Bude, dem, Fülle, Stall, Dänen, Polen, Schiff, Laden, Gespött, stählen, Hüte, schief,
zwölf, dämlich, Tuch, Danke, Gepäck, kippen, muss, stellen, Köter, Staat, Stück, Geld, Ofen, Miete, Stadt, Blume,
Wohl, Hütte, Höhle, dumm, Bitte, höflich, offen, Stahl, spuken, gewählt, Pollen, Tisch, Teller, tief, Spott, Mus, Täler,
beginnen, Pappe, Töchter, Donner, Bühne, spucken, gewöhne, Beet, Hof, biete
Word list C:
zerstößt, gewählt, Diebe, Höhle, kaputt, Verstoß, Fülle, gewöhne, Staat, spucken, Geld, beginnen, dumm, Häftling,
Köter, Mus, Dänen, Draht, Hölle, dämlich, Danke, Wohl, Bett, Stück, Gespött, Steg, spuken, tief, höflich, Teller,
kippen, gebuhlt, Pollen, Hüte, dem, Donner, Hütte, Täler, Lawine, stellen, Hof, Schutt, Bitte, fühle, Frosch, Tisch,
Blümchen, zwölf, Gepäck, Polen, Stahl, Bühne, Spott, vertuschen, gönnen, Ofen, dünsten, muss, steht, schief, erspäht,
Pappe, Tochter, Miete, Tuch, Bogen, Schiff, dünn, offen, Gewalt, Gestüt, Blume, Stall, egal, biete, Bude, dehnen,
Mitte, gewönne, stehlen, Laden, böse, stählen, Stadt, Düse, Töchter, gezackt, Beet
Word list D:
kippen, Gestüt, Blume, Gewalt, höflich, Bett, Frosch, Tuch, gewählt, Spott, tief, gönnen, Stahl, Steg, schief, gebuhlt,
dämlich, Miete, böse, dünsten, Lawine, Teller, Wohl, Draht, dem, dumm, biete, gewönne, Hüte, Staat, spucken,
Gespött, fühle, muss, stellen, Hof, Düse, Stall, Tochter, Köter, stehlen, Mitte, Hütte, erspäht, Pollen, Bude, Diebe,
egal, Bogen, Töchter, Stück, Danke, Donner, Täler, Blümchen, zwölf, Schiff, Fülle, Schutt, Bitte, Hölle, Laden,
stählen, Ofen, spuken, Stadt, Verstoß, dehnen, gezackt, beginnen, Häftling, Höhle, Pappe, Beet, Bühne, kaputt,
Dänen, gewöhne, Mus, Polen, steht, dünn, Tisch, Gepäck, offen, zerstößt, Geld, vertuschen
117
118
APPENDIX B. WORDLISTS
Er hat Schiff gesagt.
Er hat dünsten gesagt.
Er hat Tochter gesagt.
Er hat stellen gesagt.
Er hat Tuch gesagt.
Er hat stehlen gesagt.
Er hat Bühne gesagt.
Er hat egal gesagt.
Er hat gönnen gesagt.
Er hat Stadt gesagt.
Er hat Pollen gesagt.
Er hat muss gesagt.
Er hat dumm gesagt.
Er hat offen gesagt.
Er hat Staat gesagt.
Er hat steht gesagt.
Er hat tief gesagt.
Er hat spuken gesagt.
Er hat Häftling gesagt.
Er hat Gewalt gesagt.
Er hat Laden gesagt.
Er hat Hölle gesagt.
Er hat Blume gesagt.
Er hat dehnen gesagt.
Er hat Wohl gesagt.
Er hat fühle gesagt.
Er hat Täler gesagt.
Er hat gewöhne gesagt.
Er hat Danke gesagt.
Er hat Polen gesagt.
Er hat Geld gesagt.
Er hat zerstößt gesagt.
Er hat Dänen gesagt.
Er hat gezackt gesagt.
Er hat Blümchen gesagt.
Er hat Tisch gesagt.
Er hat Köter gesagt.
Er hat erspäht gesagt.
Er hat gebuhlt gesagt.
Er hat dämlich gesagt.
Er hat böse gesagt.
Er hat Hütte gesagt.
Er hat kippen gesagt.
Er hat Bogen gesagt.
Er hat biete gesagt.
Er hat zwölf gesagt.
Er hat Bude gesagt.
Er hat Spott gesagt.
Er hat Gepäck gesagt.
Er hat Lawine gesagt.
Er hat höflich gesagt.
Er hat Töchter gesagt.
Er hat dünn gesagt.
Er hat vertuschen gesagt.
Er hat Ofen gesagt.
Er hat Beet gesagt.
Er hat beginnen gesagt.
Er hat Gespött gesagt.
Er hat Gestüt gesagt.
Er hat kaputt gesagt.
Er hat Draht gesagt.
Er hat Stall gesagt.
Er hat Diebe gesagt.
Er hat Verstoß gesagt.
Er hat dem gesagt.
Er hat gewählt gesagt.
Er hat Schutt gesagt.
Er hat Stück gesagt.
Er hat Mus gesagt.
Er hat Stahl gesagt.
Er hat Höhle gesagt.
Er hat Fülle gesagt.
Er hat Miete gesagt.
Er hat Frosch gesagt.
Er hat Bett gesagt.
Er hat Pappe gesagt.
Er hat Donner gesagt.
Er hat Mitte gesagt.
Er hat Teller gesagt.
Er hat gewönne gesagt.
Er hat Steg gesagt.
Er hat Hüte gesagt.
Er hat Hof gesagt.
Er hat stählen gesagt.
Er hat Bitte gesagt.
Er hat spucken gesagt.
Er hat Düse gesagt.
Er hat schief gesagt.
Figure B.1: Printed version of word list A, as handed out to the speakers
Bibliography
Handbook of the International Phonetic Association. Cambridge University Press, 1999.
Adank, Patti, Roel Smits, and Roeland van Hout. A comparison of vowel normalization procedures
for language variation research. Journal of the Acoustic Society of America, 116(5):3099–3107,
2004.
Altenberg, Evelyn P. The judgment, perception, and production of consonant clusters in a second
language. IRAL – International Review of Applied Linguistics in Language Teaching, 43:53–80,
2005.
Anderson-Hsieh, Janet, Ruth Johnson, and Kenneth Koehler. The relationship between native
speaker judgments of nonnative pronunciation and deviance in segmentals, prosody and syllable
structure. Language Learning, 42(4):529–555, 1992.
Asher, James J. and Ramiro García. The optimal age to learn a foreign language. The Modern
Language Journal, 53(5):334–341, 1969.
Bakran, Juraj. Zvučna slika hrvatskog govora. Ibis grafika, Zagreb, 1996.
Becker, Thomas. Das Vokalsystem der deutschen Standardsprache. Peter Lang, 1998.
Beddor, Patrice Speeter and Terry L. Gottfried. Methodological issues in cross-language speech
perception research with adults. In: Strange (1995).
Best, Catherine T. A direct realist view of cros-language speech perception. In: Strange (1995).
Birdsong, David (editor). Second Language Acquisition and the Critical Period Hypothesis. Mahwah:
Lawrence Erbaum Assoc., 1999.
Birdsong, David and Michelle Molis. On the evidence of maturational constraints in second-language
acquisition. Journal of Memory and Language, 44:235–249, 2001.
Bongaerts, Theo. Ultimate attainment in L2 pronunciation: The case of very advanced late L2
learners. In: Birdsong (1999).
Bongaerts, Theo. Introduction: Ultimate attainment and the critical period hypothesis for second
language acquisition. IRAL – International Review of Applied Linguistics in Language Teaching,
43:259–267, 2005.
Brennan, Eileen M., Ellen B. Ryan, and William E. Dawson. Scaling of apparent accentedness
by magnitude estimation and sensory modality matching. Journal of Psycholinguistic Research,
4(1):27–36, 1975.
119
120
BIBLIOGRAPHY
Brière, Eugène J. An investigation of phonological interference. Language, 42(4):768–796, 1966.
Chomsky, Noam. The formal nature of language. In: Lenneberg (1967), chapter Appendix A.
Claßen, Kathrin, Grzegorz Dogil, Michael Jessen, Krzysztof Marasek, and Wolfgang Wokurek.
Stimmqualität und Wortbetonung im Deutschen. Linguistische Berichte, 174:202–245, 1998.
Clark, John and Colin Yallop. An introduction to phonetics and phonology. Blackwell, second
edition, 1995.
Flege, James Emil. A critical period for learning to pronounce foreign languages? Applied Linguistics, 8:162–177, 1987a.
Flege, James Emil. The instrumental study of L2 speech production: Some methodological considerations. Language Learning, 37(2):285–296, 1987b.
Flege, James Emil. Production and perception of a novel, second-language phonetic contrast. Journal
of the Acoustic Society of America, 93(3):1589–1608, 1993.
Flege, James Emil. Second language speech learning: Theory, findings and problems. In: Strange
(1995).
Flege, James Emil. Age of learning and second language speech. In: Birdsong (1999).
Flege, James Emil and Kathryn L. Fletcher. Talker and listener effects on degree of perceived foreign
accent. Journal of the Acoustic Society of America, 91(1):370–389, 1992.
Flege, James Emil, Elaina M. Frieda, and Takeshi Nozawa. Amount of native-language (L1) use af
fects the pronunciation of an L2. Journal of Phonetics, 25:169–186, 1997.
Flege, James Emil and James Hillenbrand. Limits on phonetic accuracy in foreign language speech
production. Journal of the Acoustic Society of America, 76(3):706–721, 1984.
Flege, James Emil, M.J. Munro, and I. MacKay. Factors affecting strength of perceived foreign
accent in a second language. Journal of the Acoustic Society of America, 97(5):3125–3134, 1995.
Flege, James Emil, Grace H. Yeni-Komshian, and Serena Liu. Age constraints on second-language
acquisition. Journal of Memory and Language, 41:78–104, 1999.
Guion, Susan G., James Emil Flege, and Jonathan D. Loftin. The effect of L1 use on pronunciation
in Quichua-Spanish bilinguals. Journal of Phonetics, 28:27–24, 2000.
Ioup, Georgette. Is there a structural foreign accent? A comparison of syntactic and phonological
errors in second language acquisition. Language Learning, 34(2):1–17, 1984.
Iverson, Paul, Patricia K. Kuhl, Reiko Akahane-Yamada, Eugen Diesch, Yoh’ich Tohkura, Andreas
Kettermann, and Claudia Siebert. A perceptual interference account of acquisition difficulties for
non-native phonemes. Cognition, 87:B47–B57, 2003.
Jilka, Matthias. The Contribution of Intonation to the Perception of Foreign Accent. Fakultät für
Philosophie der Universität Stuttgart, 2000. Doctoral Dissertation.
Kohler, Klaus J. Einführung in die Phonetik des Deutschen. Grundlagen der Germanistik 20. Erich
Schmidt Verlag, Berlin, 1977.
Kuhl, Patricia K. and Paul Iverson. Linguistic experience and the “Perceptual Magnet Effect”. In:
Strange (1995).
BIBLIOGRAPHY
121
Lenneberg, Eric H. Biological foundations of language. John Whiley & Sons, Inc., 1967.
Levi, Susannah V., Stephen J. Winters, and David B. Pisoni. Speaker-independent factors affecting
the perception of foreign accent in a second language. Journal of the Acoustic Society of America,
121(4):2327–2338, 2007.
Li, Wei. Dimensions of bilingualism. In: Wei Li (editor), The bilingualism reader. Routledge, 2005.
Lobanov, B. M. Classification of Russian vowels spoken by different speakers. Journal of the Acoustic
Society of America, 49(2):606–608, 1971.
Long, Michael H.. Maturational constraints on language development. Studies in Second Language
Acquisition, 12(3):251–281, 1990.
Long, Mike. Problems with supposed counter-evidence to the critical period hypothesis. IRAL –
International Review of Applied Linguistics in Language Teaching, 43:287–317, 2005.
Mack, Molly. Consonant and vowel perception and production: Early English-French bilinguals and
English monolinguals. Perception & Psychophysics, 46(2):187–200, 1989.
Majewski, Wojciech and Harry Hollien. Formant frequency regions of Polish vowels. Journal of the
Acoustic Society of America, 42(5):1031–1037, 1967.
Major, Roy Coleman. Foreign Accent: The ontogeny and phylogney of second language phonology.
Lawrence Erlbaum Associates, 2001.
Meisel, Jürgen M. Principles of universal grammar and strategies of language use: On some similarities and differences between first and second language acquisition. In: Lynn Eubank (editor),
Point counterpoint: universal grammar in the second language. John Benjamins Publishing Company, 1991.
Mildner, Vesna and Damir Horga. Relations between second language proficiency and formantdefined vowel space. In: Proceedings of the XIVth International Congress of Phonetic Sciences
(ICphS99), pages 1455–1458. San Francisco, 1999.
Novoa, Loriana, Deborah Fein, and Loraine K. Obler. Talent in foreign languages: A case study.
In: Loraine K. Obler and Deborah Fein (editors), The Exceptional Brain: The Neuropsychology
of Talent and Special Abilities. The Guilford Press, 1988.
Ortmann, Wolf Dieter (editor). Lernschwierigkeiten in der deutschen Aussprache. Goethe-Institut,
München, 1976. Parts 1-3.
Piske, Thorsten, Ian R. A. MacKay, and James E. Flege. Factors affecting degree of foreign accent
in an L2: a review. Journal of Phonetics, 29:191–215, 2001.
Ramers, Karl Heinz. Vokalquantität und -qualität im Deutschen. Linguistische Arbeiten 213.
Niemeyer, Tübingen, 1988.
Scovel, Tom. Foreign accents, language acquisition and cerebral dominance. Language Learning,
19:245–253, 1969.
Seliger, Herbert W. and Robert M. Vago. The study of firts language attrition: an overview. In:
Herbert W. Seliger and Robert M. Vago (editors), First language attrition. Cambridge University
Press, 1991.
122
BIBLIOGRAPHY
Selinker, Larry. Interlanguage. IRAL – International Review of Applied Linguistics in Language
Teaching, 10(3):209–231, 1972.
Sendlmeier, Walter F. and Julia Seebode. Formantkarten des deutschen Vokalsystems. 2007. URL
http://www.kgw.tu-berlin.de/forschung/Formantkarten. [accessed 11th October 2007].
Southwood, M. Helen and James Emil Flege. Scaling foreign accent: direct magnitude estimation
versus interval scaling. Clinical Linguistics & Phonetics, 13(5):335–349, 1999.
Strange, Winifred (editor). Speech Perception and Linguistic Experience: Theoretical and Methodological Issues. MD. York Press, 1995.
Tröster-Mutz, Stefan. Die Realisierung von Vokallängen: erlaubt ist, was Sp[a(:)]ß macht? SKY
Journal of Linguistics, 17:249–265, 2004. URL http://www.ling.helsinki.fi/sky/julkaisut/
SKY2004/Tr%F6ster-Mutz.pdf.
White, Lydia. Universal Grammar and second language acquisition. John Benjamins Publishing
Company, 1989.
Wängler, Hans-Heinrich. Atlas deutscher Sprachlaute. Akademie-Verlag, Berlin, 4 edition, 1968.
Wode, Henning. Phonology in L2 acquisition. In: Sascha W. Felix (editor), Second language development: Trends and Issues, pages 123–136. Narr, Tübingen, 1980.
Woods, Anthony, Paul Fletcher, and Arthur Hughes. Statistics in language studies. Cambridge
University Press, 1986.
Yamada, Reiko A. Age and acquisition of second language speech sounds: Perception of American
English /ô/ and /l/ by native speakers of Japanese. In: Strange (1995).