Gemination in Hungarian loanword adaptation

Lilla Magyar
Generals Paper
MIT
Gemination in Hungarian loanword adaptation
Abstract
An interesting puzzle of Hungarian phonology is gemination in recent loanwords borrowed from English,
German (and occasionally, from French). A consonant following a short stressed vowel is often geminated
in the loanword, even if the consonant doubling does not have an orthographic reflex in the source word,
i.e. the consonant in question is not spelt as a double letter in the source word. Unlike similar phenomena
in other languages (such as Japanese, Finnish or Telugu), this is a gradient phenomenon involving considerable inter- and intra-speaker variation, and seemingly there is nothing in native Hungarian phonotactics
that would require such a process to happen, as both singleton and geminate consonants are equally acceptable in all the contexts where loanword gemination occurs. Gemination depends on context: it
never applies when the consonant is preceded by a long vowel and is most productive in monosyllables
word-finally. Apart from the influence of context, the propensity of consonants to undergo gemination
also varies by consonant class: voiceless consonants are more likely to be geminated in loanwords than
voiced ones, which reflects universal hierarchies of geminate markedness. In this paper, we will show
that loanword gemination may not be as mysterious as it seems at first sight: it is a process regulated
by faithfulness (to vowel length in source words) and universal markedness (of gemination - which is an
important factor in native Hungarian phonotactics as well). Finally, we will present a Maximum Entropy
model to account for this phenomenon.
1
Introduction
Gemination in loanwords is a cross-linguistically widespread phenomenon: a singleton consonant in the
source word is geminated in the loanword, even if the doubling does not have an orthographic reflex (that is,
the geminated consonant is spelt with a single consonant letter in the source word). Source languages which
these words are borrowed from do not allow phonetic geminates. Some examples of gemination in loanwords
are listed below:
• Japanese: [kat:o] ‘cut’ (Kubozono et al. (2008))
• Italian: [fan:] ‘fan’ (Passino (2004))
• Kannada: [kap:u] ‘cup’ (Sridhar (1990))
• Telugu: [ro: d:u] ‘road’ (Krishnamurti and Gwynn (1985))
• Finnish: [pop:i] ‘pop’ (Karvonen (2009))
• Hungarian: [sok:] ‘shock’ (Nádasdy (1989), Kertész (2006))
In Japanese, Kannada, Telugu and Finnish, the final consonant of the source word is geminated, and a vowel
is inserted after the final consonant. Loanword gemination in Italian and Hungarian is slightly different
from what we see in other languages. In Italian, the final consonant of monosyllabic words is geminated.
In Hungarian, however, there are different contexts for gemination in loanwords, but it is most common in
word-final position in monosyllabic words.
2
Loanword gemination in Hungarian
In ‘recent’ (from 1750 onwards) Hungarian loanwords borrowed from English, German and French, short
consonants following a short (usually stressed) vowel in the source word are regularly geminated in the
loanword, even if the source word is not spelt with a double consonant letter (Nádasdy (1989), Kertész
(2006)).
I am grateful to Adam Albright, Donca Steriade, Edward Flemming, Michael Kenstowicz, Martin Hackl, Miklós Törkenczy,
Szilárd Szentgyörgyi, Sudheer Kolachina, Csaba Oravecz, Katalin Mády, Tekla Etelka Gráczi, audiences at Phonology Circle
at MIT and the 22nd Manchester Phonology Meeting, as well as all the participants in my experiments. All remaining errors
and inaccuracies are attributed to me alone.
1
Lilla Magyar
2.1
Generals Paper
MIT
Gemination by orthographic influence
Gemination in Hungarian loanwords can occur regardless of the influence of orthography. Some of the source
words contain a double consonant letter in the spelling, while others are spelt with a single letter. Some of
the following examples were also cited by Nádasdy (1989).1
2.1.1
Orthographic geminate in the source word
In the words shown in Table 1, the consonants geminated in the loanwords are spelt with a double consonant
letter in the source words.2
Source word
Loanword
IPA
Gloss
Hall (G), hall (E)
hall
[hOl:]
‘hall’
Koffer (G)
koffer
[kof:ER]
‘suitcase’
hobby (E)
hobbi
[hob:i]
‘hobby’
hippy (E)
hippi
[hip:i]
‘hippy’
bluff (E)
blöff
[bløf:]
‘bluff’
Stopper (G)
stopper
[Stop:Er]
‘stopwatch’
nett (G)
nett
[nEt:]
‘neat and tidy’
babysitter (E)
bébiszitter
[be:bisit:Er]
‘babysitter’
dollar (E)
dollár
[dol:a:r]
‘dollar’
shopping (E)
shopping
[Sop:iNg]
‘shopping’
masseur (F)
masszőr
[mOs:ør]
‘masseur’
roller (E)
roller
[rol:Er]
‘roller’
rapper (E)
rapper
[rEp:Er]
‘rapper’
Presser (G)
Presszer
[prEs:Er]
<family name>
Ritter (G)
Ritter
[rit:Er]
<family name>
Popper (G)
Popper
[pop:Er]
<family name>
Gösser (G)
Gösser
[gös:Er]
<beer brand>
Table 1: Double consonant letters in the source word
1 A list of loanwords and foreign names borrowed into Hungarian potentially or actually undergoing loan gemination is found
in Appendix IV. It is my collection, which also contains words mentioned in Nádasdy (1989), including examples ranging from
older, more established loanwords to very recent loans, foreign names or brands.
2 In many cases, the meaning of the source word and the loanword is not exactly same. Here, ‘gloss’ represents the meaning
of the loanword - as it is used in Hungarian.
2
Lilla Magyar
2.1.2
Generals Paper
MIT
No orthographic geminate in the source word
In the words shown in Table 2, the source word is spelt with either a single consonant letter or a digraph.
The % sign means that there is inter-speaker variation: both forms (ending in a singleton and a geminate)
are acceptable.
Source word
Loanword
IPA
Gloss
fit (E)
fitt
[fit:]
‘fit’
set (E)
szett
[sEt:]
‘outfit’
cheque (F)
csekk
[tSEk:]
‘cheque’
tip (E)
tipp
[tip:]
‘tip, idea’
Wecker (G)
vekker
[vEk:Er]
‘alarm clock’
clip (E)
klip
[klip:]
‘video clip’
frisch (G)
friss
[friS:]
‘fresh’
fesch (G)
fess
[fES:]
‘handsome’
Kitsch (G)
giccs
[git:S]
‘kitsch’
club (E) / Klub (G)
klub
% [klub:]
‘club’
Jam (E / G)
dzsem
% [dZEm:]
‘marmalade’
chip (E)
chip
[tSip:]
‘chip’
step (E)
sztep
[stEp:]
‘step dance’
Witz (G)
vicc
[vit:s]
‘joke’
choc (F)
sokk
[sok:]
‘shock’
Putsch (G)
puccs
[put:S]
‘coup’
chic (F)
sikk
[sik:]
‘stylishness’
Table 2: Single consonant letter or digraph spelling in the source word
2.2
Gemination by position in the word
Gemination in loanwords also depends on position within the word. It is very frequent in word-final position
in monosyllables, regardless of spelling, fairly common in intervocalic position in polysyllables as long as
the consonant in question is spelt with a double letter or a digraph.3 , Word-final consonants in polysyllabic
words are geminated quite frequently when the consonant in question is spelt with a double letter or a
digraph, but the gemination of consonants which are spelt with a singleton letter in the same context is
extremely rare.. Gemination in loanwords never occurs after long vowels (in the source word or the loan
3 It is an interesting fact that when in the case of digraphs, gemination happens almost only in words ending in -er. It might
be due to a paradigm uniformity effect (stem + -er), as native speakers of Hungarian seem to treat -er as a suffix. However, it
may be something else rather than the effect of paradigm uniformity: -er suffixation is a fairly common recent process among
younger speakers of Hungarian, and the consonant preceding the suffix is regularly geminated (e.g. from the word nyugdı́jas
[ñugdi:jOS] ‘retired person’ it is possible to coin nyugger [ñug:Er] ‘(an obnoxious or opinionated) retired person’).
3
Lilla Magyar
Generals Paper
MIT
word). These contextual restrictions are shown in Table 3.
Spelling
double letter
digraph
single letter
Monosyll., VC#
frequent
frequent
frequent
Polysyll., VCV
frequent
less frequent
rare
Polysyll., VC#
frequent
less frequent
rare
after long V
never
never
never
Table 3: Gemination by position
2.3
Gemination by consonant class
There are further restrictions on gemination in loanwords which interact with consonant classes and individual consonants. Some consonant classes and individual consonants are more likely to undergo gemination
than others.
2.3.1
Voiceless stops
Voiceless stops undergo gemination in loanwords regularly. In word-final position in monosyllables, all
voiceless stops are geminated very regularly, even if the geminated consonant is spelt with a single letter
or a digraph in the source word. Consonants [p] and [t] are geminated in intervocalic position following
a stressed vowel primarily only when there is a double letter in the source word spelling, whereas [k] is
geminated regularly when there is a digraph in the spelling (but only when the word ends in -er). In
word-final position in polysyllables, [p] is very rarely geminated, whereas gemination is more frequent in
the case of [t] and [k]. This is summarised in Table 4.
[p]
Monosyll., VC#
more frequent following
some vowels than others:
e.g. tip [tip:] vs. top [top]
[t]
e.g. set [sEt:], fit [fit:]
suite [svit:], Brit [brit:]
[k]
e.g. choc [Sok:], chic [Sik:]
rock [rok:], Jacques [ZOk:]
Polysyll., VCV
mostly double letter spelling:
e.g. hippy [hip:i], rapper [rEp:Er]
one counterexample: doping [dop:i:Ng]
(compensatory lengthening)
mostly double letter spelling:
e.g. setter [sEt:Er], Betty [bEt:i]
BUT: sweater [svEt:Er]
e.g. Wecker [vEkkEr], rocker [rok:Er]
Black and Decker [blEk:EndEk:Er]
Polysyll., VC#
very rare:
Galopp [gOlop:]
e.g. cricket [krikEt:]
ballet [bOlEt:]
toilet [toOlEt:]
e.g. baroque [bOrok:]
Table 4: Gemination of voiceless stops in loanwords
2.3.2
Voiceless affricates
Just like voiceless stops, voiceless affricates are geminated in loanwords very often in word-final position
as well as in intervocalic position in monosyllabic words. There are no recent loanwords which contain
affricates in word-final position in polysyllabic words. This is summarised in Table 5.
[ts]
[tS]
Monosyll., VC#
e.g. Witz [vit:s], spritz(en) [Sprit:s]
Hetz [hEt:s]
e.g. match [mEt:S], touch [tOt:S]
Putsch [put:S]
Polysyll., VCV
e.g. Sitzer [zit:sEr]
Kratzer [krOt:zEr]
not many data
e.g. Gletscher
Polysyll., VC#
no data
no data
Table 5: Gemination of voiceless affricates in loanwords
2.3.3
Voiceless fricatives
Geminated voiceless fricatives are also common in loanwords. In word-final position in monosyllables,
[f] is only geminated when it is spelt with a double consonant letter, while [S], [s] and [x] are frequently
geminated even if they are spelt with a single consonant letter or a digraph. In intervocalic position, [f]
is only geminated when the source word is spelt with a double consonant letter or was preceded by a
stressed vowel in a language which has post-tonic lengthening, and even in these cases, it can be pronounced
4
Lilla Magyar
Generals Paper
MIT
short. Intervocalic [S] does not seem to undergo gemination in loanwords, but we do not have much data.
Intervocalic [s] only seems to geminate when the source word contains a double consonant letter in the
spelling, while [x] is always spelt with a ch in source words and is very frequently geminated. There are no
data about word-final voiceless fricatives in polysyllabic words. This is summarised in Table 6 below.
[f]
[S]
[s]
[x]
Monosyll., VC#
doble letter spelling
e.g. Treff [trEf:]
bluff [bløf:]
e.g. couche [kuS:]
plush [plyS:], Bush [buS:]
e.g. plus [plus:]
stressz [StrEs:]
Krach [krOx:]
Pech [pEx:], Bach [bOx:]
Polysyll., VCV
double letter or post-tonic lengthening
in SL: e.g. Koffer % [kof:Er],
mafia % [mOf:iO]
not many data, no
gemination: Fischer [fiSEr]
double letter in spelling
dessert [dEs:Ert], Presser [prEs:Er]
only names ending in -er
e.g. Pacher [pOx:Er]
Polysyll., VC#
no data
no data
no data
no data
Table 6: Gemination of voiceless fricatives in loanwords
2.3.4
Voiced stops
Voiced stops hardly ever undergo gemination in loanwords. There is one example of a monosyllable
ending in [b] spelt with a single consonant letter in the source word (as well as in the loanword)
which is pronounced as a geminate by most speakers. We can find a few examples for gemination
in intervocalic position, but in all of the cases, the consonant is spelt with a double consonant letter in
the source word. There are no data on word-final voiced stops in polysyllabic words. This is shown in Table 7.
[d]
Monosyll., VC#
rare
e.g. club [klub:]
no evidence
[g]
no evidence
[b]
Polysyll., VCV
rare, only double letter
e.g. hobby [hob:i]
rare, only double letter
Yiddish [jid:iS]
no data
Polysyll., VC#
no data
no data
no data
Table 7: Gemination of voiced stops in loanwords
2.3.5
Voiced fricatives
Table 8 shows that here are hardly any recent loans from English, German, and French which have a
short vowel followed by a voiced fricative. There is only one example in which the word-final consonant
is geminated and devoiced by most (especially older or more conservative) speakers, but several younger
speakers do pronounce it with a long [z].
[v]
[z]
[Z]
Monosyll., VC#
no data
Polysyll., VCV
no data
Polysyll., VC#
no data
one example: jazz [dZEs:],
but for many younger speakers:
[dZEz:]
no data
no data
no data
no data
no data
Table 8: Gemination of voiced fricatives in loanwords
2.3.6
Nasals
Nasals undergo gemination in loanwords fairly rarely. Occasionally, in word-final position, both [m] and [n]
can be pronounced long even if the consonant is spelt with a single letter in the source word. There are
not many examples for intervocalic [m] and it seems to be geminated only when there is an orthographic
5
Lilla Magyar
Generals Paper
MIT
geminate in the source word. There are no source words in which [n] is spelt with a single consonant letter
and is pronounced long in the loanword, and even when there is an orthographic geminate in the source
word, [n] can be pronounced short in the loanword. There are no examples which contain word-final [m]
and [n] in polysyllabic words. This is summarised in Table 9.
[m]
[n]
Monosyll., VC#
e.g. jam [dZEm:],
slam [slEm:]
e.g. gin [dZin:],
Polysyll., VCV
not many examples:
e.g. shimmy [Sim:i]
no evidence
(kennel [kEn:El] or [kEnEl])
Polysyll., VC#
no data
no data
Table 9: Gemination of nasals in loanwords
2.3.7
Liquids
The consonant [l] undergoes gemination quite often, but all the examples contain orthographic geminates.
The consonant [r] does not lengthen in loanwords unless there is a double consonant letter in the source
word spelling. This is shown in Table 10.
[l]
[r]
Monosyll., VC#
double letter in spelling
Hall / hall [hOl:]
no evidence
Polysyll., VCV
double letter in spelling
e.g. dollar [dol:a:r]
possible in the case of double letter
Harry [hEr:i]
Polysyll., VC#
e.g. model [modEl:]
cartel [kOrtEl:]
no data
Table 10: Gemination of liquids in loanwords
2.4
Gemination in monosyllabic words without orthographic reflex in the source
word
In more recent loanwords, gemination is not productive in intervocalic position unless the source word is
spelt with a double consonant letter. However, it is still fairly productive in monosyllabic loanwords, even
without orthographic reflex. Since gemination in monosyllables spelt with single consonant letters is the
most widespread and the most intriguing issue in connection with the whole phenomenon, discussion in this
paper will be restricted to these cases.
The following table shows the hierarchy of gemination in monosyllabic loans without orthographic reflex:
Consonant class
Gemination in loans
Voiceless stops
Voiceless affricates
Voiceless fricatives
Nasals
Voiced stops
Liquids
Voiced fricatives
widespread
widespread
frequent
some
one
none
none
Table 11: The extent of gemination in loanwords by consonant class (without orthographic reflex in the
source word)
The hierarchy of gemination in loanwords reflects a universal markedness hierarchy, that is, voiceless consonants are more likely to be geminated than voiced ones, obstruents are less marked geminates than sonorants
etc. (see Podesva (2002), Kawahara (2007), and Steriade (2004)).
6
Lilla Magyar
3
Generals Paper
MIT
The puzzle and possible hypotheses
3.1
3.1.1
The puzzle
The influence of orthography
At first blush, given the abundance of loans with a double consonant letter in the source word, one might
assume that gemination is merely an application of an orthographic rule to pronunciation. Since the spelling
of geminates and singletons is almost entirely phonetic in Hungarian, source language spelling rules are often
carried over into the borrowing language as rules of pronunciation. This approach, however, would be too
simplistic and would not be able to account for many issues associated with this phenomenon.
To be able to claim that orthography has a primary role in loanword gemination, we have to assume
that speakers must have encountered loanwords in written form for the first time. This may be true in the
case of technical terms or words related to highbrow culture, but many other words - especially those which
have become part of low-colloquial vocabulary - have been presented to native speakers mostly in spoken
form.
Furthermore, as was pointed out by Nádasdy (1989), it would be rather strange to assume that native
speakers of Hungarian are particularly faithful to the spelling pronunciation of geminates when there is
evidence that they are, in fact, fairly conversant with reading rules of foreign languages.
In addition to the facts discussed above, there is another good reason to claim that gemination in
loanwords is not solely a reflection of spelling: a considerable number of borrowings which contain geminates
have a singleton consonant letter in the source word spelling, for example (dzsem [dZEm:] ‘jam’, set [sEt:]
‘set, outfit’, szlemm [slEm:] ‘slam’, etc.) (Nádasdy (1989)).
3.1.2
Gemination as a rule of Hungarian phonotactics
Hungarian distinguishes between singleton consonants and geminates. In principle, all consonants can be
geminated in the native Hungarian phonology. However, most geminates are created through morphological
(or morphophonological) processes. Both singleton consonants and geminates are equally possible in intervocalic and word-final position following short stressed vowels, which indicates that gemination in loanwords
in the same positions is not required by native Hungarian phonotactics. The following examples are taken
from Nádasdy (1989):
Word
Gloss
Word
Gloss
beteg [bEtEg]
‘ill’
-
retteg
‘he is scared’
kosz [kos]
‘dirt’
-
rossz [ros:]
‘bad’
vice [vitsE]
‘janitor’
-
vicce[vit:sE]
‘his / her joke’
Table 12: Singletons and geminates in the same context
However, there are phenomena similar to loanword gemination in the native Hungarian phonology as well.
In West-Hungarian dialects and in colloquial speech, consonants can be geminated in the same contexts as
in loanwords (for example, köpeny [køp:Eñ] ‘gown’, szalag [sOl:Og] ‘ribbon’, csat [tSOt:] ‘buckle’ (Nádasdy
(1989)). Furthermore, there are sporadic cases of consonant lengthening which is not reflected by spelling
in old Finno-Ugric words (for example, lesz [lEs:] ‘will be’, kisebb [kiS:Eb:] ‘smaller’, egy [EJ:] ‘one’ (Nádasdy
(1989)). Another interesting fact is that a consonant very often lengthened by speakers of such dialects is
[l] in intervocalic position (e.g. elem [ElEm] % [El:Em] ‘battery’ - Etelka Tekla Gráczi, p.c.), which is hardly
ever geminated in a loanword when the source word is spelt with a single letter.
3.1.3
Speakers’ awareness of universal markedness hierarchies
Since gemination in loanwords appears to reflect some sort of a universal markedness hierarchy, the question
arises whether native speakers of Hungarian are aware of such markedness constraints on gemination and
whether geminate markedness hierarchies are present even in languages which allow all kinds of geminates.
For example, are consonants which are cross-linguistically more marked less likely to be geminated in Hungarian as well then those which are less marked geminates in most languages? If so, do native speakers
7
Lilla Magyar
Generals Paper
MIT
have intuitions about markedness? Can they, for example, decide which nonce word is a more well-formed
native Hungarian word or Hungarianised loanword by relying on their intuitions about geminate markedness? Are there any other phenomena in the native language - such as a subphonemic process like post-tonic
lengthening - which reflect universal geminate markedness hierarchies?
3.1.4
Gemination as a strategy to preserve source vowel length?
In many languages, vowels are shorter when they are followed by a geminate consonant. However, we do not
know much about native speakers’ perception of foreign vowels compared to their Hungarianised counterparts
in loanwords. Do speakers hear vowels in source words as shorter than their substitute vowels in loanwords?
If the vowels in the source word preceding the consonant (which is geminated in the loanword) are perceived
as shorter than their loan counterparts by native speakers of Hungarian, gemination can be seen as a way
of preserving source vowel length in the loanword.
3.2
Possible hypotheses
Based on the discussion above, we can formulate the following hypotheses, which will be tested and discussed
in this paper later on.
3.2.1
Hypothesis 1
Geminate-to-singleton ratios in the lexicon (that is, type frequency distribution of geminate and singleton forms of each consonant) line up with the universal markedness hierarchy: cross-linguistically marked
geminates are less common even in a language which allows all kinds of geminates, and native speakers’
judgements also reflect those patterns.
3.2.2
Hypothesis 2
There is a native subphonemic process - for example, post-tonic lengthening - which is very similar to
gemination in loanwords in that it involves consonant lengthening and occurs following short stressed vowels.
This lengthening hierarchy also lines up with universal geminate markedness.
3.2.3
Hypothesis 3
Patterns of universal geminate markedness are learnable from the native Hungarian lexicon.
3.2.4
Hypothesis 4
Native speakers perceive foreign short vowels as shorter in closed syllables than Hungarian vowels in the
same context. Gemination is a way of preserving source (foreign) vowel length (shortness).
4
Testing Hypothesis 1: Do patterns of universal geminate markedness show
up in the Hungarian lexicon and in native speakers’ judgements?
In order to test whether the distribution of singletons and geminates in the lexicon by each consonant class
reflects universal hierarchies of geminate markedness and if so, do these patterns show up in native speakers’
judgements, we did a corpus study and conducted a wug well-formedness judgement experiment.
4.1
Corpus study
All monosyllabic words in which a short stressed vowel is followed by a singleton or a geminate consonant
have been extracted from the Hungarian Webcorpus on the Szószablya Project Website (Halácsy et al.
(2004)). Table 13 contains data in numbers and percentages about the distribution of geminate and singleton
consonants following short vowels in monosyllables. For example, there are 66 monosyllabic words ending in
a short vowel + short voiceless stop sequence and 63 has a short vowel + geminate voiceless stop combination,
which means that voiceless stops occur in singleton forms in 51% and as geminates in 49% at the end of
monosyllabic words.
8
Lilla Magyar
Generals Paper
MIT
Geminate-to-singleton ratios are shown in Table 14 in descending order. The higher the percentage
of geminates occurring after short vowels word-finally in monosyllabic word compared to the percentage of
singleton consonants in the same context (in the case of a certain consonant class), the higher the geminate
to singleton ratio is.
Consonant class
Voiceless stops
Voiced stops
Voiceless fricatives
Voiced fricatives
Voiceless affricates
Nasals
Liquids
Singleton
66
51
35
11
14
35
51
%
51%
65%
54%
92%
35%
76%
72%
Geminate
63
28
30
1
26
11
20
%
49%
35%
46%
8%
65%
24%
28%
Table 13: Distribution of singletons and geminates by consonant class in Hungarian words (suffixed forms
and widely used loanwords included)
Consonant class
Voiceless affricates
Voiceless stops
Voiceless fricatives
Voiced stops
Liquids
Nasals
Voiced fricatives
Geminate to singleton ratio
0.65
0.49
0.46
0.35
0.28
0.24
0.08
Table 14: Geminate to singleton ratio by consonant class (descending order)
As is clearly shown by the above tables, voiceless consonants tend to have higher geminate to singleton ratios
than their voiced counterparts. Thus, voiceless consonants are geminated more frequently than voiced ones,
which seems to reflect universal hierarchies of geminate markedness.
4.2
4.2.1
Wug test
Testing material
The test contains 236 words (118 word pairs). All words are nonce monosyllables ending in a short vowel +
short consonant or geminate sequence. Word pairs are minimal pairs: a word ending in a particular short
vowel + short consonant and another word ending in the same vowel + the geminated form of the same
consonant (e.g. mok [mok] - mokk [mok:]). Filler items include nonce word pairs with a short and a long
vowel. Test items are listed in Appendix I.
4.2.2
Participants
115 native speakers of Hungarian participated in the experiment. They grew up in Hungary, are currently
living there and were recruited through Facebook and various mailing lists. All the participants volunteered
to take the test for free. The test was administered online and participants remained anonymous.
4.2.3
Task
Participants were presented with a list of target and filler items (all in word pairs) and were asked to
decide which of the two nonce words they considered more well-formed as a native Hungarian word or a
Hungarianised loanword.
4.2.4
Results
The summary of results averaged for consonant class is presented in Table 15 and Table 16.
9
Lilla Magyar
Generals Paper
Consonant class
Voiceless stops
Voiced stops
Voiceless fricatives
Voiced fricatives
Voiceless affricates
Nasals
Liquids
Singleton
1037
1426
1126
2035
703
636
767
%
47%
65%
47%
85%
44%
61%
48%
MIT
Geminate
1148
761
1289
371
909
399
845
%
53%
35%
53%
15%
56%
39%
52%
Table 15: Well-formedness judgements or forms ending in singletons and geminates
Consonant class
Voiceless affricates
Voiceless stops
Voiceless fricatives
Liquids
Nasals
Voiced stops
Voiced fricatives
Geminate to singleton ratio
0.56
0.53
0.53
0.52
0.39
0.35
0.15
Table 16: Geminate to singleton ratio by consonant class (descending order)
The first column of Table 15 shows which consonants the nonce words end in. The second and the third
column contain the number and the percentage of participants (respectively) who found stems ending in
singleton consonants more well-formed than the ones ending in geminates. The number and the percentage
of test takers who preferred forms ending in geminate consonants are shown in columns four and five.
The geminate to singleton ratios in Table 16 show to what extent geminates were preferred over singletons. In cases where speakers generally preferred geminated forms to the ones ending in singleton consonants,
the geminate to singleton ratio is higher than 0.50.
In general, what the table shows is that monosyllabic nonce words ending in voiceless obstruent geminates are more acceptable to native speakers than stems ending in sonorant geminates, except for liquids.
Results by each vowel and consonant sequence will be compared with results of the phonotactic learning
experiment in the next subsection.
4.3
Correlation between geminate-to-singleton ratios in the corpus and based
on native speakers’ judgements
In the previous subsection, we have seen that geminate-to-singleton ratios - calculated based on type frequency distributions found in a corpus and based on native speakers’ well-formedness judgements on nonce
words - line up with hierarchies of universal geminate markedness. Although there are some differences
between patterns in the corpus and in people’s judgements, the cross-linguistically most frequent geminates
(voiceless obstruents) are equally highly ranked and the least frequent ones (voiced fricatives) are the lowest
ranked in both hierarchies. There is a strong positive correlation (r=8.271) between geminate-to-singleton
ratios based on type frequency distributions in the corpus and based on native speakers’ well-formedness
judgements, which is illustrated in Figure 1.
10
Lilla Magyar
Generals Paper
MIT
Figure 1: Correlation between geminate-to-singleton ratios in the corpus and in wug well-formedness judgements) in the corpus
5
Testing Hypothesis 2: Are there any other phenomena which show patterns
of universal geminate markedness?
The goal of the studies described in the previous sections was to find out if universal geminate markedness
hierarchies are prevalent in a language which allows all consonants to be geminated, and if that is the case,
are native speakers aware of such hierarchies, and do they - consciously or unknowingly - apply them when
they are asked to give well-formedness judgements on nonce monosyllables ending in short vowel + short
consonant / geminate sequences. The tentative answer to the question is yes, since cross-linguistically less
marked geminates are more frequent in the native Hungarian phonology as well. Furthermore, speakers seem
to be able to replicate these hierarchies fairly well in wug well-formedness judgement tasks.
In this section, we intend to explore the possibility of other phenomena which might reflect universal
markedness hierarchies of consonant lengthening. This particular phenomenon (and the (non-)existence
thereof) to be investigated here is a subphonemic process referred to as post-tonic lengthening. The reason
why this particular phenomenon was chosen is that it is found in languages in which loanword gemination
works similarly to the phenomena observed in Hungarian loanword phonology (cf. Farnetani and Kori (1986),
Passino (2004)).
5.1
Speech material
The speech material consists of 76 native Hungarian words, which can be divided into 19 groups (which
is the number of consonants possibly undergoing loanword gemination). Each group contains an existing
Hungarian word with a specific consonant in:
•
•
•
•
intervocalic position, following a stressed vowel
intervocalic position, following an unstressed vowel
word-final position, following a stressed vowel
word-final position, following an unstressed vowel
11
Lilla Magyar
Generals Paper
MIT
Examples are shown in Table 17:
Position
V(str)CV
V(unstr)CV
V(str)C#
V(unstr)C#
[p]
kapar ‘to scratch’
alapos ‘meticulous’
pap ‘priest’
alap ‘foundation’
[d]
retek ‘radish’
szeretek ‘love-1SG’
vet ‘to sow’
szeret ‘love-3SG’
[f ]
röfög ‘grunt-3SG’
leböfög ‘blurp at s.-3SG’
döf ‘stab-3SG’
ledöf ‘stab to death-3SG’
Table 17: Examples of target items
Each word was placed in a sentence frame like ‘X is a rather Y Z.’ (for example, ‘The sloth is a rather
interesting animal.’) Two lists of sentences were created: one contains sentences in a pseudo-random order
and the other one in a reverse order. Two filler sentences were included at the beginning and at the end of
the list, in order to exclude effects of being the first and last sentence on the list (since speakers tend to read
out the first and the last sentence on the list with different prosody).
5.2
Subjects
Six (four female and two male) native speakers of Educated Colloquial Hungarian participated in the experiment. They are between the age of 20 and 48, currently living in Hungary and have spent most of their
lives their home country (none of them have lived abroad for more than two years). They volunteered to
participate in the experiment for free and signed a consent form.
5.3
Recording procedure
Participants were asked to read out the two lists of sentences fluently, not too slowly or quickly, with natural
intonation. Before the experiment, they had a short training session so that they could familiarise themselves
with the reading material The speech material was recorded in a soundproof room in the building University
of Pannonia Broadcasting Company in Veszprém, Hungary.
5.4
Measurement procedure and criteria
The duration of each consonant was measured in Praat (Boersma and Weenink (2012)), an open-source
speech analysis software.
5.5
Analysis
76 x 2 x 6 = 912 data points were subject to analysis. Durations of various consonants in various positions
were compared and analysed. Linear mixed effects models were fitted to the duration data using the
lme4 package (Bates et al. (2011)) in R (R Core Team (2013)) and some paired t-tests were used to make
additional comparisons. Duration was the dependent variable and fixed effects were place of articulation,
position (VCV vs. VC#), stress (stressed vs. unstressed). Random effects by speaker, such as place of
articulation, position, stress, were also included in the model.
5.6
Results for consonants in word-final position
Since ‘genuine’ loanword gemination with no orthographic reflex is found almost exclusively in monosyllables
word-finally, following a short vowel, presentation of the results is restricted to only those cases in this
paper. Results is plotted for each consonant and grouped by consonant class. Duration data are given in
seconds.
5.6.1
Voiceless stops
Voiceless stops show significant effects of post-tonic lengthening (t = -6.88, baseline: stressed), which is
shown in Figure 2.
12
Lilla Magyar
Generals Paper
MIT
Figure 2: Voiceless stops in word-final position
5.6.2
Voiced stops
Voiced stops do not lengthen significantly following a short stressed vowel (t = -1.611, baseline: stressed).
This is illustrated by Figure 3.
Figure 3: Voiced stops in word-final position
5.6.3
Voiceless fricatives
Voiceless fricatives are significantly longer in final position following short stressed vowels (-4.154, baseline:
stressed), which is shown in Figure 4.
13
Lilla Magyar
Generals Paper
MIT
Figure 4: Voiceless fricatives in word-final position
5.6.4
Voiced fricatives
As we can see in Figure 5, Voiced fricatives show significant effects of post-tonic lengthening (t = -7.204,
baseline: stressed).
Figure 5: Voiced fricatives in word-final position
5.6.5
Voiceless affricates
Voiceless affricates are considerably longer following short stressed vowels (t = -5.354, baseline: stressed).
This is shown by Figure 6.
14
Lilla Magyar
Generals Paper
MIT
Figure 6: Voiceless affricates in word-final position
5.6.6
Nasals
As Figure 7 demonstrates, nasals undergo significant post-tonic lengthening in word-final position (t = -3.801,
baseline: stressed).
Figure 7: Nasals in word-final position
5.6.7
Liquids
Liquids do not show significant effects of post-tonic lengthening (t = 1.686, baseline: stressed), which we
can see in Figure 8 below.
15
Lilla Magyar
Generals Paper
MIT
Figure 8: Liquids in word-final position
5.6.8
The hierarchy of post-tonic lengthening compared to hierarchies of gemination
Table 18 shows which consonants undergo significant post-tonic lengthening and which do not. It also shows
the amount of lengthening in ms and the significance values.
Significant post-tonic lengthening
Voiceless affricates (est.: 0.042427, t=5.354)
Voiced fricatives (est.: 0.035669, t=7.204)
Voiceless fricatives (est.: 0.028971, t=4.154)
Voiceless stops (est.: 0.022527, t=6.88)
Nasals (est.: 0.22288, t=3.801)
No significant post-tonic lengthening
Voiced stops (est.: 0.003102, t=1.611)
Liquids (est.: 0.0019, t=1.686)
Table 18: Hierarchy of post-tonic lengthening by consonant class
The table showing the extent of gemination in loanwords by consonant class is repeated below in as Table
19.
Consonant class Gemination in loans
Voiceless stops
Voiceless affricates
Voiceless fricatives
Nasals
Voiced stops
Liquids
Voiced fricatives
widespread
widespread
frequent
some
one
none
none
Table 19: The extent of gemination in loanwords by consonant class (without orthographic reflex in the
source word)
16
Lilla Magyar
Generals Paper
MIT
Geminate-to-singleton ratios established based on type frequency distributions found in a corpus are repeated
in Table 20.
Consonant class Geminate-to-singleton ratio
Voiceless affricates
Voiceless stops
Voiceless fricatives
Voiced stops
Liquids
Nasals
Voiced fricatives
0.65
0.49
0.46
0.35
0.28
0.24
0.08
Table 20: Geminate-to-singleton ratios based on type frequencies in the corpus
Geminate-to-singleton ratios based on nonce well-formedness judgements by native speakers are repeated in
Table 20.
Consonant class Geminate-to-singleton ratio
Voiceless affricates
Voiceless stops
Voiceless fricatives
Liquids
Nasals
Voiced stops
Voiced fricatives
0.56
0.53
0.53
0.52
0.39
0.35
0.15
Table 21: Geminate-to-singleton ratios based on nonce well-formedness judgements
The hierarchies are very similar, but there is one crucial difference: voiceless fricatives seem to undergo
post-tonic lengthening, but they are not affected by gemination in loanwords, are very rare geminates in the
native Hungarian lexicon and are judged as very marked as geminates by native speakers. Furthermore, they
are cross-linguistically marked as geminates, therefore they do not fit into the observed universal markedness
hierarchy, for which there could be various reasons, e.g. cross-linguistic geminate markedness (Steriade
(2004)); partial devoicing of word-final fricatives (Barkanyi and Graczi (2012)) or difficulty in perception
of length contrasts (Kawahara (2007)). So, we can conclude that all phenomena in the native Hungarian
phonetics and phonology which are related to gemination or consonant lengthening line up with patterns of
universal markedness.
6
Testing Hypothesis 3: Are patterns of universal markedness learnable from
the native Hungarian lexicon?
We have seen in the previous sections that all phenomena (in native as well as in loanword phonology)
related to gemination or consonant lengthening are regulated by universal markedness. Moreover, native
speakers’ judgements also reflect those patterns. The goal of this section is to see whether the universally
observed tendencies - which ore observed even in the native Hungarian phonology - are learnable without
positing any handcrafted universal markedness constraints, that is, to find out whether the learner is able
to make phonotactic generalisations based on data found in the corpus and make correct predictions based
on that.
Before proceeding with the analysis, we will provide a short description of the learning model I used.
6.1
Some background on accounting for gradience and free variation
An often cited weakness of classic categorical Optimality Theory is its inability to account for grammars
with gradience and free variation. Unfortunately, even most OT learning algorithms (for example, Tesar
and Smolensky (1993), Pulleyblank and Turkel (1996), Prince and Tesar (1999)) are not able to account
for those phenomena, either. First, they are not trained to learn from noisy training data or they have
convergence problems when presented with such data. Second, they learn a single constraint ranking,
17
Lilla Magyar
Generals Paper
MIT
therefore they cannot model grammars with gradience or free variation (more than one winner, different
probabilities).
There have been several attempts to account for gradience and free variation within the narrow confines
of categorical OT. These include floating constraints (Nagy and Reynolds (1997)), free ranking of constraints
within strata (Anttila (1997)), strictness bands (Hayes (2000)) and factorial typology applied to a set of
unranked constraints (Ringen and Heinemäki (1999), and Magyar (2009)). The probabilistic model proposed
by Boersma (1997) (which moves away from standard OT) uses the Gradual Learning Algorithm. Although
it is successful at learning from noisy data, it is still unable to account for cumulatively effects. This weakness
of GLA was pointed out by Keller and Asudeh (2002). Keller proposed a model earlier, which was called
Linear Optimality Theory (Keller (2000)). This model is able to account for cumulatively effects, but is only
able to learn from acceptability judgement data instead of actual linguistic forms.
If we want an algorithm to learn gradient grammars, the ideal algorithm for our purposes should be able
to learn from a corpus consisting of real (not idealised and thus potentially noisy) data, handle effects caused
by cumulative constraint violations and generalise to novel data (examples not seen during the training).
Maximum entropy grammars generally meet all of these expectations.
The principle of maximum entropy is a concept first introduced to information theory by Shannon
(1948). Maxent models are log-linear models widely used in different fields including natural language
processing. The application of maximum entropy models to grammars has been discussed in Berger et al.
(1996), Rosenfeld (1996), Della Pietra et al. (1997), Jelinek (1999), Manning and Schütze (1999), Eisner
(2000), Eisner (2001), Keller (2002), Klein and Manning (2003), Goldwater and Johnson (2003), Jäger
(2004) and Hayes and Wilson (2008), amongst many others. The advantage of maxent models to other
frameworks is that it is able to account for both gradience / free variation and categorical phenomena, can
handle cumulatively effects and is mathematically more simple than GLA.
6.2
The model
In this learning experiment, we used the UCLA Phonotactic Learner developed by Bruce Hayes and Colin
Wilson (Hayes and Wilson (2008)). The grammar created by the model consist of constraints that are
weighted according to the principle of maximum entropy. Testing words are assessed by the grammar
based on the weighted sum of their constraint violations. The learning algorithm produces grammars that
can capture both categorical and gradient phonotactic patterns. The algorithm is not provided with any
constraints in advance, but uses its own resources to form constraints and weight them.
The basic idea in the application of maxent models to phonotactics is that well-formedness can be
interpreted as probability. We suppose that there is an infinite set Ω consisting of all universally possible
phonological surface forms. The maxent grammar assigns a probability to every member x of this set: this
probability P(x) expresses its phonotactic well-formedness. If this is an infinite (or just a very large) set,
the probability of any form will be extremely low. What is important here is the difference between the
probabilities, which can be significant and meaningful.
A maxent grammar model assigns probabilities with a set of constraints. Constraints in the Hayes
and Wilson model are all markedness constraints by the definition of standard Optimality Theory. Each
constraint has a weight which is a nonnegative real number. Constraints are situated on a scale: the ones
with higher weights are more powerful and can significantly lower the probability of forms violated by them.
The score is defined in the following way:
Definition 1: Score
The score of phonological representation x, denoted as h(x), is
N
P
h(x) =
wi Ci (x )
i=1
where
wi is the weight of the ith constraint,
Ci (x ) is the number of times that x violates the ith constraint,
N
P
denotes summation over all constraints (C1 , C2 , ..., CN ).
i=1
18
Lilla Magyar
Generals Paper
MIT
The maxent value of x is calculated as follows:
Definition 2: Maxent value
Given a phonological representation x and its score h(x) under a grammar, the
maxent value of x, denoted P*(x), is
P*(x) = exp(-h(x)).
The probability of x is calculated by determining its share in the total maxent values of all possible
forms in Ω, which is a quantity designated as Z. Probability is defined below:
Definition 3: Probability
Given a phonological representation x and its maxent value P*(x), the probability of x,
denoted P(x), is
P(x) = P*(x) / Z
where Z =
P
P*(y).
y∈Ω
We have shown above how well-formedness (probability) is calculated from constraint violations and different
weights. Now we are turning to learning. The core assumption of the Hayes and Wilson model is that the
learner has access to a vast amount of data from the target language, which is a representative set of observed
forms. However, it is not exposed to negative evidence, in other words, it is not told what forms are illegal.
Hopefully, this is a plausible simulation of real language learning.
The goal is to find a set of constraint weights which maximises the probability of observed forms. Since
total probability is fixed at 1, maximisng the probability of observed forms will automatically minimize the
probability of the unattested or not observed forms. This probabilistic concept of well-formedness relates
well to the principle of maximum entropy or entropy, which is the measure of the amount of surprise /
randomness in the system, given by the following formula:
−
P
P(x) log(P(x))
y∈Ω
According to a theorem proven by Della Pietra et al. (1997), if probability is defined as above (Definition
3), maximising entropy is in fact equivalent to maximising the probability of observed forms, given the
constraints. If we are making the assumption that all the observed data are independent and equally
distributed, then the observed data P(D) is simply the product of probabilities of individual observed data.
This is shown below:
Definition 4: Probability of the observed data under a given set of weights and constraints
Given a maxent grammar and a set D of observed data, the probability of D under the
grammar is
P(D) =
Q
P(x)
x∈D
where P(x) is as defined in Definition 3.
Now the learner has to find a set of weights that maximises P(D), which is a search issue. The search always
begins by giving every constraint the same initial weight (which is 1 in the Hayes and Wilson model). This
calculation is done iteratively until P(D) is maximised. In fact, the search does not find the maximum for
the actual P(D), but that of its natural logarithm instead. However, since the log function is monotonic, the
weights that maximise log(P(D)) are the same weights that maximise P(D).
The observed violation count of each constraint is also calculated: it is done by summing the violations
of the given constraint over all examples in the learning data. The calculation of expected violation count,
however, is much less straightforward and nearly impossible. To do so, we would have to sum over the set
19
Lilla Magyar
Generals Paper
MIT
of all possible phonological representations x ∈ Ω, which is an infinite set. So, first, we only give the formal
definition of the expected violation count:
Definition 5: Expected violation count
Given a grammar that determines maxent values, the expected number of violations of
constraint Ci is
E [Ci ] =
P
P(x) Ci (x),
x∈Ω
where
P(x) is the probability of the representation x,
CP
i (x) is the number of times that x violates Ci , and
represents summation over all x in Ω.
x∈Ω
Naturally, it is impossible to calculate the exact expected values. As a more plausible solution, we can
approximate the values by only looking at strings in Ω which are not the longest strings in the learning data
D. This is still an extremely large set, but at least, it is finite. Hayes and Wilson used methods previously
described in Ellison (1994), Eisner (1997), Albro (1998), Albro (2005) and Riggle (2004). This work has
shown that properties of a very large set of strings can be computed if the set is represented as a finite state
machine. The machine is constructed in the following way: first, each constraint is represented as a weighted
finite state acceptor, then constraints are combined into a single machine which is a full grammar. Every
path in this machine corresponds to a phonological representation also containing a vector of constraint
violations. Then we obtain expected violation counts by summing over all paths through the machine with
the help of a method also used by Eisner (2001) and Eisner (2002). This sum is rescaled according to the
frequency of forms of the given length in the training data.
To sum up the procedure of constraint weighting, we can say that the process is basically an iterated
hill-climbing search, which is designed to maximise the probability of learning data (or rather, the probability
of its logarithm - but it amounts to the same thing). At each stage, a local gradient based on difference
between observed and expected violation count for each constraint. Observed violation counts are calculated by examining the learning data, whereas the number of expected violations is approximated by the
aforementioned finite state method.
As mentioned earlier, constraints found and used by this learner are exclusively markedness constraints
based on co-occurence restrictions of different distinctive features (forming natural classes). The entire
process of learning alternates between constraint selection and constraint weighting, as follows: a new
constraint is selected, and then all the constraints are reweighted, and this process is iterated several times
until the learning has been completed. The overall algorithm (procedure of learning) is summarised below:
Definition 6: Phonotactic learning algorithm
Input: a set Σ of segments classified by a set F of features, a set of D of surface forms
drawn from Σ*, an ascending set A of accuracy levels, and a maximum constraint size N
1
2
3
4
5
6
begin with an empty grammar G
for each accuracy level a in A
do
select the most general constraint with accuracy less than a (if one exists) and
add it to G
train the weights of the constraints in G
while a constraint is selected in Step 4
The algorithm terminates when the search in Step 4 fails to generate a new constraint at the least stringent
accuracy level.
Now, after discussion of the basic mechanisms of the UCLA Phonotactic Learner (based on Hayes and Wilson
20
Lilla Magyar
Generals Paper
MIT
(2008)), we will continue with the description of specific details about the learning simulations.
6.2.1
Learner input
The learner must be provided with input files containing learning (or training) data, a feature chart, and
testing data.
Learning data
Since earlier learning experiments showed that there are no consistent co-occurrence restrictions on short
vowel + geminate vs. singleton combinations in the native Hungarian phonology (Magyar (2014)), we used
only consonants (geminates and singletons) in this simulation. All words (only monosyllables) ending in
a short vowel + short consonant or geminate sequence have been extracted from the Hungarian National
Corpus on the Szószablya Project Website (Halácsy et al. (2004)). Other parts of words than the final
consonant were cut and word-final consonants (both singletons and geminates) were organised into a list.
The data were collapsed across consonant classes: for example, [p], [t] and [k] were transcribed as t or t:,
a general segment representing singleton and geminate voiceless stops, respectively. We used the following
shorthands to represent consonant classes:
t voiceless stop (singleton)
t: voiceless stop (geminate)
d voiced stop (singleton)
d: voiced stop (geminate)
ts voiceless affricate (singleton)
t:s voiceless affricate (geminate)
s voiceless fricative (singleton)
s: voiceless fricative (geminate)
n nasal (singleton)
n: nasal (geminate)
l liquid (singleton)
l: liquid (geminate)
This simplification was necessary because gemination in loanwords does not seem to depend on individual
consonants (place of articulation) and universal hierarchies of geminate markedness are usually described
in terms of consonant classes rather than referring to more fine-grained differences between individual
consonants.4 Native speakers’ judgements suggest the same: there is consistency across consonant classes,
but not individual consonants. Besides, there was no need to specify the context as ”word-final position in
monosyllables”, as the corpus sample contained only monosyllables.
Feature chart
The learner also needs a feature chart, which defines the symbols according to their phonetic properties
in the following format: the top row labels contain feature names, the left side labels are speech sounds,
and the values can be +, - or 0 (unspecified). This contains all Hungarian consonant classes (that may
participate in loanword gemination) and all the features which all members of a particular consonant class
share but are enough to distinguish the consonant classes from each other. The feature chart is shown in
Table 22.
4 There is work on universal markedness hierarchies reflected in native patterns which refer to more fine-grained distinctions,
such as place of articulation (Sano (2014)). Therefore, it would be interesting to explore more carefully the possibility of place
of articulation influencing gemination processes.
21
Lilla Magyar
Generals Paper
t
d
ts
s
z
n
l
t:
d:
t:s
s:
z:
n:
l:
long
+
+
+
+
+
+
+
cons
+
+
+
+
+
+
+
+
+
+
+
+
+
+
son
+
+
+
+
syll
-
voice
+
+
+
+
+
+
+
+
MIT
cont
+
+
+
+
+
+
del rel
+
+
-
nasal
+
+
-
Table 22: Feature chart
Testing data
Testing data include a singleton and a geminated version of each consonant class.
6.2.2
Learner settings
Specifying gram size
Gram size is the number of feature matrices that appear in the constraints. For example, the gram size of
the *[+liquid][-voice] is two. The higher the gram size, the longer it takes the learner to finish finding the
constraints. In this case, since the learner is trained to learn a manageable number of symbols and features,
I anticipated that it will be able to terminate the process without having to impose any restrictions.
Therefore, we did not specify the gram size.
Specifying maximum O/E
O/E means the value of ‘observed over expected’, which is a measure of constraint effectiveness, the ratio
of the number of times a constraint is violated in the learning data, to the number of times it would be
expected to be violated based on the grammar learnt so far. (For a more detailed discussion, see Hayes and
Wilson (2008)). If we want to find the most important and the most powerful constraints, the O/E value
has to be low. However, for the reasons mentioned above (in the discussion of specifying gram size), we did
not specify the maximum O/E in this experiment.
Specifying the maximum number of constraints to discover
Specifying the maximum number of constraints to discover is another way of ensuring that the process will
terminate within a reasonable time span. As the learning procedure was not expected to take very long, we
did not specify the maximum number of constraints to discover.
6.2.3
Results
6.2.4
Constraints and weights
The constraints found and weighted by the learner are shown in Table 23 in descending order based on
weight (and not the order in which they were discovered).
22
Lilla Magyar
Generals Paper
MIT
Constraints
Weights
*[+del.rel,-long]
*[-son,+voice,+cont,+long]
*[+nasal,+long]
*[+voice,+cont,+long]
*[-son,+voice,+cont,-long]
*[+del.rel,+long]
*[-son,+voice,+long]
*[-son,+cont,+long]
*[+nasal]
*[-son,+cont,-long]
*[+voice,+cont,-long]
1.44
1.375
1.14
1.1
0.979
0.831
0.764
0.695
0.541
0.54
0.164
Table 23: Constraints and weights
Since only a limited number of features and segments were being tested, the learner has found only 11
constraints. The constraints listed above are defined as follows:
*[+del.rel,-long]: Short affricates are not allowed in word-final position.
*[-son,+voice,+cont,+long]: Long voiced fricatives are not allowed in word-final position.
*[+nasal,+long]: Long nasals are not allowed in word-final position.
*[+voice,+cont,+long]: Long voiced sonorants and sibilants are not allowed in word-final position.
*[-son,+voice,+cont,-long]: Short voiced obstruents are not allowed in word-final position.
*[+del.rel,+long]: Long affricates are not allowed in word-final position.
*[-son,+voice,+long]: Long voiced obstruents are not allowed in word-final position.
*[-son,+cont,+long]: Long sibilants are not allowed in word-final position.
*[+nasal]: Nasals are not allowed in word-final position.
*[-son,+cont,-long]: Short sibilants are not allowed in word-final position.
*[+voice,+cont,-long]: Short voiced sibilants are not allowed in word-final position.
As we can see, the learner was able to create constraints based on phonotactic generalisations, and the
constraints are basically the same as that of universal markedness. The more a constraint lines up with the
restrictions of universal markedness, the higher the weight the learner has assigned to it.
6.2.5
Predicted and observed probabilities
The learner assigns a sum of constraint violations to each form in the testing data: the higher the score, the
lower the probability of a given form is. We converted the sums of violations into probabilities based on the
following formula:
predicted-rating(x) = P*(x)1/T , where
P*(x) = exp( h(x)) and
h(x) is the score output by the model
23
Lilla Magyar
Generals Paper
MIT
Table 24 shows both sums of violations and probabilities assigned to each consonant class. Singleton
and geminated forms by each consonant class are listed in descending order with respect to probability
(well-formedness predicted based on the learning data).
Consonant class
Sum of violations
Probability
voiceless stops (singleton)
voiceless stops (geminate)
voiced stops (singleton)
liquids (singleton)
voiceless fricatives (singleton)
nasals (singleton)
voiceless fricatives (geminate)
voiced stops (geminate)
voiceless affricates (geminate)
liquids (geminate)
voiceless affricates (singleton)
nasals (geminate)
voiced fricatives (singleton)
voiced fricatives (geminate)
0
0
0
0.164
0.54
0.541
0.695
0.764
0.831
1.1
1.44
1.681
1.683
3.916
1
1
1
0.84874202188
0.58274825237
0.58216579539
0.49907444798
0.46579949765
0.43561345498
0.33287108369
0.23692775868
0.18618769521
0.18581569195
0.01992061806
Table 24: Violation sums and probabilities of candidates
6.2.6
Correlation between probabilities predicted by the model and type frequencies in the
corpus
In Table 25, predicted probabilities are shown in comparison with observed probabilities (relative type
frequencies in a corpus).
Consonant class
Probability
Type frequency (%)
voiceless stops (singleton)
voiceless stops (geminate)
voiced stops (singleton)
liquids (singleton)
voiceless fricatives (singleton)
nasals (singleton)
voiceless fricatives (geminate)
voiced stops (geminate)
voiceless affricates (geminate)
liquids (geminate)
voiceless affricates (singleton)
nasals (geminate)
voiced fricatives (singleton)
voiced fricatives (geminate)
1
1
1
0.84874202188
0.58274825237
0.58216579539
0.49907444798
0.46579949765
0.43561345498
0.33287108369
0.23692775868
0.18618769521
0.18581569195
0.01992061806
14.93212669683258
14.25339366515837
11.538461538461538
11.538461538461538
7.918552036199094
7.918552036199094
6.787330316742081
6.334841628959276
5.88235294117647
4.524886877828054
3.167420814479638
2.48868778280543
2.48868778280543
0.22624434389140274
Table 25: Violation sums and probabilities of candidates
It is clearly shown by Table 25 that the learner was able to predict the probability of both singleton
and geminate consonants correctly - the descending order of both singletons and geminates based on type
frequency in the lexicon matches up with the probabilities predicted by the learner.
The correlation between the probabilities predicted by the model and the relative frequency distributions
in the corpus is highly significant (r=0.988). It is plotted in Figure 9. The probabilities assigned by the
learner are shown on the x axis, while type frequency distributions are on the y axis.
24
Lilla Magyar
Generals Paper
MIT
Figure 9: Correlation between probabilities predicted by the model and type frequencies (%) in the corpus
6.3
Comparison with a model using hand-crafted markedness constraints
We have seen previously that the model is able to learn the frequency distribution of geminate and singleton
consonants based on phonotactic generalisations drawn from a corpus, and the constraints it learns roughly
correspond to that of universal markedness. Therefore, we can conclude that there is no specific need for
equipping the learner with particular information about geminate and singleton markedness. However, it
is important to see how well a model provided with such information would perform compared to the one
discussed in the previous subsection.
This model was implemented using the Maxent Grammar Tool (Hayes and Wilson (2009)). Just like
the learning model, this one is also based on the principle of maximum entropy. However, it uses handcrafted
constraints instead of discovering new constraints based on phonotactic generalisations. It has to be provided
with OT tableaus containing the relevant (unranked) constraints and the observed probability of each possible
candidate. Based on this information, the model will weight the constraints and assign predicted probabilities
to each form, which may or may not match up with the actual observed probabilities.
6.3.1
Input data
OT tableaus and constraints were used as input data. Each tableau contains a singleton and a geminated
version of a segment representing a consonant class, violation marks for constraints and observed probabilities for each possible output candidate. We used the following markedness constraints (for both singletons
and geminates):
*zz: Geminated voiced fricatives are forbidden.
*ss: Geminated voiceless fricatives are forbidden.
*tt: Geminated voiceless stops are forbidden.
*dd: Geminated voiced stops are forbidden.
25
Lilla Magyar
Generals Paper
MIT
*t:s: Geminated voiceless affricates are forbidden.
*nn: Geminated nasals are forbidden.
*ll: Geminated liquids are forbidden.
*z: Singleton voiced fricatives are forbidden.
*s: Singleton voiceless fricatives are forbidden.
*t: Singleton voiceless stops are forbidden.
*d: Singleton voiced stops are forbidden.
*ts: Singleton voiceless affricates are forbidden.
*n: Singleton nasals are forbidden.
*l: Singleton liquids are forbidden.
These constraints are given unranked to the learner, which will weight them based on the observed probabilities and constraint violations of possible output candidates.
6.3.2
Results
The results of the simulation are shown in Table 26.
Consonant class
Observed probability
Predicted probability
voiced fricatives (singleton)
voiced fricatives (geminate)
0.92
0.08
0.9199511662223586
0.08004883377764128
voiceless fricatives (singleton)
voiceless fricatives (geminate)
0.54
0.46
0.5399967938614894
0.4600032061385106
voiceless stops (singleton)
voiceless stops (geminate)
0.51
0.49
0.5099991991080961
0.490000800891904
voiced stops (singleton)
voiced stops (geminate)
0.65
0.35
0.6499876212125152
0.3500123787874848
voiceless affricates (singleton)
voiceless affricates (geminate)
0.35
0.65
0.3500123787874848
0.6499876212125152
nasals (singleton)
nasals (geminate)
0.76
0.24
0.7599769430688887
0.24002305693111126
liquids (singleton)
liquids (geminate)
0.72
0.28
0.7199811163658887
0.28001888363411126
Table 26: Observed and predicted probabilities of geminates and singletons by consonant class
The table shows clearly that predicted probabilities line up entirely with observed probabilities. There is
full positive correlation between predicted and observed probabilities (r=1), which is plotted below in Figure
10.
26
Lilla Magyar
Generals Paper
MIT
Figure 10: Correlation between probabilities observed in the corpus and predicted by the model
We can conclude that both models predict the distribution of singletons and geminates almost equally well.
However, the model which uses handcrafted constraints and OT tableaus has one advantage over the one
which discovers constraints based on phonotactic generalisations drawn from the corpus: it is possible to
have direct comparison between the singleton and the geminate form by each consonant class.
7
Testing Hypothesis 4: Perception of vowel length
There is no literature on the length of short vowels before geminates in Hungarian, but in many (including Finno-Ugric) languages, vowels shorten before geminates5 (Kawahara (to appear), Doty et al (2007)).
Based on evidence from other languages, we could hypothesise that gemination in Hungarian loanword is a
strategy to preserve source vowel length (shortness), if native speakers of Hungarian perceive the vowel in
the source word as shorter than its ‘substitute’ vowel in the loanword. The goal of this section is to test this
hypothesis.
7.1
Adaptation of English vowels
Native speakers substitute foreign vowels with the perceptibly closest one in the native vowel inventory
or the one which they consider to be the most similar to the source vowel. Some of the substitute vowels
may not seem phonetically ‘close’ to the original, however, Hungarian speakers regard them as the best
possible (for want of a better) replacement for the source vowel. Since English, German and Hungarian
has short and long vowels, faithfulness to source-vowel length can easily be satisfied in the process of
borrowing words from English and German to Hungarian. However, as Hungarian does not have exactly
the same vowel inventory as English and German, faithfulness to vowel quality cannot be achieved as
easily as the preservation of vowel length. At first sight, it looks like faithfulness to vowel length is
ranked higher than faithfulness to vowel quality, but this claim cannot be proven. Faithfulness to vowel
quality will practically always violated given the different vowel inventories of the source language and
the borrowing language and we cannot find examples which clearly show that both constraints could be
satisfied but one of them is more important. Probably the only example that shows this kind of pattern
5 There
are some languages - for example, in Japanese - in which it works the opposite way: vowels lengthen before geminates.
27
Lilla Magyar
Generals Paper
MIT
is the way the word spray [sprej] was borrowed into Hungarian. It is pronounced as [spre:] or [Spre:] as a
loanword, which involves a source-word diphthong changing into a monophthong, even though words like
éj [e(:)j] or kéj [ke(:)j] are perfectly well-formed in Hungarian. The reason why spray was borrowed with
a monophthong is that diphthongs are not found in standard Hungarian: what looks like a diphthong
similar to [ej] is a vowel + liquid sequence which is never described as a diphthong in Hungarian phonology. Therefore, even this example cannot prove that there is a clear preference of faithfulness to length
over vowel quality without coercion (e.g. the lack of the same vowel in the native Hungarian vowel inventory).
Table 27 shows English vowels and their Hungarian counterparts which native speakers use in loanwords.
SW
[I]
[E] / [e]
[æ]
[O]
[A]
[U]
LW
[i]
[E]
[E]
[o]
[O]
[u]
Table 27: English vowels and their Hungarian loan counterparts
German vowels and their Hungarian substitutes in loanwords are shown in Table 28 below.
SW
[I]
[E / e]
[y]
[œ]
[O]
[U]
[a]
LW
[i]
[E]
[y]
[ø]
[o]
[u]
[O]
Table 28: German vowels and their Hungarian loan counterparts
7.2
Perception experiment 1
We ran a pilot experiment in order to see how native speakers of Hungarian perceive vowel length in English
and German words compared to Hungarian words and whether their perception of vowel length plays a role
in loanword gemination.
7.2.1
Participants
Four native Hungarian speakers participated in the experiment. All of them currently reside in Hungary and
have not lived in any other countries for a longer time. They speak some English or German, but they are
not conversant with phonetics and do not have constant exposure to English or German spoken by native
speakers.
7.2.2
Speech material
The speech material consists of English, German and Hungarian words recorded by native speakers of
English, German and Hungarian. Word pairs comprise one Hungarian and one English or German word
which are minimal pairs or quasi minimal pairs. In order to avoid considerable differences in duration, we
paired up words which were produced by speakers of similar voice quality. All words in the experiment are
monosyllabic, ending in a short vowel + short voiceless or voiced consonant sequence. None of the words in
any word pair is a loanword. In the case of minimal pairs, the only difference is in the vowel: Hungarian
words contain vowels which are considered to be the most faithful substitute for the vowel in the English
or German member of the word pair. Word pairs used as speech material are listed in Appendix III/A and
III/B.
7.2.3
Task
Word pairs were presented orally to participants. Each word pair was played twice, but the order of word
pairs coming after one another was randomised. The order of the Hungarian and the English / German
word in each word pair was presented in two ways: once, the order was Hungarian-English / German, then
next, it was the other way around. Each word pair consists of a Hungarian word and an English word or a
Hungarian word and a German word (for example, Hungarian hit [hit] ‘faith’ and English hit [hIt]). They
were asked to decide which one of the two words has a shorter vowel, that is, the vowel in which word they
perceive as shorter.
28
Lilla Magyar
7.2.4
Generals Paper
MIT
Results
English-Hungarian word pairs
Table 29 shows how participants in the experiment perceived English vowels in comparison to their
Hungarian counterparts. In addition, actual durations of the vowels are included (in seconds).
Word pair
Perceived as shorter
Perceived as longer
Duration (s)
mit [mit]
hit [hIt]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.127382
0.115009
fut [fut]
foot [fUt]
by 0 subjects
by 0 subjects
by 0 subjects
by 0 subjects
0.101144
0.098444
hat [hOt]
but [h2t]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.117373
0.078111
vet [vEt]
set [set]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.130203
0.088622
szid [sid]
hid [hId]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.116010
0.083068
tud [tud]
good [gUd]
by 0 subjects
by 0 subjects
by 0 subjects
by 0 subjects
0.107769
0.105599
had [hOd]
bud [b2d]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.141419
0.087658
szed [sEd]
said [sed]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.138829
0.102780
szed [sEd]
sad [sæd]
by 4 subjects
by 0 subjects
by 0 subjects
by 4 subjects
0.138829
0.149757
Table 29: Vowel length in Hungarian-English word pairs
As is shown above, the four Hungarian native speakers perceived most of the English vowels as shorter than
their Hungarian counterparts. There are two English vowels, however, that they did not perceive as shorter
than their Hungarian counterparts: [U] and [æ]. English [U] and Hungarian [u] were perceived as having
the same length, whereas English [æ] was rated as longer than Hungarian [E]. The judgements given by
participants seem to line up with the actual duration data.
German-Hungarian word pairs
Table 30 shows how Hungarian-speaking subjects perceived German vowels in comparison to their Hungarian
counterparts. Vowel duration data are also included in the table (in seconds).
29
Lilla Magyar
Generals Paper
MIT
Word pair
Perceived as shorter
Perceived as longer
Duration (s)
kis [kiS]
Tisch [tIS]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.123737
0.096124
mos [mos]
Bosch [boS]
by 0 subjects
by 0 subjects
by 0 subjects
by 0 subjects
0.100956
0.101296
vas [vOS]
wasch [vaS]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.137081
0.092235
tus [tuS]
Fusch [fuS]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.119037
0.085063
köt [køt]
Schött [Sœt]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.138472
0.080
les [lES]
fesch [feS]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.134030
0.087056
Table 30: Vowel length in Hungarian-German word pairs
Participants in the experiment perceived most German vowels as shorter than their Hungarian counterparts. There is only one German vowel which they did not rate as shorter: it is [o]. They perceived German
[o] as having the same length as Hungarian [o].
7.3
7.3.1
Perception experiment 2
Participants
The participants were the same as the ones in the previous perception experiment.
7.3.2
Speech material
The speech material consists of Hungarian minimal pairs or near minimal pairs. There is only one difference
between the two members of each minimal pair, which is the length of the final consonant: one word contains
a singleton and the other one a geminate, and the examples contains voiceless consonants. All words used
in this experiment are monosyllabic (e.g. hit [hit] ‘faith’ and hitt [hit:] ‘believe-3rd.p.-past’): both words
contain the same vowel and they only differ in the length of the consonant. We only used short vowels and
voiceless stops in the experiment. Items are listed in Appendix III/C.
7.3.3
Task
The minimal pairs were played to participants, and participants were asked to decide which word in each
minimal pair has a shorter vowel. All word pairs were played twice, but in a random order: that is, no word
pair was directly followed by its repetition. Word pairs were presented in two different orders: (a) geminate
- singleton and (b) singleton - geminate.
7.3.4
Results
Table 31 shows how participants in the experiment perceived short vowels followed by singletons and
geminates, and actual durations of vowels (in seconds) in both context are also included.
30
Lilla Magyar
Generals Paper
MIT
Word pair
Perceived as shorter
Perceived as longer
Duration (s)
hit [hit]
bitt [hit:]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.115988
0.057
vet [vEt]
vert [vEt:]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.142483
0.090191
lap [lOp]
lapp [lOp:]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.146527
0.107326
lop [lop]
hoop [hop:]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.138545
0.076842
luk [luk]
pukk [puk:]
by 0 subjects
by 4 subjects
by 4 subjects
by 0 subjects
0.115937
0.063932
Table 31: Vowel length before singletons and geminates
It is clearly shown above that all four subjects perceived vowels followed by geminates as shorter than those
followed by singletons. Their judgements line up with durations of vowels in both contexts.
7.4
Implications for loanword gemination
Although we have not tested all possible short vowel and geminate / singleton combinations, the previous
experiment suggests that native speakers of Hungarian perceive vowels before geminates in closed syllables
as shorter than before singleton consonants. Moreover, by looking at loanwords which undergo gemination,
we cannot find any examples for loanword gemination among words in which an English or German vowel
perceived as longer or of the same length as its Hungarian counterpart and is followed by a singleton consonant
in the source word. (For example, speakers will pronounce English fit as [fit:] but not the name Pat as [pEt:]),
since English [I] is perceived shorter than Hungarian [i], while English [æ] is heard as longer than Hungarian
[E].) Gemination in German loanwords seems even more widespread, extending to consonants preceded by
almost all kinds of short vowels, which may explain why gemination in German words is more widespread
than gemination in English words. It implies that there is a connection between the preservation of source
vowel length (shortness) and gemination.
8
Interim summary
The goal of this paper is to explore a cross-linguistically widespread phonological process - gemination in
loanwords - which is also found in Hungarian. In connection with this phenomenon, the following hypotheses
were put to test in the previous sections: (1) geminate-to-singleton ratios based on type frequencies in the
lexicon line up with patterns of universal geminate markedness, and native speakers have a knowledge of these
patters; (2) there may be other processes in the native Hungarian phonetics or phonology which are similar
to gemination and universal markedness patterns are reflected in them, too; (3) patterns which correspond
to universal markedness are learnable from the native Hungarian lexicon; (4) gemination in loanwords is a
strategy to preserve source vowel shortness in closed syllables.
In Section 4, Hypothesis 1 was tested in the form of two studies: a corpus study and a wug test. We
found that there is a strong positive correlation between geminate-to-singleton ratios calculated based on
type frequencies in the corpus and based on native speakers’ well-formedness judgements on nonce words,
and the distribution of geminates and singletons by consonant class based on both type frequencies and
human judgements line up with patterns of universal geminate markedness.
The goal of Section 5 was to test Hypothesis 2, that is, to find out whether there are other processes
in the native phonetics/phonology which are similar to loanword gemination and also line up with patterns
of universal markedness. We managed to find one such process, which is post-tonic lengthening in closed
syllables: singleton consonants lengthen following stressed vowels. We have found that in general, consonants
which were more likely to undergo gemination in loanwords were more likely to lengthen post-tonically,
too.
In Section 6, we ran learning simulations to test Hypothesis 3. Results of the experiments suggest that
patterns of loanword gemination - which line up with universal geminate markedness - are learnable from
the native Hungarian lexicon based on phonotactic generalisations. We have also shown that a learner which
31
Lilla Magyar
Generals Paper
MIT
is not equipped with special information on segment markedness performs almost as well as a model using
hand-crafted markedness constraints.
Hypothesis 4 was tested in Section 7. We ran two perception experiments, which have two results: (1)
native speakers have perceived certain English and German vowels shorter than their Hungarian counterparts
(which are widely used as their substitute vowels in loanwords), and loanword gemination is more likely to
happen in the presence of these vowels; (2) The same native speakers perceived vowels followed by geminates
shorter than those followed by singletons. All this suggests that gemination in loanwords can be a strategy
to preserve the length (shortness) of the vowel in the source word.
We can conclude that all of our hypotheses have been confirmed, and that gemination in loanwords
is motivated by the need to preserve source-vowel length (shortness), but this need is not always satisfied:
whether gemination occurs or not is regulated by universal geminate markedness. With all this in mind, we
have all ingredients ready for an analysis.
9
Analysis: A Maximum Entropy model
As mentioned earlier, Hungarian loanword gemination is a gradient phenomenon with considerable interand intra-speaker variation. Therefore, we are going develop a maximum entropy model instead of providing
a categorical OT account.
9.1
The method
The method is the same as in Section 6.3., that is a maxent model using handcrafted constraints and
observed probabilities to weight constraints and predict probabilities for the possible output forms. It was
implemented with the hep of the Maxent Grammar Tool (Hayes and Wilson (2008)).
9.2
Constraints
In Section 6.3, we tested geminate markedness, therefore only markedness constraints were used. In this
case, we need both markedness and faithfulness constraints.6
9.2.1
Faithfulness constraints
IdentSV-LV(length): Vowel length of the source word must be preserved in the loanword.
MaxOrthGem: If the consonant is question is spelt with a double letter in the source word, it must be
geminated in the loanword.
9.2.2
Markedness constraints
The markedness constraints below are not yet listed in the hierarchical order of universal geminate
markedness. When ranked (as in a categorical OT analysis) or weighted (as in a maxent model), they reflect
the hierarchy of universal geminate markedness.
*zz: Geminated voiced fricatives are forbidden.
*ss: Geminated voiceless fricatives are forbidden.
*tt: Geminated voiceless stops are forbidden.
*dd: Geminated voiced stops are forbidden.
6 Apart from those listed in this sections, there are two additional constraints which could be used. One is a more general
Ident constraint, IdSL, which requires faithfulness to the vowel in the source word, that is, the same vowel must be used in
the loanword as in the source word. However, this constraint is violated by all winning candidates and the more particular
constraint IdSL(length) suffices. The other constraint would be *nnv, which bans all vowels from surfacing which are not
native to the Hungarian vowel inventory. However, for the sake of simplicity, this issue will also be ignored in the present
analysis.
32
Lilla Magyar
Generals Paper
MIT
*t:s: Geminated voiceless affricates are forbidden.
*nn: Geminated nasals are forbidden.
*ll: Geminated liquids are forbidden.
9.3
9.3.1
Input data
Words
We are going to use the following words to train and test the grammar and find constraint weights.
szett [sEt:] ‘set or outfit’
Pat [pEt] <diminutive of Patrick or Patricia>
Ted [tEd] <diminutive of Edward or Theodore>
giccs [git:S] ‘kitsch’
hall [hOl:] ‘hall’
Hal [hæl] <diminutive of Harry>
dzsem [dZEm] or [dZEm:] ‘marmalade’
friss [friS:] ‘fresh’
dzsessz [dZEs:] ‘jazz’
9.3.2
OT tableaus
The input is not an abstract underlying representation in this case, but both the written and the spoken
form of the source word. The following tableaus are not classic OT tableaus in a sense that constraints are
not ranked. Therefore, fatal violations are not marked, and instead of winners, probabilities are indicated.
Since gemination in loanwords exhibits much inter- and intra-speaker variation, oftentimes there is no
categorical winner among the candidates. Probabilities are assigned to possible surface forms, ranging
from 0 (not attested) to 1 (most likely). Probabilities are based on Nádasdy’s (1989) description, my own
judgements, personal conversations with native linguists (Miklós Törkenczy, Péter Rebrus, Péter Siptár,
Etelka Tekla Gráczi) and several native speakers.
Constraints will be weighted by the model, therefore, they are not ranked in the tableaus.
Tableau 1 shows a word which does not contain a source-word orthographic geminate and ends in a voiceless
stop. It was borrowed from English through German. Most native speakers prefer the surface form containing
a low mid vowel [E] followed by a geminated [t].
set /set/
a.
b.
sEt
sEt:
IdSV-LV(length)
0.1
0.9
MaxOG
*ss
*tt
*dd
*tts
*nn
*ll
*zz
*
*
Tableau 1: szett ‘set/outfit’
Candidate (a) violates IdSL(l), since [E] was perceived as longer than English [e] by native speakers of
Hungarian, while candidate (b) violates *tt, since it contains a geminated voiceless stop.
Tableau 2 contains an example for a word ending in a singleton voiceless stop preceded by a [æ] and an [E] in
the loanword. Candidate (a), that is, a surface form ending with a singleton [t] is preferred by most native
33
Lilla Magyar
Generals Paper
MIT
speakers of Hungarian over candidate (b).
Pat /pæt/
a.
b.
pEt
pEt:
IdSV-LV(length)
0.9
0.1
MaxOG
*ss
*tt
*
*dd
*tts
*nn
*ll
*zz
*
Tableau 2: Pat <name>
Candidate (a) does not violate any constraints. Candidate (b) violates two constraints, IdSL and tt. It
violates IdSL because English [æ] was not perceived shorter by native Hungarian speakers than Hungarian
[E]. If the Hungarian vowel (which is not longer than the English source vowel) is followed by a geminate, it
will shorten more and therefore becomes much more different in length from the source vowel. Candidate
(b) also violates *tt because it ends with a geminated voiceless stop.
A foreign name well-known to Hungarian speakers and ending in a voiced stop is shown in Tableau 3. Most
people prefer candidate (a), a surface form ending in a singleton [d].
Ted /ted/
a.
b.
tEd
tEd:
IdSV-LV(length)
0.9
0.1
MaxOG
*ss
*tt
*dd
*tts
*nn
*ll
*zz
*
*
Tableau 3: Ted <diminutive of Edward or Theodore>
Candidate (a) violates IdSL(l) because native speakers perceived English [e] as shorter than Hungarian
[E]. Candidate (b) violates *dd, since it ends with a geminated voiced stop.
Tableau 4 shows a word which was borrowed into Hungarian with a geminated voiceless affricate.7 The only
acceptable form of this loanword is candidate (b), which ends [t:S].
Kitsch /kItS/
a.
b.
gitS
gitS:
0.0
1.0
IdSV-LV(length)
MaxOG
*ss
*tt
*dd
*tts
*nn
*ll
*zz
*
*
Tableau 4: giccs ‘kitsch, gaudy stuff’
Candidate (a) violates IdSL(l) because native speakers perceived German [I] as shorter than Hungarian [i].
Candidate (b) violates *tts, as it ends with a geminated voiceless affricate.
The following word (shown in Tableau 5) is an example of a borrowing ending in a geminated liquid. The
source word is German and contains a double consonant letter in the spelling. The most widespread form
of this loanword is candidate (b), that is, the one ending with a geminate [l].
Hall /hal/
a.
b.
hOl
hOl:
0.1
0.9
IdSV-LV(length)
MaxOG
*
*
*ss
*tt
*dd
*tts
*nn
*ll
*zz
*
Tableau 5: Hall ‘hall’
Candidate (a) violates two constraints: IdSL(l) because native speakers perceived German [a] as shorter
than Hungarian [O], and MaxOG, since the source word is spelt with a double consonant letter. Candidate
(b) violates only one constraint, *ll, because it ends with a liquid geminate.
Similarly to Pat, the example in the following tableau (Tableau 6) is not a loanword per se, but it is a
common English name that many Hungarians know from films or books. People pronounce it with a short
[l].
7 In
this paper, we are not going into details concerning why this the [g] has changed into a [k] in the loanword.
34
Lilla Magyar
Generals Paper
Hal /hæl/
a.
b.
hEl
hEl:
IdSV-LV(length)
1.0
0.0
MaxOG
*ss
MIT
*tt
*dd
*tts
*nn
*
*ll
*zz
*
Tableau 6: Hal <name>
Candidate (a) does not violate any constraints. Candidate (b) violates two constraints: IdSL(l) because
native speakers did not perceive English [æ] as shorter than Hungarian [E], and *ll for containing a
geminated liquid.
An example of a loanword optionally ending in a short or a long nasal is shown in Tableau 7. The source
word which the loanword is based on contains a singleton consonant spelt with a single consonant letter.
The original source is an English word, but it was borrowed into Hungarian through German.8
Jam /dZem/
a.
b.
dZEm
dZEm:
IdSV-LV(length)
0.5
0.5
MaxOG
*ss
*tt
*dd
*tts
*nn
*ll
*zz
*
*
Tableau 7: dzsem ‘marmalade’
Candidate (a) violates IdSL(l) as native speakers perceived German [e] as shorter than Hungarian [E],
while candidate (b) violates *nn, because it ends with a geminated nasal.
The example in Tableau 8 is a loanword borrowed from German, which ends in a voiceless fricative. The
source word does not contain an orthographic geminate, only a trigraph which represents [S] in German
spelling. The winner is a form which contains an [i] instead of the source vowel [I], and always ends in a
geminate [S].
frisch /frIS/
a.
b.
friS
friS:
0.0
1.0
IdSV-LV(length)
MaxOG
*ss
*tt
*dd
*tts
*nn
*ll
*zz
*
*
Tableau 8: friss ‘fresh’
Candidate (a) violates IdSL(l) as native speakers perceived German [I] as shorter than Hungarian [i]. As
candidate (b) contains a long voiceless affricate, it violates *tts.
The following word (shown in Tableau 9) comes from English, has a double consonant letter in the spelling
of the source word and a devoiced geminate in the loanword for most native speakers of Hungarian.
jazz /dZæz/
a.
b.
c.
dZEz
dZEz:
dZEs:
0.05
0.05
0.9
IdSV-LV(length)
MaxOG
*ss
*tt
*dd
*tts
*nn
*ll
*zz
*
*
*
*
*
Candidate (a) violates MaxOG because the source word is spelt with a double consonant letter. Both
candidate (b) and (c) violate IdSL(l). Apart from that constraint, candidate (b) violates *zz for containing
a long voiced fricative, while candidate (c) violates *ss, since it ends with a geminated voiceless fricative.
As mentioned earlier in this subsection, the above tableaux do not present a full OT analysis of the data:
instead, they show input forms (both spelling and pronunciation of the source word), possible surface forms
and their violation marks for each constraint, and the probability score of each candidate. What they do not
show is fatal violations, since the constraints are not ranked. In the following subsection, we will find out
the weights for the constraints and see if the model is able to make predictions that line up with the actual
8 That
is the reason why we are using [e] instead of [æ].
35
Lilla Magyar
Generals Paper
MIT
observed probabilities.
9.4
9.4.1
Results
Constraint weights
Based on the input data, the learner has assigned the following weights to the constraints listed in 9.2.
Constraints with higher weights are more powerful than the ones with lower weights. Constraints and
weights are shown in Table 32.
Constraint
Weight
*zz
*ll
MaxOG
*dd
IdSL(l)
*nn
*tt
*t:s
*ss
8.660745165047155
6.573670003683399
5.8290854593356105
5.139753175751042
2.9424513636090874
2.942214625961538
1.3375742260864934
0.01589221942517886
0.0
Table 32: Constraints and their weights
The weights assigned to markedness constraints correspond quite closely to hierarchies of universal geminate
markedness. Constraints banning geminated voiced fricatives, liquids and voiced stops have much higher
weights than those not allowing voiceless stops, fricatives and affricates. However, the ranking between
constraints banning geminated voiceless consonants could be made more precise by providing the learner
with more input data.
9.4.2
Predicted and observed probabilities of competing candidates
Table 33 shows the observed probability of each possible output form and the predicted scores assigned to
the same forms by the model.
Input
Candidate
Observed
Predicted
set /set/
sEt
sEt:
0.1
0.9
0.05009449622784214
0.9499055037721578
Pat /pæt/
pEt
pEt:
0.9
0.1
0.9499055037721581
0.05009449622784202
Ted /ted/
tEd
tEd:
1.0
0.0
0.8998973370338739
0.10010266296612606
Kitsch /kItS/
gitS
git:S
0.0
1.0
0.050094496227842074
0.9499055037721579
Hall /hal/
hOl
hOl:
0.1
0.9
0.09994221307400378
0.9000577869259961
Hal /hæl/
hEl
hEl:
1.0
0.0
0.9999263506337052
0.00007364936629
Jam /dZem/
dZEm
dZEm:
0.5
0.5
0.4999408155883891
0.5000591844116109
frisch /frIS/
friS
friS:
0.0
1.0
0.050094496227842074
0.9499055037721579
jazz /dZæz/
dZEz
dZEz:
dZEs:
0.05
0.05
0.9
0.0501743010704173
0.05005760161197584
0.8997680973176069
36
Lilla Magyar
Generals Paper
MIT
Table 33: Predicted and observed probabilities of candidates
Even without calculating the correlation between predicted and observed probability scores, it is easy
to see that the model’s predictions closely correspond to native speakers’ preferences (that is, observed
probabilities).
The correlation between observed and predicted scores is close to 1 (r = 0.9969941), which is plotted as
Figure 11.
Figure 11: Correlation between observed and predicted probabilities
10
Conclusions
This paper discussed a cross-linguistically widespread phenomenon of loanword phonology, namely, gemination in loanwords, which is rather different from the same process in other languages. It has been claimed
to not be motivated by native Hungarian phonology and phonotactics, but the patterns of loanword gemination seem to line up with universal hierarchies of geminate markedness. Our goal was to find out what
the motivation is for this process and whether it is regulated by universal markedness or native Hungarian
phonotactics.
We found that type frequency distributions of singleton and geminate consonants in the native Hungarian lexicon lines up with universal hierarchies of geminate markedness. Speakers are also aware of these
patterns: their judgements of nonce words also line up with universal geminate markedness. Apart from
loanword gemination, frequency distributions and native speakers’ judgements, we discovered another phenomenon which reflects universal patterns of geminate markedness: this a subphonemic process, which is
referred to as post-tonic lengthening. Furthermore, we have shown by a learning simulation that geminate
markedness hierarchies can be learned from the native Hungarian lexicon based on phonotactic generalisations, which disproves the claim that the propensity of some consonants to geminate on loanwords or the
lack of gemination in the case of others has nothing to do with native Hungarian phonotactics.
We also showed that native speakers of Hungarian perceive vowels followed by geminates to be shorter
than those followed by singleton consonants. They also perceived certain English and German vowels shorter
37
Lilla Magyar
Generals Paper
MIT
than the Hungarian vowels which are generally used as their substitute vowels in Hungarianised loanwords,
and gemination occurs only in words which contain those vowels. This seems to indicate that gemination in
loanwords is a strategy to preserve the shortness of the source word vowel in the loanword.
Finally, we provided a maximum entropy model to account for the non-categorical nature of the
phenomenon. The model was able to predict the probability of each possible output form correctly and
assigned weights to the markedness constraints in a way that it lines up with universal hierarchies.
However, there are additional issues that we will have to address in the future and that are outside of the
scope of this paper.
In the nonce well-formedness judgement task, subjects were asked to choose from two monosyllabic words
and decide which one would be a more well-formed Hungarian word or Hungarianised loanword. Therefore,
when they were trying to make a decision, they were looking at full word forms and were influenced by
several additional factors, including similarity of a given string to existing word endings (for example, if a
nonce word looks like boz, it is likely to be preferred over bozz not only because of geminate markedness, but
as a result of its similarity to the ending of the existing word doboz ‘box’). At the same time, in the learning
experiment, we were training and testing word endings or rhymes. It would be interesting to see how the
learner performs if we try to find phonotactic restrictions for whole monosyllabic word forms.
There is another factor that we did not consider in our analysis, but it might play a role in the gemination
of consonants in loan monosyllables.In a data collection we have done using a reverse alphabetised dictionary
(Papp (1969)), we have found that in the cases when a source words ends in an unattested vowel + consonant
sequence (that is, a rhyme which is missing from the language (for example, there are no Hungarian words
ending in -up or ip - it may or may not be an accidental gap), gemination is more likely to occur in the
loanword. However, this generalization is very hard to express in a formal analysis. If a short vowel + short
consonant sequence is unattested, but at the same time, the geminated form is equally unattested, why would
the word be borrowed with a geminate consonant? Is it some sort of a minimal word requirement?
In the perception experiment, we used recordings by native speakers of Hungarian, English and German.
The speakers reading out word pairs to be compared had similar voice qualities, in order to exclude drastic
individual differences of vowel length. However, ‘similar voice quality’ is a rather nebulous concept and is
hard to verify, therefore it would be more advantageous to ask bilingual people (Hungarian-English and
Hungarian-German) to record words in both languages. In the same experiments, we used recordings by
speakers of British English (that is why we are using the symbol [e] instead of [E]). It would be interesting
to see which vowels the subjects would perceive as shorter, if we used recordings by American speakers who
pronounce [E], which is very similar to what Hungarian speakers use.
An interesting observation which may be related to the vowel length issue was made by Törkenczy
(1989). The observation is that in loanwords which have complex onsets, gemination is almost completely
predictable. It would also be worth exploring why this is the case.
Apart from the perception of source-loan vowel length, consonant length is also worth examining. It
would be interesting to see - similarly to the perception experiment discussed above - how native speakers
of Hungarian perceive consonant length in contexts where loanword gemination applies. If they perceive
English and German consonants shorter in monosyllables, preceded by short vowels, this could also be a
primary motivation for gemination in loanwords.
References
D. Albro. Evaluation, implementation, and extension of primitive Optimality Theory. Master’s thesis, UCLA, Los
Angeles, CA, 1998.
D. Albro. Computational Optimality Theory and the phonological system of Malagasy. PhD dissertation, UCLA,
Los Angeles, CA, 2005.
A. Anttila. Variation in Finnish phonology and morphology. PhD thesis, Stanford University, 1997.
D. Bates, M. Maechler, and B. Bolker. lme4: Linear mixed-effects models using S4 classes.
http://CRAN.R-project.org/package=lme4.
2011.
URL
A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. A Maximum Entropy approach to natural language processing.
Computational Linguistics 22:39-71, 1996.
38
Lilla Magyar
Generals Paper
MIT
P. Boersma. How we learn variation, optionality, and probability. Proceedings of the Institute of Phonetic Sciences
of the University of Amsterdam 21, 43-58, 1997.
P. Boersma and D. Weenink. Praat, Version 5.3.26. 2012. URL www.praat.org.
S. A. Della Pietra, V. J. Della Pietra, and J. D. Lafferty. Including features of random fields. IEEE Transactions on
Pattern Analysis and Machine Intelligence 19:380-393, 1997.
J. Eisner. Efficient generation in primitive Optimality Theory. Proceedings of the 35th Annual Meeting of the
Association for Computational Linguistics, 313-320, 1997.
J. Eisner. Review of Kager: ”Optimality Theory”. Computational Linguistics, 26(2):286-290, 2000.
J. Eisner. Expectational semirings: Flexible EM for finite-state transducers. In: G. van Noord (ed.), Proceedings of
the ESSLLI Workshop on Finite-State Methods in NLP (FSMNLP), 2001.
J. Eisner. Parameter estimation for probabilistic finite state transducers. Proceedings of the 40th Annual Meeting of
the Association for Computational Linguistics, 1-8, 2002.
T. M. Ellison. Phonological derivation in Optimality Theory. Proceedings of the Fifteenth International Conference
on Computational Linguistics, 1007-1013, 1994.
E. Farnetani and S. Kori. Effects of syllable and word structure on segmental durations in spoken Italian. Speech
Communication, 5:17–34, 1986.
F. Goldwater and M. Johnson. Learning OT Constraint Rankings Using a Maximum Entropy Grammar. In: J.
Spenader, A. Eriksson and Ö. Dahl (eds.), Proceedings of the Stockholm Workshop on Variation within Optimality
Theory, 111-120. Stockholm: Stockholm University, Department of Linguistics, 2003.
P. Halácsy, A. Kornai, L. Németh, András Rung, I. Szakadát, and V. Trón. Creating open language resources for
Hungarians. Proceedings of the 4th International Conference on Language Resources and evaluation, 2004.
B. Hayes. Gradient well-formedness in Optimality Theory. In: J. Dekkers, F. van der Leeuw, and J. van de Weijer
(eds.), Optimality Theory: Phonology, Syntax and Acquisition. Oxford University Press, Oxford., 2000.
B. Hayes and C. Wilson. A Maximum Entropy Model of Phonotactics and Phonotactic Learning. Linguistic Inquiry,
2008.
G. C. Jäger. Maximum entropy models and stochastic Optimality Theory. Ruthgers Optimality Archive ROA-625,
2004.
F. Jelinek. Statistical methods for speech recognition. Cambridge, MA: MIT Press, 1999.
D. Karvonen. The Emergence of the Unmarked in Finnish loanword phonology. Paper presented at the 17th Manchester Phonology Meeting, 2009.
S. Kawahara. Sonorancy and geminacy. University of Massachusetts Occasional Papers in Linguistics 32: Papers in
Optimality III, 2007.
F. Keller. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. PhD thesis,
University of Edinburgh, 2000.
F. Keller. Optimality-Theoretic Lexical Functional Grammar. In: P. Merlo and S. Stevenson (eds.), The Lexical Basis
of Sentence Processing: Formal, Computational and Experimental Issues, 59-74. John Benjamins, Amsterdam, The
Netherlands, 2002.
F. Keller and A. Asudeh. Probabilistic learning algorithms and Optimality Theory. Linguistic Inquiry, 33(2):225-244,
2002.
Zs. Kertész. Approaches to the phonological analysis of loanword adaptation. The Even Yearbook 7, Department of
English Linguistics, Eötvös Loránd University, Budapest, 2006.
D. Klein and C. Manning. Maxent models, conditional estimation, and optimization, without the magic. Tutorial
presented at NAACL-03 and ACL-03, 2003.
B. Krishnamurti and J. P. L. Gwynn. A Grammar of Modern Telugu. Oxford University Press, 1985.
H. Kubozono, J. Ito, and A. Mester. Consonant gemination in Japanese loanword phonology: A phonological account.
Proceedings of the 18th International Congress of Linguists, 2008.
39
Lilla Magyar
Generals Paper
MIT
L. Magyar. Hungarian Vacillating Stems: A Statistical and Optimality Theoretic Account. MA thesis, University of
Pannonia, 2009.
L. Magyar. Learnability of word-final gemination in loan monosyllables. Squib for 24.981, 2014.
C. Manning and H. Schütze. Foundations of statistical natural language processing. Cambridge, MA: MIT Press,
1999.
Á. Nádasdy. Consonant length in recent borrowings into Hungarian. Acta Linguistica Hungarica, 39, 1989.
N. Nagy and B. Reynolds. Optimality theory and variable word-final delition in Faetar. Language Variation and
Change, 9:37-55, 1997.
F. Papp. Reverse-Alphabetized Dictionary of the Hungarian Language. Akadémiai Kiadó, Budapest, 1969.
D. Passino. Adaptation of loanwords and licensing strategies in Italian. Paper presented at the 12th Manchester
Phonology Meeting, 2004.
R. Podesva. Segmental constraints on geminates and their implications for typology. LSA Annual Meeting, 2002.
A. Prince and B. Tesar. Learning phonotactic distributions. Technical Report TR-54, Rutgers Center for Cognitive
Science, Rutgers, ROA-353, 1999.
D. Pulleyblank and W. J. Turkel. Optimality Theory and learning algorithms: The representation of recurrent
featural asymmetries. In: J. Durand and B. Laks (eds.), Current trends in phonology: Models and methods, pages
653-684, University of Salford, 1996.
R Core Team. R: A language and environment for statistical computing. 2013. URL http://www.R-project.org.
J. Riggle. Generation, recognition, and learning in finite state Optimality Theory. Phd dissertation, UCLA, Los
Angeles, CA, 2004.
C. O. Ringen and O. Heinemäki. Variation in Finnish Vowel Harmony. Natural Language and Linguistic Theory 17.,
303-37., 1999.
R. Rosenfeld. A maximum entropy approach to adaptive statistical language modeling. Computer Speech and
Language 10:187-228, 1996.
C. E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal 27(3):379-423, 1948.
S. N. Sridhar. Kannada. New York: Routledge, 1990.
D. Steriade. Sources of markedness and why they matter. GLOW, Markedness Workshop, 2004.
B. Tesar and P. Smolensky. The learnability of Optimality Theory: An algorithm and some complexity results. Ms.,
Department of Computer Science and Institute of Cognitive Science, University of Colorado, Boulder. Rutgers
Optimality Archive ROA-2, 1993.
M. Törkenczy. Does the onset branch in Hungarian? Acta Linguistica Hungarica, 39, 1989.
40
Lilla Magyar
Generals Paper
MIT
Appendix I: Wug test items
The following nonce words were used as items in a forced choice well-formedness judgement task in which
participants were asked to choose the word that they found more acceptable as a Hungarian word or a
hungarianised loanword. The items were presented in a randomised order (with the singleton consonant as
the first option or the other way round). Filler items included nonce word pairs with a monosyllabic word
ending in a long vowel and a short consonant sequence and a word ending in a short vowel and a short
consonant sequence (such as nab [nOb] - náb [na:b]). Target items are listed here.
nab [nOb]
nabb [nOb:]
zuc [zuts]
zucc [zut:s]
cen [tsEn]
cenn [tsEn:]
lür [lyr]
lürr [lyr:]
toz [toz]
tozz [toz:]
dem [dEm]
demm [dEm:]
par [pOr]
parr [pOr:]
zot [zot]
zott [zot:]
fub [fub]
fubb [fub:]
zucs [fzutS]
zuccs [zut:S]
nüd [nyd]
nüdd [nyd:]
tög [tøg]
tögg [tøg:]
nal [nOl]
nall [nOl:]
pöl [pøl]
pöll [pøl:]
nus [nuS]
nuss [nuS:]
füsz [fys:]
füssz [fys]
zik [zik]
zikk [zik:]
zosz [zos]
zossz [zos:]
küt [kyt]
kütt [kyt:]
nis [niS]
niss [niS:]
tid [tid]
tidd [tid:]
lüf [lyf]
lüff [lyf:]
dic [dits]
dicc [dit:s]
deg [dEg]
degg [dEg:]
zuk [zuk]
zukk [zuk:]
bif [bif]
biff [bif:]
cil [tsil]
cill [tsil:]
zor [zor]
zorr [zor:]
zib [zib]
zibb [zib:]
dum [dum]
dumm [dum:]
det [dEt]
dett [dEt:]
nuz [nuz]
nuzz [nuz:]
gak [gOk]
gakk [gOk:]
szöf [søf]
szöff [søf:]
decs [dEtS]
deccs [dEt:S]
tac [tOts]
tacc [tOt:s]
zocs [zotS]
zoccs [zot:S]
zom [zom]
zomm [zom:]
zop [zop]
zopp [zop:]
pesz [pEs]
pessz [pEs:]
cez [tsEz]
cezz [tsEz:]
död [død]
dödd [død:]
nad [nOd]
nadd [nOd:]
nül [nyl]
nüll [nyl:]
peb [pEb]
pebb [pEb:]
mec [mEts]
mecc [mEt:s]
nacs [nOtS]
naccs [nOt:S]
naf [nOf]
naff [nOf:]
lig [lig]
ligg [lig:]
dök [døk]
dökk [døk:]
nüm [nym]
nümm [nym:]
masz [nym]
massz [nym:]
kob [kob]
kobb [kob:]
zöb [zøb]
zöbb [zøb:]
lüb [lyb]
lübb [lyb:]
küz [kyz]
küzz [kyz:]
döz [døz]
dözz [døz:]
döt [døt]
dött [døt:]
noc [nots]
nocc [not:s]
niz [niz]
nizz [niz:]
föc [føts]
föcc [føt:s]
püc [pyts]
pücc [pyt:s]
lics [lit:S]
liccs [lit:S]
döcs [døtS]
döccs [døt:S]
saz [SOz]
sazz [SOz:]
nut [nut]
nutt [nut:]
dösz [døs]
dössz [døs:]
nücs [nytS]
nüccs [nyt:S]
zed [zEd]
zedd [zEd:]
zod [zod]
zodd [zod:]
lef [lEf]
leff [lEf:]
zud [zud]
zudd [zud:]
zof [zof]
zoff [zof:]
nag [nOg]
nagg [nOg:]
zuf [zuf]
zuff [zuf:]
zog [zog]
zogg [zog:]
fek [fEk]
fekk [fEk:]
nüg [nyg]
nügg [nyg:]
mok [mok]
mokk [mok:]
tug [tug]
tugg [tug:]
nük [nyk]
nükk [nyk:]
pel [pEl]
pell [pEl:]
nam [nOm]
namm [nOm:]
zol [zol]
zoll [zol:]
lim [lim]
limm [lim:]
zul [zul]
zull [zul:]
döm [døm]
dömm [døm:]
san [sOn]
sann [sOn:]
cit [tsit]
citt [tsit:]
cip [tsip]
cipp [tsip:]
cer [tsEr]
cerr [lEf]
döp [døp]
döpp [døp:]
ces [tsEs]
cess [tsEs:]
nup [nup]
nupp [nup:]
nör [nør]
nörr [nør:]
küp [kyp]
küpp [kyp:]
sat [sOt]
satt [sOt:]
cir [tsir]
cirr [tsir:]
nusz [nus]
nussz [nus:]
pur [pur]
purr [pur:]
tisz [tis]
tissz [tis:]
pas [pOS]
pass [pOS:]
zos [zos]
zoss [zos:]
müs [myS]
müss [myS:]
dös [døS]
döss [døS:]
41
Lilla Magyar
Generals Paper
Appendix II: Production experiment items
List 1
Szerintem a csiga elég érdekes állat.
‘In my opinion, the snail is a rather interesting animal.’
Szerintem a ‘halad’ elég érdekes szó.
‘In my opinion, ‘to make headway’ is a rather interesting expression.’
Szerintem a rab elég veszélyes.
‘In my opinion, the inmate is rather dangerous.’
Szerintem az ‘ánizsos’ elég érdekesen hangzik.
‘In my opinion, ‘anise-flavoured’ sounds rather interesting.’
Szerintem a mez elég nagy rád.
‘In my opinion, this jersey is rather oversized for you.’
Szerintem a ‘szeretek’ elég érdekes szóalak.
‘In my opinion, ‘I love’ is a rather interesting expression.’
Szerintem a konyakok elég drágák.
‘In my opinion, brandies are rather expensive.
Szerintem a hal elég drága.
‘In my opinion, fish is rather expensive.’
Szerintem a vogul elég érdekes nyelv.
‘In my opinion, Mansi is a rather interesting language.’
Szerintem a ‘döf ’ elég furcsa szó.
‘In my opinion, ‘to stab’ is a rather weird word.’
Szerintem a fababa elég jó ajándék.
‘In my opinion, wooden doll is a rather nice present.’
Szerintem a malacok elég sokat esznek.
‘In my opinion, pigs eat quite a lot.’
Szerintem Alap elég nagy község.
‘In my opinion, Alap is rather big village.’
Szerintem az adoma elég hihetetlen.
‘In my opinion, that urban legend is rather hard to believe.’
Szerintem a kas elég nagy.
‘In my opinion, the beehive is rather large.’
Szerintem a rizs elég finom.
‘In my opinion, rice is quite tasty.’
42
MIT
Lilla Magyar
Generals Paper
Szerintem az ‘alapos’ elég érdekes szó.
‘In my opinion, ‘meticulous’ is a rather interesting word.’
Szerintem a doh elég elviselhetetlen.
‘In my opinion, musty smell is rather hard to tolerate.’
Szerintem az arab elég nehéz nyelv.
‘In my opinion, Arabic is a rather difficult language.’
Szerintem a ‘szavadat’ elég érdekes szóalak.
‘In my opinion, ‘your word-ACC is a rather interesting phrase.’
Szerintem a pirogok elég finomak lettek.
‘In my opinion, the pirogs have turned out quite well.’
Szerintem a kanalak elég kicsik.
‘In my opinion, the spoons are quite small.’
Szerintem a ‘komoly’ elég furcsa szó.
‘In my opinion, ‘serious’ is a rather strange word.’
Szerintem a ‘Kemenes’ elég érdekes név.
‘In my opinion, Kemenes is a rather interesting name.’
Szerintem a kakasok elég vadak.
‘In my opinion, roosters are rather wild.’
Szerintem a ‘bekever’ elég érdekes szó.
‘In my opinion, ‘to mix’ is a rather interesting word.’
Szerintem az orosz elég nehéz nyelv.
‘In my opinion, Russian is a rather difficult language.’
Szerintem a ‘ledöf ’ elég kifejező szó.
‘In my opinion, to stab someone to death is a rather expressive phrase.’
Szerintem a zsizsik elég kicsi bogár.
‘In my opinion, the wheat weevil is a rather small bug.’
Szerintem a rom elég rossz látvány.
‘In my opinion, the ruin is a bad sight.’
Szerintem a ‘hadar’ elég vicces szó.
‘In my opinion, ‘to sputter’ is a rather funny word.’
Szerintem a kukac elég jó csali.
‘In my opinion, worm is a good fishing bait.’
Szerintem a potroh elég jellegzetes testrész.
‘In my opinion, the abdomen of insects is a rather prominent body part.’
43
MIT
Lilla Magyar
Generals Paper
Szerintem a kacs elég érdekesen néz ki.
‘In my opinion, the the tendril looks rather interesting.’
Szerintem a ‘röfög’ elég vicces szó.
‘In my opinion, ‘grunt’ is a rather funny word.’
Szerintem a kabar elég ismert nép.
‘In my opinion, Kabar is quite a famous nation.’
Szerintem a ‘vakar’ elég furcsa szó.
‘In my opinion, to scratch is a rather strange word.’
Szerintem a kar elég hosszú.
‘In my opinion, the handle is rather long.’
Szerintem a ‘kapar’ elég furcsa szó.
‘In my opinion, to scratch is a rather strange word.’
Szerintem az oroszok elég sokan vannak.
‘In my opinion, there are quite a lot of Russians.’
Szerintem a ‘hamar’ elég érdekes szó.
‘In my opinion, soon is a rather interesting word.’
Szerintem az ezer elég sok.
‘In my opinion, a thousand is quite a huge amount.’
Szerintem Lev elég hı́res ı́ró.
‘In my opinion, Lev is quite a famous writer.’
Szerintem a retek elég jó vacsorára.
‘In my opinion, radish is quite good for dinner.’
Szerintem a lemezek elég régiek.
‘In my opinion, the records are quite old.
Szerintem a ‘facsar’ elég érdekes szó.
‘In my opinion, ‘to extract juice’ is a rather interesting phrase.’
Szerintem a dac elég furcsa reakció.
‘In my opinion, ‘defiance’ is a rather strange reaction.’
Szerintem a ‘makacs’ elég vicces szó.
‘In my opinion, ‘stubborn’ is a rather funny word.’
Szerintem a fog elég hamar kihullott.
‘In my opinion, the tooth fell out quite soon.’
Szerintem a ‘hasal’ elég érdekes szó.
‘In my opinion, ‘to lie on your stomach’ is a rather interesting phrase.’
44
MIT
Lilla Magyar
Generals Paper
Szerintem a nyak elég kényes testrész.
‘In my opinion, the neck is a rather sensitive part of the body.’
Szerintem a ‘ken’ elég rövid szó.
‘In my opinion, ‘to smear’ is a rather short word.’
Szerintem Jemen elég érdekes hely.
‘In my opinion, Yemen is a rather interesting place.’
Szerintem a pap elég komoly ember.
‘In my opinion, the priest is a rather serious man.’
Szerintem a ‘leböfög” elég vicces szó.
‘In my opinion, ‘to blurp at someone’ is a rather funny word.’
Szerintem a kakas elég nagy.
‘In my opinion, the rooster is quite big.’
Szerintem a ‘marad’ elég érdekes szó.
‘In my opinion, ‘to stay’ is a rather interesting word.’
Szerintem a kosz elég kiábrándı́tó.
‘In my opinion, the filth is rather disappointing.’
Szerintem a lemez elég drága.
‘In my opinion, the record is rather expensive.’
Szerintem Kijev elég nagy város.
‘In my opinion, Kiev is quite a big city.’
Szerintem a ‘vet’ elég rövid szó.
‘In my opinion, to sow is a rather short word.
Szerintem a haszon elég fontos.
‘In my opinion, profit is rather important.’
Szerintem a ‘szeret’ elég szép szó.
‘In my opinion, to love is a rather beautiful word.’
Szerintem a fadarab elég nagy.
‘In my opinion, that piece of wood is rather big.’
Szerintem a Szenes elég udvariatlan.
‘In my opinion, Szenes is rather impolite.’
Szerintem a ‘kacag’ elég vicces szó.
‘In my opinion, ‘to laugh’ is a rather funny word.’
Szerintem a konyak elég drága.
‘In my opinion, brandy is rather expensive.’
45
MIT
Lilla Magyar
Generals Paper
Szerintem a ‘nyafog’ elég vicces szó.
‘In my opinion, ‘to whine’ is a rather funny word.’
Szerintem Párizs elég nagy város.
‘In my opinion, Paris is quite a big city.’
Szerintem a karom elég vastag.
‘In my opinion, my arm is rather thick.’
Szerintem a ‘dohog’ elég furcsa szó.
‘In my opinion, to mumble in a grumpy way is a strange phrase.’
Szerintem a vad elég nagy.
‘In my opinion, that wild animal is quite big.’
Szerintem a szavad elég biztosı́ték.
‘In my opinion, your word is enough guarantee.’
Szerintem az ‘evez’ elég gyakori szó.
‘In my opinion, to row is a rather frequent word.’
Szerintem a potrohok elég kicsik.
‘In my opinion, the abdomens of insects are rather small.’
Szerintem a ‘kifacsar’ elég furcsa szó.
‘In my opinion, ‘to extract juice fully’ is a rather strange word.’
Szerintem a sör elég népszerű ital.
‘In my opinion, beer is a rather popular drink.’
List 2
Szerintem a sör elég népszerű ital.
Szerintem a ‘kifacsar’ elég furcsa szó.
‘In my opinion, ‘to extract juice fully’ is a rather strange word.’
Szerintem a potrohok elég kicsik.
‘In my opinion, the abdomens of insects are rather small.’
Szerintem az ‘evez’ elég gyakori szó.
‘In my opinion, to row is a rather frequent word.’
Szerintem a szavad elég biztosı́ték.
‘In my opinion, your word is enough guarantee.’
Szerintem a vad elég nagy.
‘In my opinion, that wild animal is quite big.’
Szerintem a ‘dohog’ elég furcsa szó.
46
MIT
Lilla Magyar
Generals Paper
‘In my opinion, to mumble in a grumpy way is a strange phrase.’
Szerintem a karom elég vastag.
‘In my opinion, my arm is rather thick.’
Szerintem Párizs elég nagy város.
‘In my opinion, Paris is quite a big city.’
Szerintem a ‘nyafog’ elég vicces szó.
‘In my opinion, ‘to whine’ is a rather funny word.’
Szerintem a konyak elég drága.
‘In my opinion, brandy is rather expensive.’
Szerintem a ‘kacag’ elég vicces szó.
‘In my opinion, ‘to laugh’ is a rather funny word.’
Szerintem a Szenes elég udvariatlan.
‘In my opinion, Szenes is rather impolite.’
Szerintem a fadarab elég nagy.
‘In my opinion, that piece of wood is rather big.’
Szerintem a ‘szeret’ elég szép szó.
‘In my opinion, to love is a rather beautiful word.’
Szerintem a haszon elég fontos.
‘In my opinion, profit is rather important.’
Szerintem a ‘vet’ elég rövid szó.
‘In my opinion, to sow is a rather short word.
Szerintem Kijev elég nagy város.
‘In my opinion, Kiev is quite a big city.’
Szerintem a lemez elég drága.
‘In my opinion, the record is rather expensive.’
Szerintem a kosz elég kiábrándı́tó.
‘In my opinion, the filth is rather disappointing.’
Szerintem a ‘marad’ elég érdekes szó.
‘In my opinion, ‘to stay’ is a rather interesting word.’
Szerintem a kakas elég nagy.
‘In my opinion, the rooster is quite big.’
Szerintem a ‘leböfög” elég vicces szó.
‘In my opinion, ‘to blurp at someone’ is a rather funny word.’
Szerintem a pap elég komoly ember.
47
MIT
Lilla Magyar
Generals Paper
‘In my opinion, the priest is a rather serious man.’
Szerintem Jemen elég érdekes hely.
‘In my opinion, Yemen is a rather interesting place.’
Szerintem a ‘ken’ elég rövid szó.
‘In my opinion, ‘to smear’ is a rather short word.’
Szerintem a nyak elég kényes testrész.
‘In my opinion, the neck is a rather sensitive part of the body.’
Szerintem a ‘hasal’ elég érdekes szó.
‘In my opinion, ‘to lie on your stomach’ is a rather interesting phrase.’
Szerintem a fog elég hamar kihullott.
‘In my opinion, the tooth fell out quite soon.’
Szerintem a ‘makacs’ elég vicces szó.
‘In my opinion, ‘stubborn’ is a rather funny word.’
Szerintem a dac elég furcsa reakció.
‘In my opinion, ‘defiance’ is a rather strange reaction.’
Szerintem a ‘facsar’ elég érdekes szó.
‘In my opinion, ‘to extract juice’ is a rather interesting phrase.’
Szerintem a lemezek elég régiek.
‘In my opinion, the records are quite old.
Szerintem a retek elég jó vacsorára.
‘In my opinion, radish is quite good for dinner.’
Szerintem Lev elég hı́res ı́ró.
‘In my opinion, Lev is quite a famous writer.’
Szerintem az ezer elég sok.
‘In my opinion, a thousand is quite a huge amount.’
Szerintem a ‘hamar’ elég érdekes szó.
‘In my opinion, soon is a rather interesting word.’
Szerintem az oroszok elég sokan vannak.
‘In my opinion, there are quite a lot of Russians.’
Szerintem a ‘kapar’ elég furcsa szó.
‘In my opinion, to scratch is a rather strange word.’
Szerintem a kar elég hosszú.
‘In my opinion, the handle is rather long.’
Szerintem a ‘vakar’ elég furcsa szó.
48
MIT
Lilla Magyar
Generals Paper
‘In my opinion, to scratch is a rather strange word.’
Szerintem a kabar elég ismert nép.
‘In my opinion, Kabar is quite a famous nation.’
Szerintem a ‘röfög’ elég vicces szó.
‘In my opinion, ‘grunt’ is a rather funny word.’
Szerintem a kacs elég érdekesen néz ki.
‘In my opinion, the the tendril looks rather interesting.’
Szerintem a potroh elég jellegzetes testrész.
‘In my opinion, the abdomen of insects is a rather prominent body part.’
Szerintem a kukac elég jó csali.
‘In my opinion, worm is a good fishing bait.’
Szerintem a ‘hadar’ elég vicces szó.
‘In my opinion, ‘to sputter’ is a rather funny word.’
Szerintem a rom elég rossz látvány.
‘In my opinion, the ruin is a bad sight.’
Szerintem a zsizsik elég kicsi bogár.
‘In my opinion, the wheat weevil is a rather small bug.’
Szerintem a ‘ledöf ’ elég kifejező szó.
‘In my opinion, to stab someone to death is a rather expressive phrase.’
Szerintem az orosz elég nehéz nyelv.
‘In my opinion, Russian is a rather difficult language.’
Szerintem a ‘bekever’ elég érdekes szó.
‘In my opinion, ‘to mix’ is a rather interesting word.’
Szerintem a kakasok elég vadak.
‘In my opinion, roosters are rather wild.’
Szerintem a ‘Kemenes’ elég érdekes név.
‘In my opinion, Kemenes is a rather interesting name.’
Szerintem a ‘komoly’ elég furcsa szó.
‘In my opinion, ‘serious’ is a rather strange word.’
Szerintem a kanalak elég kicsik.
‘In my opinion, the spoons are quite small.’
Szerintem a pirogok elég finomak lettek.
‘In my opinion, the pirogs have turned out quite well.’
Szerintem a ‘szavadat’ elég érdekes szóalak.
49
MIT
Lilla Magyar
Generals Paper
‘In my opinion, ‘your word-ACC is a rather interesting phrase.’
Szerintem az arab elég nehéz nyelv.
‘In my opinion, Arabic is a rather difficult language.’
Szerintem a doh elég elviselhetetlen.
‘In my opinion, musty smell is rather hard to tolerate.’
Szerintem az ‘alapos’ elég érdekes szó.
‘In my opinion, ‘meticulous’ is a rather interesting word.’
Szerintem a rizs elég finom.
‘In my opinion, rice is quite tasty.’
Szerintem a kas elég nagy.
‘In my opinion, the beehive is rather large.’
Szerintem az adoma elég hihetetlen.
‘In my opinion, that urban legend is rather hard to believe.’
Szerintem Alap elég nagy község.
‘In my opinion, Alap is rather big village.’
Szerintem a malacok elég sokat esznek.
‘In my opinion, pigs eat quite a lot.’
Szerintem a fababa elég jó ajándék.
‘In my opinion, wooden doll is a rather nice present.’
Szerintem a ‘döf ’ elég furcsa szó.
‘In my opinion, ‘to stab’ is a rather weird word.’
Szerintem a vogul elég érdekes nyelv.
‘In my opinion, Mansi is a rather interesting language.’
Szerintem a hal elég drága.
‘In my opinion, fish is rather expensive.’
Szerintem a konyakok elég drágák.
‘In my opinion, brandies are rather expensive.
Szerintem a ‘szeretek’ elég érdekes szóalak.
‘In my opinion, ‘I love’ is a rather interesting expression.’
Szerintem a mez elég nagy rád.
‘In my opinion, this jersey is rather oversized for you.’.
Szerintem az ‘ánizsos’ elég érdekesen hangzik.
‘In my opinion, ‘anise-flavoured’ sounds rather interesting.’
Szerintem a rab elég veszélyes.
50
MIT
Lilla Magyar
Generals Paper
MIT
‘In my opinion, the inmate is rather dangerous.’
Szerintem a ‘halad’ elég érdekes szó.
‘In my opinion, ‘to make headway’ is a rather interesting expression.’
Szerintem a csiga elég érdekes állat.
‘In my opinion, the snail is a rather interesting animal.’
Appendix III: Perception experiment items
III/A: Hungarian-English word pairs
The following Hungarian and English minimal and quasi minimal pairs were used in the perception
experiment.
Hungarian: mit [mit] ‘what-ACC’
English: hit [hIt]
Hungarian: fut [fut] ‘he/she runs’
English: foot [fUt]
Hungarian: hat [hOt] ‘six’
English: but [b2t]
Hungarian: vet [vEt] ‘sow-3rd. p. sing.’
English: set [sEt] or [set]
Hungarian: szid [sid] ‘he/she scolds (someone)’
English: hid [hId]
Hungarian: tud [tud] ‘he/she knows’
English: good [gUd]
Hungarian: had [hOd] ‘warfare’
English: bud [b2d]
Hungarian: szed [sEd] ‘he/she collects’
English: said [sEd] or [sed]
III/B: Hungarian-German word pairs
The following Hungarian and German minimal and quasi minimal pairs were used in the experiment.
Hungarian: kis [kIS] ‘small’
German: Tisch [tIS] ‘table’
Hungarian: mos [moS] ‘he/she washes’
German: Bosch [boS] <brand name>
Hungarian: vas [vOS] ‘iron’
German: wasch [waS] ‘wash-imperative’
Hungarian: tus [tuS] ‘douche or ink’
German: Fusch [fuS] <a village in Austria>
51
Lilla Magyar
Generals Paper
Hungarian: köt [køt] ‘he/she knits’
German: Schött [Sœt] <a name>
Hungarian: les [lES] ‘he/she looks furtively’
German: fetch [fES] ‘handsome’
III/C: Hungarian word pairs
The following Hungarian word pairs were used in the experiment.
Singleton: hit [hit] ‘faith’
Geminate: hitt [hit:] ‘believe-3rd.p.-past’
Singleton: vet [vEt] ‘sow’
Geminate: vett [vEt:] ‘buy-3rd.p.-past’
Singleton: lap [lOp] ‘piece of paper’
Geminate: lapp [lOp:] ‘Sami’
Singleton: lop [lop] ‘steal’
Geminate: hopp ‘oops’
Singleton: luk [luk] ‘hole’
Geminate: pukk <bursting sound>
52
MIT
Lilla Magyar
Generals Paper
MIT
Appendix IV: List of monosyllabic loanwords with consonants subject to gemination
Loanword
Gloss
Orth. gem. in SW
Geminate in LW
Singleton in LW
hall [hOl:]
blöff [bløf:]
nett [nEt:]
fitt [fit:]
szett [sEt:]
csekk [tSEk:]
klip [klip:]
friss [friS:]
fess [fES:]
giccs [git:S]
klub % [klub:]
dzsem % [dZEm:]
chip [tSip:]
vicc [vit:s]
sokk [sok:]
puccs [put:S]
sikk [Sik:]
drukk [druk:]
chat [tSEt]
net % [nEt:]
bit % [bit:]
flott [flot:]
spicc [Spit:s]
trükk [tryk:]
rock [rok:]
pop % [pop:]
smukk [Smuk:]
dekk [dEk:]
dokk [dok:]
pucc [put:s]
sacc [SOt:s]
hecc [hEt:s]
Schütz [Syt:s]
fuccs [fut:S]
Bach [bOx:]
pech [pEx:]
sah [sOx:]
trapp [trOp:]
nipp [nip:]
nopp [nop:]
kepp [kEp:]
shop % [Sop:]
blikk [blik:]
blokk [blok:]
stop % [stop:]
meccs [mEt:S]
taccs [tOt:S]
dog [dog]
szmog [smog]
blog [blog]
HIV hiv
Liv [liv]
‘hall’
‘bluff’
‘neat and tidy’
‘fit’
‘set or outfit’
‘cheque’
‘videoclip’
‘fresh’
‘handsome’
‘kitsch’
‘club’
‘marmalade’
‘electronic chip’
‘joke’
‘shock’
‘coup’
‘stylishness’
‘anxiety’
‘online chat’
‘internet’
‘bit’ (IT)
‘fast and easy’
‘Spitz’
‘trick’
‘rock music’
‘pop music’
‘jewellery’
‘cigarette’
‘dock’
‘poshness’
‘guess’
‘hoax’
<name>
‘waste’
<name>
‘bad luck’
‘shah’
‘trap’
‘figurine’
‘knot’
‘hooded gown’
‘shop’
‘wink’
‘block’
‘stop sign’
‘match’
‘touch’
‘dog’
‘smog’
‘blog’
‘HIV’
<name>
yes
yes
yes
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
yes
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
no
no
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
yes (less common)
yes
no
no
no
no
no
no
yes
yes (more common)
more common
no
no
no
no
yes (more common)
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
yes (more common)
no
no
yes
no
no
yes
yes
yes
yes
yes
53
(more common)
(less common)
(less common)
(less common)
(less common)
Lilla Magyar
Generals Paper
MIT
Loanword
Gloss
Orth. gem. in SW
Geminate in LW
Singleton in LW
bob [bob]
sznob [snob]
dub [dOb]
pub [pOb]
Ted [tEd]
Bud [bOd]
Hal [hEl]
tipp [tip:]
stramm [StrOm:]
sift [sit:]
pack [pOk:]
kuss [kuS:]
sztepp [stEp:]
kit % [kit:]
Pat [pEt]
brit % [brit:]
szvit % [svit:]
asz % [as:]
pikk [pik:]
skicc [Skit:s]
treff [trEf:]
procc [prot:s]
snassz [SnOs:]
klassz [klOs:]
bessz [bEs:]
priccs [prit:S]
slussz [Slus:]
plussz [plus:]
tus % [tus:]
plüss [plus:]
krach [krOx:]
stich [Stix:]
Mann [mOn:]
finn [fin:]
gramm [grOm:]
dzsinn [dZin:]
tüll [tyl:]
brill [bril:]
Fred [frEd]
LED [lEd]
Elle [El:]
gif % [gif]
klikk [klik:]
ceh [tsex:]
skeccs [skEt:S]
top % [top:]
web % [vEb]
Webb [vEb:]
krossz [kros:]
dressz [drEs:]
Liz [liz]
Dell [dEl:]
Mac [mEk]
Scholl [Sol:]
‘bobsleigh’
‘snob’
‘dub(step)’
‘pub’
<name>
<name>
<name>
‘tip’
‘tough’
‘debris’
‘package’
‘shut up’
‘step dance’
‘kit’
<name>
‘Brit’
‘suite’
<musical tone>
‘spades’
‘sketch’
‘clubs’
‘snobbish’
‘mediocre’
‘great’
‘baisse’
‘iron bed’
‘end’
‘plus’
‘ink’
‘plush’
‘financial crash’
‘something fishy’
<name>
‘Finnish’
‘gram’
‘genie’
‘tulle’
‘diamond’
<name>
‘LED’
<a magazine>
‘gif’
‘click or clique’
‘bill’
‘sketch’
‘top’
‘web’
<name>
‘cross’
‘dress’
<name>
<computer>
<computer>
<shoes>
no
no
no
no
no
no
no
no
yes
yes
no
no
no
no
no
no
no
no
no
no
yes
no
yes
yes
yes
no
yes
no
no
no
no
no
yes
yes
no
no
yes
yes
no
no
yes
no
no
no
no
no
no
yes
yes
yes
no
yes
no
yes
no
no
no
no
no
no
no
yes
yes
yes
yes
yes
yes
yes (less common)
no
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
no
yes
yes
yes
yes
yes
yes
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
yes
yes
yes
no
no
no
no
no
no
yes (more common)
yes
yes
yes
yes
no
no
no
no
no
no
no
no
no
no
yes
no
no
no
no
no
no
no
no
no
yes
yes
no
yes
no
no
no
no
yes
no
no
no
yes
no
yes
no
54