PDF file - Linguistics at Cambridge

Li11 Historical Linguistics
The comparative method
Marius Jøhndal
23 January 2016
The comparative method, which was developed by the Neogrammarians (Hermann Paul, Karl
Brugmann, Hermann Osthoff and others) in the late nineteenth century, is a set of principles
for reconstructing the phonology of the ancestor of two or more genetically related languages
(external reconstruction).
The comparative method is not an algorithm; there is no self-contained step-by-step sequence of
operations that you can follow. It is a set of heuristics for making hypotheses, and you have to
weigh different options along the way.
The assumption here is that languages are genetically related and form families. Daughter
languages descend from a single ancestor, the proto-language, and it is the proto-language that
we are trying to reconstruct.
How do we establish that two or more languages are genetically related? Can we tell from the
list of words below that English is genetically related to Dutch and German?
1
English
Dutch
German
snow
water
bird
dog
hole
onion
sneeuw
water
vogel
hond
gat
ui
Schnee
Wasser
Vogel
Hund
Loch
Zwiebel
1 Identifying correspondence sets
The first step is to identify correspondence sets, which are sets of cognates, i.e. shared elements
inherited from a common ancestor.
Some cognates can be identified by surface resemblance, e.g. English ‘name’, Nepali nām, French
nom, Irish ainm, Greek ónoma, etc. are cognates (< PIE *h3 nómn ̥-).
Cognates are not always that easily identified. We have to eliminate other possible sources of
similarity:
1. Chance resemblances, e.g. Mbabaram (Pama-Nyungan) dog ‘dog’, related to Yidiny gudaga,
Dyirbal guda (Dixon 1983: 52).
2. Borrowing, e.g. Welsh pysg ‘fish’, borrowed from Latin piscis (the original Celtic cognate
is seen in Irish iasc).
3. Onomatopoeia, e.g. similarity between Chinese māo ‘cat’, English ‘meow’.
4. Typological tendencies, e.g. nursery words like mama, baba etc.
Note that language contact can create false, apparently regular correspondences through borrowing:
Welsh
Latin
pont
pysg
pabell
padell
pobl
pōns
piscis
pāpiliō
patella
populus
‘bridge’
‘fish’
‘tent’
‘pan’
‘people’
We know these are not true cognates because PIE *p was lost in Celtic. Compare
(1)
a.
b.
Welsh llawn, Latin plēna ‘full’
Welsh nai, Latin nepōs ‘nephew’
But if there are enough false cognates, it can be difficult to work out which words are inherited
and which are borrowed.
Occasionally even chance resemblances can look almost systematic:
Hawaiian
Classical Greek
aeto
areto
noonoo
aetos
artos
nous
‘eagle’
‘bread’
‘mind, thought’
2
Hawaiian
Classical Greek
lahui
mahina
laos
me:n
‘people’
‘month’ (H.)/‘moon’ (AGr.)
How do we eliminate these possibilities? Firstly, by remembering that they are possibilities.
Onomatopoeia and nursery words are relatively straightforward to eliminate. Secondly, by
accepting cognate sets only if regular phonological correspondences between items can be
demonstrated. This should eliminate most chance resemblances and a proportion of borrowings.
Finally, by restricting ourselves to basic vocabulary, at least in initial stages, that is, to items
which are unlikely to be borrowed.
The presence of idiosyncratic or arbitrary morphological correspondences is another feature to
look at when establishing genetic relatedness:
Language
3SG
3PL
1SG
Sanskrit
Ancient Greek
Latin
Gothic
PIE
ásti
estí
est
ist
*es-ti
sánti
eisí
sunt
sind
*s-enti
asmi
eimí
sum
am
*es-mi
Also note that cognates do not always have the same meaning. Take, for example, PIE *méh2 -tr‘mother’:
(2)
a.
b.
c.
d.
e.
Latin māter
Greek māter, mēter
Avestan mātarArmenian mayr
Albanian motër ‘sister’, nënë ‘mother’
1.1 Establishing sound correspondences
Once we have a set of (presumed) cognates, we can take the next step in comparative reconstruction and identify sound correspondences between different varieties.
1
2
3
4
5
6
7
8
9
10
11
Russian
Polish
Bulgarian
vʲera
rʲedktrʲi
trʲezvzvʲerʲ
bʲelbʲelʲetʲ
dʲivno
krʲepkdʲevʲatʲ
dʲesʲatʲ
vʲara
ʒadktʒɪ
tʒeʑvzvʲeʒbʲawɪ
bʲeletɕ
dʑivnʲe
kʒepkdʑevʲẽtɕ
dʑeɕẽtɕ
vjara
rjadktri
trezvzvjar
bjal
bele(-ja)
divno
krepkdevet
deset
3
‘belief’
‘rare’
‘three’
‘sober’
‘animal’
‘white’
‘whiten’
‘smoke’ (verb)
‘strong’
‘nine’
‘ten’
Some of the correspondences we can identify are:
Russian
Polish
Bulgarian
Environment
Examples
t
tʲ
dʲ
r
rʲ
rʲ
e
e
e
a
t
tɕ
dʑ
r
ʒ
ʒ
a
e
e
ẽ
t
t
d
r
rj
r
a
a
e
e
_
_
_
_
_
C _, _ #
Cʲ _ ?
Cʲ _ Cʲ ?
_
_
3, 4
7, 10, 11
8, 10, 11
1
2
3, 4, 5, 9
1, 2, 6
5
4, 7, 9, 10, 11
10, 11
2 Reconstructing sound changes
Once we have identified correspondences, we want to reconstruct changes that produce those
correspondences. If all the languages have the same sound, it is easy – we reconstruct that
sound.
If we see a single sound in one language corresponding to multiple sounds in another, we can
reconstruct either a split or a merger. A split will generally be conditioned by some environment.
For instance, above we see e ∼ a in Polish corresponding to a in Bulgarian, and we have a
conditioning environment for this split: Cʲ _ Cʲ. (Total) mergers can only be reconstructed in
the absence of a possible conditioning environment for a split.
Consider the following data from Romance:
Italian
Spanish
French
correre
costare
caro
capo
correr
costar
caro
cabo
courir
coûter
cher
chef
‘run’
‘cost’
‘dear’
‘head, top’
We see that we have both [k, k, k] and [k, k, ʃ ]. Is this a split *k > Fr. k, ʃ or a merger *k, *ʃ >
It./Sp. k? There are several reasons to favour the former; one is that we have an environment
for a split, viz. Fr. *k > ʃ / _ e (and other front vowels).
Now consider the following cognate sets from Mayan (Campbell 2013):
K’iche’
Tzeltal
Yucatec
Huastec
rah
rix
rraʃ
war
jax
jaʃ
k’aj-
ja
jix
jjaʃ
waj
jah
jaʃ
k’aj-
jah
jiih
jjaaʃ
waj
jah
–
k’aj-
jahjeh–
jaʃwaj
ja
–
tʃ’aj-
4
‘hot, spicy’
‘old, old man’
‘his, her, its’
‘green’
‘sleep’
‘sick’
‘crab’
‘sell’
We have two sound correspondences:
(3)
a.
b.
K’iche’ r, Tzeltal j, Yucatec j, Huastec j
K’iche’ j, Tzeltal j, Yucatec j, Huastec j
The reconstruction for (3-b) must be *j. In (3-a) we can say that *j changed to r in K’iche’
if there is a specific phonetic environment for the change. But the environments of both
correspondence set are identical, as far as we can tell, so we have nothing that could condition
this split. We must therefore instead posit a merger between *r and *j in all varieties except
K’iche’:
Proto-Mayan
K’iche’
*j, *r > j
Tzeltal Yucatec Huastec
We are not always able to reconstruct sound changes from the presence/absence of a conditioning
environment alone. We may find unconditioned sound changes that do not involve a split or a
merger (e.g. PIE *d > PGerm. *t). In such cases there is no conditioning environment to tell us
which daughter language is innovative.
Alternatively the details of splits or mergers may be obscured by later changes. In such cases we
need other principles to tell us what the most likely change could be. The general principles are:
1. Economy (Occam’s Razor): assume as few changes as possible.
2. Majority: retain common features, e.g. if all descendent forms are labials, avoid reconstructing a velar.
3. Naturalness and directionality: posit changes that go in the ‘natural’ direction, e.g. lenition
rather than fortition intervocalically
2.1 Economy and majority
Suppose that we want to reconstruct the changes that lead to languages a, b, c, d and e. Now
suppose that a, b and c form a subgroup but that d, e do not:
abcde
d e
abc
a b c
If a, b, c have a sound /X/ corresponding to /Y/ in d, e, it is (all things being equal) preferable
to reconstruct Y. This requires only one change Y > X (between proto-abcde and proto-abc).
Otherwise d and e would need two independent changes X > Y.
The effect of this principle will depend on the subgrouping of the language family (and of course
also on what you define as a ‘language’).
If we do not have a well-defined subgrouping, the principle reduces to a majority principle. Let
us look at the effect of this in the context of the following data from Germanic:
5
Old English
Old Norse
Gothic
Old High German
þū
þrīe
þú
þrír
þu
þreis
dû
drî
‘you’ (sg.)
‘three’
The majority principle gets us the right result here: Most varieties have þ, which is the historical
value. But majority often gives the wrong result and must be applied with caution. If we blindly
applied it to modern Germanic languages, we would get precisely the wrong result. We have, for
example,
(4)
a.
b.
c.
English (arch/dial.) thou
Icelandic þú
German/Swedish/Norwegian/Danish/Dutch (arch/dial.) du
Using majority we would reconstruct *du instead of the correct þū.
2.2 Naturalness and directionality
Sound change is often considered a directional process. Lenition, for example, is more common
than fortition, and assimilation more common than dissimilation. So it is better to reconstruct
Polish *r > ʒ and *t > t (assimilation/coalescence) than the reverse (dissimilation). But
dissimilation and fortition do occur, so this principle must be considered in combination with
other factors.
Naturalness is a property not only of reconstructed sound changes, but also of reconstructed
phonological systems. So avoid reconstructing an unattested vowel inventory, for example. But
languages sometimes do have typologically unusual patterns, so again this principle cannot be
used alone.
3 Internal reconstruction
We are not always able to reconstruct sound changes by comparing different languages. We
may be working with a language isolate, we may be interested in the intermediate stages of
a language’s development, or the correspondences between languages may be too complex to
reliably reconstruct sound changes.
In such cases we can try to reconstruct the history of a single language based on data from
that language alone. Instead of comparing words among languages we can compare forms from
different morphophonological environments. This is known as internal reconstruction.
(Languages reconstructed through comparative reconstruction are called proto-languages. Languages reconstructed through internal reconstruction are sometimes called pre-languages.)
Evidence for an earlier stage can be deduced from internal patterns of the language. The method
is applied to linguistic items that are related but appear in different context synchronically
(allophones and allomorphs), and internal reconstruction exploits these differences in a systematic
way.
Changes that give rise to alternations of this kind very often encroach on morphology. The
following shows how rhotacism (s > r / V _ V) produced two stem allomorphs /honos-/ ∼
/honor-/ for Latin honos ‘honour’:
6
Form
NOM/VOC SG
ACC SG
GEN SG
DAT SG
ABL SG
honos-∅
honos-em > honor-em
honos-is > honor-is
honos-ī > honor-ī
honos-e > honor-e
Now consider the following Nahuatl (Uto-Aztecan) data (Campbell 2013):
Base form
1SG POSS form
ihti ‘stomach’
ikʃi ‘foot’
n-ihti ‘my stomach’
no-kʃi ‘my foot’
We have an alternation i ∼ ∅ in ikʃi ∼ no-kʃi. This pattern is regular in the language. The
behaviour of the prefix is also a general rule, no- > n- / _ V, so we can put that aside.
We can either reconstruct *ikʃi and a rule that deletes initial i-, or we can reconstruct *kʃi and
a rule that inserts i in appropriate contexts. In other words, the question is if i and ∅ have
merged in the base form or have split in the possessive form.
If we choose to reconstruct *ikʃi, the rule that deletes i- would also have to apply to ihti, which is
wrong, unless we can restrict it to more specific environment. There is no obvious conditioning
environment for a split (ikʃi and ihti are similar base forms), so we have to choose the other
option and reconstruct a merger, ∅ > i / # _, in the base form, giving us the pre-forms *kʃi
and *ihti.
3.1 Rule ordering
When multiple changes occur, they must generally be temporally ordered relative one another.
(This goes for comparative reconstruction too.)
The Spanish first person singular present marker alternates between -o and -oy:
(5)
a.
b.
s-oy ‘I am’, d-oy ‘I give’, v-oy ‘I go’, est-oy ‘I am’
and-o ‘I walk’, pes-o ‘I weigh’, lav-o ‘I wash’
It does not look like there is a conditioning environment for this alternation. But consider
another alternation:
(6)
escrib-o ‘I write’, tran-scrib-o ‘I transcribe’, in-scrib-o ‘I inscribe’
It looks like there is a good environment for the insertion of a prothetic e – namely word-initially.
So we can reconstruct a pre-form for est-oy, namely *st-oy. Now we have a good environment
for the o ∼ oy alternation: monosyllables versus polysyllables.
Now consider Ancient Greek
(7)
a.
b.
genes-si ‘family’ (dat. pl.)
gene-os ‘family’ (gen. sg.)
7
The most natural sound change linking these forms is intervocalic lenition of s (> ∅), while
epenthesis of s is less likely. We also see an alternation s ∼ t, e.g. in
(8)
a.
b.
ambrot-os ‘immortal’
ambros-i-a ‘ambrosia’
t > s is the most likely change intervocalically (especially before high vowels – a kind of
palatalisation).
So which change came first? The answer is that s > ∅ must have come first, or it would be fed
by t > s, which means we would have expected forms like **ambroia.
3.2 Limits of internal reconstruction
Internal reconstruction is not foolproof. It is impossible to recover unconditioned changes (like
total mergers) since they leave no alternations, e.g. Sanskrit *e, *o, *a > a:
(9)
a.
b.
c.
*h1 ed- > ád- ‘eat’, cf. Lat. ed-ō ‘I eat’
*dwóh1 > dvá- ‘two’, cf. Lat. duo ‘two’
*h2 éǵros- > ájra- ‘field’, cf. Lat. ager ‘field’
It can also be rather difficult to reconstruct changes where the original environment has been
lost. Consider the [θ] ∼ [ð] alternation in breath and breathe, bath and bathe. The original
environment for the alternation (intervocalic) has been lost (except orthographically).
To reconstruct that environment reliably, we need external evidence. (But note that intervocalic
voicing is a very frequent process cross-linguistically, so we might be able to make a fair guess!)
4 Further reading
There are good chapters on this in Campbell 2013, Trask 1996 and Lehmann 1993. See the
other references for more specific coverage.
References
Campbell, Lyle. 2013. Historical Linguistics: An
Bentley (eds.). Historical linguistics 1995: Seintroduction. 3rd ed. Edinburgh: Edinburgh
lected papers from the 12th international conUniversity Press.
ference on historical linguistics, Manchester,
Dixon, R. M. W. 1983. A Grammar of Yidin.
August 1995. Amsterdam: John Benjamins,
Cambridge: Cambridge University Press.
99–110.
Durie, Mark and Malcolm Ross. 1996. The com- Hoenigswald, Henry M. 1960. Language change
parative method reviewed: Regularity and irand linguistic reconstruction. Chicago: Uniregularity in language change. Oxford: Oxford
versity of Chicago Press.
University Press.
Jeffers, Robert J. and Ilse Lehiste. 1979. PrinFox, Anthony. 1995. Linguistic Reconstruction:
ciples and methods for historical linguistics.
An Introduction to Theory and Method. OxCambridge, Massachusetts: MIT Press.
ford: Oxford University Press.
Lass, Roger. 1993. ‘How real(ist) are reconstruc. 2000. ‘On simplicity in linguistic recontions’. In Jones, Charles (ed.). Historical linstruction’. In Smith, John Charles and Delia
8
guistics: Problems and perspectives. London: Trask, R. L. 1996. Historical linguistics. LonLongman, 156–189.
don: Arnold.
Lehmann, Winfred P. 1993. Historical Linguistics: An Introduction. 3rd ed. London: Routledge.
This handout is based on handouts and slides by David Willis, Joe Perry and Petros Karatsareas
9