Li11 Historical Linguistics The comparative method Marius Jøhndal 23 January 2016 The comparative method, which was developed by the Neogrammarians (Hermann Paul, Karl Brugmann, Hermann Osthoff and others) in the late nineteenth century, is a set of principles for reconstructing the phonology of the ancestor of two or more genetically related languages (external reconstruction). The comparative method is not an algorithm; there is no self-contained step-by-step sequence of operations that you can follow. It is a set of heuristics for making hypotheses, and you have to weigh different options along the way. The assumption here is that languages are genetically related and form families. Daughter languages descend from a single ancestor, the proto-language, and it is the proto-language that we are trying to reconstruct. How do we establish that two or more languages are genetically related? Can we tell from the list of words below that English is genetically related to Dutch and German? 1 English Dutch German snow water bird dog hole onion sneeuw water vogel hond gat ui Schnee Wasser Vogel Hund Loch Zwiebel 1 Identifying correspondence sets The first step is to identify correspondence sets, which are sets of cognates, i.e. shared elements inherited from a common ancestor. Some cognates can be identified by surface resemblance, e.g. English ‘name’, Nepali nām, French nom, Irish ainm, Greek ónoma, etc. are cognates (< PIE *h3 nómn ̥-). Cognates are not always that easily identified. We have to eliminate other possible sources of similarity: 1. Chance resemblances, e.g. Mbabaram (Pama-Nyungan) dog ‘dog’, related to Yidiny gudaga, Dyirbal guda (Dixon 1983: 52). 2. Borrowing, e.g. Welsh pysg ‘fish’, borrowed from Latin piscis (the original Celtic cognate is seen in Irish iasc). 3. Onomatopoeia, e.g. similarity between Chinese māo ‘cat’, English ‘meow’. 4. Typological tendencies, e.g. nursery words like mama, baba etc. Note that language contact can create false, apparently regular correspondences through borrowing: Welsh Latin pont pysg pabell padell pobl pōns piscis pāpiliō patella populus ‘bridge’ ‘fish’ ‘tent’ ‘pan’ ‘people’ We know these are not true cognates because PIE *p was lost in Celtic. Compare (1) a. b. Welsh llawn, Latin plēna ‘full’ Welsh nai, Latin nepōs ‘nephew’ But if there are enough false cognates, it can be difficult to work out which words are inherited and which are borrowed. Occasionally even chance resemblances can look almost systematic: Hawaiian Classical Greek aeto areto noonoo aetos artos nous ‘eagle’ ‘bread’ ‘mind, thought’ 2 Hawaiian Classical Greek lahui mahina laos me:n ‘people’ ‘month’ (H.)/‘moon’ (AGr.) How do we eliminate these possibilities? Firstly, by remembering that they are possibilities. Onomatopoeia and nursery words are relatively straightforward to eliminate. Secondly, by accepting cognate sets only if regular phonological correspondences between items can be demonstrated. This should eliminate most chance resemblances and a proportion of borrowings. Finally, by restricting ourselves to basic vocabulary, at least in initial stages, that is, to items which are unlikely to be borrowed. The presence of idiosyncratic or arbitrary morphological correspondences is another feature to look at when establishing genetic relatedness: Language 3SG 3PL 1SG Sanskrit Ancient Greek Latin Gothic PIE ásti estí est ist *es-ti sánti eisí sunt sind *s-enti asmi eimí sum am *es-mi Also note that cognates do not always have the same meaning. Take, for example, PIE *méh2 -tr‘mother’: (2) a. b. c. d. e. Latin māter Greek māter, mēter Avestan mātarArmenian mayr Albanian motër ‘sister’, nënë ‘mother’ 1.1 Establishing sound correspondences Once we have a set of (presumed) cognates, we can take the next step in comparative reconstruction and identify sound correspondences between different varieties. 1 2 3 4 5 6 7 8 9 10 11 Russian Polish Bulgarian vʲera rʲedktrʲi trʲezvzvʲerʲ bʲelbʲelʲetʲ dʲivno krʲepkdʲevʲatʲ dʲesʲatʲ vʲara ʒadktʒɪ tʒeʑvzvʲeʒbʲawɪ bʲeletɕ dʑivnʲe kʒepkdʑevʲẽtɕ dʑeɕẽtɕ vjara rjadktri trezvzvjar bjal bele(-ja) divno krepkdevet deset 3 ‘belief’ ‘rare’ ‘three’ ‘sober’ ‘animal’ ‘white’ ‘whiten’ ‘smoke’ (verb) ‘strong’ ‘nine’ ‘ten’ Some of the correspondences we can identify are: Russian Polish Bulgarian Environment Examples t tʲ dʲ r rʲ rʲ e e e a t tɕ dʑ r ʒ ʒ a e e ẽ t t d r rj r a a e e _ _ _ _ _ C _, _ # Cʲ _ ? Cʲ _ Cʲ ? _ _ 3, 4 7, 10, 11 8, 10, 11 1 2 3, 4, 5, 9 1, 2, 6 5 4, 7, 9, 10, 11 10, 11 2 Reconstructing sound changes Once we have identified correspondences, we want to reconstruct changes that produce those correspondences. If all the languages have the same sound, it is easy – we reconstruct that sound. If we see a single sound in one language corresponding to multiple sounds in another, we can reconstruct either a split or a merger. A split will generally be conditioned by some environment. For instance, above we see e ∼ a in Polish corresponding to a in Bulgarian, and we have a conditioning environment for this split: Cʲ _ Cʲ. (Total) mergers can only be reconstructed in the absence of a possible conditioning environment for a split. Consider the following data from Romance: Italian Spanish French correre costare caro capo correr costar caro cabo courir coûter cher chef ‘run’ ‘cost’ ‘dear’ ‘head, top’ We see that we have both [k, k, k] and [k, k, ʃ ]. Is this a split *k > Fr. k, ʃ or a merger *k, *ʃ > It./Sp. k? There are several reasons to favour the former; one is that we have an environment for a split, viz. Fr. *k > ʃ / _ e (and other front vowels). Now consider the following cognate sets from Mayan (Campbell 2013): K’iche’ Tzeltal Yucatec Huastec rah rix rraʃ war jax jaʃ k’aj- ja jix jjaʃ waj jah jaʃ k’aj- jah jiih jjaaʃ waj jah – k’aj- jahjeh– jaʃwaj ja – tʃ’aj- 4 ‘hot, spicy’ ‘old, old man’ ‘his, her, its’ ‘green’ ‘sleep’ ‘sick’ ‘crab’ ‘sell’ We have two sound correspondences: (3) a. b. K’iche’ r, Tzeltal j, Yucatec j, Huastec j K’iche’ j, Tzeltal j, Yucatec j, Huastec j The reconstruction for (3-b) must be *j. In (3-a) we can say that *j changed to r in K’iche’ if there is a specific phonetic environment for the change. But the environments of both correspondence set are identical, as far as we can tell, so we have nothing that could condition this split. We must therefore instead posit a merger between *r and *j in all varieties except K’iche’: Proto-Mayan K’iche’ *j, *r > j Tzeltal Yucatec Huastec We are not always able to reconstruct sound changes from the presence/absence of a conditioning environment alone. We may find unconditioned sound changes that do not involve a split or a merger (e.g. PIE *d > PGerm. *t). In such cases there is no conditioning environment to tell us which daughter language is innovative. Alternatively the details of splits or mergers may be obscured by later changes. In such cases we need other principles to tell us what the most likely change could be. The general principles are: 1. Economy (Occam’s Razor): assume as few changes as possible. 2. Majority: retain common features, e.g. if all descendent forms are labials, avoid reconstructing a velar. 3. Naturalness and directionality: posit changes that go in the ‘natural’ direction, e.g. lenition rather than fortition intervocalically 2.1 Economy and majority Suppose that we want to reconstruct the changes that lead to languages a, b, c, d and e. Now suppose that a, b and c form a subgroup but that d, e do not: abcde d e abc a b c If a, b, c have a sound /X/ corresponding to /Y/ in d, e, it is (all things being equal) preferable to reconstruct Y. This requires only one change Y > X (between proto-abcde and proto-abc). Otherwise d and e would need two independent changes X > Y. The effect of this principle will depend on the subgrouping of the language family (and of course also on what you define as a ‘language’). If we do not have a well-defined subgrouping, the principle reduces to a majority principle. Let us look at the effect of this in the context of the following data from Germanic: 5 Old English Old Norse Gothic Old High German þū þrīe þú þrír þu þreis dû drî ‘you’ (sg.) ‘three’ The majority principle gets us the right result here: Most varieties have þ, which is the historical value. But majority often gives the wrong result and must be applied with caution. If we blindly applied it to modern Germanic languages, we would get precisely the wrong result. We have, for example, (4) a. b. c. English (arch/dial.) thou Icelandic þú German/Swedish/Norwegian/Danish/Dutch (arch/dial.) du Using majority we would reconstruct *du instead of the correct þū. 2.2 Naturalness and directionality Sound change is often considered a directional process. Lenition, for example, is more common than fortition, and assimilation more common than dissimilation. So it is better to reconstruct Polish *r > ʒ and *t > t (assimilation/coalescence) than the reverse (dissimilation). But dissimilation and fortition do occur, so this principle must be considered in combination with other factors. Naturalness is a property not only of reconstructed sound changes, but also of reconstructed phonological systems. So avoid reconstructing an unattested vowel inventory, for example. But languages sometimes do have typologically unusual patterns, so again this principle cannot be used alone. 3 Internal reconstruction We are not always able to reconstruct sound changes by comparing different languages. We may be working with a language isolate, we may be interested in the intermediate stages of a language’s development, or the correspondences between languages may be too complex to reliably reconstruct sound changes. In such cases we can try to reconstruct the history of a single language based on data from that language alone. Instead of comparing words among languages we can compare forms from different morphophonological environments. This is known as internal reconstruction. (Languages reconstructed through comparative reconstruction are called proto-languages. Languages reconstructed through internal reconstruction are sometimes called pre-languages.) Evidence for an earlier stage can be deduced from internal patterns of the language. The method is applied to linguistic items that are related but appear in different context synchronically (allophones and allomorphs), and internal reconstruction exploits these differences in a systematic way. Changes that give rise to alternations of this kind very often encroach on morphology. The following shows how rhotacism (s > r / V _ V) produced two stem allomorphs /honos-/ ∼ /honor-/ for Latin honos ‘honour’: 6 Form NOM/VOC SG ACC SG GEN SG DAT SG ABL SG honos-∅ honos-em > honor-em honos-is > honor-is honos-ī > honor-ī honos-e > honor-e Now consider the following Nahuatl (Uto-Aztecan) data (Campbell 2013): Base form 1SG POSS form ihti ‘stomach’ ikʃi ‘foot’ n-ihti ‘my stomach’ no-kʃi ‘my foot’ We have an alternation i ∼ ∅ in ikʃi ∼ no-kʃi. This pattern is regular in the language. The behaviour of the prefix is also a general rule, no- > n- / _ V, so we can put that aside. We can either reconstruct *ikʃi and a rule that deletes initial i-, or we can reconstruct *kʃi and a rule that inserts i in appropriate contexts. In other words, the question is if i and ∅ have merged in the base form or have split in the possessive form. If we choose to reconstruct *ikʃi, the rule that deletes i- would also have to apply to ihti, which is wrong, unless we can restrict it to more specific environment. There is no obvious conditioning environment for a split (ikʃi and ihti are similar base forms), so we have to choose the other option and reconstruct a merger, ∅ > i / # _, in the base form, giving us the pre-forms *kʃi and *ihti. 3.1 Rule ordering When multiple changes occur, they must generally be temporally ordered relative one another. (This goes for comparative reconstruction too.) The Spanish first person singular present marker alternates between -o and -oy: (5) a. b. s-oy ‘I am’, d-oy ‘I give’, v-oy ‘I go’, est-oy ‘I am’ and-o ‘I walk’, pes-o ‘I weigh’, lav-o ‘I wash’ It does not look like there is a conditioning environment for this alternation. But consider another alternation: (6) escrib-o ‘I write’, tran-scrib-o ‘I transcribe’, in-scrib-o ‘I inscribe’ It looks like there is a good environment for the insertion of a prothetic e – namely word-initially. So we can reconstruct a pre-form for est-oy, namely *st-oy. Now we have a good environment for the o ∼ oy alternation: monosyllables versus polysyllables. Now consider Ancient Greek (7) a. b. genes-si ‘family’ (dat. pl.) gene-os ‘family’ (gen. sg.) 7 The most natural sound change linking these forms is intervocalic lenition of s (> ∅), while epenthesis of s is less likely. We also see an alternation s ∼ t, e.g. in (8) a. b. ambrot-os ‘immortal’ ambros-i-a ‘ambrosia’ t > s is the most likely change intervocalically (especially before high vowels – a kind of palatalisation). So which change came first? The answer is that s > ∅ must have come first, or it would be fed by t > s, which means we would have expected forms like **ambroia. 3.2 Limits of internal reconstruction Internal reconstruction is not foolproof. It is impossible to recover unconditioned changes (like total mergers) since they leave no alternations, e.g. Sanskrit *e, *o, *a > a: (9) a. b. c. *h1 ed- > ád- ‘eat’, cf. Lat. ed-ō ‘I eat’ *dwóh1 > dvá- ‘two’, cf. Lat. duo ‘two’ *h2 éǵros- > ájra- ‘field’, cf. Lat. ager ‘field’ It can also be rather difficult to reconstruct changes where the original environment has been lost. Consider the [θ] ∼ [ð] alternation in breath and breathe, bath and bathe. The original environment for the alternation (intervocalic) has been lost (except orthographically). To reconstruct that environment reliably, we need external evidence. (But note that intervocalic voicing is a very frequent process cross-linguistically, so we might be able to make a fair guess!) 4 Further reading There are good chapters on this in Campbell 2013, Trask 1996 and Lehmann 1993. See the other references for more specific coverage. References Campbell, Lyle. 2013. Historical Linguistics: An Bentley (eds.). Historical linguistics 1995: Seintroduction. 3rd ed. Edinburgh: Edinburgh lected papers from the 12th international conUniversity Press. ference on historical linguistics, Manchester, Dixon, R. M. W. 1983. A Grammar of Yidin. August 1995. Amsterdam: John Benjamins, Cambridge: Cambridge University Press. 99–110. Durie, Mark and Malcolm Ross. 1996. The com- Hoenigswald, Henry M. 1960. Language change parative method reviewed: Regularity and irand linguistic reconstruction. Chicago: Uniregularity in language change. Oxford: Oxford versity of Chicago Press. University Press. Jeffers, Robert J. and Ilse Lehiste. 1979. PrinFox, Anthony. 1995. Linguistic Reconstruction: ciples and methods for historical linguistics. An Introduction to Theory and Method. OxCambridge, Massachusetts: MIT Press. ford: Oxford University Press. Lass, Roger. 1993. ‘How real(ist) are reconstruc. 2000. ‘On simplicity in linguistic recontions’. In Jones, Charles (ed.). Historical linstruction’. In Smith, John Charles and Delia 8 guistics: Problems and perspectives. London: Trask, R. L. 1996. Historical linguistics. LonLongman, 156–189. don: Arnold. Lehmann, Winfred P. 1993. Historical Linguistics: An Introduction. 3rd ed. London: Routledge. This handout is based on handouts and slides by David Willis, Joe Perry and Petros Karatsareas 9
© Copyright 2026 Paperzz