Classification and the tempo of language evolution

Lexical homologies (cognates)
Classification and the tempo of
language evolution
English
here
sea
water
when
German
hier
See, Meer
Wasser
wann
French
ici
mer
eau
quand
Italian
qui, qua
mare
acqua
quando
Greek
edo
thalasa
nero
pote
Hittite
ka
aruna-
watar
kuwapi
Quentin D. Atkinson
Meaning
Institute of Cognitive and Evolutionary Anthropology, University of Oxford
English
1
0
0
0
1
0
0
0
1
0
0
1
German
1
0
0
0
1
1
0
0
1
0
0
1
French
0
1
0
0
0
1
0
0
0
1
0
1
Italian
0
1
0
0
0
1
0
0
0
1
0
1
Greek
0
0
1
0
0
0
1
0
0
0
1
1
Hittite
0
0
0
1
0
0
0
1
1
0
0
1
here
sea
water
when
Image from cover of Nature 449, 11 October, 2008
The IndoEuropean
Language
Family Tree
Gray and Atkinson,
Nature, 2003
Swadesh 200 word list
2449 cognate sets
Likelihood model of
cognate birth/death
Branch-lengths =
change
Phylogenetic
uncertainty
I-E tree showing variation in rates of lexical replacement
“One”
“Ear”
“Sand”
Some examples of meanings with small and large
numbers of cognate sets
ROMANCE
CELTIC
GERMANIC
SLAVIC
Cognate sets
Examples
1
two, three, five, I, who
2
one, four, we
3
how
4
name, tongue
6
ear, night, thou
10
day, to live, mother, salt, when
27
bark (of a tree), to count, to dig, to float, to
flow, if, rub, sand, straight, woods
INDOIRANIAN
46
dirty (the most variable word)
GREEK
1
Estimating rates of word evolution on a phylogeny
Coding the cognate data
Languages
meanings
English
here
sea
water
German
hier
see, meer
wasser 0
wan
French
ici
mer
eau
1
quand
quand
Italian
qui, qua
mare
acqua
1
quando
acqua
quando
Greek
edo
thalasa
nero
2
pote
thalasa
nero
pote
Hittite
ka
aruna-
watar
0
kuwapi
aruna-
watar
kuwapi
English
here
sea
water
when
German
hier
See, Meer
Wasser
wann
French
ici
mer
eau
Italian
qui, qua
mare
Greek
edo
Hittite
ka
English
0
0
0 0
German
0
0, 1
0 0
French
1
1
1 0
Italian
1
1
1 0
Greek
2
2
2 0
Hittite
3
3
0 0
transition model (e.g., water)
0
when
phylogeny
q01
1
0
q10
q21
q20
+
q12
q02
2
numerical
estimates of
transition
rates, q
(scaled as
expected
changes per
ten thousand
years)
Distribution of word replacement rates
(rates of lexical evolution)
What predicts variation in rates of evolution?
genes
directional versus purifying selection (conserved and
non-conserved elements), expression levels,
population size
words
word frequency
Paul (1880) and Zipf (1947), but not tested.
100-fold rate variation
Correlated rates in Bantu
(Pagel & Meade, 2006)
Correlations between frequencies of word use
Spoken word frequency in the British National Corpus
350
300
N=4840 words
mean = 194
geometric mean = 35.94
median = 25
Count
250
200
150
100
50
0
1
1.5
2
2.5
3
3.5
4
4.5
log(10) of spoken word frequency per million
Figure from Pagel et al., Nature, 2007.
2
Frequency vs rate of lexical evolution
Parts of speech
conjunctions
---prepositions
---adjectives
---verbs
---nouns
---special adverbs---pronouns
---numbers
----
r=-0.37
r=-0.35
r=-0.41
r=-0.32
R2 =0.48
R2 =0.48
R2 =0.50
R2 =0.48
Figure from Pagel,
Atkinson & Meade,
Nature, 2007
Two models of how frequency influences the rate of lexical evolution
i) reduced mutation
ii) matching-purifying model
Word frequency distribution for “Thunderstorm”
thunderstorm
n=192 different words for ‘thunderstorm’ in a population of Midwest
American speakers.
adoption of variants
bugs
meaning or concept
+
arbitrary sound
e.g., “rabbit”
lagomorph
thundershower
mutation
word = “rabbit”
innovation
Storm, thundercloud
Electrical storm, thunder gust
bunny
Cat squall, thundering in the molly hole, yawl
Peter
hare
Tempo and classification
Word frequency distribution for “Thunderstorm”
Frequency of word use together with part of speech accounts for 50% of variation in
rates of evolution across 87 languages representing ~130,000 language-years of
evolution
Frequency may act as a linguistic form of ‘purifying selection’ affecting the choice of
words
Power law curve
- bias against infrequent words
The mechanism is expected to operate similarly across all languages and time
scales, and makes predictions about specific meanings. (e.g. Indo-European and
Bantu correlation).
Some insights for classification
languages evolve initially in less frequently used parts of vocabulary, retaining
mutual intelligibility for longer
high frequency words may be less likely to be borrowed
cultural replicators can evolve more slowly than some human genes (e.g., compare
“five” with lactase gene) -- some words persisting for tens of thousands of years
slow evolution raises possibility of deep linguistic reconstructions
3
The IndoEuropean
Language
Family Tree

Gray and Atkinson,
Nature, 2003
 long periods of stability or stasis followed by short punctuational
bursts associated with speciation
Punctuated Equilibrium and the fossil record
Eldredge and Gould 1972
Swadesh 200 word list
2449 cognate sets
Likelihood model of
cognate birth/death
Branch-lengths =
change
Phylogenetic
uncertainty
Species Formation through Punctuated Gradualism in Planktonic Foraminifera Bjorn A. M almgren; W. A.
Berggren; G. P. L ohmann. S cience, 225 (4659): 317-319.
Path lengths may contain components derived from
punctuational and gradual processes
Do the data sets show evidence of a
punctuational effect (β > 0)?
•
•
! >0
each language splitting
event makes some
contribution to path length
! =0
path length
accumulates as a
function of time
•
Austronesian
• punctuational effect in
~100% of trees
Bantu
• punctuational effect in
~98% of trees
Indo-European
• punctuational effect in
~67% of trees
path length = n! + g
Figure from Atkinson et al., Science, 2008
Rate variations through time…
• language splitting events have a punctuational effect on
lexical evolution
• This effect is substantial, and potentially a ubiquitous
property of language evolution
• There may be more than one process causing
punctuational language change…
• Founder effect
• small founder populations evolve more quickly.
• Fits with computer simulations of (Nettle, 1999)
• “The underlying cause of sociolinguistic differences…is
the human instinct to establish and maintain social
identity” Chambers (1995, p 250)
• Martha’s Vineyard (Labov, 1963)
• Noah Webster - “as an independent nation, our honor
requires us to have a system of our own, in language as well
as government”
- Noah Webster, Dissertations on the English Language
(1789, p. 20).
Conclusions
• Models of language evolution are useful for
classification, but can also be used to test
hypotheses about the processes underlying
language change
• We can feed what we learn back into the models
to develop better classification and a better
understanding of the human language
4
Thanks to:
Mark Pagel
Chris Venditti
Andrew Meade
Russell Gray
Simon Greenhill
The Leverhulme Trust
5