Language Families and Linguistic Diversity

Language Families and Linguistic Diversity 499
Moir J & Nation I S P (2002). ‘Learners’ use of strategies
for effective vocabulary learning.’ Prospect 17, 15–35.
Nagy W E, Herman P & Anderson R C (1985). ‘Learning
words from context.’ Reading Research Quarterly 20,
233–253.
Nation I S P (2000). Learning vocabulary in lexical sets:
dangers and guidelines. TESOL Journal 9, 6–10.
Nation I S P (2001). Learning vocabulary in another
language. Cambridge: Cambridge University Press.
Nation P & Wang K (1999). ‘Graded readers and vocabulary.’ Reading in a Foreign Language 12, 355–380.
Palmer D M (1982). ‘Information transfer for listening and
reading.’ English Teaching Forum 20, 29–33.
Read J (1998). ‘Validating a test to measure depth of vocabulary knowledge.’ In Kunnan A J (ed.) Validation in
language assessment. Mahwah, NJ: Lawrence Erlbaum.
41–60.
Read J (2000). Assessing vocabulary. Cambridge: Cambridge
University Press.
Schmitt N & McCarthy M (eds.) (1997). Vocabulary:
description, acquisition and pedagogy. Cambridge: Cambridge University Press.
Schmitt N, Schmitt D & Clapham C (2001). ‘Developing
and exploring the behaviour of two new versions
of the Vocabulary Levels Test.’ Language Testing 18,
55–88.
Smith H (1996). ‘An individualised vocabulary programme.’
TESOLANZ Journal 4, 41–51.
Waring R & Takaki M (2003). ‘At what rate do learners
learn and retain new vocabulary from reading a graded
reader?’ Reading in a Foreign Language 15, 130–163.
Wesche M & Paribakht T S (1996). ‘Assessing second
language vocabulary knowledge: depth versus breadth.’
Canadian Modern Language Review 53, 13–40.
Language Families and Linguistic Diversity
M Ross, The Australian National University, Canberra,
Australia
! 2006 Elsevier Ltd. All rights reserved.
Linguistic Diversity
‘Linguistic diversity’ refers to several interrelated phenomena:
1. phylogenetic (genetic, genealogical) diversity: the
number of language families in a geographic area;
2. intrafamilial and intralinguistic variation: differences among the languages of a family and among
the variants of a language;
3. language density: the number of languages in a
geographic area;
4. typological diversity: differences among the structures which make up languages.
The focus of this article is on (1), (2) and (3) (see
also Variation and Language: Overview). More conventional approaches to these topics are discussed
first, then a recent alternative.
Although (4) will not be discussed further here,
the relationship between phylogenetic and typological diversity should be mentioned. It is quite common
for the members of a language family to be typologically similar, but there is no necessary reason why
they should be so. There is, for example, considerable typological variety among the grammars of the
Austronesian family, some of it the result of contact
with non-Austronesian languages, but much of it the
outcome of language-internal changes. Conversely,
languages belonging to different language families
may be typologically quite similar, either as the result
of contact or as the result of independent parallel
innovation.
Language Families
Definitions and Problems
A language family is conventionally defined as a set of
languages that share a common ancestor. The family
metaphor captures two insights: that languages are
systems that are transmitted from one generation to
the next (generational continuity) and that these systems change over time so that when a community of
speakers divides into, say, three separate communities,
the speech of each of the three will change in different
ways from the others, leading eventually to mutual
unintelligibility, i.e., three languages descended from a
shared ancestor. The basis of the metaphor is biological
evolution, where one species divides into two or more
new species (McMahon, 1994: Chap. 12), even
though linguists often use the term ‘family tree,’
which otherwise denotes trees drawn to represent
human family relationships.
Identifying a language family in the first place
depends on finding individual-identifying evidence
(Nichols, 1996, 1997), patterned similarities between
a set of languages that could not have arisen by chance
and must be the outcome of shared inheritance. The
Indo-European family, for example, was first definitively identified by Sir William Jones in 1786 on the
basis of similarities between the verb paradigms of
Sanskrit, Greek, and Latin. Recently, the Trans-New
500 Language Families and Linguistic Diversity
Figure 1 A conventional family tree of the Germanic languages.
Guinea family has been identified on the basis of
formally similar pronouns (Ross, 2005).
Figure 1 shows a conventional family tree of
the Germanic languages (see Germanic Languages)
and serves as a starting point for a discussion of the
family-tree model. It shows that the family concept is
recursive. Thus English, Friesian, etc. form a West
Germanic family. This, together with the
North Germanic family and the single-member East
Germanic family, forms the Germanic family. The
latter, along with the Romance, Balto-Slavic, IndoIranian, and other, some single-member, families,
is part of the Indo-European family (not shown in
Figure 1).
The languages at all nodes other than the terminals in Figure 1 are reconstructed proto-languages.
The changes that languages undergo are often regular enough to allow quite a safe reconstruction of
their parent based on the correspondences among
them, provided that the linguistic comparative method is carefully applied. Thus the reconstruction
of Proto-Indo-European is reasonably secure. The
reconstruction (and indeed the existence) of its
hypothesized parent, Proto-Nostratic, however, is
considered dubious by many linguists, as it is not
based on a rigorous application of the comparative
method (Nichols, 1997), the maximum reach of
which is usually reckoned to be around 6000–8000
years (see Long-Range Comparison: Methodological
Disputes).
The comparative method provides a means for
inferring the tree structure of a large family by reconstructing the chronology of innovations relative to the
protolanguage. For example, all Germanic languages
reflect the so-called Germanic sound shift, a set of
changes in consonants relative to the reconstructed
system of Proto-Indo-European. It is inferred that this
set of changes only occurred once, in Proto-Germanic,
and this allows us to insert the Proto-Germanic
node into the Indo-European tree (see Subgrouping
Methodology).
Scholars investigating worldwide linguistic diversity need standardized phylogenetic units to work
with. After Morris Swadesh devised the techniques
known as lexicostatistics and glottochronology in the
1950s (Swadesh, 1972), there was a wave of optimism about the possibility of quantifying linguistic
diversity. Researchers drew trees based on the percentage of putative cognates (words related through
shared inheritance, e.g., English house and German
Haus) in each pair of languages on Swadesh’s 100or 200-meaning list, setting percentage ranges for
different levels of grouping.
For example, in work on Papuan languages two
languages were deemed to be dialects of the
same language if the difference between them meant
no less than 70% of list items were cognate; at the
other extreme lists with 5–12% cognates were attributed to the same phylum. The basic classification was
into dialects, languages, subfamilies, families, stocks,
and phyla. Today many scholars think such findings
have little utility for historical reconstruction, as cognate pairs were often identified on the basis of similarity in form, with no attention to whether the words
displayed regular correspondences determined by the
comparative method, and cognacy was often not distinguished from borrowing. Wordlists could easily
contain gaps and elicitation errors, skewing results.
This meant that only lower-level groupings – those
in any case obvious by inspection – were reliable.
Swadesh assumed that basic vocabulary is replaced
at a constant rate, but this assumption no longer
has wide acceptance. A language that has undergone more rapid change than its sisters will appear
more distantly related to them than it really is.
In her study of worldwide typological diversity,
Nichols (1992: 24–25) adopted the units ‘family’
and ‘stock.’ The family she defined as a group
with about the time depth of one of the older
branches of Indo-European (2500–4000 years, e.g.,
Iranian, Balto-Slavic), recognizable by inspection
when regular correspondences between word forms
Language Families and Linguistic Diversity 501
and morpheme paradigms are displayed. The stock is
the deepest phylogenetic node at which a protolanguage is reconstructable by the comparative method
(5000–8000 years, e.g., Indo-European, Austronesian). Nichols (1997) adds the ‘quasi-stock,’ a
grouping of stocks with promising phylogenetic markers but with no regular sound correspondences and
few clear cognates. By this rubric, Afro-Asiatic, for
example, is a quasi-stock. This approach lacks quantitative support, but it has the advantage that groups
of languages under comparison meet the same
methodological requirement.
The failure of quantitatively defined language
groupings has a further consequence. The quantitative approach assumed languages to be composed of
dialects, and families to be composed of languages.
Since there is no quantitative distinction between a
dialect and a language, linguists sometimes fall back
on the assertion that dialects are mutually comprehensible, languages not. But as mutual comprehensibility is also a matter of degree, the distinction has
limited objective validity, and ‘lect’ will be used here
as a convenient cover term for both (see Language
and Dialect: Linguistic Varieties). It follows that
if there is no absolute distinction between the dialects
of a language and the languages of a family, there is
similarly no absolute distinction between a language
with dialects and a family.
Complex Internal Structures
The tree structure of some large families, e.g., IndoEuropean and Austronesian, has been worked out in
some detail. The structures of others have not. There
are at least three reasons for this. Some putative
families such as Afro-Asiatic are generally accepted,
even though their time depth is apparently greater than 8000 years. The data reflect too much
change and too much divergence to allow a thorough
reconstruction of the protolanguage. Other families,
such as Sino-Tibetan, have a history of frequent
migrations back and forth, in the course of which
lects have diversified but remained in contact, influencing each other in ways that make it very difficult
to sort out borrowing from shared inheritance
(LaPolla, 2001). A third reason is that the relationships among languages within many families are very
complicated. In many – and perhaps most – parts of
the world, the idealized model of a language family as
the outcome of a community of speakers dividing into
discrete daughter communities does not fit the data.
Instead, a family may arise through the differentiation of an expanding community’s speech into a network of lects, so that speakers of most lects can
understand those of communities within a certain
radius, but comprehension diminishes the further
the speaker moves away from home.
Language families that arise in this way are the
subject of Johannes Schmidt’s (q.v.) wave theory,
whereby innovations spread out from the center of a
network like the ripples when a stone is thrown into
a pond. The tree and wave models resist integration
into a single model, and this has often been taken as a
sign that they are in conflict with each other. This is
unfortunate, as they model different phenomena. In
the tree model, a community splits, and the lect of one
or more of the new communities undergoes innovations that are subsequently inherited into its daughter
lects. In the wave model, a community spreads, and
innovations spread at different rates through the
resulting network. The tree model fits large amounts
of Austronesian and Uralic data rather well, while the
wave model works better in other areas. However, we
would like to be able to examine the history of a
language family within a single framework, and the
discussion below examines how lectal differentiation
can be interpreted within a tree model and how the
latter needs to be modified to accommodate it.
Dutch and German taken together provide a
testbed. They form a network of lects covering the
Netherlands and Germany (except the Friesian
Islands), nearly half of Belgium, Luxembourg, twothirds of Switzerland, Liechtenstein, and most of
Austria. Until recently, this continuum extended into
Alasace and into areas in northern Italy. If we for a
moment ignore speakers’ present-day (and often
fairly recent) ability to communicate with each
other in varieties of either standard Dutch or standard
German, then we have a situation in which everyone
readily understands nearby lects and in which there
are few major boundaries between groups of lects, yet
lects at greater distances from each other, especially
on a north–south axis, are mutually incomprehensible. How does one split such a network up into
languages in order to represent it in a family tree?
Not surprisingly, attempts to do so disagree. Whatever higher-order groups are posited, each includes lects
that are geographically contiguous with resemblant,
mutually comprehensible lects in other higher-order
groups, reflecting historical relationships that the
grouping belies. A tree cannot do justice to the historical relationships. Instead, the best we can do is to
redraw Figure 2, which shows the West Germanic
part of Figure 1, as Figure 3. However, this avoids
too much distortion by representing dialect networks
very grossly. Norwegian, Danish, and Swedish similarly form a dialect network, and so does English in
Britain and Ireland.
The presenting problem with the Dutch–German
network is that its speakers (and most linguists)
502 Language Families and Linguistic Diversity
would call its lects dialects, yet the degree of diversity
across the network as a whole is intuitively greater
than we expect in a single language. In other words,
the network is like a (small) family made up of dialects, with no intervening level of ‘language.’ This
problem disappears if one recognizes that the distinction between ‘dialects’ and ‘languages’ is artificial
and a matter of degree of divergence. But it leaves
unanswered the question of whether the Dutch–
German network is more appropriately called a
language or a family.
This question can be answered from two perspectives. One emanates from the line of thought above:
the difference between language and family is also
one of degree of diversity, and the question is an
artifact of an unsubstantiated terminological distinction. But this is a little too simple if we are using
‘family’ recursively, because, ascending the tree,
there comes a point at which the diversity within a
collection of lects requires us to call it a family.
It might at a pinch be reasonable to call the Dutch–
German network a language, but it would be unreasonable to call West Germanic, the next node up the
tree, anything but a family, since English, Friesian,
Afrikaans, Dutch–German, and Yiddish are all quite
distinct from each other. Where does this distinctness
reside? Partly in their degree of difference from each
other and partly in the fact that they are not linked by
transitional dialects. It is tempting to attribute these
facts to geography, as the languages are mostly not
contiguous, but the crucial factor is the strength of
the social boundaries that separate speech traditions.
For example, Afrikaans is spoken in South Africa, far
from the Dutch lects of which it is an offshoot. Although they remain largely mutually comprehensible,
their differences are striking. But if geography were
the major factor, we should expect the Englishes
of Britain/Ireland and North America to have
diverged to a similar degree, yet they have not. After
Figure 2 A conventional family tree of West Germanic.
Figure 3 A revised family tree of West Germanic.
the establishment of British rule in 1795, speakers of
the divergent southern African dialect of Dutch
became socially isolated from their fellow-speakers
in the Netherlands and a separate standard emerged.
Looking at the Dutch–German network from
the perspective of social boundaries, we find that
German speakers recognize dialect groupings such
as Low German, Swabian, and Swiss German.
Despite fuzzy boundaries, these groupings also have
a reality for dialectologists, but no one would normally call them languages. One reason for this is that
their social boundaries are also fuzzy, even across
national borders. In this context the application of
the term ‘language’ is as much a product of political
as of linguistic history. The local lects of speakers who
also speak a variety of standard German (in Germany,
Switzerland, Liechtenstein, and Austria) are considered to be German dialects. The local lects of
those who also speak a variety of standard Dutch
(in the Netherlands and Belgium) are considered to
be Dutch dialects. The linguistic repertoire of today’s
dialect speakers also includes their variety of the standard. In Switzerland, speakers are often diglossic in a
Swiss German dialect and Swiss-flavored standard
German, i.e., there is a measure of separateness
between the two systems. Elsewhere, a speaker’s repertoire is a continuum from a local dialect to a version
of the standard, and s/he moves back and forth along
the continuum according to whom s/he is speaking
with. In these speakers, dialect and standard usually
influence each other, with the result that the mutual
comprehensibility of local lects across the Dutch–
German border is being reduced as speakers accommodate to standard Dutch or standard German, and
the standard languages are thereby reinforcing the
social boundaries between the political entities with
which they are associated.
The non-absoluteness of dialect/language and language/family distinctions is also manifest in Sinitic
studies. The Sinitic network has conventionally been
described as the Chinese language, comprising the
Chinese dialects, but the Chinese language is comparable in diversity to the West Germanic family.
This terminological situation has arisen because the
Chinese language has long been coterminous with
the political and social entity of China. The Chinese
dialects and Chinese language are now sometimes
Language Families and Linguistic Diversity 503
called the Sinitic languages and the Sinitic family,
leaving the term ‘Chinese language’ to denote the
standard language.
Relationships between dialect and standard and
relationships among dialects are not easily rendered
in a family tree diagram. Nor are situations where
the historically closest sisters of a language, e.g.,
Afrikaans, are themselves certain dialects of another
language, Dutch.
Extensive lectal networks occur in many parts of
the world. New Guinea and Island Melanesia host
a number, although none with the geographic extent
of Dutch–German. Because their speakers’ repertoire
usually does not include a related standard language,
the dialect/language question is not raised, but the
absence of standard languages over millennia has
permitted quite complex linguistic events when social
boundaries have shifted. The complex history reconstructed in Figure 4 shows one area where networks
have broken and later rejoined in a different configuration as the result of shifting boundaries. The
basis for this reconstruction is the distribution of
innovations. The Austronesian lects of New Ireland,
a large island to the east of New Guinea, share certain
innovations and appear at first sight to reflect the
differentiation of a single lect, Proto-New-Ireland,
into a lectal chain running the length of the island,
then into a series of small families. But the lects of
the southernmost family share other innovations with
lects spoken on islands to the southeast. First, it seems,
a single speech community rapidly settled New
Ireland, and their speech differentiated into a chain
of lects. Initially, population density was low, and
social boundaries emerged between lects. The southernmost lect underwent certain innovations, and then
some of its speakers sailed east and settled Nissan
Island. From Nissan they established settlements
on Buka Island to the south. The result was the
South New Ireland/Northwest Solomonic network.
Initially, frequent contact was maintained across
the new network, more frequent than with other
communities in New Ireland. Later on, populations
increased, and the south New Ireland community
resumed contact with communities to its north and
was reintegrated into a New Ireland lectal network,
through which innovations now passed (i.e., limited
koineization occurred). Links with Nissan diminished
to an annual voyage. The speech of the northwest
Solomons community underwent certain innovations and became Proto-Northwest-Solomonic, then
its speakers scattered through parts of Bougainville
and the New Georgia group and across Choiseul and
Santa Isabel to form the communities where the member lects of today’s Northwest Solomonic family are
spoken (Ross, 1988: 216–218, 258–259, 293–313).
Although the lects of New Ireland look as if they
form a single node in the tree, the analysis of shared
innovations reveals a much more complex history.
A similar but more complex series of events has occurred in the history of the Fijian lects and the genesis
of Proto-Polynesian (Geraghty, 1983, summarized by
Ross, 1997: 227–229).
Cases such as New Ireland and Fiji raise a question
of method. If, as in southern New Ireland, two sets
of innovations overlap in a language or a group of
languages, then they cannot both reflect a shared
ancestor. One or both sets must reflect diffusion
across lectal boundaries. How does one determine
whether one set (or neither) reflects a shared ancestor? The answer lies in the nature of the innovations.
The innovations common to the languages of southern New Ireland and the northwest Solomons entail
bound morphology, which is only very rarely subject
to diffusion. The innovations common to New
Ireland as a whole do not, and are candidates for
diffusion.
Figure 4 Network breaking and joining in New Ireland (after Ross, 1997: 230).
504 Language Families and Linguistic Diversity
Mechanisms of Diversification
A family tree diagram can lull one into the false
assumption that diversity occurs between lects, not
within them, and that lects at terminal nodes are
homogeneous. However, we know that an individual’s speech varies – for example, along a continuum
from local lect to standard – and that the degree to
which variables are manifested differs among speakers. A speaker’s choice of variables on particular
occasions is in a complex relationship to her personal
social network. We can distinguish between primary
and secondary networks (Nettle, 1999: 67). At least
in traditional societies, a person’s primary network is
likely to be dense, multiplex, kin-based, and enduring,
whereas links in the secondary network have single
functions such as trade. This distinction is a simplification, but a helpful one. Grossly, the primary network determines the features of a person’s speech,
while the secondary network mediates changes in
those features.
Social network research shows that speakers are
likely to use variables – pronunciations, words and
phrases, grammatical constructions – which are used
by and identify them with others in their primary
network, although age and gender are also determinants (Milroy, 1987, 2001). The choice of variables
on a particular occasion is also biased by the nature of
that occasion and by the speech of the interlocutor.
There is thus some diversity in even the smallest unit
of a network, namely a speaker’s primary links.
How do innovations enter a tight-knit primary
network? The answer seems to be that its members
typically have weak ties with outsiders through
their secondary network, who in turn, because
of their social roles, have weak ties with members
of other primary networks, and it is these multipleweak-tie individuals who are less subject to the norm
enforcement of a primary network and act as carriers
of innovation across the larger network (Milroy and
Milroy, 1985).
How an innovation begins is difficult to investigate,
but it seems that a speaker repeats a variant (random
or deliberate) which is acquired by the speaker’s children and/or copied by other speakers and selected as
the marker of a social group (Weinreich et al., 1968;
Milroy, 1992: 170; McMahon, 1994: 251; Croft,
2000: 44–78). Among the Takia of Karkar Island
(Papua New Guinea), there is a division between
coastals and inlanders. Coastals distinguish the phonemes /l/ and /r/. Inlanders merge them as /l/. Every
Takia adult knows about this difference. Comparative
evidence shows that the innovators are the inlanders.
The merger of /l/ and /r/ as inland /l/ must have occurred randomly in the speech of a single speaker, been
acquired by others, selected as an inlander marker,
and carried from one primary network to another by
inlanders with multiple secondary links. Nettle (1999:
Chaps. 2–3) finds that an innovation is unlikely to
catch on without the amplification afforded by social
selection.
When a speech community becomes two or three
communities and contact among the new communities is reduced, their speech may diverge and new
lects appear. The divergent changes are driven by
social selection. The deciding factor behind divergence is the strength of social boundaries – the weakening of links between social networks so that new
social identities emerge.
Some languages are far more resistant to change
than others. This difference has been attributed to social network structure. Milroy and Milroy (1985) propose that when the speakers of a language form a set
of overlapping tight primary networks, i.e., a tight-knit
speech community, with only weak social links to
other groups, the language typically changes very slowly. The paradigm case is Icelandic, little changed since
early medieval Old Norse, unlike its radically transformed sisters Norwegian, Danish, and Swedish. At
the opposite extreme are papuanized Austronesian
languages in mainland New Guinea. Their speakers
entered into symbiotic relationships with speakers of
Papuan languages so that their secondary links were
with Papuan speakers whose alien speech was the
source of extensive innovations (Ross, 1996). However, Labov (2001: Chap. 10) offers an occupationbased interpretation of data sets like the Milroys’,
and there are relatively isolated communities (e.g., in
Polynesia) where change has been quite fast, and
apparently even tight-knit communities where speakers
have exaggerated differences from their neighbors, i.e.,
accelerating change to maintain isolation (Anderson,
1988; Thurston, 1989; Ross, 2003).
Nettle (1999: 66–78) argues that economic (inter)
dependence is the main reason for language spread.
He contrasts the many small speech communities
of inland New Guinea, each of them economically
largely independent, with the widespread communities of the Hausa, the Fulani, and the Tuareg of
on the southern edge of the Sahara, where survival
requires distant economically based relationships.
Hunter-gatherer pygmy groups in central Africa depend on their farmer neighbors to supplement their
diet, and each group of pygmies has shifted to the
language of the farming group with which it is paired.
Hunter–gatherer negrito groups in the Philippines
have adopted Austronesian languages from farmer
neighbors (Reid, 1994). The difference between
these situations and the one which led to the papua-
Language Families and Linguistic Diversity 505
nization of Austronesian languages in New Guinea
was probably a difference in the strength of primary network links: these links were stronger among
Austronesian-speaking horticulturalists than among
hunter- gatherers.
With explicable exceptions, language density is
greater nearer the Equator than further away from
it, perhaps because it is correlated with ecological
risk: the more difficult it was to sustain human
life, the larger the economic and therefore linguistic
networks that came into being (Nettle, 1999: 60–66,
79–93). This economy–language pairing continues
in more recent events. The expansion of European
colonizers across the planet since the 15th century
caused a devastating reduction in language diversity,
now being intensified by the push toward economic
globalization.
The Origins of Phylogenetic Diversity
The origins of the phylogenetic diversity of languages
lie far back beyond knowability. Are all today’s
language families descended from a single ‘ProtoWorld’? Many people have assumed so, not on linguistic evidence but on the basis of a single human
origin in east Africa. The comparative method
allows us to reconstruct linguistic history only as far
back as about 8000 years, yet structurally modern
language has probably been around for 100 000
years.
Nichols (1992: 221–230) finds a geographic patterning of morphosyntactic features in the world’s
languages, which, she suggests, are fossil reflexes of
the original spread of languages. She traces a path
from the Old World into the Pacific and rapidly
south to Australia, followed by circum-Pacific and
New World colonization. Those who entered the
Americas across the Bering Strait were related to a
circum-Pacific group, not an Old World population.
Using Nichols’ stocks, Nettle (1999: 113–129)
divides the world into nine regions and plots stock
and language densities for each of them. Setting aside
the New Guinea figures, which are significantly
higher than anywhere else, he finds that stock (phylogenetic) density in Africa, Europe, and Asia is far
lower than in the rest of the world. In the rest of
the world there is a simple correlation: the more
languages, the more stocks. This situation is claimed
to reflect time depth. Over time, interaction between human populations has led through language
shift to stock extinctions. On archaeological estimates, the time depth of human language in Africa
is more than 100 000 years; it ranges from 60 000
to 40 000 years in Europe, Asia, Australia, and
New Guinea; and is as little as 12 000 years in the
Americas (although this dating is controversially recent). Nettle’s stock densities are in an approximate
inverse relationship to these time depths.
Bellwood (1997) argues that the Neolithic transition, i.e., the transition from foraging to agriculture,
caused the expansion of some of the world’s largest
language families, presumably leading to increased
extinctions of languages and stocks. He notes four
significant cultivation events that apparently caused
the expansion of languages into often large families,
probably replacing their own closest relatives as well
as other families in the process:
1. Wheat, barley and legumes in the Fertile Crescent
by 8000 BC: Afro-Asiatic, (controversially) IndoEuropean, Elamo-Dravidian, Kartvelian.
2. Taro and bananas in the highlands of New Guinea
by 8000 BC (Denham et al., 2003): Trans-New
Guinea family, limited to the New Guinea region.
3. Rice, foxtail and broomcron millet in the Yangtze
and Yellow River Basins by 6000 BC: Austronesian,
Tai-Kadai, Austro-Asiatic, Sino-Tibetan, HmongMien.
4. Sorghum and pearl millet in sub-Saharan Africa by
2000 BC (?): Nilo-Saharan and Niger-Congo.
These events may have sharply reduced the world’s
phylogenetic diversity. Relics of what was there before perhaps survive in Basque in the Pyrenees, in the
diversity of the language families of the Caucasus,
and of parts of New Guinea and parts of the Americas
(see Nichols, 1997).
In Diamond and Bellwood (2003), the Neolithictransition hypothesis is extended across the world,
but somewhat tentatively, and wisely so. Although it
is probably correct for Austronesian, and no doubt
elsewhere, it would be wrong to overgeneralize it.
Nilo-Saharan and Niger-Congo are not necessarily
phylogenetic units (Nichols, 1997). Pama-Nyungan
in Australia, Uralic, and Chukchi-Kamchatkan in
northern and central Eurasia, Khoisan in southern
Africa, and Athabaskan and Eskimo-Aleut in northwest North America have all expanded without a
Neolithic transition, although there may have been
other economic reasons for their spreads. Various
agricultural families have not expanded significantly: Ramu–Lower Sepik in New Guinea, NakhDaghestanian and Kartvelian in the Caucasus, and
various families in the Americas (Campbell, 2002).
The putative Indo-European agricultural expansion
(Renfrew, 1987) has been called into question by
Nichols (1998), who brings a cohort of arguments,
linguistic, geographic and historical, for a homeland
in western Central Asia around 3500 BC. This is part
of a larger argument that stock densities are also
heavily influenced by geographic factors (Nichols,
506 Language Families and Linguistic Diversity
1997), to which Campbell (2002) would add differences in social behaviors among speech communities.
A New Evolutionary Model
Recently, an evolutionary model has been proposed
in which the replicator is what Croft (2000) calls the
lingueme. Its basis is a general theory of evolution
applicable to change in social phenomena as well as in
biology. There are three versions of the model (space
compels us to ignore their differences): Nettle (1999),
Croft (2000), and Enfield (2003). Croft’s is the most
articulated version. A lingueme is any unit of linguistic structure, be it a phoneme, a morpheme, a word or
phrase, or a construction, i.e., the units which are
the parameters of intra- and interspeaker variation.
The linguistic counterpart of the biologist’s DNA
string is the utterance, a structured set of linguemes.
The speaker, as the repository of grammar, corresponds to the biological organism (the interactor).
As in network theory, speakers form a networked
population, and language change occurs when a lingueme is propagated across the network in altered
form and becomes manifest in speakers’ utterances.
Croft (2000: Chaps. 5–6) builds in a paradigm of
possible innovation types as well.
What is new and what gives the model its integrative power is the view of the lingueme as the
replicator. The language is deliberately left out of
focus, recognizing that a speaker’s repertoire is a
structured collection of variables that do not necessarily have common geographical boundaries. The
dialect vs. language vs family issue does not arise.
What is somewhat uncertain, however, is where
this leaves the grammar. Enfield stresses that language
learners infer ‘‘grammatical patterns’’ from others’
behavior. The idea that a single grammar is shared
by speakers or is transmitted from one generation to
the next is for him a cultural illusion. Linguistic signs
‘‘are best understood as theories, constructed by individual speakers over time by a process of trial and
error’’ (Enfield, 2003: 2–3). Sets of signs come to
cohere as systems in individuals’ minds, and speakers’
grammars have a lot in common because of the need
for coordination. Croft (2001: 29) grants that the
evolutionary model implies looser grammatical organization than either the structuralist or the generative
model, but attributes ‘‘a high degree of structural
organisation’’ to the lingueme pool. He asserts that
through the utterances she hears, a child inductively
acquires syntactic constructions, and then from these
infers a taxonomic network of constructions (Croft,
2001: 57–58).
Nettle, Croft, and Enfield all give pride of place to
contact phenomena as a justification for the new
model. They point out that if the replicator is the
lingueme, then it does not matter whether an
altered lingueme has its origin in what speakers recognize as ‘‘their own language’’ or ‘‘another language.’’ The process of change is the same. Under
the family tree model, a language is only allowed to
have one parent, and contact phenomena, they imply,
tend to be pushed under the carpet. This is true, but
the emphasis on contact risks overlooking what is
modeled in a family tree, namely the generational
continuity of a language. Although grammar is not
itself a replicator and is reconstructed by each new
speaker, parents and children usually have a clear
sense that they are speaking the same language. Furthermore, contact in most cases entails bilingualism,
which causes the linguemes of one language to be
modified under the influence of the other. The modified language is usually still recognized as the same
language by the next generation, whether altered
linguemes have a language-external or languageinternal origin. This is so even for the papuanized
Austronesian languages of New Guinea. The generational continuity depicted by a family tree is only
broken when a social catastrophe occurs, as when
the transportation of Melanesians to far-off plantations from the 1860s led to the rapid stabilization of
Pacific Pidgin, a language with no previous existence
as a system (Ross, 1997: 251–253). Language contact
may have been marginalized by the family tree model,
but it would be a pity if generational continuity were
marginalized under the new model.
See also: Cladistics; Contact-Induced Convergence: Typology and Areality; Cultural Evolution of Language; Early
Historical and Comparative Studies; Evolutionary
Theories of Language: Current Theories; Evolutionary
Theories of Language: Previous Theories; Fijian; Genetics and Language; Germanic Languages; Labov, William
(b. 1927); Language and Dialect: Linguistic Varieties; Language Change and Language Contact; Language/Dialect
Contact; Long-Range Comparison: Methodological Disputes; Microparametric Variation; Origin and Evolution
of Language; Papua New Guinea: Language Situation;
Phonemics, Taxonomic; Solomon Islands: Language Situation; Subgrouping Methodology; Variation and Language: Overview; Variation in German.
Bibliography
Andersen Henning (1988). ‘Centre and periphery: adoption, diffusion and spread.’ In Fisiak J (ed.) Historical
dialectology. Berlin: Mouton de Gruyter. 39–85.
Bellwood P (1997). ‘Prehistoric cultural explanations
for the existence of widespread language families.’ In
McConvell P & Evans N (eds.) Archaeology and
Language Families and Linguistic Diversity 507
linguistics: Aboriginal Australia in global perspective.
Melbourne: Oxford University Press. 123–134.
Campbell L (2002). ‘What drives linguistic diversification
and language spread?’ In Bellwood P & Renfrew C (eds.)
Examining the farming/language dispersal hypothesis.
Cambridge: McDonald Institute of Archaeological
Research. 49–63.
Croft W (2000). Explaining language change: an evolutionary approach. Harlow: Pearson Education.
Croft W (2001). Radical construction grammar: syntactic
theory in typological perspective. Oxford: Oxford
University Press.
Denham T P, Haberle S G, Lentfer C, Fullagar T, Field J,
Therin M, Porch N & Winsborough B (2003). ‘Origins
of agriculture at Kuk Swamp in the Highlands of
New Guinea.’ Science 201, 189–193.
Diamond J & Bellwood P (2003). ‘Farmers and their
languages: the first expansions.’ Science 300, 597–603.
Enfield N J (2003). Linguistic epidemiology: semantics
and grammar of language in mainland Southeast Asia.
London: RoutledgeCurzon.
Geraghty P (1983). The history of the Fijian languages.
Oceanic Linguistics special publication No. 19.
Honolulu: University of Hawaii Press.
Labov W (2001). Principles of linguistic change. 2: Social
factors. Oxford: Blackwell.
LaPolla R J (2001). ‘The role of migration and language in
the development of the Sino-Tibetan languages.’ In
Aikhenvald A & Dixon R M W (eds.) Areal diffusion
and genetic inheritance: problems in comparative linguistics. Oxford: Oxford University Press. 225–254.
McMahon A (1994). Understanding language change.
Cambridge: Cambridge University Press.
Milroy J (1992). Linguistic variation and change: on the
historical sociolinguistics of English. Oxford: Blackwell.
Milroy J & Milroy L (1985). ‘Linguistic change, social
network and speaker innovation.’ Journal of Linguistics
21, 339–384.
Milroy L (1987). Language and social networks (2nd edn.).
Oxford: Blackwell.
Milroy L (2001). ‘Social networks.’ In Chambers J K (ed.)
The handbook of language variation and change.
Oxford: Blackwell. 549–572.
Nettle D (1999). Linguistic diversity. New York: Oxford
University Press.
Nichols J (1992). Linguistic diversity in space and time.
Chicago: Chicago University Press.
Nichols J (1996). ‘The comparative method as heuristic.’ In
Durie M & Ross M D (eds.) The comparative method
reviewed: irregularity and regularity in linguistic change.
New York: Oxford University Press. 39–71.
Nichols J (1997). ‘Modeling ancient population structures
and movement in linguistics.’ Annual Review of Anthropology 26, 359–384.
Nichols J (1998). ‘The Eurasian spread zone and the IndoEuropean dispersal.’ In Blench R M & Spriggs M (eds.)
Archaeology and language, 2: Correlating archaeological
and linguistic hypotheses. London: Routledge. 220–266.
Reid L A (1994). ‘Unravelling the linguistic histories of
Philippine negritos.’ In Dutton T E & Tryon D T (eds.)
Language contact and change in the Austronesian world.
Berlin: Mouton de Gruyter. 443–475.
Ross M D (1988). Proto Oceanic and the Austronesian
languages of western Melanesia. Canberra: Pacific
Linguistics.
Ross M D (1996). ‘Contact-induced change and the comparative method: cases from Papua New Guinea.’ In
Durie M & Ross M D (eds.) The comparative method
reviewed: regularity and irregularity in language change.
New York: Oxford University Press. 180–217.
Ross M D (1997). ‘Social networks and kinds of speechcommunity event.’ In Blench R M & Spriggs M (eds.)
Archaeology and language, 1: Theoretical and methodological orientations. London: Routledge. 209–261.
Ross M D (2003). ‘Diagnosing prehistoric language contact.’ In Hickey R (ed.) Motives for language change.
Cambridge: Cambridge University Press. 174–198.
Ross M D (2005). ‘Pronouns as a preliminary diagnostic for
grouping Papuan languages.’ In Pawley A, Attenborough
R, Hide R & Golson J (eds.) Papuan pasts: Investigations
into the cultural, linguistic and biological history of the
Papuan-speaking peoples. Canberra: Pacific Linguistics.
Swadesh M (1972). The origin and diversification of
language. London: Routledge and Kegan Paul.
Thurston W R (1989). ‘How exoteric languages build a
lexicon: esoterogeny in West New Britain.’ In Harlow R
& Hooper R (eds.) VICAL 1, Oceanic languages:
papers from the Fifth International Conference on
Austronesian Linguistics. Auckland: Linguistic Society
of New Zealand. 555–579.
Weinreich U, Labov W & Herzog M (1968). ‘Empirical
foundations for a theory of language change.’ In
Lehmann W P & Malkiel Y (eds.) Directions for historical linguistics. Austin: University of Texas Press. 95–195.