The Oxford Handbook of Language Evolution

da Oxford hnadbook of language evolution M. Tallerman & K. Gibson eds (Oxford Oxford UP 2012)
chapter 51
..............................................................................................
P ROTO L A N G UAGE
..............................................................................................
maggie tallerman
51.1 W H AT
I S P ROTO L A N G UAG E A N D
W H E N D I D I T E M E RG E ?
................................................................................................................
Most researchers suggest that early hominin communication involved some form
of pre-language, or protolanguage. Protolanguage is seen as simpler than full
language, with a proto-lexicon, i.e. storage for learned, meaningful signals, but
no syntax (Bickerton 1990). Protolanguage may have utilized vocal, gestural, and
mimed components (see Arbib; Corballis; Donald; Harnad, this volume). However,
I assume for brevity that the primary—and dominant—modality was vocal.
A gestural protolanguage predicts sign as the dominant modern language modality,
obviously incorrectly. Scenarios for a switch from manual to vocal can easily be
invoked (for instance, speech frees up the hands for increased tool use), but all lack
evidence.
We have no idea when protolanguage evolved, though many proposals link the
Homo genus with the first protolanguage, perhaps 2 million years ago (mya).
Suggested evidence comes from the archaeological and palaeoanthropological
records. Stone tools appear around 2.6 mya (Wynn, this volume), including
sharp-edged cutting tools, a kind of tool that is apparently beyond the capability
of modern great apes. The fossil record indicates hominin brain growth and a
developing vocal tract (MacLarnon; Mann; Wilkins; Wood and Bauernfeind, this
volume) but provides only indirect evidence. Selective pressures for vocal tract
changes doubtless come from the emerging speech capacity, but even dating speech
480
maggie tallerman
would not date language: language could occur without speech (i.e. sign), and the
speech capacity could merely indicate a limited protolanguage.
On selective pressures specific to hominins, Bickerton (2009a) suggests a scenario involving construction of new niches: early Homo is postulated to employ
cooperative scavenging, vocally recruiting aid to protect large carcasses. This builds
in the evolution of cooperation, changes in diet leading to morphological changes
in brain and body, and the beginnings of protolanguage. It appears consistent with
the archaeological and fossil record: early hominins split animal bones to extract
marrow using stone tools. Scavenging constitutes novel behaviour in the hominin
lineage, thus might be the novel selection pressure needed for the development of
new communicative behaviour.
Lacking any definitive way to date protolanguage, I assume it emerged within the
last 2 million years or so, roughly coincident with early stone tools and with brain
growth beyond the size of other primate brains. Full language must already be in
place by the time of the human diaspora, around 85 kya (thousand years ago), since
all modern populations have the same language faculty (i.e. can learn any natural
language); at the latest, the full language faculty must be fixed before Australia was
settled, around 50–60 kya. Possibly, full language is linked to the speciation of
Homo sapiens, around 195 kya (McDougall et al. 2005). We cannot tell whether
other species had (any form of) language; for instance, there is debate over the
linguistic capacity of Neanderthals (e.g. Mithen 2005; P. Lieberman 2006), which
became extinct around 30 kya.
51.2 C O M P O S I T I O NA L
P ROTO L A N G UAG E
................................................................................................................
A compositional or lexical protolanguage consists of single proto-words, initially
uttered separately and slowly, and subsequently joined in short, fairly random
sequences (Bickerton, this volume). It has no hierarchical structure, no syntactic
combinatorial principles, and only a loose pragmatic relationship between protowords. What meanings might proto-words express? Bickerton (1990, 2009a),
Hurford (2003a), and Tallerman (2007) suggest noun-like and verb-like words,
while Heine and Kuteva (2007, this volume) suggest that noun-like words appear
first, with all other categories deriving from these, including verbs.
The term ‘proto-word’ reflects significant differences between proto-vocabulary
and true words. Modern vocabulary items (almost) all fit into a structured semantic network, and contract obligatory relationships with other words. For instance,
transitive verbs and prepositions require syntactic objects with specific semantic
properties. No subcategorization relationships (Tallerman, Chapter 48) would
protolanguage
481
occur in protolanguage—proto-words did not obligatorily select other words.
Modern ‘defective’ vocabulary items are similar, having phonology and meaning
but no syntax; examples include yes/no, hello, ouch, oops, wow, hey, shh, psst, and
words for animal calls, like oink and woof. Jackendoff (1999, 2002) regards these as
living linguistic fossils. Unlike ordinary words, defective words are used alone as
meaningful utterances, and cannot be combined (except in quotatives: She said
‘ouch’). Some are largely involuntary, apparently under right hemisphere control,
and some even survive aphasia; such features are suggestive of primate calls.
However, unlike primate calls, they are crucially culture-specific, and thus learned.
A crucial development occurs when words are first combined: stringing protowords together, still without hierarchical structure, brings an advance on a singleword stage (Bickerton, this volume). Jackendoff (2002) suggests compounding as a
pre-syntactic principle, involving mere concatenation. Consider some English
compounds: penknife, breadknife, table knife, butter knife. The relationships between the semantic head (knife) and the modifier are idiosyncratic: penknives
sharpen pens, breadknives cut bread, table knives are used at the table, and butter
knives spread butter. Compounding forms around twenty distinct semantic relationships, potentially producing quite an expressive protolanguage. Spontaneous
compound-like (signed) utterances occur in captive apes, such as the chimpanzee
Washoe’s ‘water bird’ for duck (Fouts 1975), but we cannot know if chimpanzees
intend one sign to modify another, or understand that for the observer, concatenations generate meaning.
What is the evidence for an unstructured word þ word (þ word . . . ) protolanguage stage? Bickerton (1990, 1995) argues that the protolanguage capacity is
retained by modern humans, emerging in child language and in pidgins (Roberge,
this volume), in adults learning a second language naturalistically, in homesign
(used by the deaf children of hearing adults; Goldin-Meadow, this volume), and in
linguistically-deprived children, who lack adequate input within the critical acquisition period (see Curtiss 1977 on ‘Genie’). Moreover, protolanguage may not be
species-specific: trained apes can produce sign or symbol ‘utterances’ with short
sequences of proto-words, lacking syntax (e.g. Gardner and Gardner 1969; Patterson 1978; see Gibson, Chapter 3). Though non-human primates never acquire
protolanguage in the wild, their abilities in captivity suggest some phylogenetically
ancient pre-linguistic capacities.
Examples of various putative forms of modern protolanguage are given in (1)–(3);
Roberge, this volume, provides samples of pidgins:
(1) Ape protolanguage (Kanzi (bonobo), using a combination of lexigrams on a
keyboard and gestures; data from Savage-Rumbaugh et al. 1998)
water chase
water balloon
food childside childside orange
Matata bite
chase water
482
maggie tallerman
bad water
chase you
juice raisins
Austin carry
childside carry hide peanut
(2) Child language (Tom, 23 months)
doggie fall
I get that
want hat
there birdie
put sock off more milk
where gone Tom cup
(3) Genie (Curtiss 1977)
Paint. Paint picture. Take home. Ask teacher yellow material. Blue paint.
Yellow green paint. Genie have blue material. Teacher said no. Genie use
material paint. I want use material at school.
Protolanguage exhibits the following properties (e.g. Bickerton 1990). First, the
ordering of elements is relatively random. No hierarchical syntactic structure
constrains surface order, and different word orders have no link to information
structure (e.g. given vs. new information). The bonobo Kanzi illustrates this,
producing Austin carry when asking to be carried to see Austin (Savage-Rumbaugh
et al. 1998: 62), but Matata bite when Matata (his mother) bites him. SavageRumbaugh reports definite ordering regularities in Kanzi’s output (Greenfield and
Savage-Rumbaugh 1990, 1991), but these differ significantly from the spoken
English input that Kanzi receives. Pre-grammatical utterances in young children
typically reflect closely the word orders of the ambient language, specifically by
being consistently head-initial or head-final; Genie’s utterances in (3) also share
this characteristic. Simple word order regularities do not, though, necessarily
indicate syntax. Ancestral protolanguage putatively contained various ‘purely
semantically based principle[s] that map into linear adjacency without using
anything syntactic’ (Jackendoff 2002: 248). ‘Fossils’ of these principles, such as
Agent First and Focus Last, still occur: Agent First produces the subject-initial
constituent order found in 90% of languages today. Another principle, Grouping,
ensures that in dog brown eat mouse, the dog is brown, while in dog eat brown
mouse, the mouse is brown; Agent First ensures that the dog is eating. Yet no
constituent structure is assumed here. Pre-syntactic principles of this nature in
protolanguage could start to tie information structure to ordering.
Next, consider the subcategorized arguments of verbs and other syntactic heads. In
full languages, these are often phonetically null, but are systematically related to overtly
present categories. Below, e stands for ‘empty’, an element understood, not pronounced:
(4) Kim is too mean [e] to make supper. (Here, [e] ¼ Kim)
(5) Kim is too mean [e1] to make supper for [e2]. (Here, [e1] is not Kim, but
[e2] ¼ Kim)
protolanguage
483
The meanings of null elements are not random, but are syntactically regulated.
Contrast the protolanguage utterances from Genie: Paint picture (where the agent
is missing) or Take home (where the agent and patient—the item to be taken—are
both missing); null elements in Kanzi’s utterances appear similarly unconstrained.
Bickerton notes that ‘in protolanguage . . . any item may be absent from any
position’ (1990: 124); null elements are randomly distributed, so subjects or objects
of verbs often get omitted. In ancestral protolanguage, a major rubicon is crossed
when words start requiring co-occurring words with specific syntactico-semantic
properties.
However, some full languages also exhibit few syntactic restrictions on null elements. In Chinese, ‘He saw him’ translates as in (6), with an overt subject and object,
but is also expressed (in an appropriate pragmatic context) as (7a), (b) or (c):
(6) ta kanjian ta le
he see
he aspect
‘He saw him.’
(7) a. [e] kanjian ta le
b. ta kanjian [e] le
c. [e] kanjian [e] le
Of course, Chinese is not protolanguage, but a full language with regular subcategorization requirements: kanjian ‘see’ is a transitive verb with an animate agent and
a visible patient, just as in English. But the principles regulating null elements in
English are not universal, so cannot form a model for how protolanguage became
language.
Ancestral protolanguage putatively lacked a mechanism for assembling words
into structural units (Bickerton 2009a, 2010, this volume): initially, there were no
syntactic relationships between proto-words. For Bickerton, the crucial development is the appearance of ‘Merge’ (more precisely, the novel linguistic use of an
existing cognitive ability), which combines two words to form a phrase, then
combines that phrase with another word, forming a larger phrase, and so on.
Under this view, clausal subordination is not special; it is simply due to repeated
applications of Merge. (Modern) protolanguage sometimes exhibits apparent
subordination, such as Genie’s I want use material at school, but these are merely
prefabricated routines, consisting of I want þ state of affairs; they are not productive. Bickerton suggests (2009b) that protolanguage-speaking children may not yet
know the kinds of verbs that require subordinate clauses.
Finally, protolanguage lacks a distinction between lexical elements (primarily
verbs, nouns, adjectives) and functional elements (grammatical items, including
determiners, auxiliaries, and sub-words such as affixes). Modern protolanguages
lack grammatical markers (for instance, Genie’s ask teacher yellow material, or
Kanzi’s ball slap), while in full languages, functional and lexical elements occur in
484
maggie tallerman
roughly equal proportions in utterances. There is widespread agreement that
ancestral protolanguage would contain at most two categories, the precursors to
nouns and verbs. The transition to language involved the gradual accretion of other
word classes via the same processes of grammaticalization that occur in all recorded
languages: see Bybee; Heine and Kuteva, this volume; Hurford 2003a; Tomasello
2003b. Some grammatical elements mark phrase and clause boundaries, so presumably by the time these appear, speakers have passed the item-by-item stage of
protolanguage production, and instead pre-form phrases before they are uttered.
Of course, modern peoples all possess a full language faculty; thus, Bickerton’s
proposals to model ancestral protolanguage on modern child language or on
restricted linguistic systems are controversial. Modelling protolanguage on ape
‘language’ capacities is also controversial: the last common ancestor of chimpanzees and humans, around 6 or 7 mya, need not possess any specifically ‘linguistic’
characteristics, since there is plenty of time since the split for the full suite to evolve.
In sum, using modern reflexes of ‘protolanguage’ as evidence for ancestral protolanguage is contentious. Nonetheless, the models of protolanguage presented by
Bickerton and by Jackendoff—together with pathways of grammaticalization outlined by Heine and Kuteva—go a long way towards elucidating likely processes in
language evolution.
51.3 P R I M AT E
VO C A L I Z AT I O N S A N D
P R I M AT E C O G N I T I O N
................................................................................................................
Under the compositional view of protolanguage, the earliest development is the
creation of arbitrary signals connecting sounds to simple meanings—a protovocabulary of symbols, lacking word classes. Proponents of this view reject the
idea that protolanguage emerged from a primate communication system, though it
may have utilized essentially the same mechanical means of sound production.
Proto-vocabulary thus represents a major discontinuity with primate communication. The evidence draws on known features of vocabulary, which show little
overlap with features of natural vocal or gestural communication in other primates. In this scenario (e.g. Burling 2005) primate gesture-calls are not precursors
to words; instead, the critical continuities are cognitive.
The primate literature certainly provides increasingly sophisticated evidence
concerning primate vocalizations and gestures (in the wild and in captivity),
indicating a network of similarities and differences between human and nonhuman primates (see the contributions in Part I). But there is no straightforward
pathway via which primate vocalizations and/or gestures could ‘turn into’
protolanguage
485
linguistic utterances. Below, I focus on comparison of words (or proto-words) and
primate calls.
Human (proto-)words differ from primate calls in major ways. The first concerns symbolic reference (see Deacon, this volume). Some primate vocalizations
have functional reference: they refer to events or entities external to the caller, and
not merely the animal’s own emotional state. This applies both to monkey and ape
vocalizations (see Slocombe and Zuberbühler 2006; Slocombe, this volume), but
prime examples are monkey alarm calls. Vervets produce different calls in response
to each main predator, eagles, leopards or snakes (Cheney and Seyfarth 1990, this
volume; Zuberbühler, this volume). Conspecifics clearly associate each alarm call
with specific dangers, taking appropriate avoidance action for each predator type.
But alarm calls are not word-like. They are more like propositions, rather than
‘referring’ to specific predators. As Bickerton (2009a: 200) points out, alarm calls
may simply mean ‘threat from the ground’, ‘threat from above’, and so on. Unlike
vocabulary items, each alarm call is tied to one specific context, with no flexibility
or nuances of meaning (monkeys can’t indicate a particularly mean leopard).
Primate calls never form part of an interconnected network of related symbols,
as words do. And critically, alarm calls cannot be used merely to mention the
concept of a leopard.
Thus, a second distinction between human (proto-)vocabulary and primate calls
is displacement. Using words in the absence of their referent is a novel feature in
primate communication, though limited tactical deception (uttering an alarm call
when no predator is visible) occurs in some monkeys (Wheeler 2009), and perhaps
in chimpanzees too. The earliest hominin vocabulary likely had no displacement.
For instance, a word indicating an animal is initially uttered when the animal is
visible to both speaker and addressee. Later, the speaker sees the animal while his
companion is distracted, and uses the animal’s name, since he can see it. If the
companion is smart, he understands. The ability to comprehend displacement thus
plausibly emerges before it is used deliberately in production, to refer to entities not
present; see also Burling (2002). Displacement must be a highly adaptive feature of
protolanguage (Bickerton 2009a).
A scenario like this reveals a third major difference between human use of
vocabulary and primate communication: only the former exhibits shared intentionality (Tomasello et al. 2005; Tomasello and Carpenter 2007), the mutual
commitment to collaboration found in human interactions. Before 1-year-old,
human infants can triangulate between themselves, an adult, and external objects,
by pointing, gaze-following, or offering objects for inspection: they establish joint
attention. Thus, unlike primate calls, words can be used merely to mention, or to
point something out. Other primates use vocalizations and gestures to draw
attention to themselves (e.g. to initiate play, or beg for food). But only humans
engage spontaneously in triadic reference: two people attend to some external
486
maggie tallerman
object, and agree on a convention for referring to it. Such collaboration is an
absolute prerequisite for protolanguage (Tomasello 2003b).
Shared intentionality gives rise to a fourth distinction: human vocabulary
involves cultural transmission and learning, unlike non-human primate vocalizations. Moreover, though language has a critical period, vocabulary learning does
not atrophy in adults. Learning also brings the prospect of innovation. Other
primates are reported to produce some novel vocalizations (in Part I see Gibson;
Slocombe), but in the wild their call repertoire is basically fixed, whereas vocabulary is productive and open-ended.
Fifth, vocal learning requires both vocal control and vocal imitation. Researchers
originally assumed that non-human primates only produce ‘affective’ (i.e. emotionally-driven) vocalizations, and couldn’t vocalize at will, suppress, or modify
vocalizations. This is now known to be inaccurate. Audience effects and call
modifications occur both in monkeys (Cheney and Seyfarth 1990) and great apes
(Slocombe and Zuberbühler 2007; see Slocombe; Zuberbühler, this volume). This,
along with tactical deception, suggests elements of vocal control. On the other
hand, primate vocalizations are essentially driven by an internal state, rather than
being volitional. And vocal imitation is a novelty in the human lineage.
Such distinctions between words and primate calls might all appear in the
earliest proto-words. Full language displays yet more differences. Duality of patterning is pivotal: a small, discrete set of sounds combines in different ways
(phonology), giving rise to open-ended sets of morphemes and words, which
themselves combine productively to form phrases and clauses (syntax). We assume
protolanguage to lack this duality, and primate calls have nothing analogous (and
nothing homologous). Ancestral protolanguage would not yet have a generative
phonological system: for instance, Lindblom (1998) demonstrates that pressure for
phonological complexity only comes from increases in vocabulary (see MacNeilage;
Studdert-Kennedy, this volume).
In sum, the earliest words had little in common with primate calls, apart from
probably using the vocal/auditory modality (inconsequentially, since nearly all
mammals vocalize). Protolanguage most likely did not develop from primate calls.
An alternative is that primate cognition played a crucial role; see Seyfarth and
Cheney (this volume); Hurford (2007, this volume). Hurford argues that human
conceptual structure derives from primate perceptual structure. The fundamentals
of modern cognition (concepts such as object permanence) probably evolved
before the split from the Pan genus; see Tallerman (2009), also Coolidge and
Wynn, this volume. Bickerton (1990: 91, 101) notes:
In all probability, language served in the first instance merely to label protoconcepts derived
from prelinguistic experience. [ . . . ] Protoconcepts which could serve as referents for nouns
and even verbs – nouns and verbs being the basic units from which other linguistic
categories are derived – were in place by the time the higher primates had developed.
protolanguage
487
Strikingly, however, Bickerton (2009a) denies that any species other than H. sapiens
had genuine concepts. Unlike mere categories, which other animals do have,
concepts involve offline thinking (thinking about an activity or entity you are
not currently engaged with) and displacement (e.g. imagining a leopard that’s not
present). Concepts have permanent storage in the brain which can be accessed
voluntarily (a lexicon). True concepts are triggered, in Bickerton’s view, by the
emergence of the earliest words (see also Boeckx, this volume).
Discussion of these two extremes—the idea that conceptual structure is in place
well before protolanguage emerged versus the view that concepts are impossible
without language—cannot be pursued here. However, researchers constantly invent subtle ways of examining animal cognition, producing new data to further the
debate.
51.4 H O L I S T I C
P ROTO L A N G UAG E
................................................................................................................
I emphasize critical distinctions between primate vocalizations and human vocabulary, because some recent speculation links primate calls directly to protolanguage. Mithen (2005, 2009) and Wray (1998, 2000, 2002a) assume that primate calls
are ancestral to human vocabulary:
Protolanguage would . . . be a phonetically sophisticated set of formulaic utterances, with
agreed function-specific meanings, that were a direct development from the earlier noises
and gestures, and which had, like them, no internal structure (Wray 1998: 51).
Primate calls are not compositional, but holistic: the entire call is the entire
message. Wray and Mithen, also Arbib (2005) and Fitch (2010a), propose that
protolanguage was also holistic. They reject the compositional account of protolanguage starting with discrete proto-words representing concepts, and then forming short, unstructured proto-word strings. In holistic protolanguage (HPL), each
utterance represents an entire proposition, with arbitrary form and a complex
meaning agreed by the community. Wray’s toy examples are tebima ‘give-that-toher’ and kumapi ‘share-this-with-her’; 2000: 294). The idea is that form/meaning
correspondences occasionally occur fortuitously; here, ma occurs in each string,
and the meaning ‘her’ also occurs in each. A speaker might assume that ma means
‘her’, and a ‘word’ for ‘her’ then ‘fractionates’ out of the non-compositional
sequence. Utterances are thus broken apart to form proto-words.
The HPL idea is highly problematic. First, for fractionation to succeed, holistic
calls must contain phonetic break-points (Studdert-Kennedy and Goldstein 2003),
and early hominins must notice them. But just like our own innate vocalizations
488
maggie tallerman
(laughter, crying, shrieks of fear, etc.), holistic primate calls contain no discrete
phonetic units. One element of proposed continuity thus disappears. HPL must be
physically very unlike primate calls, and proponents of HPL don’t explain how a
presumed complex phonetic system originates; how do holistic primate calls turn
into a discrete segmental system?
Tallerman (2007) also questions the assumption that calls would be long enough
to fractionate: instead of tebima, for instance, a signal in HPL might be simply
ma—far more likely, given the protosyllable account of MacNeilage (1998, 2008,
this volume). If each signal is short, there’s no material to break down, and the
account fails. Moreover, the HPL account requires early hominins to possess a
sophisticated compositional semantics (Johansson 2008), which, on the basis of
comparative biology and the archaeological record, seems improbable. On the
compositional account, semantics simply evolves in tandem with words. On
physical properties alone, then, HPL is unsupported.
Second, the processes turning a putative HPL into compositional language differ
completely from observed processes of language change; see Bybee; CarstairsMcCarthy; Heine and Kuteva, this volume. The bundle of effects termed ‘grammaticalization’ create syntactic constructions (such as the passive), produce new word
classes (such as adjectives), and form grammatical elements from content words
(for instance, creating auxiliaries and complementizers out of verbs). Heine and
Kuteva (2007, this volume) argue that grammaticalization is the only process that
could produce words of distinct classes from a protolanguage consisting initially of
noun-like items. As Bybee (this volume) notes, grammatical constructions are
overwhelmingly formed by composition when adjacent elements fuse, not by
breaking complex elements apart (cases like back-formation of edit from earlier
editor are much rarer). The large-scale deconstruction presumed in accounts of HPL
is unsupported by historical linguistics. Nor do language deficits support HPL:
grammatical breakdown in agrammatical aphasia has entirely different properties.
Third, consider the problem of counterexamples. Discussing Wray’s HPL
examples above, Tallerman (2007) surmises that many utterances might contain
ma but mean nothing to do with ‘her’, or might pertain to a female recipient but
not contain ma; the number of counterexamples would overwhelm positive examples. This intuition is confirmed via computational modelling (Johansson 2008;
K. Smith 2008). Johansson’s models vary the parameters of a toy HPL by various
factors, including total inventory of utterances, number of sound segments, number of meaning elements etc., and:
For all parameter combinations, the number of counterexamples were found to outweigh
the number of positive examples by a considerable margin. For no parameter combination
did the fraction of all predicates with more positive examples than counterexamples exceed
2% (Johansson 2008: 175).
protolanguage
489
Fourth, some accounts (Mithen 2005, Arbib 2005) suggest that highly complex
meanings could be inferred from holistic utterances. A. Smith (2008) observes that
meanings in HPL must be reconstructed purely from context, and while humans
can conceptualize simple, cognitively salient meanings, associated with basic-level
categories such as ‘dog’ and ‘chair’, it is hard to learn general categories such as
‘animal’ or ‘furniture’ contextually. Moreover, if we show a child a picture of a ‘dax’,
a mythical creature dancing on a table, she doesn’t assume that ‘dax’ means This-isa-dax-dancing-on-a-table, but rather, she associates the novel creature with the
label dax (Tallerman 2008a). The intricate, very specific, multi-propositional
meanings suggested by some proponents of HPL (see Tallerman 2007 for discussion) thus cannot be reconstructed from context in the first place, let alone
transmitted successfully between further individuals. A. Smith concludes that
‘Unitary, unstructured meanings can only reliably be associated with highly salient,
relatively simple meanings, as they must be reconstructable without any linguistic
cues’ (2008: 109). Complex propositions are thus unlikely to be associated successfully with holistic utterances.
Fifth, consider the use made of HPL. For Wray, its function was not informative,
but social and manipulative, like animal communication systems (adding further
continuity with primate calls). HPL delivers ‘subtle and complex social messages’
(Wray 2002a: 117), covering threats, greetings, and commands; a primitive compositional protolanguage, lacking grammar, would, Wray suggests, be too ambiguous
to function properly. (Though potential ambiguities in complex holistic message
strings are not seen as problematic.) However, Bowie (2008), in experiments using
restricted language systems, shows that a small compositional system (containing
just a few words) significantly enhances communication in novel situations; conversely, a semantically-fixed set of holistic signals is highly inflexible.
Moreover, early hominins don’t need a new system to deliver the kind of social
messages Wray suggests (Bickerton 2003, 2009a; Tallerman 2007). Our ancestors
had, as we still have, all the primate vocal and gestural features necessary: tears,
laughter, sighs, snarls, shouts of joy, cringing, plus biochemical signals; see Burling
(1993, 2005). HPL adds nothing to this innate repertoire (nor does language).
Language doesn’t replace primate signals, but adds an entirely different system
alongside them (sometimes literally: cries of pain are involuntary, yet often
expressed using language-specific vocalizations; cf. English ow! but French aie!).
Finally, proposals for HPL entirely disregard typical communicative attempts
made by apes in the lab. Far from being holistic, whole propositions, ape utterances
are typically short, perhaps two proto-words, crucially revolving around content
‘words’. Apes trying to communicate with humans produce the elements with most
meaning (noun-like and verb-like items), and ignore the rest. Their ancestors are
our ancestors too, and this strategy—concentrate on producing maximum meaning with minimum effort—likely utilizes the type of cognition early hominins
brought to protolanguage. HPL is thus too complex for our ancestors; instead,
490
maggie tallerman
protolanguage comprised short proto-word sequences of one or two items bearing
high meaning.
I conclude, then, that holistic protolanguage is linguistically untenable, and does
not achieve continuity with primate communication systems anyway. Primate cognition is more relevant in the continuity debate than primate vocal communication.
51.5 M U S I C A L
P ROTO L A N G UAG E
................................................................................................................
Some recent publications suggest a musical protolanguage as a precursor of HPL
(Mithen 2005, 2009, this volume; Fitch 2010a); see Botha (2009a) for critique. Fitch
focuses on ‘complex vocal imitation’ (vocal imitation, control, and learning; 2010a:
340). In other primates, few homologues to these vital features of spoken language
occur. However, complex vocal imitation occurs widely elsewhere. If the selection
pressures giving rise to learned ‘song’ in songbirds, seals, whales etc. also applied to
hominins, then novel vocal capacities in speech are a case of convergent evolution.
Problematically, though, sexual selection drives the evolution of animal song:
learned song is mostly produced by males, in courtship and defence of territory.
But both males and females possess speech. So how would ‘musical protolanguage’
spread from males to females? Fitch emphasizes social bonding or group cohesion,
as in other species where both sexes display complex vocal imitation. But there is
no evidence that learned vocalization evolved first in males and later spread to
females. Secondly, as is expected with sexually-selected characteristics, song arises
at puberty, thus highly unlike learned vocalization in humans. Moreover, learned
song shows seasonal peaks and is hormonally driven, again unlike speech (see
Gibson, Chapter 11). The biological perspective offers little evidence that sexual
selection drove human vocal learning.
From a linguistic perspective too, musical protolanguage is dubious. Fitch
suggests that a musical protolanguage contained ‘meaningless sung phrases of
complex phonological structure’ (2010a: 496), and that ‘the generative aspect of
phonology might have emerged before it was put to any meaningful use’ (2010a:
471). Fitch’s musical protolanguage is ‘bare phonology’, like non-lyrical song, and
initially lacked meaning entirely. But as noted above, selective pressures for contrastive phonology come from an expanding vocabulary; complex phonology
cannot evolve before vocabulary exists. Even allowing that Fitch really means
phonetic structure, this scenario still entails a massive evolutionary leap, as other
great apes have nothing comparable. More plausible scenarios are outlined by
MacNeilage; Studdert-Kennedy, this volume. Both stress the importance of vocabulary (i.e. of meaning) as a selection pressure in learned vocalization.
protolanguage
491
Fitch suggests that meaningless melodies subsequently become associated with
whole events, hence meanings are paired arbitrarily with musical ‘phrases’ (forming a holistic protolanguage). These utterances have complex, hierarchical structure (Fitch 2010a: ch. 14), subsequently exapted for syntax. But phrases in music or
animal song differ radically from syntactic phrases, which start with a semantic/
syntactic head that gains dependents—in evolution (Jackendoff 2002) as in child
language acquisition; see Tallerman, Chapter 48.
Musical protolanguage is thus an evolutionary cul-de-sac (since the musical
aspects must ultimately be abandoned for a word-based protolanguage), and
moreover, does not provide any observed features of full language.