Cognitive Linguistics Investigations

<DOCINFO AUTHOR ""TITLE "Cognitive Linguistics Investigations: Across languages, fields and philosophical boundaries"SUBJECT "HCP, Volume 15"KEYWORDS ""SIZE HEIGHT "240"WIDTH "160"VOFFSET "4">
Cognitive Linguistics Investigations
human cognitive processing is a forum for interdisciplinary research on the
nature and organization of the cognitive systems and processes involved in
speaking and understanding natural language (including sign language), and
their relationship to other domains of human cognition, including general
conceptual or knowledge systems and processes (the language and thought
issue), and other perceptual or behavioral systems such as vision and nonverbal behavior (e.g. gesture). ‘Cognition’ should be taken broadly, not only
including the domain of rationality, but also dimensions such as emotion and
the unconscious. The series is open to any type of approach to the above
questions (methodologically and theoretically) and to research from any
discipline, including (but not restricted to) different branches of psychology,
artificial intelligence and computer science, cognitive anthropology, linguistics,
philosophy and neuroscience. It takes a special interest in research crossing the
boundaries of these disciplines.
Editors
Marcelo Dascal, Tel Aviv University
Raymond W. Gibbs, University of California at Santa Cruz
Jan Nuyts, University of Antwerp
Editorial address
Jan Nuyts, University of Antwerp, Dept. of Linguistics (GER),
Universiteitsplein 1, B 2610 Wilrijk, Belgium.
E-mail: [email protected]
Editorial Advisory Board
Melissa Bowerman, Nijmegen; Wallace Chafe, Santa Barbara, CA;
Philip R. Cohen, Portland, OR; Antonio Damasio, Iowa City, IA;
Morton Ann Gernsbacher, Madison, WI; David McNeill, Chicago, IL;
Eric Pederson, Eugene, OR; François Recanati, Paris;
Sally Rice, Edmonton, Alberta; Benny Shanon, Jerusalem;
Lokendra Shastri, Berkeley, CA; Dan Slobin, Berkeley, CA;
Paul Thagard, Waterloo, Ontario
Volume 15
Cognitive Linguistics Investigations:
Across languages, fields and philosophical boundaries
Edited by June Luchjenbroers
Cognitive Linguistics
Investigations
Across languages, fields
and philosophical boundaries
Edited by
June Luchjenbroers
John Benjamins Publishing Company
Amsterdam/Philadelphia
8
TM
The paper used in this publication meets the minimum requirements
of American National Standard for Information Sciences – Permanence
of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data
Australian Linguistics Institute (4th : 1998 : University of Queensland)
Cognitive Linguistics Investigations : Across languages, fields and philosophical
boundaries / edited by June Luchjenbroers.
p. cm. (Human Cognitive Processing, issn 1387–6724 ; v. 15)
Chiefly revisions of papers presented at a 4th Australian Linguistics
Institute workshop, held in July, 1998, at the University of Queensland.
Includes bibliographical references and indexes.
1. Cognitive grammar--Congresses. I. Luchjenbroers, June. II. Title.
P165.A96 1998
415--dc22
isbn 90 272 2368 8 (Hb; alk. paper)
2005058866
© 2006 – John Benjamins B.V.
No part of this book may be reproduced in any form, by print, photoprint, microfilm, or
any other means, without written permission from the publisher.
John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands
John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa
JB[v.20020404] Prn:21/04/2006; 10:07
F: HCP15CO.tex / p.1 (47-104)
Table of contents
Preface
ix
Biographical information
xi
chapter 1
Introduction: Research issues in cognitive linguistics
June Luchjenbroers
1
Part I. Cultural models and conceptual mappings
chapter 2
When does cognitive linguistics become cultural? Case studies
in Tagalog voice and Shona noun classifiers
Gary Palmer
13
chapter 3
Purple persuasion: Deliberative rhetoric and conceptual blending
Seana Coulson and Todd Oakley
47
chapter 4
Depicting fictive motion in drawings
Teenie Matlock
67
chapter 5
Discourse, gesture, and mental spaces manoeuvers: Inside versus
outside F-space
June Luchjenbroers
87
JB[v.20020404] Prn:21/04/2006; 10:07

F: HCP15CO.tex / p.2 (104-151)
Table of contents
Part II. Computational models and conceptual mappings
chapter 6
In search of meaning: The acquisition of semantic structures
and morphological systems
Ping Li
109
chapter 7
Grammar and language production: Where do function words come from?
Joost Schilperoord and Arie Verhagen
139
chapter 8
Word recognition and sound merger
Paul Warren
169
Part III. Linguistic components and conceptual mappings
chapter 9
Verbal explication and the place of NSM semantics in cognitive linguistics
Cliff Goddard
chapter 10
“How do you know she’s a woman?”: Features, prototypes and category
stress in Turkish kadin and kiz
Robin Turner
189
219
chapter 11
Cross-linguistic polysemy in tactile verbs
Iraide Ibarretxe-Antuñano
235
chapter 12
How experience structures the conceptualization of causality
Maarten Lemmens
255
chapter 13
Internal state predicates in Japanese: A cognitive approach
Satoshi Uehara
271
JB[v.20020404] Prn:21/04/2006; 10:07
F: HCP15CO.tex / p.3 (151-168)
Table of contents 
chapter 14
Figure, ground and connexity: Evidence from Xhosa narrative
David Gough
293
chapter 15
Discourse organization and coherence
Ming-Ming Pu
305
Name index
325
Subject index
329
JB[v.20020404] Prn:29/11/2005; 9:40
F: HCP15PR.tex / p.1 (47-128)
Preface
The origin of this book was a workshop held at the University of Queensland, during the 4th Australian Linguistics Institute, in July 1998. Researchers from around
the world offered papers on a range of research topics of specific interest to the cognitive linguistics paradigm, and a number of those papers have been revised and
modified for this volume. Since that workshop several additional papers were also
sought from exciting researchers in the field, so that this monograph would capture the diversity of research activity from various parts of the world and across
a range of languages, relevant to the Cognitive Linguistics orientation toward
language and cognition.
My thanks to the many colleagues who volunteered their time to give Peer
reviews of the papers included in this volume (listed below). Without their help
this monograph would not have been possible. Also many thanks are due to the
contributors themselves, many of whom have tolerated countless delays and innumerable requests; their patience and good humour have made the task of collating
this monograph a satisfying experience. Thanks also to the editors of this series
and their reviewers; and a final thanks to the Centre for Language & Cognition
Groningen (clcg) Rijks Universiteit Groningen, where this manuscript was finally completed, as well as the Linguistics Department at the University of Wales,
Bangor for supporting my visit there.
List of guest reviewers
Michel Achard
French/Linguistics, Rice University, USA
Keith Allan
Linguistics, Monash University, Australia
Edith Bavin
Psychology, La Trobe University, Australia
Frank Brisard
Germanic Languages, University of Antwerp, Belgium
Wallace Chafe
Linguistics, University California at Santa Barbara, USA
Alan Cienki
Russian Studies, Emory University, USA
Hubert Cuyckens English Linguistics, Katoliek University Leuven, Belgium
Dirk Geeraerts
Linguistics, Katoliek University Leuven, Belgium
Ray Gibbs
Psychology, University California at Santa Barbara
Adam Glaz
Linguistics, University Marie-Curie Sklodowskiej, Poland
Andrej A. Kibrik Applied Linguistics, Lomonosov University, Russia
Ronald Langacker Linguistics, University California at San Diego, USA
JB[v.20020404] Prn:29/11/2005; 9:40

F: HCP15PR.tex / p.2 (128-128)
Preface
David Lee
Eric Pederson
Bill Raymond
Giesela Redeker
Wilbert Spooren
Mark Turner
English Linguistics, University Queensland, Australia
Linguistics, University Oregon, USA
Linguistics, University Columbus Ohio, USA
Communication, Rijks University Groningen, Netherlands
Dutch/Communication, Vrije University, Netherlands
Arts & Sciences, Case Western Reserve University, USA
JB[v.20020404] Prn:9/02/2006; 8:21
F: HCP15B1.tex / p.1 (47-133)
Biographical information
Seana Coulson – is an associate professor in the Cognitive Science Department
at the University of California, San Diego where she heads the Brain & Cognition
Laboratory. The author of Semantic Leaps: Frame-Shifting And Conceptual Blending In Meaning Construction, her research involves an interdisciplinary approach
to the study of communication and conceptual structure.
Cliff Goddard – works primarily in the natural semantic metalanguage (NSM)
theory originated by Anna Wierzbicka. He has published widely on cross-linguistic
semantics, ethnopragmatics, descriptive linguistics, and language typology. His
books include Semantic Analysis (OUP, 1998), Meaning and Universal Grammar
(co-edited with Anna Wierzbicka, Benjamins, 2002) and The Languages of East and
Southeast Asia (OUP, 2005). He is a full Professor in Linguistics at the University
of New England, Australia.
David Gough – is currently Head of the School of English Language at Christchurch
Polytechnic Institute of Technology, New Zealand where he has been for the past
5 years. Prior to this, David, a South African, was professor of Linguistics at the
University of the Western Cape, Cape Town. He has research interest and has
published in African linguistics, pragmatics and language and literacy education.
Iraide Ibarretxe-Antuñano (PhD Edinburgh, 1999) – is currently a lecturer in
Linguistics at the University of Zaragoza, Spain. She was a research fellow at UC
Berkeley (1999–2001), the International Computer Science Institute (2000–2001),
and the University of Deusto, Spain (2001–2003). She is especially interested in
issues related to cross-linguistic polysemy, constructions, semantic change, semantic typology, sound symbolism, metaphor and metonymy, perception, space
and motion.
Maarten Lemmens – is senior lecturer of English linguistics at the University of
Lille, France, where he teaches cognitive and English linguistics and English phonetics. His research centers around three main areas: (i) English lexical causatives
and their constructional alternations, (ii) a lexical semantic analysis of posture
verbs in Dutch, English and Swedish, and (iii) a typological study of the expression
of static location, as a complement to existing research on movement verbs.
JB[v.20020404] Prn:9/02/2006; 8:21

F: HCP15B1.tex / p.2 (133-206)
Biographical information
Ping Li – is Professor of Psychology and Cognitive Science at the University of
Richmond, USA. His main research interests are in the areas of psycholinguistics
and cognitive science. He specializes in crosslinguistic studies of language acquisition, bilingual language processing, and neural network modeling of monolingual
and bilingual lexical development.
June Luchjenbroers – received her PhD from La Trobe University in 1994, and
joined the Linguistics Department at University of Wales, Bangor in 1999 after
appointments with the Hong Kong Polytechnic University and the University of
Queensland. Her research involves Discourse Analysis from a cognitive linguistics
perspective, including gender and gestural analyses of video, discourse data.
Teenie Matlock – is founding faculty in Social and Cognitive Sciences at University
of California, Merced, and a visiting scholar in Psychology at Stanford University. An experimental psychologist and cognitive linguist, Matlock has published
numerous articles on conceptual structure and imagery in language, especially
non-literal spatial language.
Todd Oakley – is associate professor of English and Cognitive Science at Case
Western Reserve University in Cleveland, Ohio. His principle areas of scholarship
are in rhetoric, linguistics, and cognitive science. His interest in Cognitive Lingusitics dates from the early 90’s when he began investigating the conceptual basis
of rhetorical effect, a project that drew heavily on Langacker’s Cognitive Grammar
and Fauconnier’s Mental Spaces Theory. This project has since expanded to focus
on the relationship between attention and meaning construction in general, hence
its title, Elements of Attention: Explorations in Mind, Language, and Culture.
Gary B. Palmer – is Professor Emeritus at Nevada, Las Vegas. He is the author of
Toward a Theory of Cultural Linguistics (1996), translated as Hacia una Teoría de
la Linguïstica Cultural (2000) by Enrique Bernárdez. He co-edited Talking about
Thinking across Languages. Cognitive Linguistics 14/2,3 (2003) with Cliff Goddard
and Penny Lee, Cognitive Linguistics and Non-Indo-European Languages (2003)
with Eugene Casad, and Languages of Sentiment (1999) with Debra Occhi.
Ming-Ming Pu – is an Associate Professor of Linguistics at the University of
Maine, Farmington. She obtained her PhD in psycholinguistics from University
of Alberta, Canada. Her current research interests include cognitive linguistics,
comparative discourse analysis and Chinese linguistics.
Joost Schilperoord – is a psycholinguist with a special interest in cognitive and
rhetorical aspects of text production and communication processes. His research
focuses on regularities in language use derived from text analysis and experimentally elicited usage data. He is assistant professor at the Communication Department of Tilburg University, where he teaches statistics and text linguistics.
JB[v.20020404] Prn:9/02/2006; 8:21
F: HCP15B1.tex / p.3 (206-240)
Biographical information 
Robin Turner – teaches English at Bilkent University in Ankara, Turkey. His interests include cognitive, cultural and corpus linguistics, Turkish language and
culture, constructed languages, and computer programming
Satoshi Uehara – has a PhD in linguistics, from University of Michigan (1995).
He is professor of Japanese language and linguistics at Center for International
Exchange and Graduate School of International Cultural Studies, Tohoku University, Japan. He has also taught at University of Michigan and Wellesley College.
His areas of specialization are cognitive linguistics, linguistic typology, discourse
analysis, pragmatics, and Japanese and East and Southeast Asian linguistics.
Arie Verhagen – received his PhD in 1986 at the Free University in Amsterdam. He
has been teaching at the Free University, Utrecht University, and the University of
Leiden. He has been editor-in-chief of Cognitive Linguistics, from 1996 until 2004.
Since 1998, he holds the chair of Dutch Linguistics at the University of Leiden. Recent publications include Usage-Based Approaches to Dutch (co-edited with Jeroen
van de Weijer, LOT, 2003) and Constructions of Intersubjectivity (Oxford University
Press, 2005).
Paul Warren – is Associate Professor in the School of Linguistics and Applied Language Studies at Victoria University of Wellington, New Zealand. Paul’s primary
research interests are in psycholinguistics, in particular spoken word recognition
and the use of intonation in sentence processing. Since moving to New Zealand in
1994, he has combined these interests with a growing fascination in the development of New Zealand English.
JB[v.20020404] Prn:20/03/2006; 15:50
F: HCP1501.tex / p.1 (48-119)
chapter 
Introduction
Research issues in cognitive linguistics
June Luchjenbroers
University of Wales, Bangor
.
The cognitive linguistics agenda
Linguistics as a discipline aspires to capture the essence of communication, and
how language is processed in the human brain. The exact path to achieving this
aspiration however, has in past decades split into two major and substantially different approaches: the now, more traditional approach to language processing,
referred to as the ‘Formal’ or ‘Orthodox’ approach (cf. Langacker 1988), and the
Cognitive Linguistics approach. A significant point of contrast between these two
theoretical approaches lies in whether linguistic processes are deemed essentially
different from other cognitive processes, or not; and thus whether linguistic phenomena should therefore should be investigated separately (cf. Chomsky 1980;
Fodor 1983), or not.
Although the goal of the Formal, generativist paradigm has been to provide
cognitively oriented explanations rather than structural taxonomies, linguistics researchers from within the Cognitive Linguistics research community have brought
challenge to a range of fundamental elements of the Formalist’s approach to language and cognition. In particular, cognitive linguistics challenges whether the
brain is modular, as well as the role of logic and deduction as cognitive strategies
for information processing (e.g., Langacker 1987, 1990); whether language in the
brain is hardwired, as well as the validity of ‘mentalese’ (the supposed language of
the mind, thought to be propositional in structure and possess logical attributes –
cf. Fodor 1975; Pylyshyn 1984).
The Formalist paradigm has consistently reinforced the view that the representation of language is best seen as involving basic, symbolic building blocks and
rules; and further that those building blocks are also autonomously processed –
i.e., grammar is distinct from both the lexicon and semantics (cf. Newmeyer 1986;
JB[v.20020404] Prn:20/03/2006; 15:50

F: HCP1501.tex / p.2 (119-172)
June Luchjenbroers
Kempson 1991), and that semantics is distinct from pragmatics. However, researchers from within the cognitive linguistics community have repeatedly shown
how a full appreciation of individual linguistic units requires the researcher to
consider all parts of language analysis (cf. Fauconnier 1994; Lakoff 1987; Lakoff
& Johnson 1990; Talmy 1996). The papers of this volume have been collected to
illustrate how otherwise separate areas of linguistic concern can present a better
clarification of the linguistic distributions in which units are produced in talk; as
well as provide a deeper appreciation of the semantic richness of those linguistic
units, not captured by Formalist approaches.
The cognitive linguistics agenda is to work toward a cognitively real approach
to language processing; and for researchers from within the cognitive linguists
community that means making ourselves amenable to research from disciplines
outside the linguistics domain, such as psychology, A.I., Anthropology and philosophy, in addition to language related studies done within the linguistics spectrum.
The papers in this volume are also drawn from a number of areas from within
the cognitive sciences, to provide a more comprehensive appreciation of the multiplicity of the language units under investigation, as predicted and advocated by
the cognitive linguistics approach to language and cognition.
However, the full breadth of the cognitive linguistics agenda involves more
than identifying the nature of language processing, which in itself includes both
language production and comprehension processes, it also presupposes the more
primary concern of language categorization and representation in the mind. In
this volume a number of papers illustrate how our understanding of grammar
units are essentially semantic, and other papers are devoted to specifically clarifying the nature of conceptual structures.
Janda (2000) has also described the cognitive linguistics community as a group
of researchers who embrace a concatenation of core concepts and goals, and who
are emerged in the empirical observations of language behaviours across languages
and disciplines. This does not subsume a single philosophical perspective toward
the exact relation between language and mind; instead these core concepts capture the unifying principle that language, as representations in the mind and as
the product of cognitive events, reflects the interaction of cultural, psychological,
communicative and functional considerations.
. Outline of this volume
As promised in the title of this collection, the total body of papers presents research
across a variety of languages and language groups, as well show how particular elements of linguistic description draw upon otherwise separate aspects (or fields)
of linguistic investigation. The languages include European languages – Basque,
JB[v.20020404] Prn:20/03/2006; 15:50
F: HCP1501.tex / p.3 (172-217)
Research issues in cognitive linguistics
Dutch, Spanish and Turkish, as well as different varieties of English (American, Australian, New Zealand, and Old English); Asian languages – Chinese and
Japanese; Austronesian Languages – Malay and Tagalog; Bantu languages – Shona
and Xhosa; as well as a number of examples drawn from Australian Aboriginal languages and cultures, such as Dyirbal and Western Australian communities. Despite
possible differences in philosophical approach to the role of language in cognitive
tasks, and differences in the methodology used as an avenue for linguistic investigation, these papers are similar in a fundamental way: they all share a commitment
to the view that human categorization involves mental concepts that have fuzzy
boundaries and are culturally and situation-based.
The selection of papers within this volume all concern how language comprehension and production involve conceptual mappings between varying domains
of cognitive function. The three thematic subsections captured in this collection
include (a) conceptual mappings involving cultural models. These involve specific types of knowledge that impact and sculpt the language outputs produced
in talk. The second subsection (b), deals with computational models that emulate and hypothesize different features of the cognitive programming dealing with
morphology, grammar, and sociolinguistic variation; while the third subsection
of papers (c), focuses on specific components of linguistic description: semantics,
grammar and discourse.
A very appropriate start to the first subsection, and to this volume, is the
paper by Gary Palmer, “When does cognitive linguistics become cultural? Case
studies in Tagalog voice and Shona noun classifiers” (Chapter 2). In this paper,
Palmer outlines important fieldwork in which important theoretical concerns
about grammatical representation and processing are dealt with. He argues for the
cognitive and semantic underpinnings of grammatical phenomena in the form
of ‘cultural schemas’. Evidence for his argument is provided by cross-linguistic
data (from Dyirbal, Tagalog, and Shona), to illustrate how many lexical domains and grammatical constructions link either directly or indirectly to significant cultural models. Well known concepts from the cognitive sciences, such
as ‘scenarios’ from Artificial Intelligence and psychology, and ‘Idealized Cognitive Models’ from linguistics, are incorporated in his treatment of grammatical
voice and noun classifiers, which are presented as extraordinary polycentric categories that provide the key to understanding the discourse of these language
communities.
After Palmer’s consideration of the role of culture (and thus experience)
in explaining linguistic structure, the first thematic subsection continues with
three other papers dealing with how different linguistic choices are manifest by
each speaker’s conceptual representations of the world – Coulson & Oakley;
Matlock; and Luchjenbroers. These papers, each drawing on different methodologies (discourse, experiment, and gesture), deal with different aspects of con-

JB[v.20020404] Prn:20/03/2006; 15:50

F: HCP1501.tex / p.4 (217-271)
June Luchjenbroers
ceptual representation: Coulson & Oakley’s paper deals with conceptual blending; Matlock’s paper with how information in memory is manifest in lexical
retrieval; and Luchjenbroers deals with how cognitive strategies are evident in
conversational gesture.
In the chapter by Seana Coulson and Todd Oakley, “Purple persuasion: Deliberative rhetoric and conceptual blending” (Chapter 3), the authors consider semantic structure in the form of ‘Conceptual Integration Theory’ (‘Blending Theory’).
In their paper, the authors illustrate how blending is recruited in persuasive discourse. The data used include an email message encouraging people to vote in a US
congressional election, and a church letter sent to encourage monetary donations
to that church. With excerpts from these data, the authors show how simplified input models are blended to form integrated event scenarios, and how the strategic
choice of input frames can provide a writer (or speaker) with the means to encourage a particular construal of events that will likely result in the target action(s).
Coulson and Oakley argue that persuasion depends on ‘objects of agreement’,
and the strategic choice of inputs to create a convincing blend will promote the
perception of such agreement.
The following chapter (4), “Depicting fictive motion in drawings”, by Teenie
Matlock, puts Len Talmy’s proposed, ‘fictive motion’ (1996) to the test, and
thereby also cognitive theory dealing with conceptual representation and language
processing. In this paper Matlock deals with motion verbs, and asks whether
fictive motion plays a role in their comprehension. With a number of drawing
experiments, she uncovers reliable evidence of a link between motion verbs and
the mental simulation of the action conveyed by the verb: a link that involves
a mentally simulated traversal or scanning of a trajectory. For example, manner
information (such as slow, fast, or neutral) is depicted with longer, thinner or
straighter lines for fast verbs than for slow verbs. The results given from three
experiments challenge many traditional approaches to lexical representation, and
provide strong evidence that comprehension taps into knowledge acquired from
embodied experience.
The final paper of this subsection (Chapter 5), “Discourse, gesture, and mental
spaces manoeuvers: Inside vs. outside F-space”, by June Luchjenbroers, investigates
the dynamics of conversational gesture in terms of the physical space in which
they occur during discourse. That space, also called the ‘comfort zone’ or ‘F-space’,
is where speakers produce most of their gestures during discourse, and Luchjenbroers argues that speakers convey added meaning, relevant to mental spaces
navigations (i.e., movements around conceptual structure), when they choose to
locate their gestures inside the boundaries of that space, or when they physically
stretch to place a gesture outside it. The examples offered in this paper also illustrate how a speaker’s choice of gesture can amplify, and sometimes supplement
information provided by the lexical component; they also show how the loca-
JB[v.20020404] Prn:20/03/2006; 15:50
F: HCP1501.tex / p.5 (271-324)
Research issues in cognitive linguistics
tion of a gesture in relation to a speaker’s F-space conveys role relations relevant
to the subject-matter being discussed. As such, a speaker’s gestural F-space can
be an important source of information for all discourse participants to establish,
navigate and disambiguate the many mental spaces that may be required during
discourse.
These chapters are then followed by a new thematic subsection, that brings together research dealing with different computational models of the human cognitive system. These papers discuss different computation models for describing cognitive processes associated with the mental lexicon, in relation to morphology (Li);
grammar (Schilperoord & Verhagen); and the phonological system (Warren).
The paper by Ping Li (Chapter 6), “In search of meaning: The acquisition of
semantic structures and morphological systems”, presents a very different approach
to cognitive processing, in that he utilizes computational models in the form of
a connectionist network. In this paper Li challenges the Formalist assumption
embraced by many areas in the cognitive sciences that language is best seen as
involving basic, symbolic building blocks and rules. Using child language acquisition data, and in particular parental speech from the childes database, Li begins
with the observation that young children learn word meanings by exploiting contextual information in the input; thus, lexical categories can be acquired by the
computation of statistical regularities involving multiple constraining factors, and
meaning is the emergent property of that process. The major part of this paper,
however, is his consideration of a puzzle involving a ‘cryptotype’, in the form of
the reversive prefix ‘un-’. The un- problem is described as essentially semantic for
which there seems to be no regular rule to govern its use – e.g., we can ‘untie’ a
bow but not ‘unmove’ a desk. Li’s study illustrates how the semantic features that
unite different members of a cryptotype are represented in a complex distributed
fashion (where feature overlaps occur across categories); a process that is accessible
to native intuition but appears to defy traditional symbolic analysis.
In chapter Seven by Joost Schilperoord and Arie Verhagen, “Grammar and
language production: Where do function words come from?”, the authors deal with
the characterization of linguistic knowledge, in particular, organizational features
of the mental lexicon and mental grammar. The practical application of this bigger picture issue is to ask the question, “how are function words selected during
language production?”. In this quest, the authors first offer a theoretical consideration of language production models and the predictions that result from them.
This is then followed by a usage based consideration of function words (prepositions and articles) and pauses, as they appear in the production of Dutch, oral
dictations of routine business letters. The authors use cognitive linguistic views
on the nature of linguistic knowledge to explain the evidence they have obtained
regarding function words and how they are cognitively processed. In particular,
they call into question assumptions in the literature that function words are stored

JB[v.20020404] Prn:20/03/2006; 15:50

F: HCP1501.tex / p.6 (324-378)
June Luchjenbroers
independently of their lexical heads, and whether there is a principled difference
between functional and lexical words in the mental lexicon.
In the final paper of this subsection (Chapter 8), “Word recognition and sound
merger”, by Paul Warren, language processing models are again considered, although in this case, the field of research deals with comprehension in the form
of psycholinguistic models of spoken word recognition. Warren questions how
the human recognition system copes with phonetic variability across inputs: a
matter of key interest for cognitive and computational theories dealing with how
linguistic units (words and phones) are represented and processed for talk. The
primary focus of this paper is a phenomenon Warren refers to as (word) ‘sound
merger’, as in New Zealand ear/air neutralization. In NZ English merger occurs
when two originally, phonologically distinct words progressively loose phonetic
contrast, to become homophones; a progression that can be partial or complete.
He then considers strong sociolinguistic literature addressing this phenomenon in
New Zealand English to give evidence that merger is definitely in progress. These
studies provide the corpus data to consider frequency and context effects, as well
as social variables such as age difference, as predictors of when sounds merge and
when not. Warren suggests that aspects of the sentential and extralinguistic context will resolve homophone ambiguity in the case of merged ear and air forms
just like they do for other homophones.
The final subsection of papers in this volume deal specifically with different
and sometimes overlapping aspects of linguistic description: semantics, grammar
and discourse. The first paper in this subsection, by Goddard, has many features
in common with the first paper in this volume (Palmer), in that it also deals
with cultural models, computational arguments, and semantic structure. However, Goddard presents a slightly different orientation to the earlier papers, in that
he focuses on not only conceptual representations of lexical entries and the semantic relations they involve, but is also concerned with key aspects of the cognitive
linguistics theory itself, in terms of the intellectual contribution made to the field
by Anna Wierzbicka.
In his paper, “Verbal explication and the place of NSM semantics in cognitive linguistics” (Chapter 9), Cliff Goddard considers areas of cognitive linguistics endeavour compatible with or anticipated by Wierzbicka’s approach
to conceptual structure. However, the main core of Goddard’s paper is to argue, with examples from Aboriginal cultures, Malay, English and Japanese, that
the verbal explication of conceptual categories and lexical entries is indispensable to the field of cognitive linguistics, and to illustrate that diagrams cannot stand alone without verbal support. In fact, Goddard argues that diagrams
often rely on complex culture-specific iconographic conventions (to be interpreted), and only a fine-grained approach to verbal explication can the subtle
nuances of abstract, culture-rich vocabulary be dealt with. Any theorist who re-
JB[v.20020404] Prn:20/03/2006; 15:50
F: HCP1501.tex / p.7 (378-442)
Research issues in cognitive linguistics
searches how linguistic and language-relevant information is cognitively stored,
retrieved and illustrated, as well as how analysts can illustrate their representations, must also make theoretical decisions concerning the issues raised in
this paper.
This argument is a very relevant and important to bear in mind with the
papers collected in the final subsection of this volume that deal with specific components of linguistic description: semantic analyses (Turner; Ibarretxe-Antuñano;
and to some extent Lemmens); grammatical choices (Lemmens; and Uehara); and
finally discourse in the form of narrative (Gough; and Pu).
Many of the component arguments raised and dealt with in these papers also
have resonance with earlier papers placed in other subsections. For example, in
chapter Ten, “‘How do you know she’s a woman?’: Features, prototypes and category
stress in Turkish ‘kadın’ and ‘kız’” by Robin Turner, a number of concepts raised by
Palmer (this volume) are considered with Turkish data, including Noun classification, story schemas and scenarios as well as prototype effects. In this paper, Turner
asks the question relevant to the Turkish choice of ‘kız’ (‘girl’) or ‘kadın’ (‘woman’)
as a descriptor of an adult woman, “When is a girl a woman?” Using descriptive elements from componential semantics (i.e., + or – some semantic feature)
Turner nevertheless illustrates the ‘fluid’ nature of meaning, and that category
membership is not absolute; descriptive components like [+virgin] are merely
convenient for naive descriptions because it fits the minimum criteria for the prototype of a lexical entry, such as ‘kız’. A number of different approaches to lexical
semantics are considered, including Palmer’s (1996) view that categorization is influenced by scenarios that define sequences of (expected) states and actions. An
important contribution made by Turner’s paper is the concept of ‘category stress’,
which occurs when there is a disparity between the results of feature-based and
prototype-based categorizations. This stress has a direct impact on how users deal
with category membership in production as well as comprehension.
Complementing Turner’s research, the following paper “Cross-linguistic polysemy in tactile verbs” (Chapter 11) by Iraide Ibarretxe-Antuñano, looks at how
the semantic content of the tactile verb ‘touch’ in three genetically unrelated languages (Basque, Spanish and English) interacts and contributes to the creation of
semantic extensions, while taking into account the different lexicalization patterns
needed to convey the different senses this tactile verb can convey. The resulting
polysemy is explained in terms of different experiential domains, triggered by the
different senses of this verb, such as the mapping onto emotions, as well as other
semantic fields.
Even though the following chapter (12) by Maarten Lemmens, “How experience structures the conceptualization of causality”, is in principle about syntactic
choices, it also deals with lexical semantics. In this paper he focuses specifically on
verbs of ‘killing’, such as ‘suffocate’, ‘choke’ and ‘kill’. Variations in the conceptual-

JB[v.20020404] Prn:20/03/2006; 15:50

F: HCP1501.tex / p.8 (442-493)
June Luchjenbroers
ization of the different causative events are considered, with regard to which verb
of ‘killing’ is chosen and the consequences that choice has for the selection of syntactic pattern in which it is to appear. His consideration includes case categories,
such as Agent, Affected, Goal and Instigator, and their significance for transitive
vs. ergative syntactic choices. For example, he argues that a more volitional participant who is engaged in some causative process, is more likely to be represented
as a volitional Actor in a transitive construction. Lemmens’ research uses Old English, corpus data, and goes beyond description to focus on the experiential bases
for a speaker’s choice of verb within a specific semantic class.
Lemmen’s paper on syntactic choices is then followed by Satoshi Uehara’s
cognitive grammar paper “Subjective predicates in Japanese: A cognitive approach”
(Chapter 13). Here again, like several earlier chapters, the discussion of grammatical elements involves semantic concepts, in this case feelings and emotional
reactions. Uehara’s main interest in this paper is subjectification, and the construal
of the speaker (i.e., the conceptualizer), to explain the use of particular grammatical elements in discourse – e.g., account for the use of the nominative particle -ga
with grammatical objects. Uehara’s many examples illustrate his claim that subjective predicates in Japanese can best be characterized as ‘deictic’ as they profile
the object of conception from the vantage point of the speaker.
The final two papers of this collection both deal with narrative. The first
by Dave Gough, “Figure, ground and connexity: Evidence from Xhosa narrative”,
(Chapter 14). This is a usage-based study of folk narrative discourse, which is
the stimulus to show how discourse factors, pragmatic and cognitive processing
should be described in terms outside language itself. Like Palmer, in chapter Two,
he argues that grammatical terms like ‘mood’ and ‘tense’ refer to quite diverse
verbal categories; and similarly, like Pu, in the following chapter, he uses a functionally based account of narrative discourse, with categories such as ‘foregrounding’ and ‘backgrounding’, in addition to the more general process of ‘grounding’,
and ‘connexity’ (or ‘dependence’), to reveal systematic (conceptual) organization.
His ultimate claim is that the concepts of ‘grounding’ and ‘connexity’ are fundamental to the organisation of the Xhosa verbal system and further that verbal
forms, referred to as the participial, consecutive and indicative moods as well as
the so-called ‘continuous tense’ are structured around those concepts.
In the final chapter in this volume, “Coding events in oral and written discourse”
(Chapter 15), Ming-Ming Pu also investigates discourse, although her focus is on
discourse organization in terms of thematic structure and information units. In
particular, Pu examines episodic structure and how speakers relate events within
and between them. This is followed by a consideration of the relation between
spoken and written narratives, as well as universality in narrative production. Pu
uses narrative data that was produced by English and Mandarin Chinese speakers,
drawn from a children’s picture book. Her research is part of a larger tradition that
JB[v.20020404] Prn:20/03/2006; 15:50
F: HCP1501.tex / p.9 (493-582)
Research issues in cognitive linguistics
sees conversations and written texts as more than unordered strings of utterances;
instead she argues for structures with levels of organization that require conceptual management. Pu’s study provides further evidence of the cognitive constraints
upon speakers to accommodate their addressee’s processing needs by signaling
discourse units and prompting the retrieval of information.
A wide range of language issues are relevant to cognitive linguistics research and is
reflected in the collection of papers included in this volume. The now traditional
cognitive linguistics areas include: lexical semantics, cognitive grammar, metaphor
and prototypes, pragmatics, narrative and discourse, and computational models.
In this volume however, these general concerns have been considered in harmony
with other important fields including: language acquisition, language and culture,
video data analysis and gesture, Blending Theory, fictive motion and others. Devising an order for these papers, or my summation of them for this chapter was
made all the more difficult because they all illustrate how a full appreciation of particular elements of linguistic description, and the cognitive processing involved in
their use, requires a synthesis of different (and traditionally separate) areas of linguistic investigation; and that aspects of situated meaning and cultural semantics
are relevant to the cognitive processing of language phenomena, and should not
be divorced from them.
References
Chomsky, Noam (1980). Rules and Representations. Oxford: Basil Blackwell.
Fauconnier, Gilles (1994/1985). Mental Spaces: Aspects of Meaning construction in Natural
Language. Cambridge, UK: CUP.
Fodor, Jerry A. (1975). The Language of Thought. New York: Crowell.
Fodor, Jerry A. (1983). The Modularity of Mind. Cambridge, MA: MIT Press.
Janda, Laura (2000). Cognitive Linguistics. Paper presented at SLING2K Workshop.
Kempson, Ruth (1991). The Language Faculty and Communication. Reading materials, 1991
Linguistics Institute, Univ. of California at Santa Cruz.
Lakoff, George (1987). Women, Fire, and Dangerous Things: What categories reveal about the
mind. Univ. Chicago Press.
Lakoff, George & Mark Johnson (1990). Philosophy in the flesh. New York: Basic Books.
Langacker, Ron (1987). The Cognitive Perspective. CRL Newsletter. Vol. 1(3). UC, San Diego.
Langacker, Ron (1988). An Overview of Cognitive Grammar. In B. Rudzka-Ostyn (Ed.), Topics
in Cognitive Linguistics. Amsterdam: Benjamins.
Langacker, Ron (1990). The Rule Controversy: a Cognitive Grammar Perspective. CRL
Newsletter, 4(3). University of California, San Diego.
Newmeyer, F. J. (1986). Linguistic Theory in America: The first Quarter-Century of Transformational Generative Grammar. New York: Academic Press.
Palmer, Gary (1996). Towards a Theory of Cultural Linguistics. Austin: University of Texas Press.

JB[v.20020404] Prn:20/03/2006; 15:50

F: HCP1501.tex / p.10 (582-592)
June Luchjenbroers
Pylyshyn, Zenon (1984). Computation and Cognition: Towards a Foundation for Cognitive
Science. Cambridge, MA: MIT Press.
Talmy, Len (1996). Fictive motion in language and “ception”. In P. Bloom, M. A. Peterson, L.
Nadel, & M. F. Garrett (Eds.), Language and space (pp. 211–276). Cambridge, MA: MIT
Press.
JB[v.20020404] Prn:1/12/2005; 10:40
F: HCP15P1.tex / p.1 (47-73)
 
Cultural models and conceptual mappings
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.1 (47-109)
chapter 
When does cognitive linguistics
become cultural?
Case studies in Tagalog voice
and Shona noun classifiers
Gary Palmer
University of Nevada at Las Vegas
In cultural linguistics, grammar is seen as governed by cultural schemata rather
than universal innate or emergent cognitive schemata. Sources of linguistically
determinant schemata include mythology, social structure, repetitive domestic
and subsistence activities, and salient rituals. Two noteworthy types of cultural
schemata are scenarios, which model social action and discourse, and polycentric
categories, which elaborate the complex and radial category types of Langacker
(1987) and Lakoff (1987). These concepts will be demonstrated in two case
studies: In Tagalog, an Austronesian language, grammatical voice used in
emotional expression expresses elementary scenarios of control and non-control.
In Shona, a Bantu language, noun classifiers are governed by polycentric
categories pertaining to salient domestic and ritual scenarios.
Keywords: categories, Bantu, Austronesian, scenarios, cultural linguistics
.
Introduction1
Ronald Langacker (1999: 13) has noted that “language is an essential instrument
and component of culture, whose reflection in linguistic structure is pervasive and
quite significant” (1999: 16). This observation provides an excellent starting point
for cultural linguistics, an approach which foregrounds cultural schemata in explanations of grammar and semantic patterns (Palmer 1996). In this respect, it contrasts with the typical practice of cognitive linguistics, which foregrounds universal
cognitive processes such as figure-ground relations, force dynamics, emergent categories, and Idealized Cognitive Models, leaving cultural dimensions of language
somewhere in the background, or at least unlabeled as such. Cultural linguistics is
not so much a new theory as a shift in emphasis. It draws on the theory of cogni-
JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.2 (109-174)
Gary Palmer
tive linguistics for many essential analytical concepts, but it takes a point of view
from the margin of cognitive linguistics as it is typically practiced. It is an extension of cognitive linguistics into cultural domains, as foreshadowed in the writings
of Langacker (1987, 1991a, b), Lakoff (1987), and others.
Specifically, I am claiming that many grammatical phenomena are best understood as governed by cultural schemata rather than universal innate or emergent
cognitive schemata. The sources of such cultural schemata include mythology,
such as the Australian Dyirbal myth of the sun and moon, which George Lakoff
used to explain membership in Dyirbal noun classes (Lakoff 1987). They also include social structure, repetitive domestic and subsistence activities, salient rituals,
and a host of other cultural phenomena. This cultural emphasis makes it essential
that the linguist either produce or survey ethnography pertaining to the linguistic
topic under study. As Mylne (1995) argued in a critique of Lakoff ’s interpretation
of Dyirbal classifiers, linguists can not rely solely upon their own intuitions about
the semantics of complex domains, but should instead attempt to discover which
concepts have particular relevance for speakers.
Unlike postmodernist cultural theory, which posits no fixed points of reference
or stable meanings, cultural linguistics depicts grammar as an entrenched system
of meaning and form. Following Langacker’s (1987, 1991a, b, 1999) theory of cognitive linguistics, the minimal units of grammar are verbal symbols, each of which
represents a linkage of two kinds of units, one phonological, the other semantic.
Semantic units are characterized relative to semantic domains (1987: 63). Since
these may include any concept or knowledge system, linguistic semantics is encyclopedic and very much a cultural entity. When a class of linguistic expressions is
seen as relative to one or more semantic domains of relatively extensive scope with
complex category structures and rich details, then cognitive linguistics becomes
decidedly cultural. It is this difference in emphasis and elaboration of the cultural
dimension, not an underlying difference in theory, which justifies the new label of
cultural linguistics. The label also differentiates the approach from that of contemporary linguistic anthropology, which is typically discourse-oriented and heavily
invested in pragmatism and political economic or feminist theory, often displaying
scant interest in cultural categories or cognitive processes. In my view, culture and
cognition are not separate entities, just two views on the process whereby people
with minds, which are embedded in physical bodies situated in social and physical environments, communicate, learn, think, and pursue social goals. Similarly,
Edwin Hutchins (1996: 354) proposed an integrated view of human cognition, “in
which a major component of culture is a cognitive process . . . and cognition is a
cultural process.”
Certain types of cultural models merit special attention from linguistic anthropologists and culturally oriented linguists. These are scenarios (including
discourse scenarios) and polycentric categories. The use of these concepts will be
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.3 (174-222)
When does cognitive linguistics become cultural?
demonstrated in two case studies (1) voice and emotional expression in Tagalog,
an Austronesian language; and (2) noun classifiers in Shona, a Bantu language.
The first case will deal with elemental scenarios underlying grammatical voice in
the emotion language that appears in a Tagalog video melodrama dealing with
a couple living in transnational circumstances. I will show how the highly abstract scenarios underlying voice are instantiated in the emotional discourse of
melodrama and provide the key to understanding that discourse. In the case the
study of Shona, I demonstrate that a better understanding of noun classifiers can
be achieved by analyzing each classifier as a polycentric category. The latter is a
synthesis of Langacker’s (1987) concept of complex category with Lakoff ’s (1987)
concept of radial category. Unlike the radial category, which has a single central
prototype category, a polycentric category has multiple central categories connected by conceptual metonymies. In the next section I will elaborate on these
concepts. Then, in the following sections, I will apply them to the case studies.
. Operational concepts
Scenarios
Scenarios are schematic cultural models of action. Cultural linguistics is based on
the premise that grammar is relative to cultural models and culturally defined
imagery. Cultural models are cognitive entities, but they are often more richly
elaborated and further removed from basic physical and cognitive experience than
the spatial-mechanical schemas and figure-ground relations typically investigated
within cognitive linguistics. Examples of cultural models include the conventional
knowledge systems governing kinship, ways of preparing food, navigation, rituals,
myths, ceremonies, games, and speech events such as conversations. Imagery arises
from construing models at different levels of abstraction, from different points of
view, or at different stages in a process,2 and from admitting various features of
models within the scope of attention (Langacker 1987; Lakoff 1987; Palmer 1996).
Cultural models include some, but perhaps not all, of what Lakoff (1987: 113–
114) termed Idealized Cognitive Models, in which he included propositional,
image-schematic, metaphoric, and metonymic models. Universal image-schemas
derived solely from the common experience of inhabiting a human body would
not in themselves be cultural models. However, universal image-schemas may be
incorporated into cultural models, and in fact most physical experience reflects
not only universal constraints, but also cultural modifications or culturally specific uses of tools, dwellings, and habitats. Embodied universal categories may
simultaneously belong to cultural domains.

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.4 (222-259)
Gary Palmer
With respect to metaphoric and metonymic models, it seems more accurate
to speak of metaphoric relations between models or parts of models, or to say
that models comprise functional relations, which provide the material for verbal
metonymy. But again, these distinctions are not theoretically crucial so long as cognitive linguistics provides a role for cultural constraints on grammar, as Langacker
and Lakoff have done. It is useful to explicitly recognize the elements of convention and social construction by referring to some kinds of linguistically significant
models as cultural, while conceding that all cultural models are also cognitive.
Most ICMs are cultural products, and the same may be said for domains of experience (Lakoff 1987). Thus, it seems appropriate to refer to an approach which
examines such cultural constraints on language as cultural linguistics. By using the
term, we make it obvious that existing ethnographic studies contain a wealth of
information of potential immediate use to linguistic theory.
Relatively abstract or decontextualized images are called schemas or imageschemas. Those involving actions and sequences of actions are scenarios. The
scenario concept is particularly important in cultural linguistics because the term
directs attention to the imagery of social action and discourse, which has largely
been overlooked by cognitive linguistics, particularly in the study of non-IndoEuropean languages. The reason for this neglect may lie in the fact that scenarios
are strongly influenced by history and socio-cultural context and therefore relatively independent of more basic cognitive processes of attention, accessibility or
saliency of information, and basic concept formation which many linguists regard
as the strongest determinants of grammar. It is true that Langacker (1987: 63) included as possible semantic domains “the conception of a social relationship” and
“the speech situation”, but at the very least, one can say that social scenarios have
not been clearly delineated as a type of imagery having linguistic significance to
the same extent as, for example, spatial imagery. And yet, humans probably direct
as much verbal attention to orienting in society as they do in space, if not more.
Not all of this social orientation can be reduced to metaphors of force and space.
The approach pursued here resembles that of Anna Wierzbicka in that her cultural scripts are something like scenarios (Wierzbicka 1996, 1997; Palmer 2000).
However, unlike Wierzbicka, I do not reduce scenarios to statements composed of
a small set of semantic primes arranged according to the rules of a semantic metalanguage. I take scenarios to be gestalts or constructions built up from lower-level
scenarios and event-schemas.
Discourse scenarios and discursives
The discourse-relevant content of forms and constructions is not always obvious. Much attention has been devoted to discourse particles, but verbs or verbal
morphology may also predicate information pertaining to discourse and human
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.5 (259-333)
When does cognitive linguistics become cultural?
interaction, notably information on the agency of actors or interlocutors, as I will
show in the Tagalog case study.
Cultural linguistics approaches discourse by following two principles: (1) part
of the meaning of every lexeme or construction is its habitually situated use in
discourse; (2) discourse is governed by scenarios of verbal and social interaction.
The first principle follows from Langacker’s premise that “any facet of the context
[of an usage event] that consistently recurs across a set of usage events can be retained as a specification of the schema that emerges from them” (Palmer 1996: 40;
see also Langacker 2001). The usage principle may seem obvious, but the implications for cognitive linguistics have not been clearly drawn. Of course it means that
discourse follows culturally specific patterns and sequences, but it also means that
most discourses consist partly of verbal particles, lexemes, and longer utterances
whose predicational content is the discourse itself, meaning its participants, verbal
events, and prosodic qualities.
Since verbal discourse is so pervasive in human life, much of the lexicon
and grammar of any language must be about discourse scenarios. Thus, we have
metadiscursive terms and expressions like lie, gossip, shut up, be attentive, and be
on the stump (give speeches in a political campaign). The domain of terms and
expressions that predicate discourse scenarios includes that of speech act terms,
but it is more comprehensive. For example, the construction be attentive, is not,
strictly speaking, a speech act, but it does predicate a construal of one aspect of a
discourse scenario.
Terms whose main function is to predicate some aspect of ongoing discourse
in which the speaker is engaged may be termed discourse indexicals, or just discursives (Palmer 1996: 207).3 These would include discourse particles such as English
um, oh, and uh huh, Japanese yo, some tag questions, and English like when used as
a presentative or quotative (e.g. She was like [quote, pseudoquote or experiential
state]). The so-called discourse particles are seen not as mere non-propositional
forms (Stubbs 1983), non-referential indexicals (Silverstein 1976), conversational
reflexes, pointers, meaningless elements, or strategic moves (Clark 1996) that are
qualitatively different from other terms, but as terms that predicate much as other
terms do. They are verbal symbols whose semantic domain happens to be the
ongoing and ambient discourse itself as performed by both speaker and listener.
Thus, discursives may even be evaluative, as when English So? is used to question
the significance of a preceding statement and is riposted with a Sooo?! that sarcastically questions the validity of the original question. Discursives may pertain
to situation, interactional structure, pragmatic intensions, ideological content, or
phonological shape of discourse. Since each culture develops its own unique discourse imagery, this is a potentially important topic in cultural linguistics.4 Many
other terms and expressions may be said to have discourse indexicality or discursiveness as a peripheral part of their meaning (compare Langacker 1987: 63).

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.6 (333-370)
Gary Palmer
elaboration
PROTOTYPE
SCHEMA
elaboration
extension
VARIANT
Figure 1. Complex category as envisioned by Langacker (1987)
Langacker (1991b: 318) defined the term ground as “the speech event, its
participants, and its immediate circumstances . . . .” Since discursives predicate
about ongoing discourse, which is necessarily part of the grounding situation,
one might theorize that they will sometimes predicate speakers’ perspectives. One
could investigate their distribution across the dimension of subjectivity-objectivity
(Langacker 1990). A participant may take a subjective perspective on the speech
event, in which case she herself lies outside the perceptual field; or she may construe the event and her own role in it objectively, in which case she herself lies
within the perceptual field. Japanese yo, for example, has a sense something like
I am telling you or pay attention to what I just said, but the participants are tacit,
suggesting a subjective perspective and a focus on the discourse events rather than
the participants, whereas an English tag question, such as “Am I right?” with an
explicit pronoun for speaker, is a discursive suggesting an objective perspective
on speaker in Langacker’s sense. The topic of discursives will not be discussed
further in this paper, but I mention it as meriting further cross-linguistic and
cross-cultural study.
Categories: Complex, radial, and polycentric
Cognitive linguistics presents us with at least two types of complex categories. The
first is Langacker’s, which he characterizes simply as a complex category (Langacker
1987: 373; see also Palmer 1996: 96–97). It begins with a prototype and a variant.
Since these necessarily have something in common, there is also a schema, which is
elaborated by both the prototype and the variant (Figure 1). Langacker’s complex
category appears to have no place for conceptual metonymy.
Another kind of complex category is the radial category as described by Lakoff
(1987). A radial category has a central subcategory and non-central extensions or
variants. This is very much like Langacker’s model, except that Lakoff does not include the schemas which can be abstracted from each extension of the prototype to
a variant. In his discussion of Dyirbal noun classes, Lakoff also states that “complex
categories are structured by chaining; central members are linked to other members, which are linked to other members, and so on” (1987: 95). Some of the links
which he describes are conceptual metonymies (the sun is linked to sunburn);
others are similarities (sunburn is linked to the sting of the hairy mary grub),
or prototype to variant relations (women to the sun, who is a mythical woman).
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.7 (370-418)
When does cognitive linguistics become cultural?
Figure 2. Radial category balan as envisioned by Lakoff (1987: 103)
Rather vaguely, he asserted that Experiential Domains and Idealized Cognitive Models can “characterize links in category chains” (1987: 95). A bit of cultural theory
seeps in as well: “Experiential Domains . . . are basic domains of experience, which
may be culture-specific” [bold face added].
I hold that such linguistically significant experiential domains are in most instances actually cultural scenarios that have been given high salience by virtue of
occurring in myth, ritual, crisis, social structure, or even the daily drudgery of domestic life. The functional links within domains are what we regard as conceptual
metonymies. In a further suggestion of the importance of conceptual metonymy
over schematization, Lakoff asserted that “specific knowledge (for example knowledge of mythology) overrides general knowledge” (1987: 96). We are left with a
picture of a category that has a central prototype from which radiate a number of
chains based on similarity and conceptual metonymy (Figure 2).
Lakoff used this concept to develop a theory of Dyirbal noun classifiers. Three
of the four classifiers were characterized as radial categories (bayi, balan, balam).
The fourth (bala) was characterized as an ‘everything else’ category. Noun classifiers represent a common and important kind of grammatical category, which
was once thought to be arbitrarily organized. Lakoff (1987) demonstrated that a
class may have hundreds of members that share no common features of meaning. In my opinion, this important advance in the theory of linguistic categories
depended crucially on understanding the governing role of cultural scenarios.
Tom Mylne (1995) took issue with Lakoff ’s (1987) analysis of Dyirbal noun
classifiers, accusing him of imposing a Western world view on the Dyirbal system because it proposed human males and females as prototypes for the classes
bayi and balan. Mylne proposed instead that the linguist should seek to discover
which concepts have particular relevance for the Dyirbal and use these as the basis
for the analysis. He proposed that the four classes of bala, balam, bayi, and balan

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.8 (418-489)
Gary Palmer
could each be defined by combinations of values on the dimensions of potency
and harmony, which have special relevance in Dyirbal culture and society. Thus,
Mylne’s critique appears to be an argument for an explanation that is more cultural than cognitive, but based on parameters or features, rather than on scenarios
or cultural models.
My analysis of classifiers is like Mylne’s in two respects: First, I am arguing
that the important criteria for classification are concepts that are culturally salient.
Second, I am arguing that one finds no single prototype at the center of a typical
noun class. But unlike Mylne, I do not try to explain the category by replacing
the prototype with one or two abstracted dimensions. Similar approaches have
been attempted in Bantu studies (Contini-Morava 1994; Spitulnik 1987, 1989)
with unsatisfactory results, as discussed by Palmer and Arin (1999) and Palmer
and Woodman (1999).
A third type of complex category is the polycentric category as proposed by
Palmer and Woodman (1999). A polycentric category has multiple central categories, each of which may be a scenario or a prototype derived from the scenario. I
show only scenarios in the central region of Figure 3. I treat the central categories
as a functional complex, rather than as parameters which must have contrasting
values across categories, though I would not rule out the possibility of a level
of contrast that would apply across classes to subsets of category members. The
central categories are related to one another and to more peripheral categories
and instances either by function (contiguity, conceptual metonymy), by similarity
(prototype to variant, metaphor), or by schematization (schema to instantiation).
I call these complexes polycentric categories. They consist in part of complex categories as defined by Langacker (1987: 373) and of radial categories as defined by
Lakoff (1987). Since the cognitive links of polycentric categories are all embedded
in cultural scenarios and other sorts of cultural models, the PC is at once both
cognitive and cultural.
. Case studies
Case 1: Grammatical voice and emotion language in Tagalog5
The notion of agency itself represents a very abstract schema of social interaction
in which the subject or focal participant initiates or performs an action. In many
languages it is uncommon to explicitly mention agents of transitive constructions,
so that sentence subjects are often experiencers or objects of transitive actions.
Mention of a transitive agent may require explicit ergative marking on the noun.
In Western Samoa, Alessandro Duranti (1994: 114–143) found that participants in
village council meetings were reluctant to define agents in the beginning part of the
SCHEMA
f
f
f
f
f
SCHEMA
elaboration extension
Key
VARIANT
SCENARIO C
SCENARIO A
f
SCENARIO B
PROTOTYPE
PROTOTYPE
Figure 3. Schematic of polycentric category as proposed by Palmer and Woodman (1999)
VARIANT
VARIANT
SCHEMA
PROTOTYPE
f
metonymy
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.9 (489-489)
When does cognitive linguistics become cultural?

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.10 (489-546)
Gary Palmer
meetings. In transcriptions of the meetings, transitive clauses with ergative agents
were not very frequent (1994: 125). They appeared only where participants were
receiving credit or blame, or where “the power of certain individuals or groups to
affect others through their actions or to cause or initiate events is at least acknowledged” (1994: 126). The person with the highest incidence of ergative agents in his
speech was the senior orator who chaired the meeting and also acted as prosecutor or instigator. References to actions of the Almighty also place the Lord in the
ergative case, as in example (1) (1994: 126).
(1) e fa’alava
e
le Akua mea ‘uma.
ta caus+enough erg art Lord thing all
“The Lord makes all things sufficient.”
Speakers avoid focusing the agency of participants by placing actors in prepositional or genitive phrases. While some are fixing responsibility and laying blame
with ergative constructions, others are dodging responsibility and denying blame
with genitive or prepositional constructions, or with vague language. Duranti
pointed out that speaking with ergative agents constructs relations of power as
much as it reflects them. The powerful may use ergative constructions to frame
the situation, but the less powerful use them at their own risk.
By demonstrating the usage of the ergative construction in political scenarios, Duranti has shown that the grammar of agency participates in the culture of
power. Making the connection is not as straightforward as relating a deictic term
or a spatial preposition to a physical scene, because the construal of social events is
much more problematic than the construal of basic spatial conformations. Further
complicating the analysis is the fact that the language of agency is not independent
of the social process. The discourse and its grammar participate in the scenario,
co-constituting it along with other symbolic acts, such as seating arrangements,
turn-taking, and presentations of gifts or titles. There are ways to evaluate dimensions of social scenarios independently from their discourse, but for the moment
I am resorting to an interpretive approach. This is not an unusual limitation, because linguists are seldom able to provide rigorous proofs of the semantic basis for
their grammatical categories.
My case study of voice in Tagalog emotion language is parallel to Duranti’s
study of ergative constructions in Samoan council discourse. Agency schemas
underlie grammatical voice at its semantic pole. Tagalog lacks an ergative case construction, but there are other grammatical similarities, no doubt based on the fact
that Samoan and Tagalog are distantly related Austronesian languages. For example, both languages commonly place non-focused agents in genitive phrases.6 In
Tagalog, several verbal affixes predicate the agency or the lack of agency of the
focal participant in a clause. Interpreting their meaning with respect to agency
of participants in a discourse is not always straightforward, because speaker may
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.11 (546-599)
When does cognitive linguistics become cultural?
be referring to the agency of self, of interlocutor, or of some third party. Nevertheless, the attribution or denial of agency can be shown to make sense in the
discourse context. I will not be arguing that emotion language in Tagalog differs
greatly from language in other domains of culture and discourse, only that making
the governing scenarios explicit helps us to understand agency and voice in Tagalog. When comparable studies become available, the scenarios governing voice can
be compared cross-linguistically. However, it does seem likely that emotional language is particularly sensitive to the nuances of semantic agency evocable by voice
constructions. It therefore provides a good domain for the study of voice.
This case study examines grammatical voice in the emotion language that appears in a Tagalog video melodrama dealing with a couple living in transnational
circumstances. I will demonstrate that the protagonists in this melodrama most
often present themselves and one another either as grammatical experiencers or
patients. Similarly, others also represent them as patients or as needing to acquire
agency. In those instances when they are assigned actor roles, they are seldom
placed in grammatical focus.7 It is only in moments of crisis that they assume the
language of strong personal agency by using forms in which they, as grammatical
participants, take on active focus.
Nominal participants in Tagalog are said to be focused if they are preceded
by the referential (ref) determiner ang, which contrasts with the genitive (gn)
marker ng [nang] and the directional (drc) preposition sa. There are also pronouns and personal name markers that correspond to ang, ng, and sa phrases.
Each of the voice affixes places certain kinds of nominal participants in focus. The
most common transitive construction occurs with a null voicing affix (though often with the ni- (-in-) realis prefix or -in irrealis suffix which is sometimes regarded
as a voicing affix). The construction, regarded as a kind of passive in less technical
grammars, requires that a profiled actor – if one is profiled – be genitive and a
profiled undergoer be focused, as in (2). Some linguists regard this construction
as evidence that Tagalog is an ergative language (Cooreman, Fox, & Givón 1984).
Focused undergoers have low topicality.
(2) Gagaw-in ko
ang lahat
upang
ma-kamt-an
do-irr:uf 1p:gn ref everything in.order.to irr:nc-obtain-loc
ito.
prox:ref
“I will do everything in order to obtain this.”
The non-control affix ma- also places a patient or experiencer in focus, as in (3),
which uses the prefix in its realis form na- and the referential focus pronoun ako
rather than an ang-phrase.

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.12 (599-677)
Gary Palmer
(3) na-tatawa
ako, hi, hi, hi, hi, sa ‘yo8
nc:rl-incm-laugh 1s:ref hee, hee,. . . drc 2s:drc
“I was amused, hee, hee, hee, hee, at you.”
Other affixes (mag-, -um-) place actors in focus. Two examples appear in (4).
(4) Ngayon ako’y
nag-sisisi
kung bakit ako
now
1s:ref-inv af:rl-incm-regret cond cond 1s:ref
nag-‘I love you’!!!9
af:rl-‘I love you’
“Now I am regretting ever saying ‘I love you’!!!”
Tagalog has many ways of verbalizing or predicating emotional experience. Example (3) illustrates the Use of Emotion Terms (na-tatawa) and the Mimesis of
Psycho-ostensives (hi, hi, hi, hi). Example (4) illustrates the Use of English Emotion Terms in Tagalog or Mixed Text. Other ways of verbalizing emotions are listed
in (5) to (11).
(5) Obscenity
the lady just kept swearing banal na aso, santong kabayo10
the lady just kept swearing holy lg dog, pious-lg horse
“the lady just kept swearing ‘holy dog, horse saint”’
(6) Description of Psycho-ostensives
katakot-takot na kamot si kaka’y napadaing 11
st-fear-r2
lg scratch pn prnm rl:ncf-ger-cry.out
“horrific scratches, Kaka cried out”
(7) Repetition
ako,
mahal kita, mahal na mahal12
1s:ref love 2s:1s love lg love
“I love you, love of love”
(8) Use of Verb with Process that Results in Emotion or Feeling
Hindi mo
alam kung gaano mo
ako
sasaktan.13
neg 2s:gn know cond how 2s:gn 1s:ref incm:injure-loc
“You don’t know how much you hurt me.”
(9) Description of Facial Expressions (Conceptual Metonymy)
gumulong at nagkaduling-duling 14
af:roll
and af:rl-st-cross.eyed-r2
“he rolled on the floor and got cross-eyed”
(10) Use of Metaphor
nababato
ako
gusto kong umuwi15
ncf:rl-incm-stone 1s:ref like 1s:gn af:irr-go.home
“I am turned to stone, bored, my desire is to go home.”
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.13 (677-799)
When does cognitive linguistics become cultural?
(11) Denial of Emotion
Matuto
kang
maging
manhid.16
ncf:irr-learn 2s:ref-lg ncf:irr-become numb
“Learn to become insensitive.”
What one quickly notices in these expressions is that many emotion terms have
verbal affixes, each of which conveys mood as well as voice. Examples include
(3) na-ta-tawa ‘I was amused’, (4) nag-si-sisi ‘I am regretting’ and the interesting nag- ‘I love you’, which uses an English phrase as a verb stem, (6) na-pa-daing
‘he cried out’, (8) sa-sakt-an ‘(someone) hurt (someone), (9) nag-ka-duling-duling
‘got cross-eyed’, (10) na-ba-bato ‘was turned to stone’. These forms all happen to
be realis mood. Irrealis forms would be matatawa, mapadaing, magkadulingduling,
etc. My arguments regarding voice in Tagalog emotion language hinge mainly on
the distribution of non-control (ncf), undergoer-focus (uf), and agent-focus (af)
forms. The distinction between realis and irrealis is not without interest, but it is
not crucial to the argument. Aspect is most often either completive, which is unmarked, or incompletive, signified by reduplication operating on the first syllable
of the root as in (3) na-ta-tawa, (4) nag-si-sisi, and (10) na-ba-bato.
Voice affixes ma-, - i- and -an put non-agentive participants in focus. The focal participant of ma- may be merely an experiencer, but i- and -an require focal
participants to be undergoers. All three may be said to have undergoer focus (ug),
but they are usually designated as stative focus (sf), undergoer-focus (uf), and
locative focus (lf). Rather than stative, I use the term non-control (ncf), because
it more accurately subsumes the variety of meanings. The undergoer-focus affixes
i- and -an contrast with affixes mag- and -um-, which have active agent-focus (af).
The forms with initial m- (ma- and mag-) are irrealis. They have realis counterparts na- and nag- and gerund forms pa- and pag-. Related to mag- is maN-, a
form that has more idiosyncratic semantics. Suffix -in is irrealis, but it occurs frequently in undergoer focus constructions. The semantics of the voicing affixes is
summarized in Table 1.17
The concern in this paper is how these forms are used in actual discourse to
communicate agency or lack of agency on the part of the central participants.
Since I am positing that the affixes of voice predicate elemental scenarios of action and agency, it seems useful to represent their semantics with a few heuristic
diagrams, as in Figures 4 to 9 in which the stick figures represent the protagonists
Alice and Jerry, when they are speaking or when they are being spoken to by others. As speakers, they may speak to each other, or more often to a third person.
I have no examples in which Agnes or Jerry are spoken about as third persons.
The stick figures may seem gratuitous, but I use them to emphasize that the voice
affixes in the verbs of emotion language predicate scenarious with human agents
and patients.

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.14 (799-866)
Gary Palmer
Table 1. Focus and semantics of Agency in the voicing affixes
Focus
Morphology Semantics of figure
Non-control (NCF) maUndergoer (UF)
-i-
Locative (LF)
Agent (AF)
-an
-ummagmaN-
Experiencer or patient
Reason for doing;
Conveyance of patient;
Instrument
Goal or location
Performs or initiates action
“
“
Example
ma-rinig ‘(x) be able to hear’
i-kukwento ‘tell story-(x)’
sasakt-an ‘injuring (x)’
um-uwi ‘(x) went home’
nag-sisisi ‘(x) is regretting’
nang-galing ‘(x) came from’
* In full clauses the arguments corresponding to “(x)” in the examples would appear as focused
nominals. The prefix ni-/-in- is treated as modal rather than voicing, though it commonly occurs
with otherwise unmarked undergoer focus.
I am regarding Tagalog grammatical focus as a means of profiling participants
and processes. Profiling means that an expression specifically designates a particular substructure within a conceptual base or scope of predication (Langacker
1999: 27). “The entity designated by a predication – what I will . . . call its profile –
is maximally prominent and can be thought of as a kind of focal point” (Langacker
1987: 118). Thus, I take grammatical focus in Tagalog to be a marker of salience. If
an actor has grammatical focus, I take it as a marker of the salience of agency. If an
experiencer or undergoer has grammatical focus, I take it to mark lack of agency.
In Figures 4–8, profiled elements are drawn with bold lines. Figure 4 represents
the situation in which an actor or agent is in focus, and it follows that the action
must also be salient, so the arrow also appears in bold. Actually, the grammatical
actor in Figure 1 is ang kalooban, ‘inner feelings’, which I have represented with the
gray circle in the chest region of the stick figure. Figure 5 reverses the focus, placing it on an undergoer. Participants in this scenario lack personal agency – they
are acted upon. Figure 6 represents the participant as experiencer, another situation in which personal agency is lacking. Figure 7 represents the conceptualization
underlying a clause in which the agency of a central participant is denied. Predication of denial is accomplished by the construction hindi hindi . . . kayang ‘neg
neg . . . be able’. Figure 8 shows a scenario in which an actor surrenders personal
agency by a metaphorical act on an object in the body. The bold box represents
an abstract entity, here instantiated by pride, which was metaphorically swallowed
(ni-Ø-lunok), or the heart (puso), which was allowed to prevail (< ni-pa-iiral).
Both constructions require undergoers, as indicated by the prefix pa- and the null
prefix. The ni-/-in- prefix in both verbs is realis mode rather than voice. The dotted line indicates that the two figures represent the same person. The metaphorical
actor is the actual experiencer. The figure in the target concept (box) is drawn in
light lines to show that the resulting status is not explicitly verbalized. The scenar-
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.15 (866-866)
When does cognitive linguistics become cultural?
-um-, magNag-su-s-um-igaw
ang kalooban ko*
ma-, i-, -an, Ø-
‘I am shouting out my
inner feelings’
Na-niwal. Bakit niya ako
ni-Ø-loko?
‘I believed him. Why did
he fool me?’
*‘My inner feelings are
shouting out.’
Figure 4. Agent Focus (AF)
Figure 5. Non-control, Undergoer & Locative Focus (NCF, UF, LF)
ma-
Alam mong hindi hindi ko
kayang mag-mahal...
Na-ba-bato ako.
‘I am turned to stone (bored)’.
‘You know I can’t love...’
Figure 6. Experiencer (Non-Control) Focus (NCF)
Figure 7. Denial of Agency
metaphor
target
source
pa-, Ø-
Experiencer
Undergoer Focus (UF)
Ni-Ø-lunok ko ang pride ko.
‘I have swallowed my pride.’
Hindi puweding ang puro puso p-in-a-iiral.
‘You cannot let the heart prevail.’
Figure 8. Metaphorical surrender of Agency

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.16 (866-937)
Gary Palmer
ios in these figures are sufficient to characterize most of the emotion language in
Sana’y Maulit Muli.
The video Sana’y Maulit Muli ‘I Hope It Will Be Repeated Again’ dramatizes
several facets of the predicament of the Tagalog transnational community, at least
as experienced by two young middle-class lovers, Agnes and Jerry.18 The two experience anguished separation from home, family, and friends, as well as from each
other. They encounter dehumanizing ideologies and onerous social demands of
the market economy. They are exploited by callous employers and immigration
officials. Agnes discovers the freedom, danger, and loneliness of feminine selfreliance. Both succumb to the temptations and comforts of consumerism. The
emotional conversations of Jerry and Agnes, and of each with others, appear to be
largely about the loss and recapture of personal agency.
Of special interest in the film is the frequent use of emotional expressions suggesting lack of control. Expressions revealing active control with protagonists in
grammatical focus (af) appear only as directives received by them and as uncharacteristically assertive outbursts occurring in moments of crisis. Jerry is ambitious
and spends a lot of time with his attractive boss, Cynthia, often leaving Agnes
alone. Agnes’s mother wants her to come to the United States, where the mother is
living. Jerry’s cousin Nick arrives from America, looking rich and important. After
a difficult interview with an immigration officer, Agnes is moping about the house,
dreading the thought of leaving Jerry. Her aunt tells her to take control of her life.
She perceives Agnes as allowing the heart to rule. Here Agnes is the tacit actor for
p-in-a-i-iral ‘let prevail’ (< ni- + pa- + i- + iral), but she shows a lack of agency
by letting the heart prevail (12). Rather than Agnes being focused as actor, it is the
heart, itself a metaphor for lack of control, that is focused as grammatical patient.
Pa- is the gerund form of non-control focus ma-. Here it has the sense of ‘let’.
(12) hindi puweding puro puso ang p-in-a-i-iral
neg can.be-lg pure heart ref rl-ger-incm-prevail
“You can’t allow the heart to rule” ∼ “you cannot let the heart prevail.”
When departure seems imminent, Agnes says, “Don’t let me go; I don’t want to
go.” Jerry says “Remember, you are loved, loved, loved (by me).” (13). Mahal kita
is usually translated as ‘I love you’, but it is non-control focus, meaning that the
person loved is given grammatical focus.19 Because focus is on the patient, this
construction does not highlight the agency of either participant. Kita is a portmanteau form that conflates second person singular experiencer and first person
singular actor. At the denouement of this story, we will hear Jerry use the active
form mag-mahal, highlighting the role of human agency.
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.17 (937-1014)
When does cognitive linguistics become cultural?
(13) mahal
na mahal
na mahal
kita.
irr:ncf-loved lg irr:ncf-loved lg irr:ncf-loved 2s:ref
“You are loved, loved, loved (by me).”
Agnes goes to San Francisco, where she becomes ‘stoned’ with boredom, using the
non-control prefix na- (14). Her non-agency is salient.
(14) na-ba-bato
ako
rl:sf-incm-stone 1s:ref
“I am stoned [turned to stone].”
Agnes’s brother and sister mistreat her. Her mother, urging a more active role on
her, tells her she has to use her brain: gamitin mo ang utak mo. Agnes says,
(15) Ayoko
dito.
dislike:1s:gn prox:loc
“I don’t like it here.”
(16) Wala akong
ka-kampi,
ma-ma-matay
ako
sa
lack 1s:ref-lg incm-take.side ncf:irr-incm-die 1s:ref drc
lungkot.
melancholy
“I’m not taking sides, I’m dying of home sickness.”
In (15), ayoko is a contraction of ayaw ko, so this is an instance of Agnes taking
the role of agent, as indicated by genitive ko, but ko is not a focus-pronoun, so her
agency is non-salient. In (16), Alice is again the agent of taking sides, but she denies
her agency. In the next clause, the metaphor mamamatay ‘dying’, is non-control
and Agnes is the focal participant, so her non-agency is salient.
Back in Manila, Jerry’s mother interferes with their phone calls. Due to lack of
communication and miscommunication, their relationship is starting to get blurry
(nag-ka-halabu-an < labo ‘blur’). The prefix is active, but neither Agnes nor Jerry
is the agent. Jerry is torn over his relationship with his boss. As Jerry and the boss
sit in his car, he speaks of his feelings, using the term pa-ki-ramdam ‘feelings’, a
conventional form based on the gerund form of non-control ma- (17). The term
is used for transient feelings caused by outside events that effect the body or the
emotions. Jerry speaks of hearing the crying of Alice, using the non-control nari-rinig (18), and realizing that she has resentment towards him (19). His nonagency is salient. The word tampo predicates a feeling of anger and hurt, a sense of
sulkiness, often felt between two people who are close or love each other. Tampo is
presented as a bare root, suggesting a nominal interpretation, which is reinforced
by the referential preposition ang. Alice has some agency here, as her feeling is
directed towards Jerry, but the pronoun referring to Alice is the genitive niya. She
is not in grammatical focus, so her agency is has low salience. If there is a focus at

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.18 (1014-1079)
Gary Palmer
all in this clause, it is the resentment itself, which is preceded by ang, the referential
preposition used with focal participants.
(17) Ang pa-ki-ramdam ko
tama ang g-in-a-gawa
ko
pag
ref ger-soc-feel 1s:gn right ref rl:ug-incm-do 1s:gn when
ikaw ka-usap ko.
2s:ref st-talk 1s:gn
“My feeling is that what I am doing is right when you talk to me.”
(18) Pero pag na-ri-rinig
ko
ang iyak ni
Agnes,
but when rl:sf-incm-hear 1sg:gn ref cry pr:gn prnm
“But when I hear the Agnes’s crying,”
(19) Malaki na nga ang tampo
niya sa akin.
great now emph spc hurt∼anger 3s:gn drc 1s:drc
“She is really feeling hurt and resentful towards me.”
Jerry won’t let Agnes come home. She is then attacked in an alley. She escapes and
tries to call him, but he is at Cynthia’s place. Agnes goes crazy (na-ba-baliw, another non-control form in which non-agency is salient). She characterizes herself
as stupid, using the nominal root tanga ‘stupidity’ plus the genitive first person
pronoun (20). She uses a non-control form of believe, suggesting that she was
caused to believe and she places herself in undergoer-focus to talk about being
fooled.20
(20) Ang tanga
ko,
ang tanga
ko,
ang tanga
tanga
ko.
ref stupidity 1s:gn ref stupidity 1s:gn ref stupidity stupidity 1s:gn
“My stupidity, my stupidity, my great stupidity.”
Bakit, bakit ganoon?
why why like.this
“Why, why like this?”
Na-niwal.
Bakit niya ako
ni-loko?
rl:sf-believe why 3s:gn 1s:ref rl:ug-fool
“I believed him. Why did he fool me?”
The film continues in this vein, until it reaches a crisis. Jerry now realizes that
he has been a passive participant. He uses a flurry of realis forms with default
grammatical undergoers to speak of swallowing pride (21), sacrificing principles
(22, 23), and enduring (24). These all give Jerry agency, but Jerry as agent is not in
focus, so his agency has low salience. Rather, his rantings place pride, principles,
and hardship in focus.
(21) ni-lunok
ko
ang pride ko
rl-swallow 1s:gn ref pride 1s:gn
“I have swallowed my pride.”
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.19 (1079-1158)
When does cognitive linguistics become cultural?
(22) S-in-akripisyo ko
ang magandang kinabukasan ko
sa Pilipinas.
rl-sacrifice 1s:gn ref beautiful
future
1s:gn drc prnm
“I sacrificed a beautiful future in the Philippines.”
(23) Ni-lamon
ko
ang prinsipyo ko.
rl-eat.big.piece 1s:gn ref principle 1s:gn
“I ate a big piece of my principles.”
(24) T-in-iis
ko
ang hirap
ng buhay dito.
rl-endure 1s:gn ref difficult gn life
prox:drc
“I endured a hard life here.”
This passage reaches a climax with Jerry’s use of two active forms describing
his attempts to overcome the oppression of his circumstances nag-su-s-um-igaw
‘shouting out’ and nag-babakasakali ‘hoping to repeat the past’ (25, 26). The former is doubly active, in that it uses two active affixes, nag- and -um-. But even
here, Jerry is apparently not the active grammatical agent, or he would be represented with the first person pronoun ako. The sense is that his internal feelings,
presented in the ang-phrase, are actively impelling (nag-) active (-um-) shouting
out. A consultant said:
Nagsusumigaw ang kalooban ko does not necessarily mean that the person is ‘literally’ shouting or letting out his feelings to a person(s). It just means that the
person has this (intense) feeling, clamoring/bursting inside of him, wanting to be
let go. Now, the person has a choice whether to let it (the feeling) out or not but
he doesn’t have to “shout” it out. The shouting was inside of him.
The explanation suggests that the underlying scenario involves force dynamics in
which the will is striving against the inner feelings (Talmy 1988). Agency involves
both motivation and choice, which may act in opposition or synergistically. In
this instance, Jerry is choosing to suppress the motivation. It will be interesting to
search for further instances of the affix combination mag-__-um___ to discover
whether it always predicates a force-dynamic scenario.
(25) Kahit
nag-su-s-um-igaw
ang kalooban
ko,
dahil
mahal
in.spite.of rl:af-af-incm -shout ref inner.feeling 1s:gn because love
kita,
2s:ref
“In spite of this I am shouting out my inner feelings, because I love you,”
(26) dahil
nag-ba-baka-sakali
ako-ng
maulit
because rl:af-incm-perhaps-in.case 1s:ref-lg ncf:irr-repeat
yung
dati.
rem:ref-lg former
“because I am perhaps hoping to repeat the past.”

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.20 (1158-1209)
Gary Palmer
Near the end of this saga, Jerry realizes that his capacity to love is something over
which he should have control, even though he feels himself losing it. He uses the
agent focus form mag-mahal (27).
(27) Alam mong
hindi ko
kayang mag-mahal nang hindi buo ang
know 2s:gn-lg neg 1s:gn ex-lg irr:af-love when neg whole ref
pagkatao ko.
humanity 1s:gn
“You know I can’t love when my humanity is not whole.”
Jerry doesn’t want to lose Agnes and her respect for him, so he returns to Manila,
where he takes up his old role as an assertive advertising man. One day, Agnes
shows up in Makati, the upscale business district of Manila. The film ends on
their encounter, leading the viewer to conclude that the couple resumes their
relationship. Perhaps their language also takes a more active turn.
I hope my analysis has demonstrated that grammatical voice in Tagalog emotion language is sensitive to very abstract social scenarios. In the scenarios played
out in the melodrama Sana’y Maulit Muli, personal agency of the participants is a
major element. The repertoire of Tagalog verbal affixes provides ample resources
for predicating nuances of personal agency. Voicing affixes may focus agents, experiencers, goals, or patients. By focusing an agent with -um- or mag-, an affix may
also focus the agency involved, so long as the actor is a human participant. Lack of
personal agency may be expressed directly with a focused experiencer (ma-) or patient (-in, i-, -an). Lack of personal agency may be expressed indirectly by denial of
the agency implied by an active form. Occasionally, some component of identity,
such as inner feelings, takes the grammatical role of focal participant in an active
construction. I do not feel sufficiently conversant with Tagalog theory of agency
to offer a judgement as to whether or not this construction in fact highlights personal agency. Controlled surrender of personal agency is expressed metaphorically
with an unfocused actor in a genitive phrase, as in ni-lunok ko ang pride ko. ‘I have
swallowed my pride’ (21) (ko is the genitive form of the first person pronoun).
Case 2: Shona noun classifiers as polycentric categories
Many languages have gender classifiers that segregate nouns. There are, for example, the genders of German and Latin, the numeral classifiers of Chinese, Japanese,
Maya, Ojibway and many languages of southeast Asia, the verbal classifiers of
Navajo, and the 20 or more classes of the Bantu languages. Other languages have
substantive affixes that can function as classifiers. These would include, for example, the anatomical suffixes of Tarascan and Coeur d’Alene (Friedrich 1979: 394–
395; Palmer 1996: 60, 145–146).21 For decades linguists have struggled to make
semantic sense of classifiers. Most commonly they have concluded that the as-
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.21 (1209-1258)
When does cognitive linguistics become cultural?
signment of lexemes to classes is arbitrary or that the classes center on such basic
physical qualities as shape, texture, number, and animacy. While there is some explanatory value in the physical prototype approach, it has ultimately proven to
be limited, leaving unexplained such interesting phenomena as the occurrence in
some Bantu languages of the human term chief in the same class as wild animals
(Guthrie’s 9/10; Guthrie 1967). Another approach was needed.
As early as 1959, the famous paleontologist Louis S. B. Leakey proposed in his
Kikuyu lesson book that the noun classes are ranked on a hierarchy of spiritual
value. For example, humans appear in Leakey’s class I (Guthrie’s 1/2), the highest
in spiritual value; class II (Guthrie’s 3/4) is for “second class spirits;” and class
III (Guthrie’s 9/10) is for all other living creatures. Regarding Guthrie’s class 5/6,
Leakey (1955: 13) asserted that “every single word in this class is an object which is
used, or has been used until recently, in connection with religion, magic or ritual
or some other form of ceremonial.” To my knowledge, Leakey’s proposal was never
consciously followed up by linguists.
The year 1987 saw a breakthrough in the understanding of classifiers. The key
to their explanation was most widely publicized by George Lakoff in the book that
drew its title Women, Fire, and Dangerous Things from a noun class of the Dyirbal language of Queensland. Lakoff was actually reshaping a middle-level theory
proposed by Dixon (1982). Lakoff held that each noun class had a central member
and that other members were linked to the central member by category chaining.
The basis of the chaining was a common domain of experience, which was culturespecific. The Dyirbal classifier balan (one of four) marks a category whose central
member is human females. In Dyirbal mythology, the sun was a woman. Other
members of the class were birds (mythical females) and plants and animals who
either appeared in the myth or were seen as somehow similar to fire (they were hot
or they had stingers). Fire belongs to the class because it belongs the same domain
of experience as the sun. Thus, with some exceptions, category membership seems
neatly explained by this approach. Problems with the approach have been raised
by Mylne (1995), whose critique was previously discussed.
In the same year, Debra Spitulnik (1987) published a study of Chewa (Bantu)
classifiers.22 Her approach leaned heavily on highly abstract schemas, which she
called “central notional values”, but she also proposed that some nouns belong in
their classes by virtue of cultural associations. “The [ChiBemba] noun ímfumu
‘chief ’ occurs in the class dominated by nouns for wild animals (Cl. 9/10) because of the cultural association of the chief with the animal world” (Spitulnik
1987: 110) [italics added]. She did not lean heavily on the cultural approach, because in her view, grammatical factors compete for control over the classifiers. At
about the same time, Ellen Contini-Morava proposed in a paper made available
on the internet that the Swahili (Bantu) noun classes were dominated by “super-

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.22 (1258-1311)
Gary Palmer
schemas” that were linked by schematicity and extension to spatial, supernatural,
and psychological features and schemas.23
To sum up these approaches to understanding classifiers, Leakey described
classification by spiritual hierarchy, Dixon and Lakoff showed clear mythical motivations for Dyirbal classifiers, Spitulnik presented a plausible cultural explanation
for the apparently anomalous classification of Chewa chiefs, and Contini-Morava
saw supernatural schemas underlying Swahili classes. These observations suggest
that it might be worthwhile to apply a cultural approach to the Bantu classifiers
with special attention to the supernatural and to apply the approach more systematically than had been previously attempted. That is what I and students Dorthea
Neal Arin, Claudia Woodman, and Russell Rader have begun to do for the Shona
language of Zimbabwe. But before discussing those findings, I will present a brief
description of the classifier system involved:
Bantu noun classifiers are defined by characteristic prefixes on the nouns and concordial affixes on adjectives, verbs, and deictics. The classes are usually designated
by numbers from 1 to 22. In classes 1 to 13, odd numbers are singulars, even
numbers are plurals. Thus, for Shona singular class 1, mu-, the plural is class 2,
va-, and for singular class 3, mu-, the plural is class 4, mi-. Of the first 15 classes
identified by Guthrie (1967), the only ones to which he attributed clear semantic
correlates are 1/2 (persons) and 9/10 (animals). He observed that parts of the body
appeared more frequently in 3/4 and 5/6, but otherwise found no definite correlations of meanings to classes. Fortune (1955) observed that “class 3 contains nouns
indicating trees, parts of the body, atmospheric phenomena, things characterized
by length, and miscellanea” [emphasis added]. The only atmospheric phenomena
that he listed are m]ando ‘breeze, wet weather’ and possibly m]ea ‘air, soul’ and
cando ‘cold.’
(Palmer & Woodman 1999)
Specifically, Palmer (1996) and Palmer and Arin (1999) proposed that the semantics of classifiers in Shona and other Bantu systems are governed by salient ritual
scenarios that are more culturally specific and richer than the stereotypes and
features proposed by Spitulnik (1987, 1989) and Contini-Morava (1994). After
reading all available ethnographies of Shona culture and society, Palmer and Arin
identified nine specific and two general scenarios that might govern the distribution of Shona noun classes. Close reading of Shona ethnography was the only
systematic method used to identify these scenarios. Therefore,we cannot guarantee that Shona speakers would agree with us on their salience or structure them in
the same way. It would be preferable to conduct interviews and make correlated
observations in the field.24 Scenarios 1, 2, 10, and 11 are listed below. The numbers
of these scenarios do not correspond to the numbers used by Bantuists to identify
the noun classes.
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.23 (1311-1403)
When does cognitive linguistics become cultural?
1. The spirits of ancestral chiefs live in the bodies of lions (mhondoro).
2. The chiefly ancestral spirits (mhondoro) reign over both the things of the wild
and human affairs. They are the protectors of the land and the wild animals.
10. There is a scenario of protection in which the central participants are dominating protectors, protected ones, and the victims of domination.
11. There is ritual danger, stemming mainly from foreign ancestors with grievances
or from contact with the paraphernalia of mediums.
Palmer and Arin (1999) proposed that Guthrie’s class 9/10 is governed by scenario
10 (which also subsumes 1 and 2), and that Guthrie’s 5/6 might be governed by
scenario 11. Subsequent research by Rader (1998) suggests that class 5/6 is more directly governed by the imagery and mythology of fertility.25 Palmer and Woodman
(1999) examined Guthrie’s class 3/4, finding that its central members involve an
important domestic scenario and an ethno-ecological model as well as mythical
and ritual scenarios. Central physical items in this class are those used in ritual and
domestic activities. There is a network of salient categories and chains of extension,
which justify using the term “central” for the salient categories. We concluded that
a noun class is more than a radial category centering on a prototypical member or
a single domain of experience. It is more like a network of radial categories based
on a cross-section of the cosmos, including physical experience, domestic scenarios, ritual scenarios, and world view. We proposed that a classifier organized like
this be termed a polycentric category.
Shona noun class 3/4 grammaticizes and lexicalizes four scenarios and one
ethno-ecological model which are salient themes of Shona culture. Scenario 3 was
among the 11 previously defined. Three new ones include two new ritual scenarios
(12, 14) and a domestic scenario (13). Item 15 is an ethno-ecological model.
3.
12.
13.
14.
15.
The spirits of ancestral chiefs bring rain, thunder, and lightning.
People pray to the ancestors.
Grain is pounded daily with a mortar and pestle.
Doctors cure with herbal medicines that are ground in a mortar and pestle.
Trees, shrubs, and herbs are associated with coolness, moisture, and medicine.
The conceptual elements provided by these models find lexical expression in many
of the members of Shona class 3/4 – see Figure 9. Those lexemes in the class that do
not predicate any of the major elements in the five models are semantically linked
in various ways as described in Table 2 (in Appendix). The more inclusive cognitive
model of a noun class that emerges from inspection of the semantics of the lexical
members and their associative links to the ethnographic models is what I refer to
as a polycentric category. The general structure of such a category is summarized
with example terms in Table 2 (in Appendix).

f
MOULT
SCATTER
PEOPLE
PRAY TO
ANCESTORS
f
f
DAILY
POUNDING
OF GRAIN
f
SCATTERED MEAL
f
NOISE
f
f
f
MORTAR
AND PESTLE
f
POLES
elaboration
Key
extension
f
metonymy
BAD HABITS
REPETITION
DURATION
LENGTH
EXTENSION
f
END-POINT
TRANSFORMATION
WITCHCRAFT
f
CRUSHING
GRINDING
OR
POUNDING
f
MEDICINES
f
CURING
PRACTICE
GROUND MEAL
ANCESTORS
ANCESTORS
GIVERAIN
RAIN
GIVE
Figure 9. Shona class 3/4 as a polycentric category
WAYS OF
SPEAKING
LANGUAGE
f
f
FOOD
OFFERINGS
RAIN
TREES
SHRUBS
HERBS
MOISTURE

LIQUIDS
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.24 (1403-1403)
Gary Palmer
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.25 (1403-1457)
When does cognitive linguistics become cultural?
A polycentric category has more complexity than a radial category, but it does
not seem to display unnatural or excessive complexity for the semantic system
of a natural spoken language. It is natural for people to have salient ideas based
on rituals and daily domestic tasks, and it is natural for them to model their
environmental surroundings. It is natural to identify clusters of models that are
functionally related and to regard them as a cultural unit. It is natural to abstract schemas from the elements of those models and to discover similarities and
metaphors across conceptual domains. And it is natural to recursively apply such
thought processes to the derived categories. Finally, it is natural for a lexeme to be
polysemous within the domains of a polycentric category. When such a complex
is grammaticized, the result is culture-specific and based on models that can be
discovered by the methods of ethnography, but dependent upon mental processes
that have been best described in the literature of cognitive linguistics.
This approach explains the numerous instances of nouns which appear to satisfy the criteria for more than one class but characteristically appear in only one
class. The archetypal example in Bantu studies is the classification of chiefs with
wild animals, rather than with humans (Creider 1975). Many terms do in fact satisfy the criteria for multiple classes, but they are judged by their speakers to fit one
better than another. Each class has multiple criteria, and these may be activated by
the context of a discourse. The selection and classification of a term is the product
of multiple competing and synergistic activations. In Bantu, some nominal roots
have more than one common classification. It is likely that some classifications are
well-entrenched, while others are more subject to reassignment.
This approach raises a question of boundaries. Where are the boundaries
between classes, if any? If every class has multiple criteria and nominal participants are sufficiently complex in their semantics to satisfy multiple criteria, then
classes will necessarily compete for members in an ecology of classification. In
fact, there are no fixed boundaries between classes. The overriding criterion is
cultural salience, which varies with situations, but how can cultural salience be
evaluated by the linguist? How can one predict which classifiers will be used with
Bantu nominal roots? Currently, conclusions regarding the motivations for particular classifications are largely a matter of interpretation based on familiarity with
the culture gained through participant observation or reading of ethnographies.
One could devise tests that would manipulate the salience of criteria and observe
the assignments of nominal participants to categories, but such tests may not reproduce the motivations presented by naturally occurring discourse. Nevertheless,
in the event that such tests are undertaken, two hypotheses are suggested:
1
Reassignments will be more likely to occur where a domain which is inherent
in both the semantics of the nominal root and in an alternative classifier is
saliently evoked by the discourse situation.

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.26 (1457-1518)
Gary Palmer
2
It will be more difficult to elicit reassignments to more entrenched category
members, where entrenchment is independently measured by frequency of
usage or infrequent reassignment in natural discourse.
We must ask also how one can evaluate this analysis in comparison to other possibilities. Are there other analyses that would be just as convincing? Can our analysis
predict which nouns will be classified together? There are a number of possible criteria that could be used to evaluate competing analyses. They do not entirely solve
the problem of arriving at an analysis that is both replicable by others and true to
native-speaker thinking, because they remain subject to judgement and interpretation, but if taken seriously, I think they are better than having no criteria. The
criteria are as follows:
1
2
3
4
5
An analysis should be based upon thorough and comprehensive ethnography
with attention to salient cultural scenarios.
Given an adequate description of the cultural scenarios, an analysis should
be plausible, that is, it should consist of obvious connections. Non-obvious
connections may be adduced only where they are supported by native speaker
attestations.
A plausible analysis that is supported by native speaker attestation and reasoning is to be preferred over one that is not supported.
A plausible analysis which explains the largest number of terms in a class is to
be preferred.
A plausible analysis of a classifier which excludes terms normally found in
other classes is to be preferred, though even in a correct analysis many terms
will not be excluded, only preferred more strongly by their canonical classifier.
Finally, we must ask whether the cultural approach with polycentric categories can
predict the emergence and structure of classifier systems cross-linguistically. The
theory predicts that some kind of classifier system can emerge wherever there are
salient and stable cultural practices and institutions. These are the necessary conditions. Certainly, many of the languages around the world have classifier systems,
though some are hardly recognized as such. For example, the anatomical suffixes
of the Salish languages are usually not regarded as constituting classifier systems,
yet they function in much the same way as they take on abstract values of shape
(Palmer 1996). Also marginal to our notion of noun classifiers are the click classifiers of the Khoisan and the verbal classifiers of Apache, but they have similar
functions (Bernárdez n.d.; Basso 1990). One might even regard a finite paradigm
of honorifics, as in Japanese or Korean, as a classifier system in the social domain.
The approach does not currently specify the conditions that are sufficient to motivate the emergence of classifiers. Further cross-linguistic studies along these lines
are needed.
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.27 (1518-1569)
When does cognitive linguistics become cultural?
. Conclusions
Many lexical domains and grammatical constructions link directly or indirectly
to significant cultural models, notably including scenarios. Understanding the
grammar and lexicon of a language requires grasp of cultural models and culturally defined imagery. The most appropriate term for this approach is cultural
linguistics.
Application of this approach to voice in Tagalog emotion-verbs shows that the
semantics of voice affixes can be described in terms of elemental scenarios that
variously profile agents, experiencers, or objects. Analysis of the grammar in the
emotional language of a Tagalog melodrama reveals that choice of voicing affix
causes the agency of emotional participants to be profiled (given grammatical focus, either as agents or experiencers) or relegated to the base of predication with
reduced prominence (actors appear in genitive or oblique phrases, or not at all).
Thus, cultural linguistics helps to elucidate the emotional semantics of very dynamic discourse situations as portrayed in a popular medium. The grammar is
seen operating in its socio-cultural context. The scenarios of voice presented here
in diagrams 4–8 provide a basis for graphic comparisons across languages and domains, so comparable cross-linguistic studies are needed. Video melodramas are
particularly useful for the study of grammatical voice, because emotional speakers are attuned to the nuances of semantic agency and because melodrama reveals
and highlights the agency of participants in other ways, such as the presentation
of facial expressions and the portrayal of scenarios of fortune and misfortune.
The perspective of cultural linguistics shows obvious utility compared to a
more narrowly cognitive approach in its application to the problem of Bantu
noun classifiers, where the use of ethnographic methods to identify salient cultural models and scenarios is a necessary step in the research. In this application, it
was possible to show how cognitive processes of complex category formation and
category chaining operate within culturally specific models to create the polycentric categories that we know as Bantu noun classifiers. The polycentric category
introduced by Palmer and Woodman (1999) has multiple central scenarios and
prototypes, from which radiate category chains and complex categories as defined,
respectively, by Lakoff (1987) and Langacker (1987). The approach of cultural linguistics and its theory of polycentric categories improves on previous accounts
of classifier systems in a number of ways. It makes extensive use of ethnography,
which enables the content of categories to be related to a variety of salient scenarios of domestic and ritual life. The attention to ethnography reduces the risk
of forcing native terms into non-native categories. The approach avoids reducing each classifier category to a few features, or even to a single radial category
based on a single domain of experience. Instead, it posits a number of functionally related scenarios, each of which provides a rich semantic field of linkage for

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.28 (1569-1633)
Gary Palmer
dozens of nouns. It is more complex than previous approaches, but appropriately
so, because classifiers are motivated by metonymies and metaphors that are often
explicit in ethnographic descriptions, in the construction of terms, or in multiple
definitions of a single term. Finally, this approach highlights a number of interesting scientific questions pertaining to how one may establish the cultural validity
and psychological reality of polycentric categories.
Notes
. The research on Tagalog was supported by a Site grant and a sabbatical leave from the University of Nevada, Las Vegas for a study of “Popular Discourse in Manila and Las Vegas”, by the
Department of Anthropology, and by grants from the Faculty Travel Committee. My understanding of Tagalog linguistics has benefited from discussions and correspondence with Ricardo
Nolasco, Videa P. de Guzman, Lawrence Reid, and Stanley Starosta, though none would necessarily subscribe to this analysis of Tagalog voice. I am indebted to Nikolaus Himmelmann for
generously sending me his papers in progress and to Eric Pederson for many constructive comments. All correspondences concerning this article should be sent to Gary Palmer at University
of Nevada at Las Vegas, USA.
. The construal of schematic processes at different stages has been termed image-schema transformation (Lakoff 1987: 440–444, 1988: 144–149).
. For an example case study of a discursive term, see my discussion of Japanese yo in Palmer
(1966: 206–212).
. For examples of conceptions of discourse in various cultures, see Kuipers (1998), Scollon and
Scollon (1995: 94–121), Feld (1982), Kochman (1981), and Basso (1979).
. This case study of Tagalog emotion language draws heavily upon my paper “Sana’y Maulit
Muli: The Grammar of Agency and Emotion in a Tagalog Transnational Video Melodrama,”
which is a revision of a paper presented to the Linguistics Colloquium, University of the Philippines, Diliman Campus, February 11, 1999 and the Annual Meeting of the American Anthropological Association, Philadephia, December 2–8, 1998. That paper contains additional examples
and more discussion of cultural and historical dimensions.
The glosses of abbreviations are as follows: 1, 2, person; af, agent focus; drc, directional; emph,
emphasis; ex, existential; gn, genitive; ger, gerund; incm, incompletive reduplication; irr, irrealis; lf, locative focus; lg, ligature; loc, locative; ncf, non-control focus; neg, negative; pr,
pronoun; prnm, proper name; prox, proximate; r2, augmentative reduplication; rl, realis; ref,
referential; rem, remote; s, singular; sf, stative focus; st, stative (not involved in focus); soc,
social; uf, undergoer focus.
. “In Austronesian languages generally, agency and posssession are marked in the same way.
In other words, the agent of non-actor focus verbs co-occurs with the genitive marker, usually a
reflex of PAn *ni ‘genitive of human nouns; agent of non-actor focus verbs” (Blust 2002: 67).
. Henceforth, the term focus will refer only to grammatical focus as defined for Tagalog.
. Banal na Aso, Santong Kabayo ‘Holy Dog, Horse Saint’ by YANO. YANO. 1994. Yano. Produced by Yano & Poch Concepcion. Alpha Records Corporation (audiotape).
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.29 (1633-1709)
When does cognitive linguistics become cultural?
. Maniwala Ka Sana ‘Your Belief Is Hope’ by Parokya Ni Edgar. KHANGKHUN GKHERRNITZ
THE ALBUM. Parokya Ni Edgar: Backbeat. Pasig, Metro Manila (audiotape).
. Banal na Aso, Santong Kabayo ‘Holy Dog, Horse Saint’ by YANO. YANO. 1994. Yano. Produced by Yano & Poch Concepcion. Alpha Records Corporation (audiotape).
. Kaka, ‘Joe,’ by YANO.
. Senti ‘Sentimental’ (from YANO).
. Sana’y Maulit Muli. Regal Films. Star Cinema Productions, Inc. (videotape).
. Kaka, ‘Joe,’ by YANO.
. Sana’y Maulit Muli. Regal Films. Star Cinema Productions, Inc. (videotape).
. Sana’y Maulit Muli. Regal Films. Star Cinema Productions, Inc. (videotape).
. For a more detailed analysis of the semantics of the mag- forms, see Palmer (2003).
. Sana’y Maulit Muli. Regal Films. Star Cinema Productions, Inc. (videotape). In the film,
Lea Salonga plays Agnes, a young middle-class woman who has a boyfriend Jerry, who is in
advertising. Jerry is played by Aga Mulach.
. One could also say minamahal kita. If we think of mahal as the base, then the only senses
explicitly added by morphology are the incompletive, by means of reduplication of ma, and
realis, by means of the infix -in. Thus, it is probably best to think of this form as predicating the
default lack of control on the part of the one who is loved, i.e. the referential focal participant,
kita. It is usually translated with the more active English expression I love you.
. My consultants translated the expression as ‘Why did he fool me?’, but since niya is genitive,
a more structure-preserving translation would be ‘Why was I fooled by him?’
. The figure of 20 for the Bantu classes includes singular and plural forms. If these are not
counted separately, the figure would be ten. Classes 1 and 2 (or 1/2), for example, labels the
singular and plural of the class that includes most terms for humans.
. See also, Spitulnik (1989).
. The paper was eventually published in Contini-Morava (1994).
. Palmer was also able to draw on memories of eight months of field experience with several
Bantu ethnic groups in a rural community in Kenya in 1969 and extensive reading in Bantu
ethnographies in preparation for that work.
. In spite of the earlier date of publication, Rader’s paper was published after the Palmer and
Arin paper.
References
Basso, Keith (1990). Western Apache Language and Culture: Essays in Linguistic Anthropology.
Tucson: University of Arizona Press.
Bernárdez, Enrique (n.d.). Categorization through phonetic symbolism: Radial categories based
on the clicks in the San languages. Unpublished Ms. in possession of the author.
Blust, Robert (2002). Notes on the history of ‘focus’ in Austronesian languages. In Fay Wouk &
Malcolm Ross (Eds.), The History and Typology of Western Austronesian Voice Systems (pp.
63–78). Canberra: Pacific Linguistics, Research School of Pacific and Asian Studies, The
Australian National University.

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.30 (1709-1840)
Gary Palmer
Clark, Herbert (1996). Using Language. Cambridge: Cambridge University Press.
Contini-Morava, Ellen (1994). Noun Classification in Swahili. Publications of the Institute for
Advanced Technology in the Humanities, University of Virginia. Research Reports, Second
Series.
Cooreman, Ann, Barbara Fox, & Talmy Givón (1984). The discourse definition of ergativity.
Studies in Language, 8, 1–34.
Creider, Chet (1975). The semantic system of noun classes in Proto-Bantu. Anthropological
Linguistics, 17, 127–138.
Dixon, R. M. W. (1982). Where Have All the Adjectives Gone? Berlin: Walter de Gruyter.
Duranti, Alessandro (1994). From Grammar to Politics. Linguistic Anthropology in a Western
Samoan Village. Berkeley: University of California Press.
Feld, Steven (1982). Sound and Sentiment: Birds, Weeping, Poetics, and Song in Kaluli Expression
(2nd ed.). Philadelphia: University of Pennsylvania Press.
Fortune, G. (1955). An Analytical Grammar of SHONA. London: Longmans, Green and
Company.
Friedrich, Paul (1979). Language, Context, and the Imagination: Essays by Paul Friedrich.
Stanford: Stanford University Press.
Guthrie, Malcolm (1967). Comparative Bantu: An Introduction to the Comparative Linguistics
and Prehistory of the Bantu Languages. Amersham, England: Gregg Press, LTD.
Hannan, M. (1984). Standard Shona Dictionary. Revised. Harare, Zimbabwe: The College Press.
Hutchins, Edwin (1996). Cognition in the Wild. MIT Press.
Kochman, Thomas (1981). Black and White Styles in Conflict. Chicago: University of Chicago
Press.
Kuipers, Joel C. (1998). Language, Identity, and Marginality in Indonesia: The Changing Nature
of Ritual Speech on the Island of Sumba. Cambridge: C.U.P.
Lakoff, George (1987). Women, Fire, and Dangerous Things: What Categories Reveal about the
Mind. Chicago: University of Chicago Press.
Lakoff, George (1988). Cognitive semantics. In Umberto Eco, Marco Santambrogio, & Patrizia
Violi (Eds.), Meaning and Mental Representations (pp. 119–154). Bloomington and
Indianapolis: Indiana University Press.
Langacker, Ronald (1987). Foundations of Cognitive Linguistics, Vol. 1: Theoretical Prerequisites.
Stanford: Stanford University Press.
Langacker, Ronald (1990). Subjectification. Cognitive Linguistics, 1, 5–38.
Langacker, Ronald (1991a). Foundations of Cognitive Linguistics, Vol. 2: Descriptive Application.
Stanford: Stanford University Press.
Langacker, Ronald (1991b). Concept, Image, and Symbol. Berlin/New York: Mouton de Gruyter.
Langacker, Ronald (1999). Assessing the cognitive linguistic enterprise. In Theo Janssen & Gisela
Redeker (Eds.), Cognitive Linguistics: Foundations, Scope, and Methodology (pp. 13–59).
Berlin and New York: Mouton de Gruyter.
Langacker, Ronald (2001). Discourse in cognitive grammar. Cognitive Linguistics, 12(2), 143–
188.
Leakey, Louis S. B. (1955). First Lessons in Kikuyu. Nairobi: The Eagle Press.
Mylne, Tom (1995). Grammatical category and world view: Western colonization of the Dyirbal
language. Cognitive Linguistics, 6(4), 379–404.
Palmer, Gary (1996). Toward a Theory of Cultural Linguistics. Austin: University of Texas Press.
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.31 (1840-1913)
When does cognitive linguistics become cultural?
Palmer, Gary (2000). Review of Anna Wierzbicka, Understanding Cultures Through Their Key
Words: English, Russian, Polish, German, and Japanese (New York: Oxford University Press,
1997) and Semantics: Primes and Universals. (New York: Oxford University Press, 1996).
Journal of Linguistic Anthropology, 10, 279–284.
Palmer, Gary (2003). Metonymy and polysemy in the Tagalog voicing prefix PAG-. In Gene
Casad & Gary B. Palmer (Eds.), Cognitive Linguistics and Non-Indo-European languages (pp.
193–222). Berlin: Mouton de Gruyter.
Palmer, Gary & Dorothea Neal Arin (1999). The domain of ancestral spirits in Bantu
Noun Classification. In Masako Hiraga, Chris Sinha, & Sherman Wilcox (Eds.), Cultural
Typological and Psycholinguistic Issues: Selected Papers of the Bi-annual ICLA Meeting in
Alburquerque, July 1995 (pp. 25–45). Amsterdam: John Benjamins.
Palmer, Gary & Claudia Woodman (1999). Ontological Classifiers as Polycentric Categories, as
Seen in Shona Class 3 Nouns. In Martin Puetz & Marjolijn Verspoor (Eds.), Explorations in
Linguistic Relativity (pp. 225–249). Amsterdam and Philadelphia: John Benjamins.
Rader, Russell (1998). Life and land-ownership: the autochthonous nature of Shona noun class
5 and 6. California Anthropologist, 25, 8–17.
Scollon, Ron & Suzanne Wong Scollon (1995). Intercultural Communication: A Discourse
Approach. Cambridge/Oxford: Blackwell Publishers, Inc.
Silverstein, Michael (1976). Shifters, linguistic categories, and cultural description. In Keith
Basso & Henry Selby (Eds.), Meaning in Anthropology (pp. 11–55). Albuquerque: University
of New Mexico Press.
Spitulnik, Debra A. (1987). Semantic Superstructuring and Infrastructuring: Nominal Class
Struggle in ChiBemba. Bloomington, Indiana: Indiana University Linguistics Club.
Spitulnik, Debra A. (1989). Levels of Semantic Structuring in Bantu Noun Classification. In
R. Botne & P. Newman (Eds.), Current Approaches to African Linguistics, Volume 5 (pp.
207–220). Dordrecht, The Netherlands: Foris.
Stubbs, Michael (1983). Discourse Analysis: The Sociolinguistic Analysis of Natural Language.
Chicago: University of Chicago Press.
Talmy, Leonard (1988). Force dynamics in language and cognition. Cognitive Science, 12, 49–100.
Wierzbicka, Anna (1996). Semantics: Primes and Universals. New York: Oxford University Press.
Wierzbicka, Anna (1997). Understanding Cultures Through Their Key Words: English, Russian,
Polish, German, and Japanese. New York: Oxford University Press.

JB[v.20020404] Prn:9/02/2006; 8:57

F: HCP1502.tex / p.32 (1913-2056)
Gary Palmer
Appendix
Table 2. The structure of a polycentric category: Shona class 3/4a
(1) Multiple Central Models: A class may be governed by one, two, or more salient cultural
models and/or scenarios that are different from those governing other classes. The central
models of Shona class 3/4 are:
The spirits of ancestral chiefs bring rain, thunder, and lightning.
People pray to the ancestors.
Grain is pounded daily with a mortar and pestle.
Doctors cure with herbal medicines that are ground in a mortar and pestle.
Trees, shrubs, and herbs are associated with coolness, moisture, and medicine.
(2) Multiple Prototypes: A central model may be sufficiently complex to offer more than one
prototype concept. For example, trees provide large poles and sticks, shrubs provide small
poles and sticks. All provide medicinal leaves and fruits. The term for tree, muti, also means
‘medicine.’ Any of these items may serve as prototypes.
The scenario of pounding grain with the pestle and mortar presents pounding, grinding,
crushing, and grain as salient elements from which abstractions and extensions can be
derived. The grain itself assumes the form of piles of grain, piles of finely ground meal,
and scattered grains. These provide additional prototypes for spatial distribution of dry
granular or powdery solids.
The ancestral scenarios of curing and rain-making offer component scenarios of propitiation of ancestors and grinding and giving of medicines. They also offer physical models
of cool liquids. Lexemes for all these elements appear in Shona class 3/4. Examples: muhwi
‘pestle’, musi ‘pestle’, mutsi ‘pestle’, muti ‘tree, medicine’, mudzukwa ‘tall, straight object
(e.g. tree; skyscraper)’, mudzvurwa, mutwiwa ‘meal ground in duri (mortar)’, muchaka
‘meal from green mealies’, muchinjwa ‘mealie meal ground by engine-driven grinding mill’,
mubvau ‘young, green mealie’, mudede ‘green mealies’, muguri ‘mealie cob (with the grains
on it)’, munyuchu ‘mealie-rice’, mubukirwa ‘green maize cob’, mudakunanzva ‘sweet-tasting
liquid’, mudzamba ‘porridge made with milk as the liquid’ mujururu ‘any liquid thinner
than it should be’, muchenga muchenga ‘abundance of grain’, muchenganherera ‘general
rain <-chenga’, munakamwe ‘springtime (beginning of rainy season)’, mutsatsatire ‘gusty
rain’, muzhandwa ‘crops, animals or people struck down in large numbers. <-zhanda;
act of crushing (e.g. as heavy object does when it falls)’, muchito ‘sound of footsteps,
hoofbeats, etc.’.
(3) Chaining of central models by metonymy: The themes that provide the backbone of a class
are closely related, not by similarity, but by function or metonymy. For example, the pestle,
a kind of stick or pole, provides the conceptual link from the originating model of trees,
shrubs, and herbs to the scenario of pounding grain with a pestle. Medicines for curing
are made from plant leaves and bark. One cures with herbal medicines, but also by appeal
to ancestors who bring the rain associated with cool, moist forests and good plant cover.
Examples: mukwerera ‘ceremony to pray for rain’, munamato ‘prayer (act of praying; words
of prayer)’, musumo ‘small pot of beer offered to husband to notify him that beer has been
prepared and is now ready; amount of any prepared food or drink brought to head of
family so that he may say the polite words of welcome to a guest; opening words of prayer
to mudzimu [ancestor]’, mukwerera ‘ceremony to pray for rain’.
JB[v.20020404] Prn:9/02/2006; 8:57
F: HCP1502.tex / p.33 (2056-2056)
When does cognitive linguistics become cultural?
Table 2. (continued)
(4) Radial Categories: Non-central terms are linked and chained to central members by
metonymy and metaphor. For example, witchcraft, which appears in this class, is a kind
of pounding and crushing. Examples: muzhandwa ‘crops, animals or people struck down
in large numbers [as by sickness]; act of crushing (e.g. as heavy object does when it falls)’,
mupfuku ‘trampled grain or grass, peaceful place, case of witchcraft, fee for such a case’,
muchapo ‘paddle, medicine for killing witches’, mushinhiriro ‘spell; act of bewitching’.
(5) Primary Schematization: Spatial and temporal schemas may be abstracted from any substantive concept. The pole or stick provides the abstraction of a solid cylinder or extended
solid object. From pounding of the pestle it is an easy step to repetition, and to duration of
time. Examples: mudhadhadha ‘long object (e.g. low building, letter to someone); cursive
writing’, mugavhanyu-gavhanyu ‘repetition of an action without interruption’, muchimbo
‘index finger. <-chimba’, mudhidhi ‘penis (polite expr)’, mutambwi ‘time since’, musanya
‘period of time (gen the present)’, mukore ‘era, period of history’.
(6) Secondary Schematization and Extension: Spatial schemas are subject to various abstractions and extensions. The end-point transformation of an extended spatial object or time
is a common extension, yielding ends of paths, beginnings, last times, and worn-out objects. Examples: muvambo ‘commencement, action of beginning’, mutangiro ‘beginning,
way of beginning’, mugumo ‘end (of action, extent, etc.)’, mufika ‘tapered end of axe or hoe
blade’, mugumegume ‘last time, occasion, etc.’, mudemo ‘useless, worn-out axe’.
(7) Extension of concepts to human behavior. The schema of repetition is extended to repetitive behaviors, mostly bad habits and propensities. Spatial and physical are extended. For
example, in Shona, theft is a narrow passage between two objects. Language is a metaphorical scattering, the feathers of a moulting bird. Examples: mubo ‘way of stealing’, mukoto
‘narrow passage between two objects, pass, act of stealing something in order to sell it,
object stolen in order to be sold, act of stealing’, mutauro ‘language, discussion of a misdemeanour gen leading to legal case’ < tau ‘speak, molt’, mubwereketero ‘way of speaking’,
mukafamwera ‘foolish, thoughtless way of speaking’, mukanya ‘peremptory, emphatic way
of speaking’, muririro ‘call; characteristic cry or way of speaking’.
a
This table is based on the framework presented in Palmer and Woodman (1999). Principle (7)
from that listing has been subsumed into principle (6). All examples are from Hannan (1984).

JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.1 (47-119)
chapter 
Purple persuasion
Deliberative rhetoric and conceptual blending
Seana Coulson and Todd Oakley
University of California San Diego / Case Western Reserve University
Conceptual blending, or conceptual integration, is a set of general cognitive
processes used to combine conceptual structure in mental spaces. We analyze
how speakers exploit these blending processes in two examples of persuasive
discourse: one a widely distributed email message urging recipients to vote for
Democratic candidates in the 1998 U.S. congressional election; the other, a
solicitation for monetary donations from the St. Matthew’s Church Ministry.
Both examples use discourse to prompt very specific actions in the world. We
show here how blending theory accounts for the mental operations necessary for
readers to metamorphose into activists.
Keywords: conceptual blending, conceptual integration, mental spaces,
discourse, usage based data
.
Introduction
Flipping through a magazine, you come across a photograph of a martini glass
against a blue satin background. The glass contains a clear liquid, an olive, and a
car key in place of the swizzle stick. The caption reads, “Killer Cocktail”, and the
message is clear. Though there is no explicit mention of either drinking or driving,
this bizarre picture functions as a powerful argument against the combination of
the two activities. Apparently, the picture of the martini is enough to activate the
concept of drinking, the car key is sufficient to activate the concept of driving, and
the array of image and caption serves to activate background knowledge about the
dangers of drinking and driving.
Comprehension of this simple public service message results largely from the
processes of conceptual blending: a set of general cognitive processes used to combine conceptual structure in mental spaces (Fauconnier & Turner 1998). Mental
spaces are very partial representations of the entities and relations of a particular scenario as perceived, imagined, remembered, or otherwise understood by a
JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.2 (119-163)
Seana Coulson and Todd Oakley
speaker (Fauconnier 1994). Blending takes place in a conceptual integration network, an array of mental spaces that typically includes at least two input spaces
and a blended space. Input spaces represent information from discrete cognitive
domains, and the blended space contains structure from both inputs, as well as its
own emergent structure. For example, in the killer cocktail blend, one input includes conceptual structure related to drinking alcoholic beverages, and the other
input includes conceptual structure related to driving automobiles. The blended
space gets partial projections from both inputs and can develop emergent structure of its own. The human agent behaves in such a way that the act of drinking
alcoholic beverages impinges on the act of driving a car.
Emergent structure arises out of the imaginative processes of blending. The
first process is called composition, and involves the juxtaposition of information
from different spaces, as in conjunction and role-filling. For example, in the killer
cocktail blend, an element from the driving domain (the car key) has been composed with structure from the cocktail domain, such that it fills the swizzle stick
role. Completion, as in pattern completion, occurs when part of a cognitive model
is activated and results in the activation of the rest of the frame. In the killer cocktail blend, the martini frame activated by the picture is completed with a frame
for drinking alcoholic beverages. Similarly, the car key leads to the activation of a
frame for driving. Finally, elaboration is an extended version of completion that
results from mental simulation, or various sorts of physical and social interaction
with the world as construed with blended concepts. In this example, simulating
the possible unfortunate effects of drunk driving constitutes the elaboration of the
blend. We shall argue that acts of deliberation depend on this elaboration process.
Below we analyze how blending is recruited in two examples of persuasive
discourse: one a widely distributed email message urging recipients to vote for
Democratic candidates in the 1998 U.S. congressional election; the other, a solicitation for monetary donations from the St. Matthew’s Church Ministry. Both
examples use discourse to prompt very specific actions in the world. We show here
how blending theory accounts for the mental operations necessary for readers to
metamorphose into activists.
. Voting
This section addresses blending in an email message sent from documentary filmmaker and political activist, Michael Moore, to left-wing, third-party American
voters like Greens, Communists, and Socialists. The letter, dated October 8, 1998,
urges its recipients to vote the Democratic ticket in the November 1998 midterm
elections. Because the intended audience is unlikely to vote for Democratic candidates (and, indeed, in many cases, unlikely to vote at all), Moore’s letter is aimed
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.3 (163-222)
Purple persuasion
at reconstruing the act of voting so that it is more consistent with the values and
goals of political progressives. He does so by framing the act of voting as a “legal
act of civil disobedience”, and, relatedly, as “sending Congress a message” to cease
impeachment proceedings against U.S. President Bill Clinton.
Moore begins his letter with the following proposal:
Dear Friends. . . Ok, I’ve had it. The right wing is trying to overturn a national
election because. . . they didn’t like the results!
This must be stopped. I would like to propose a legal act of civil disobedience
that could send the Right into near oblivion.
With this Moore introduces the oxymoronic concept of a legal act of civil disobedience, prompting the reader to wonder both about what a legal act of civil
disobedience might be, as well as what particular action Moore has in mind. Only
later do we learn:
The act of civil disobedience I am calling for is for each and every American
to go to the polls on November 3 and vote for the Democratic candidate for
Congress on your ballot.
However, Moore does not advocate voting for Democrats because he supports
their policies. Rather, he opposes the policies of their chief political adversaries,
the Republicans. Consequently, Moore’s first rhetorical goal is to counter the default interpretation of the act he advocates. Because voting Democrat usually
signals support for Democratic policies, Moore makes several remarks that serve
to distance himself from the Democrats. For example, Moore writes: “I am not
a member of the Democratic party”; “To me they are a barely tolerable version
of the Republicans”; “I did not vote for Clinton in 1996”; and even, “Yes, most
Democrats suck”.
Here, as in many places in the letter, Moore’s rhetoric is meant to appeal to
the values and goals of his target audience. In particular, he is forced to contend with the implicit tension in being a participant in third-party politics while
advocating a particular political action that inherently acknowledges its impotence in current American politics. By recruiting conceptual blending processes,
Moore invites readers to construct models which allow them to maintain these incompatible goals. Below we analyze five distinct instances of blending that shape
Moore’s argument.
Palatable candidates
For example, Moore begins his discussion of the 1996 Presidential election by bemoaning the absence of viable progressive candidates on the ballot. Recounting
how he himself voted for Clinton in 1992, but not in 1996, Moore cites a list of

JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.4 (222-279)
Seana Coulson and Todd Oakley
Clinton’s policies that signalled an abandonment of liberal ideals. Nonetheless,
Moore argues, Clinton was elected in a fair and democratic election and should be
permitted to serve as President of the United States for the remainder of his second term. With the following excerpt, Moore presents his readers with a blend that
acknowledges both the limited choice in American politics, and Clinton’s status
as the legitimate winner of the election. Capitalizing on the entrenched mapping
between ideas and food (see also Lakoff & Johnson 1980) Moore writes:
. . . the majority who could stomach that pathetic choice on the ballot went
and voted for Bill Clinton.
One input, perhaps structured by a model of ordering food in a restaurant, involves a scenario in which the agent imagines the palatability of menu items and
makes her decision on this basis. The other input contains a model of voting in
which citizens evaluate the political platforms of candidates on the ballot. In the
blend, we are invited to imagine citizens evaluating the ballot in the way one might
evaluate a menu, such that candidates are chosen based on how tasty their ideas
are. On this construal, people who don’t vote correspond to people who will not
eat in a particular restaurant because they don’t like the menu.
However, note that in the restaurant case, the diner doesn’t typically know the
details of the menu until after he has been seated. But, because the contents of the
ballot are widely publicized ahead of time, people like Moore can actually avoid
the polling booth if they don’t like the list of candidates. So, rather than relying on
prototypical domain knowledge, the stomach blend recruits a slightly less prototypical model, which better matches the topic input. The restaurant space is thus
structured by a model in which both the contents of the menu and the taste of
the food are so well-known that people might well use this knowledge to choose
whether or not to dine there. In America, the menu at a place like Denny’s or
McDonald’s might serve as a potential counterpart for the ballot in Moore’s blend.
As noted above, this blend capitalizes on entrenched mappings between ideas
and food, exemplified in sentences such as “I devour books”, and “She won’t swallow your proposal”. Indeed, the use of the verb “stomach” to refer to tolerance
for unpleasant things is entrenched enough to be listed in many dictionaries. As
argued in Coulson and Oakley (2005), conceptual blending is often involved in
conventional metaphoric expressions, although the mappings are not elaborated
in the same way they are in more spectacular blends (such as the “Killer Cocktail”
blend discussed in the introduction). For both novel and entrenched metaphors,
conceptual structure from both input domains is activated as well as the structure
in the blended space. But, because entrenchment often leads to automaticity, the
mappings in conventional metaphors are established via an automatic process of
retrieval rather than via analogical reasoning, and the “emergent” inferences can
simply be retrieved rather than being actively computed.
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.5 (279-325)
Purple persuasion
Depending on their linguistic experience, readers differ in the extent to which
they utilize retrieval over more effortful interpretive strategies, and differ in their
awareness of the different domains activated by a blend such as the palatable candidates one discussed here. The domain of food consumption implicitly evoked by
the verb “stomach” is available for integration with concepts from the domain of
political choice evoked by “ballot” and “voted”. While the food domain is likely to
be more salient for some speakers than others, the visceral sense of being nauseated
by the candidates is what makes this text potentially compelling. The rhetorical efficacy of the text, then, depends in part in the reader’s willingness to construct
the blend.
Stinky candidates
In suggesting that readers “hold their nose” while voting, Moore again evokes the
unpalatable candidate blend while simultaneously signalling his sympathy with
third party politics. He writes:
If you want Congress to stop this witch hunt, if you want Congress to start
focussing on the real problems facing this country and the world . . . get out
and vote November 3. Hold your nose if you have to.
Since the writer and his audience dislike the policies of Democrats as well as Republicans, Moore must frame the act of voting with the proper “attitude”. Thus
Moore’s ‘hold your nose while voting’ blend is aimed at describing the manner of
the proscribed action.
The inputs to this blend include voting, and holding one’s nose while acting.
The act of voting entails going to a designated space and making a choice among
several candidates. Holding one’s nose while acting calls up a different frame, that
of completing an unpleasant task. Consistent with the unpalatable candidate blend
discussed above, one might hold one’s nose while eating something that tastes bad.
Similarly, one might hold one’s nose while doing a task that involves a foul stench,
such as changing a diaper, cleaning a toilet, or taking out the trash. Composing
voting and holding one’s nose results in framing the act of voting as an unpleasant
but necessary chore, much like some of the tasks mentioned above. Moreover, entrenched meaning of the ‘stinks’ metaphor, allows speakers to understand the text
as acknowledging the limited political options available to progressive voters.
The distinct nature of these acts emerges when one considers that the ‘holding
your nose while voting’ blend produces inferences not usually attributable to either
voting proper or to unpleasant stench-ridden tasks. In voting, one makes a choice
among several possibilities, some more desirable than others. By contrast, if one’s
task is to change a baby’s diaper, one does not normally go into a room and make
a choice about whose diaper to change. Nor does one choose between the lesser of

JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.6 (325-385)
Seana Coulson and Todd Oakley
two stinky diapers. In the blend, however, the voter is performing an unpleasant
task in a stench-ridden environment, and that task is to choose the thing that stinks
the least. Thus the voter should choose Democratic candidates because they stink
less than the Republican candidates.
Public conversation
After noting that Bill Clinton won the 1996 Presidential election, Moore continues:
That was the will of the people. And that is the will the Republicans are trying
to subvert.
In the passage above (which precedes the actual proposal), Moore frames his as
yet undefined act of civil disobedience as preventing the Republicans (construed
as a unified entity) from subverting the will of the people (also construed as unified). Thus Moore advocates neither Democratic congressional candidates, nor
their party leader President Clinton. Rather, he advocates the “will of the people”.
Though he hasn’t yet revealed how the Republicans are trying to subvert the will
of the people, we know that it has to do with Clinton being elected President in a
fair and democratic election, and that the Republicans did not like the results.
Immediately after his discussion of Clinton’s (re)election in 1996, Moore
moves to the related, but non-identical, issue of impeachment proceedings:
All the public opinion polls – New York Times, Wall Street Journal, CNN –
have said the same thing over and over: The American public does not want
impeachment. Yet, Congress has decided to tell the public to take a flying
%$#@& and has moved ahead with the impeachment process anyway.
Although it is easy to construe impeachment as tantamount to overturning an
election, each is a distinct concept. Strictly speaking, impeachment involves accusing a public official of high crimes; and while this may result in removing the
accused official from office, it need not. Overturning an election, on the other
hand, usually occurs when there is evidence that the voting process was unfair.
But, because both can result in removal of an official from office, it is easy to set
up cross-space mappings between the two concepts. Moore’s task is also supported
by models set up earlier in the letter: because Clinton’s 1996 election has been
construed as the will of the people, impeachment (and removal from office) is
subverting that will.
Thus Moore relies on conceptual integration to construct a simplified model
of the relationship between electoral politics, political ideology, and the impeachment proceedings against Bill Clinton. First, public opinion polls are personified in
a metonymic way so that the American public can speak with one voice. For example, the reader is invited to blend the results of various opinion polls (NYT, WSJ,
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.7 (385-441)
Purple persuasion
CNN) with statements uttered by individual citizens. In the larger picture, the
story of a conversation between individual people, or representatives of different
groups, is being blended with the more abstract communication (or miscommunication) between politicians and citizens.
Moore’s blend exemplifies a key phenomenon in conceptual integration theory: compression to achieve human scale. Compression is a tendency for objects
from multiple related spaces to be represented in a single blended space (Fauconnier & Turner 2002). For example, the same person can be viewed in different
stages of his life, as in a cartoon where the former basketball star Michael Jordan
plays a game against himself at an earlier stage in his career (see Coulson 2003).
Fauconnier and Turner discuss many different sorts of compression, and note that
this phenomenon often allows us to represent abstract concepts with more familiar
frames. In Moore’s example, the opinions of many different people in the opinion
poll are mapped onto a single person in the blend so as to facilitate the application
of the “human scale” conversation frame in the blended space.
For the most part, Moore’s blends are quite standard: the construal of polls
as the voice of the people, election results as the will of the people, and Clinton’s impeachment as the subversion of the will of the people were all publicly
available at the time he composed the letter. However, his description of Congress
members telling their constituents to “take a flying %$#@&” represents a novel extension. There is, of course, no actual town meeting in which Congress members
hurl expletives at their constituents. Rather, Moore prompts the reader to construe
two independent sets of occurrences – one involving the release of opinion polls
which reveal public opposition to impeachment; and the other, the decision by
the House Judiciary Committee to proceed with impeachment – as an integrated
event scenario. The compression here is used to construct a conversational frame
with potential motivational properties.
Moore’s blend has desirable rhetorical characteristics from both a cognitive
and an affective standpoint. Cognitively, the event integration simplifies reasoning about a complex series of events. Moreover, the integration of the construal
of the political process with that of an interpersonal argument invites the reader
to complete the blend with knowledge from her own argumentative experiences.
Because Congress has already proceeded against the will of the public, Congress
maps onto the winner of the argument, and the reader (who also corresponds to
the public) maps onto the loser. If the reader truly integrates knowledge about the
political process with her own personal experience with losing arguments, it can
evoke the sorts of emotions that accompany the latter. This, in turn, helps motivate
the revenge frames that support Moore’s ultimate call to action.

JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.8 (441-498)
Seana Coulson and Todd Oakley
Sending a message
Having framed the political act of impeachment as a defiant act of disobedience on
the part of Congress, Moore invokes a salient counterfactual in which the House
Judiciary Committee behaves in a manner more consistent with the ‘message’ in
the polls. In fact, Moore later draws on this scenario in his attempts to convince
people to vote. Voting is framed as a poll that Congress will listen to. He writes:
The act of civil disobedience I am calling for is for each and every American
to go to the polls on November 3 and vote for the Democratic candidate for
Congress on your ballot. That’s right, my fellow cynics and progressives – the
only way to send a true message to the right wing is to throw every Republican
out of office.
Here he capitalizes on a mapping between polling and voting. In both models, individual members of the public express their opinions and the results are
tabulated in order to express collective opinion. And, while both influence the political sphere of events, only voting has explicit political consequences. Winning
an election is constitutive of assuming a political role in a way that favourable poll
results are not.
Moore elaborates on the public conversation blend by scripting what the citizenry should “say” in reply to Congress’ recent actions, thus framing voting for
Democrats as the citizenry’s turn in conversation:
Imagine if the Democrats are voted in by overwhelming numbers (when all
the pundits are predicting a Republican landslide). The message would be
loud and clear to all these new Democrats – the american public wants
the agenda of the (so-called) christian right removed from the halls
of our united states congress!
Here Moore describes the message as being “loud and clear”, adjectives appropriate
for verbal communication, but not for the abstract information presumably conveyed by the results of an election. Their use here is licensed by a conceptual blend
between voting and speaking. Pascual (2002) suggests that due to the centrality
of talk in human social life, many situations that involve information exchange –
from perception to abstract instances of communication – are metaphorically construed as verbal communication (see also Turner 2002). A phenomenon called
fictive interaction, Pascual shows how this blend is common in rhetorical situations
that occur in the courtroom. As noted in our discussion of the Public Conversation
blend, fictive interaction can be seen as an attempt to construe abstract situations
with more motivating “human scale” frames.
Moore’s blend between voting and speaking is facilitated by their shared frame
structure as communicative acts. In the conceptual integration network, these
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.9 (498-549)
Purple persuasion
commonalities are represented in a generic space that contains a communicating agent, a communicative action, a message, and a recipient. In the blend, voting
sends a message, which (unlike the vote in the politics space) is audible. Interestingly, the number of votes maps onto the loudness of the reply in an adversarial
conversation. Moreover, as in a conversation, the louder the message the more conviction we attribute to the speaker. Moore suggests that if enough readers follow his
advice, the message will be so forceful as to end the public debate. Framed this way,
Moore can assert another consequence of speaking: the end of the right-wing’s
political agenda.
Interpretation is supported by the configuration of mental spaces needed to
represent the complex conditional in this excerpt. Besides embedding the counterfactual Democratic landslide in a scenario that includes the prediction of a
Republican landslide, the excerpt above sets up two sorts of contingencies dubbed
content-level and epistemic-level by Sweetser (1990, 1996). At the content level,
the antecedent is the Democrats being elected (in the case where pundits predict
a Republican landslide), and is (in some sense) causally related to the consequent
space where the message is clear. At the epistemic level, the antecedent remains
the same, and the epistemic consequent is that people oppose the Republicans.
Thus the election of Democrats licenses the inference that voters oppose the Republicans.
Interestingly, given the structure Moore has set up, a Democratic victory will
be interpreted quite differently from a Republican victory. Because votes are generally interpreted as an endorsement of the elected candidates’ policies, a Republican
victory would presumably be interpreted as support for the right-wing agenda.
However, by this point, Moore has clearly framed voting for a Democrat as voting against the right-wing policies embodied by Republican candidates. Indeed,
Moore goes even so far as to propose that the act of voting for a Democrat is an
act of civil disobedience.
Legal act of civil disobedience
In many ways, Moore’s portrayal of voting as an act of civil disobedience is the
most striking aspect of the piece. Civil disobedience, by its very definition, involves the violation of the law. In contrast, voting is not only legal, but strongly
encouraged by law. However, by recruiting peripheral aspects of structure from the
concept of civil disobedience, and blending it with structure in his own ‘sending a
message’ blend, Moore directs his readers to integrate two concepts that appear to
be contradictory.
First, Moore relies on the fact that the concept of civil disobedience is itself a
blend between spaces which detail two different components of law: the moral justification for law; and the workings of the law. In the former space, which we might

JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.10 (549-604)
Seana Coulson and Todd Oakley
call the Spirit of the Law, is a construal of the law as being enacted to promote the
common good. In the latter space, which we might dub the Letter of the Law, an
act of disobedience is defined as an act that violates the law. The blended space
composes the act of disobedience with the justification for law. Civil disobedience
is thus an act that violates the law to promote the common good. Elaborating this
blend produces the inference that the law in question is unjust, and that acts of
civil disobedience are meant to bring public attention to the unjustness of the act.
Further, just as acts of civil disobedience are aimed at sending a message that
the law is unjust and should be repealed, Moore suggests that his proscribed action is aimed at sending the message that the impeachment proceedings (and,
indeed, right-wing policies more generally construed) are unjust and should be
stopped. Thus Moore’s legal act of civil disobedience represents a keying of emergent structure in the more standard concept of civil disobedience. In short, what
is a violation of the law in the civil disobedience space corresponds to a violation of a general principle not to vote for either Democrats or Republicans in the
progressive politics space.
In this way, the legal act of voting has been construed as an act of civil disobedience in the blend. Rather than doing something illegal for the greater good, Moore
suggests his readers do something politically distasteful. Further, by capitalizing
on the parallels he has set up between disobeying an unjust law and signalling disagreement with unjust Republican policies, Moore is able to appeal to an ethic –
that of civil disobedience – that is likely to arouse a sympathetic response in his
target audience of disgruntled progressives.
Summary
This section has shown how blending can be used to compress and combine a
number of simplified models in order to form integrated event scenarios. Among
other things, Moore’s blends frame voting as speaking in a larger political argument, voting as an unpleasant but necessary task, and voting as a form of protest.
As discussed above, the correspondences between domains are animated in the
blend to produce emergent structure. Although analyzable, it is their emergence
as blends that make them potentially persuasive. Thus the success or failure of
Moore’s letter does not simply depend on being able to establish the appropriate
mappings – for example, understanding the intended correspondences between
personal dialogue and the political process. The mappings are necessary, but not
sufficient for persuasion. The rhetorical efficacy of the text depends on the reader’s
willingness to integrate and elaborate the models in a way that yields the desired
emergent structure and affective responses.
The result of blending in these cases is to encourage readers to construe events
with cognitive models that are both easily understood and appropriately motivat-
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.11 (604-659)
Purple persuasion
ing. Moore’s letter is a call for a particular action from readers which has been
successively framed and reframed so as to make it palatable to its intended audience. The persuasive element of the letter is not aimed at changing the reader’s
goals, but changing her construal of one particular action – that of voting for a
Democrat – so that it is consistent with presumably extant goals. These observations are consistent with other research on argumentative discourse that suggests
people attempt to exploit conceptual blending to reframe a particular scenario, but
not to restructure their opponents’ value systems (Coulson 2001).
. Purple point of contact
This section concerns an elaborate invitation to support a church group which one
of the authors actually received via the U.S. postal service. It is a very complicated
message that includes a letter, a ‘prayer page’ to send with donations, a return
envelope for the prayer page, and a purple sealed envelope bearing a message from
Jesus Christ. The letter urges its recipient to perform a number of concrete actions
in order to show her faith, and be blessed by Jesus. In particular, the reader is
instructed to:
1. Place the purple sealed envelope under his or her pillow
2. Sleep on this “purple point of contact just like the children of Israel did when
God instructed them to do so (Numbers 15: 38, 39)”
3. Mail back the prayer page with a donation to the Ministry
4. Open the purple sealed envelope to receive the “purple point of contact blessing”.
This package is a rich piece of persuasion, the success of which depends on the
reader’s willingness to construct a number of blends outlined below. In particular,
we focus on blending involved in the metaphoric construal of making a donation
as sowing a seed, and on how the reader is invited to construe her own actions
as fulfilling the purple point of contact. Analysis points to an important role for
blending in understanding commonalities between performative aspects of language and the social construction of reality. In performative language (as when
a justice of the peace pronounces a couple “man and wife”), and ritual (as when
parents in a particular Italian village carry their child up a set of stairs to ensure his
success in life), actions in one space, or domain, serve to effect changes in another
(Sweetser 1998, 2000).
However, performativity only occurs when the scenario fulfils particular
sociocultural conditions that license conceptual integration. For example, the
metaphoric significance of the act of carrying the child up the stairs is confined
to the execution of the ritual. Though entrenched connections between vertical

JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.12 (659-703)
Seana Coulson and Todd Oakley
ascent and success are always available, the everyday act of taking the child upstairs to bed is not construed as contributing to the child’s success in life. The
import of the action in the course of the ritual thus stems from its status as an
entrenched blend in which the action and the metaphor have been integrated such
that the physical actions are construed as causing metaphoric effects. In the case
of, “I now pronounce you man and wife”, the utterance is fully integrated with the
marriage frame only when it is uttered by an individual with the proper social authority (a judge, a minister, a priest, etc.), preceded by the appropriate sequence of
utterances, and, perhaps, followed by a kiss.
Similarly, in the purple point of contact letter, the solicitation succeeds only if
the reader believes that her actions of putting the purple envelope under her pillow
and mailing in the donation will result in a blessing. In the case of marriage, the
integration is licensed (largely) by the social authority of the utterer. In the present
case, the integration is licensed by the extent to which the St. Matthew’s Church
Ministry is construed as acting with the authority of God. Consequently, much of
the text of the letter is aimed at establishing the religious legitimacy of the Ministry,
framing the act of donation as an act of faith, and constructing a blend in which
the act of donation (and fulfilling the other instructions contained in the letter)
can be conceived of as causally connected to the receipt of the blessing.
Let’s have church here in your home
A number of aspects of the letter seem to be aimed at promoting the religious
authority of St. Matthew’s Church Ministry, and the construal of reading the
letter and following its instructions as religious acts. For example, the fact that
the organization (“St. Matthew’s Church Ministry”) contains the words “church”,
“ministry”, and the name of a New Testament saint, all suggest a legitimate connection to Christianity. The letter is peppered with quotations from the Bible and
accompanying citations of chapter and verse. Moreover, on the first page of the
letter we find the following invocation:
Our dearly beloved in Christ, turn to page two and let’s have church here in
your home.
The reader is thus invited to integrate her activity of reading the letter with her
conception of attending church. Normally, reading a letter (particularly a solicitation from an unknown organization) is construed as a secular, and, often private,
activity. Moreover, attending church involves leaving one’s home to go to a place of
worship with others in a public space. Aspects of each input domain are selectively
projected into the blend, so that reading the letter is construed as a religious activity, and the church service is construed as occurring in the home. The letter-church
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.13 (703-753)
Purple persuasion
blend is helped along by strategic modes of address (e.g., “our dearly beloved in
Christ”) that one might expect to hear at a religious ceremony.
In this blend, the minister does not speak to the congregation from the pulpit. Rather, the Ministry communicates with the reader via the letter. Constructing
the blend thus involves establishing cross- space mappings between the Minister
in a church and the writers of the letter (viz. the St. Matthew’s Church Ministry),
and between the members of a congregation and the reader of the letter. In turn,
completion from background knowledge about church yields inferences about the
relationship between the reader and the writers of the letter. In particular, the letter writers in the blend are construed as possessing a Minister’s knowledge and
wisdom, as well as his moral authority over his Congregation.
Testimony
One of the interesting facets of this communication is the extent to which it functions generically as a blend between an epistle and a chain letter, where the reader
is entreated to send some small amount of money to various people on a list, with
the expectation that it will lead to exponential returns when subsequent recipients
send money to the reader. In the purple point of contact letter, we learn almost
from the outset that the blessing God will give us for fulfilling the instructions in
the letter has a distinct financial component. The letter starts with the following
testimony from a woman named Priscilla:
I was a sinner and drank real heavy and had a lot on my mind. I remember
some of the scriptures that you had written to me and . . . I felt God speaking
to my heart saying, “My daughter, your sins are forgiven.” I felt so good inside, for I knew God had saved my [soul]. Rev., I haven’t drank another drop
from that day. I wrote you a letter and joined the Gold Book [Seed Harvest
Prosperity] Plan, and it seemed like heaven just opened up my life. I didn’t
have transportation, but now since I have been a member of the . . . Plan God
has really been blessing [me]. I have a new Ford and Cadillac. Not only that,
but I have never been broke.
Note that the persuasive character of this testimony depends crucially on the congruity of the reader’s worldview and that advocated by the St. Matthew’s Church
Ministry. For example, the writer presumes that the biblical faith is a part, or,
at least, a potential part of the reader’s construal of reality. In other words, the
writer presumes that the reader believes in God, as well as in the divinity of Jesus.
Rhetoricians have argued that all arguments ultimately rest on shared facts, beliefs, presumptions, and values, which they call ‘objects of agreement’ (Perelman &
Olbrechts-Tyteca 1969). If the reader does not share the presumption of religious

JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.14 (753-813)
Seana Coulson and Todd Oakley
faith, and appreciate the value of the proposed blessing, persuasion will simply
not occur.
Given these objects of agreement, Priscilla’s testimony is aimed at promoting a
conception of God as an entity willing to grant monetary favours. Moreover, readers are invited to map sister Priscilla’s speedy transformation from a poor sinner
to a prosperous disciple onto our own case – provided, of course, that we are willing to see ourselves as downtrodden sinners. In Perelmanian terms, this is also an
object of agreement, as we will not do what the letter bids unless we see ourselves
as sinners who might potentially benefit from the blessing.
The inputs to the blend involve two sets of spaces to represent the scenario
described by Priscilla, first, a troubled past, second, joining the plan, and finally,
the resolution of her problems; and, another set of spaces to represent the reader’s
own troubled present, and desired future. The blend inherits its causal structure
from the Priscilla domain, and its elements from the reader’s domain. Thus the
reader imagines herself joining the plan, and construes this act as causally mediating a transformation from her own troubled present to her own desired future.
Persuasion, then, depends on both sharing the objects of agreement that enable
the reader to believe Priscilla’s story, and the reader’s willingness to blend her own
situation with aspect of Priscilla’s.
Sowing the seed of $5, $10, or $20
The letter repeatedly appeals to a metaphoric construal of making a monetary
donation as sowing a seed. For example, towards the end of the letter proper, that
is, the part of the letter addressed to the reader (rather than the part of the letter
addressed directly to the Lord), we read:
We believe you are going to sow a seed so God can bless you with a harvest.
God said, “Give and it shall be given unto you . . .” Luke 6:38. We pray that you
will sow $5.00, $10.00, $20.00, or more. Let God lead you. Our prayer is that,
by faith, what you sow will start being returned to you before the seventh day
of next month, as God sees fit. He knows best how and when to let it begin.
Let us pray over this last page and purple sealed word. Let us bow our heads
in prayer – shall we?
[all emphasis in the original]
Broadly, sowing a seed maps onto sending a donation, and the harvest maps onto
the money that the sender receives in return. Mappings in the network are set up by
a conventional metaphoric connection between agriculture and investment, which
maps the metamorphosis of a seed into crops for harvest onto the difference between the initial investment and its return. The inputs to the seed-sowing blend
thus include one space we might call the Agriculture space, and another we might
call the Material space. The mapping between the seed and the money is cued ex-
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.15 (813-866)
Purple persuasion
plicitly by the statement, “We pray that you will sow $5.00, $10.00 . . .” in which
the object of “sow” is not a type of seed (as in the agriculture input), but a unit
of currency (that originates in the material input). Linguistic prompts also help
the reader identify the mapping between the harvest and the monetary returns, in
“Our prayer is that, by faith, what you sow will start being returned to you . . . [emphasis ours]” Since the letter reader will presumably sow money, she can expect
money to be returned to her.
The structure in the blend differs from conventional conceptions of agriculture in several ways, especially in its recruitment of structure from a third input
which we might dub the Spiritual space. For example, on the prayer page, which
the reader sends in with her donation, is written, “I am sowing [followed by a
list of potential dollar amounts] as my seed unto the Lord, in faith”. Thus unlike
real seeds, the seed of $5 is not planted in the earth; and, unlike a conventional
investment, it has not been used for its purchasing power.
The example here involves a prototypical case of conceptual integration in
which the blended concept involves partial structure from each of its inputs as
well as novel structure of its own. In the context of the blend, the $5 has some of
the properties of conventional money (it can be used to buy things) and some of
the properties of a seed (it will undergo a transformation). Further, unlike most
agricultural endeavours, the relationship between the initial sowing of the seed
and the final harvest is not mediated by farming activity. In contrast to default
knowledge about managing investments, the transformation from seed to harvest
here occurs “by faith”. Because it is a seed of faith, the coming harvest depends
on receiving a blessing from the Lord. Moreover, receiving the blessing depends in
turn on following the instructions to achieve the purple point of contact: mailing
in the donation, sleeping on the purple envelope, and opening the purple envelope
after sunset on the following day.
The purple envelope please
Inside the envelope is an image of Jesus from religious art, His hand raised in a
generic blessing gesture. At the top of the picture is a quote from the New Testament, “. . . If two of you shall agree . . . it shall be done . . .” Matthew 18:19. At the
bottom of the picture, the caption reads “Jesus, my letter is in the mail on its way
to the people of God who will pray over it for me.” But perhaps most striking, is
that this text is divided by a line drawing of a woman’s hand, holding a letter up
towards Jesus – as if for Him to bless it.
The image prompts the reader to unpack the blend (i.e., reconstitute the roles,
relations, and inferences of each input space), mapping the picture of Jesus onto
the saviour, the unidentified hand maps onto the reader, and the envelope maps
onto the one the reader presumably mailed to the St. Matthew’s Church Min-

JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.16 (866-941)
Seana Coulson and Todd Oakley
istry. Importantly, in the picture, although the reader holds the envelope in her
hand, the stamp on the envelope has already been cancelled. This suggests it is
no longer in the reader’s actual possession, but is being processed by the postal
system. Thus in one input (derived from the original piece of religious art), Jesus
issues a generic blessing with no specific target. In the other input, metonymically
evoked by the envelope with the cancelled stamp, the reader sends in her prayer
page with donation.
The information represented in the two input spaces constitute two separate
events, which need not be construed as integrated. People mail letters every day
and rarely consider the spiritual implications of such an act. Similarly, Jesus can
be construed as blessing any number of objects and actions in the world, with no
preference given to the transactions of the U.S. postal system. However, given the
background knowledge set up by excerpts such as “Lord, keep Your eyes upon this
very envelope until. . . it is returned back to this little 47 year old church ministry.
Lord, bless this dear one as they open this purple Sealed Word after sunset and
after they have mailed their prayer page back to us”, the visual image prompts
the reader to construe the disparate input spaces as a unified event structure. Jesus
blesses the prayer page as it passes through the postal system, and blesses its sender
as she opens the sealed purple envelope.
The picture epitomizes the set of actions, reinforcing the spiritual import
of her donation. It is, in fact, a rhetorical technique Aristotle termed energia
or bringing-before-the-eyes (Aristotle 1994), in which the reader witnesses in the
present all that is supposed to have occurred up to this point. Energia is an example of compression in which structure from a number of spaces that each
represent events occurring at different points in time, are integrated into a single
scene in the blended space. Moore, in his description of the argument between the
American people (as expressed in the polls) and Congress (as expressed by their
impeachment of Clinton), exploits compression in a similar way to construe a
complex scenario with a single frame that evokes emotions and other associations
consistent with his rhetorical goals.
Summary
The desired rhetorical effect of this letter depends on the existence of systematic
correspondences between the three input spaces displayed in Table 1.
Besides conventional agricultural metaphors for investment (e.g. investments
that grow), the letter authors are exploiting conventional agricultural metaphors
for spirituality (e.g. spiritual growth). The former play into the readers’ greed,
while the latter are reminiscent of the Bible and bolster the legitimacy of the St.
Matthew’s Church Ministry. The integration of these three domains results in a
scenario where the reader can satisfy her greed in a virtuous way. Thus the inputs
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.17 (941-987)
Purple persuasion
Table 1. 3-input blend
Material
Spiritual
Agriculture
Reading Letter
Mail Prayer Page
w/ $5 donation
Sleep on Purple Envelope
Receive Money
Attending Church
Make Offering
Sow Seed
Commit Act of Faith
Receive Blessing
Cultivate Seed
Reap Harvest
in this blend are being exploited not only for their inferential possibilities, but also
for their sociocultural significance.
Further, while the letter clearly establishes the mappings between sending the
money, sowing a seed, and making an offering to God, establishing the blend goes
further. Without the blend (or, at least, without some sort of a blend), there is no
way that anyone would believe that sending off $5, $10, or $20 could ever result in
a new car. Similarly, the reader will not carry around the purple envelope or sleep
on it unless she or he believes the action will have the spiritual and/or the monetary results implied in the blend. So, to reiterate, anyone who performs the actions
described in the letter will do so because they have adopted the blend where mailing $5 is sowing a faith seed, sleeping on the envelope is an act of faith, and that the
ultimate result of these actions will be a monetary blessing from God. Moreover,
the difference between someone who does and someone who does not carry out
the instructions has little to do with the mappings (presumably anyone can figure
out what one is supposed to do and why), and everything to do with integrating
and elaborating the structure in the blend until it becomes a motivating frame.
. Conclusions
Deliberative rhetoric is the primary means of getting human beings to think and
act according to the expectations of others without recourse to violent coercion.
We have suggested that, as an interpretive model capable of describing the strategic and tactical ways human beings frame situations, conceptual integration theory
provides a means of addressing this fundamental area of human cognition. Moreover, in the analyses above we have attempted to demonstrate the importance of
blending for understanding specific, attested instances of human deliberation. In
sum, deliberation recruits elaboration as blends animate mappings in a way that
makes them compelling.
Because persuasion depends crucially on objects of agreement, rhetorical
blends are aimed at promoting the perception of this agreement. Thus, Moore
does not recruit the stomach blend because of a preponderance of shared rela-

JB[v.20020404] Prn:9/02/2006; 9:07

F: HCP1503.tex / p.18 (987-1034)
Seana Coulson and Todd Oakley
tional structure in our understandings of the choice of political candidates and the
choice of what to eat for lunch. Nor does he make reference to holding one’s nose
while voting purely because of its analogical potential. These blends were recruited
because of the way they frame the topic space of American politics for a disenchanted third-party citizen. Such a citizen may discard a political letter couched in
language designed to appeal to a mainstream voter, but be willing to consider a plea
which establishes initial agreement between writer and reader that both consider
Democratic candidates to be too conservative.
Conceptual blending is used to integrate concepts with different affective valences, often so that the desired course of action is seen as consistent with the
audience’s value system. Further, compression is used to simplify complex causal
relationships, both so they can be more readily understood, and so that they can
be construed with motivational “human scale” frames. This suggests our concepts have abstract, inferential, as well as affective and motivational properties.
Moreover, neither is set in stone as speakers frequently employ conceptual blending processes to reconstrue a particular action to alter its inferential, affective,
sociocultural, and even spiritual significance.
We have also seen that the binding force of blends-we-act-on depends as much
on the ontology supported by our cultural values and practices as on the structural
correspondences between the representations in the different domains. For example, we have argued that the possibility of interpreting polling data as the voice
of the people depends on our cognitive capacity for conceptual integration. But
so, too does the possibility of construing the beliefs of the 270 odd million American citizens as the will of a unified American people depend on the existence of
polling practices, voting practices, and standard procedures for interpreting the results. Relatedly, the success of rhetorical efforts to reify a blend like sowing a faith
seed will depend in a complex way on the character of their appeal to social roles
and previously established cultural practices. While conceptual integration does
indeed account for the mental operations necessary to incite action, these examples suggest that the roots of action extend beyond the individual’s nervous system
as conceptual blends are intimately intertwined with human doings.
Acknowledgments
Seana Coulson was supported by National Research Service Award DC00355.
Thanks also to Mark Turner and Cyma Van Petten for comments on an earlier
draft of this chapter.
JB[v.20020404] Prn:9/02/2006; 9:07
F: HCP1503.tex / p.19 (1034-1112)
Purple persuasion
References
Aristotle (1994). On Rhetoric: A Theory of Civil Discourse. Book III. (Translated by George
Kennedy). Oxford and New York: Oxford University Press.
Coulson, Seana (2001). Semantic Leaps: Frame-shifting and Conceptual Blending in Meaning
Construction. Cambridge and New York: Cambridge University Press.
Coulson, Seana (2003). Reasoning and rhetoric: Conceptual blending in political and religious
rhetoric. In Elzbieta Oleksy & Barbara Lewandowska-Tomaszczyk (Eds.), Research and
Scholarship in Integration Processes (pp. 59–88). Lodz, Poland: Lodz University Press.
Coulson, Seana & Todd Oakley (2005). Blending and Coded Meaning: Literal and Figurative
Meaning in Cognitive Semantics. Journal of Pragmatics, 37, 1510–1511.
Fauconnier, Gilles (1994). Mental Spaces. Cambridge and New York: Cambridge University
Press.
Fauconnier, G. & Mark Turner (1998). Conceptual Integration Networks. Cognitive Science, 22,
133–187.
Fauconnier, G. & Mark Turner (2002). The Way We Think. New York: Basic Books.
Lakoff, G. & Mark Johnson (1980). Metaphors We Live By. Chicago: U. Chicago Press.
Pascual, Esther (2002). Imaginary Trialogues: Conceptual Blending and Fictive Interaction in
Criminal Courts. Utrecht, Netherlands: LOT.
Perelman, C. & L. Olbrechts-Tyteca (1969). The New Rhetoric: A Treatise on Argumentation.
Notre Dame & London: University of Notre Dame Press.
Sweetser, Eve (1990). From Etymology to Pragmatics. Cambridge: Cambridge U. Press.
Sweetser, Eve (1996). Spaces, Worlds, and Grammar. In Gilles Fauconnier & Eve Sweetser (Eds.,
pp. 318–333). Cambridge and New York: Cambridge University Press.
Sweetser, Eve (1998). Performativity and blended spaces. Paper presented at the 4th conference
on Conceptual Structure, Discourse, and Language, Atlanta, GA.
Sweetser, Eve (2000). Blended Spaces and Performativity. Cognitive Linguistics, 3(4), 305–334.
Turner, Mark (2002). The Cognitive Study of Art, Language and Literature. Poetics Today, 23,
9–20.

JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.1 (48-113)
chapter 
Depicting fictive motion in drawings*
Teenie Matlock
Stanford University
This chapter examines fictive motion sentences such as The road goes along the
coast. These constructions, which contain a motion verb but describe no motion,
have been argued to involve dynamic construal, whereby motion or scanning
occurs along a path or linear object (Langacker 1986; Talmy 1983, 1996). Three
experimental studies tested this idea with novel drawing tasks aimed at the
underlying conceptual structure of these constructions, especially the trajector
(e.g., road). Participants drew pictures to demonstrate their understanding of
fictive motion sentences. They drew longer trajectors when conceptualizing
fictive motion sentences versus comparable non-fictive motion sentences (e.g.,
The road is next to the coast), and longer trajectors when conceptualizing fictive
motion sentences with fast verbs (e.g., race) versus slow verbs (e.g., creep).
Together, the results suggest that fictive motion sentences include dynamic
construal as mentally simulated motion or linear extension.
Keywords: fictive motion, spatial language, motion verbs, psycholinguistics,
mental imagery
.
Introduction
Motion verbs are pervasive. Found in all languages and all levels of discourse
(Miller 1972; Miller & Johnson-Laird 1976), they are highly polysemous, affording
a range of interpretations and occurring in a wide variety of grammatical constructions. When interpreted literally, motion verbs express movement along a
trajectory, as in Bob goes down the walkway and The stray cat runs across the alley.
In such cases, the subject noun phrase referent (e.g., Bob) is animate and capable
of traveling through space. When interpreted figuratively, motion verbs often express no physical perceivable movement, as in Weekends go by fast and The tone
went from morose to ecstatic. In these cases, motion information metaphorically
maps on to relatively abstract conceptual domains, such as change and time, and
spatial information is transformed or backgrounded (see Boroditsky 2000; Lakoff
JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.2 (113-194)
Teenie Matlock
1987; Lakoff & Johnson 1980, 1999; Radden 1997, for discussion of change is
motion, time is space, and related conceptual metaphors).
Another pervasive figurative motion verb use is shown in (1a) and (1b). It
too describes a static scene, but in this case, spatial information is highlighted,
especially spatial information relating to the trajector (here, subject noun phrase).
(1) a. The road goes along the coast
b. A lake runs between the golf course and the train tracks
In (1a), the trajector (road) is close to and parallel with a landmark (coastline). In
(1b), it extends between two landmarks (golf course and train tracks). In both, the
trajector is linear, occupying a relatively long space.
Though the construction shown in (1a) and (1b) is ubiquitous in everyday
language and has received considerable attention in cognitive linguistics, its conceptual structure is not yet well understood. This goal of this chapter is to gain
a better understanding of the representation underlying these figurative uses of
motion verbs. First, I provide an overview of relevant cognitive linguistic research,
including discussion of fictive motion. Then, I discuss the results of three novel
drawing tasks designed to investigate the way these constructions are conceptualized and in turn externally represented.
A commonly held assumption among cognitive linguists is that some linguistic
forms and constructions tacitly include fictive motion, mentally simulated motion
that transpires from one part of a scene to another (see Talmy 1996, 2000).1 On
this view, upon hearing a spatial description such as The road goes along the coast
the listener “moves” along some portion of a road, and upon hearing a sentence
such as The lake runs between the golf course and the train tracks the listener “scans”
a lake. Fictive motion is thought to be analogous in some respects to real motion in
that it takes time to “go” from one imagined point in space and time to another. It
is also believed to provide language users a way to compute information about the
layout of the scene, especially the configuration of the trajector and its position
relative to other entities (Matsumoto 1996). For instance, A table runs along the
wall immediately signals that a table is adjacent to the wall and not simply in the
proximity of the wall. Fictive motion is also thought to be subjectively experienced
in that the language user enacts “motion” in the absence of an explicitly coded
animate agent (see Langacker 1986).
Fictive motion is not limited to constructions with motion verbs. It is present
in a broad range of spatial expressions, including sentences such as There’s a cottage
every now and then in the woods, evoking “movement” along a line of cottages (see
Talmy 2000), or Ed is across the room from John, which involves “scanning” from Ed
to John. Fictive motion is subsumed under virtual motion, which covers a broad
range dynamic construal, including temporal scanning, such as the “replay” of
events in the historical present (see Langacker 1999).
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.3 (194-233)
Fictive motion
Linguistic observations provide some insights into the conceptual structure of
fictive motion sentences (also referred to as FM sentences) such as (1a) and (1b).
One observation concerns tense and aspect. FM sentences often appear in the simple present tense, as shown in (1a), but not in the progressive, as exemplified in
??The road is running along the coast. Because FM sentences “already” express an
on-going situation with an implicit state change (scanning from one point on the
road to another), there is no need to make them more “on-going” by imposing
progressive aspect (see Langacker 1987, 2000). (Note that this utterance would be
fine with sufficient context. For instance, Person A asks Person B about the status
of a new road, and Person B, who works on the road crew, responds with The road
is running along the coast, highlighting the evolving, changing state of the road.) A
second observation is that temporal modifiers often occur with FM sentences, as
in The road goes along the coast for two hours. The same phrase could also indicate
how long it took to actually move along the coast, as in Bob drove along the coast
for two hours. A third observation is that directional phrases often occur with FM
sentences, as in The road goes north or The road goes left. The same phrases describe
direction of actual movement, as in The train goes north or The taxi turned left.
Such linguistic observations are informative and useful, but conducting experiments can lead to deeper insights into language representation, comprehension,
and use (see Gibbs 1991). Doing on-line experiments is one way to investigate
conceptual structure, including that of FM sentences. In one project I did a series
of decision-time experiments that tested how long it took participants to read and
make decisions about FM sentences in a variety of contexts (Matlock 2004). The
rationale was that if people simulate motion or visual scanning while attempting to
understand fictive motion language, it should be possible to manipulate that simulation by varying contextual information about motion, for instance, placing an
FM sentence in the context of a story about fast motion versus slow motion. Overall, participants were quicker to process FM sentences after reading stories about
fast travel versus slow travel, short-distance versus long-distance, and with easy
terrains versus difficult terrains. Together, the results suggested that understanding an FM sentence required participants to tap into information about the actual
motion they had read about and imagined while reading the story (for supporting
arguments, see Barsalou 1999 and Glenberg 1999). Critically, control experiments
showed that participants were no faster or slower when reading comparable spatial
descriptions that did not include fictive motion, for instance, The road is next to
the coast.
Doing experiments with drawings is another way to investigate conceptual
structure. Drawings are external representations of people’s conceptions of the
world, and they provide insights into how they conceptualize objects, states, and
actions (Tversky 1999, 2001). They can also reveal aspects of conceptual understanding that may otherwise be impossible to express in words alone. This is

JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.4 (233-283)
Teenie Matlock
evident in advertisements that use pictorial metaphor (see Forceville 1997). It is
also seen in the way illustrators draw lines trailing behind a figure or an elongated
figure to depict motion (McCloud 1993; Tversky 1999), and in the way people
use lines and arrows to specify direction and other motion information in maps
(Tversky & Lee 1998, 1999). Inferring motion from lines is so natural that even
blind individuals “see” motion in raised curved lines and draw lines to indicate
motion, for instance, lines emanating from a person (Kennedy 1997).
In what follows, I discuss three drawing studies designed to get at the conceptual structure of fictive motion sentences. If mental simulation of movement
or scanning is part of the conceptual structure of sentences with fictive motion,
then that information may be observable in the way people externally represent
salient spatial elements described by FM sentences. In particular, they may spatially extend or elongate trajectors in spatial depictions. If so, we might expect a
long narrow rectangle to represent a carpet (trajector) in the FM spatial description The carpet runs between the wall and the counter, but not necessarily in the
comparable non-FM (non-fictive motion) spatial description The carpet is between
the wall and the carpet. In all three studies, participants read a sentence that described a spatial scene, and drew an image to represent their understanding of that
sentence. In Study 1, they generated depictions of FM sentences, such as The pond
runs between the barn and the corral, and non-FM sentences, such The pond is between the barn and the corral – sentences judged as having similar meanings and as
having trajectors that may or may not be long in the world (e.g., pond). In Study
2, participants drew pictures of sentences such as The trail goes along the road and
The trail is next to the road – sentences with inherently long trajectors. In Study 3,
participants drew arrows to represent traversable trajectors in FM sentences that
featured slow, neutral, or fast manner verbs (e.g., race, go, crawl), for instance, The
frontage races through the countryside and The road crawls from one vista point to
another.
. Study 1
The goal of study 1 was to examine how people would depict sentences that did
and did not include fictive motion. Of interest was how trajectors that may or may
not be construed as long would be drawn. Would they be longer in depictions of
FM sentences than in depictions of non-FM sentences? If the trajector (hereafter,
TR) is generally longer in depictions of FM sentences than in depictions of nonFM sentences, it could suggest differences in conceptual structure due to motion
simulation or some kind of elongation or linear extension.
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.5 (283-335)
Fictive motion
Method
Participants
Fourteen UCSC undergraduates participated for credit in a psychology course. All
were native speakers of English or learned the language before the age of 7.
Stimuli and design
Stimuli included 128 English sentences. Each sentence described a spatial scene
that was (a) outdoors (e.g., farm), (b) indoors (e.g., classroom), or (c) on the human body (e.g., leg). Primary stimuli included 32 sentence-pairs. Sentences in each
pair were nearly identical. The FM sentence featured the motion verb run, and the
non-FM sentence featured the copula verb be. In addition, half the pairs featured
the prepositional phrase between X and Y (both FM and non-FM) (e.g., A birthmark runs between her ankle and knee, A birthmark is between her ankle and knee),
and the other half featured along X (FM) and next to X (non-FM) (e.g., The tattoo runs along his spine, The tattoo is next to his spine). The sentences in each pair
varied only minimally to lessen the influence of other factors.2 Sample stimuli are
shown in Appendix 1.
All sentences had subject noun phrases that referred to objects of variable
length in the real world. For instance, an object such as a table may or may not
be long (e.g., small round coffee table or long rectangular dining room table). A
norming study before the experiment ensured that experimental sentences would
include only trajectors that were conceptually “flexible” in length. Twelve UCSC
undergraduates rated 195 concrete (tangible, visible) nouns on how long they
were. To make their judgments, participants used a scale of 1 to 7, in which “1”
was “never long”, and “7” was “always long”. The list included a wide range of
items, including lake, tattoo, parking lot, and blackboard. In the end, only the items
with mean ratings in the middle range were recruited as TR’s for sentential stimuli
in the experiment (3 to 5). This was important to determining whether TR’s that
are neutral to length would be linearly extended when they appeared in depictions
of FM sentences.
Prior to the experiment it was also important to establish that the two types of
stimuli – FM sentences and non-FM sentences – would be as semantically similar
as possible. In a separate norming study, 10 UCSC undergraduates rated sentences
in every pair on semantic equivalence. Participants were told to think about the
meaning of each sentence in a pair and provide a similarity rating. Using a scale
where “1” indicated “not at all the same meaning” and “7”, “the same meaning”,
participants rated 50 pairs, including items such as A birthmark runs between her
ankle and knee and A birthmark is between her ankle and knee. Only the pairs with
mean ratings of 5 or higher were retained as stimuli for the experiment. Finally,
it was important to ensure that all sentences in the experiment were semanti-

JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.6 (335-393)
Teenie Matlock
cally sensible. Using a scale of 1 to 7, in which “1” was “makes no sense” and
“7” was “makes perfect sense,” 15 UCSC undergraduates judged all sentences as
on semantic sensibility. All sentences in this study had mean ratings of 5 or higher.
The stimuli also included 32 filler pairs of spatial sentences, such as The rocking
chair sits on the back porch and The rocking chair is on the back porch. All sentence
pairs, including fillers, were put into two lists so no participant would see both
sentences in a pair. One contained 16 FM sentences, 16 non-FM sentences, and 32
filler sentences, and the other, the remaining 16 FM sentences, 16 non-FM sentences, and 32 fillers. Sentences in each list were randomly ordered and put in a
booklet. In both booklets, each sentence appeared at the top of an otherwise blank,
vertically oriented 8.5 by 11 inch page.
Procedure
After filling out a survey about language background and visual impairments, each
participant was given a booklet and instructed to (1) read each sentence carefully,
(2) imagine what it meant, and (3) quickly sketch the image below the sentence.
The participant was told not to be overly concerned with detail because no sketch
would be analyzed on artistic merit.
Results and discussion
Only the drawings for the non-filler sentences were analyzed. Length scores were
calculated by first measuring the length and width of every TR (e.g., birthmark) in
centimeters, and then dividing length by width. (Two coders, who were blind to
the study, measured the scores here and in Study 2 and agreed 92 percent of the
time.) The length scores were averaged across all drawings for FM sentences and
non-FM sentences. Overall, TR’s were longer in depictions of FM sentences (M =
2.73) than in depictions of non-FM sentences (M = 1.84), t (12) = 4.91, p < .001.
See Appendix 2 for examples of drawings.
To see whether the overall difference in TR length was primarily driven by any
one sentence type, two additional t-tests were run. One compared only the FM
and non-FM sentences with the preposition between, yielding a reliable difference,
t (12) = 3.05, p < .01 (FM = 2.44, non-FM = 1.94). The other compared only the
FM and non-FM sentences with the preposition along/next to, showing a reliable
difference, t (12) = 5.10, p < .001 (FM = 2.99, non-FM = 1.75). Thus, the difference
in TR length was not driven by differences in prepositions.
The results suggest the TR is conceptualized differently for FM sentences than
it is for non-FM sentences, even though the two types of sentences are judged to be
highly similar in semantic content. One possibility for greater TR length in depictions of FM sentences is that people naturally simulate motion or tap into motion
information when processing fictive motion language. If so, this could encourage
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.7 (393-447)
Fictive motion
them to conceptually elongate the TR – either through scanning along it and later
representing it in static form, or through spatially extending it and building it up
over time. Another possibility, however, is that the mere presence of the motion
verb in the FM sentence led to differences in TR length.
. Study 2
The second study further investigated the conceptual structure of fictive motion
using the drawing task from Experiment 1. Here participants were given only FM
and non-FM sentences that contained inherently long TR’s (e.g., road in A road
goes along a mountain range and The road is next to the mountain range). Of interest again was how the two types of sentences would be depicted in drawings.
Specifically, would inherently long TR’s be even longer in depictions of sentences
with fictive motion?
Method
Participants
Nineteen UCSC undergraduates participated for credit in a psychology course. All
were native speakers of English or learned the language before the 7 years of age.
Stimuli and design
Primary stimuli included 16 pairs of sentences that described outdoor settings.
Each pair contained an FM sentence and a non-FM sentence. The FM sentence
featured a motion verb (go, run), and the non-FM sentence featured a copula verb
(be). The FM sentence also included the preposition along, as in A road goes along
a mountain range, and the non-FM sentence included the prepositional phrase
next to, as in A road is next to a mountain range.3 Sample stimuli are shown in
Appendix 1.
A norming study ensured all FM and non-FM were highly semantically similar. Using a scale where “1” indicated “not at all the same” and “7” indicated “the
same”, 10 UCSC undergraduates rated sentence pairs such as A sidewalk goes along
a canal and A sidewalk is next to a canal. In the end, only highly similar pairs (mean
rating of 5 or higher) were used in the study. Those same sentences had also been
rated as semantically sensible (mean rating of 5 or higher) by 21 UCSC undergraduates. They also included TR’s judged as relatively long (mean rating of 5 or
higher) in the norming study mentioned in Study 1.
All pairs of sentences, including the 16 filler pairs, were put into two booklets.
One set contained 16 FM sentences, 16 non-FM sentences, and 32 filler sentences,
and the other, the remaining 16 FM sentences, 16 non-FM sentences, and 32 filler

JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.8 (447-514)
Teenie Matlock
sentences. Sentences in both booklets were randomly ordered and each sentence
appeared at the top of an otherwise blank horizontal 8.5 by 11 inch page.
Procedure
Each participant followed the same procedure used in Experiment 1.
Results and discussion
Only the depictions of non-filler items were coded and analyzed. Length scores
were measured using the method in Study 1. Overall, people drew longer TR’s
when drawing of FM sentences (M = 10.13) than when drawing non-FM sentences
(M = 6.79), t (18) = 3.51, p < .01. See Appendix 2 for examples.
The results, consistent with those of Study 1, show differences in the way
people conceptualized the TR in understanding and drawing FM and non-FM
sentences. One explanation for longer TR’s in depictions of FM sentences is that
people simulated motion or tapped into conceptual structure about actual motion
in making sense of the sentence and forming a mental image. If so, this may have
led them to conceptually elongate the TR and draw a longer object in the picture.
Another possibility is that the motion verb alone led to longer TR’s.
. Study 3
The third study further investigated the conceptual structure of fictive motion. In
this case, a slightly different task was used, one with more attention on the trajector
and one that used only FM sentences. Participants were given FM sentences with
manner verbs that expressed varying rates of speed in their literal uses, such as race
(fast), creep (slow), and go (neutral). For each sentence, participants drew an arrow
to represent the TR (e.g., road in The road jets from one vista point to another). Of
interest was whether manner of movement alone would lead to difference in how
arrows were drawn, especially length, thickness, and crookedness. If FM sentences
include motion as part of their conceptual structure, and if this is reflected in a
spatial depiction, we would expect TR’s to be longer, thinner, and less crooked for
FM sentences with fast motion verbs, even though nothing is actually moving in
the description.
Method
Participants
Sixteen UCSC undergraduates participated for credit in a psychology course. All
were native speakers of English or had learned the language before 7 years of age.
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.9 (514-565)
Fictive motion
Stimuli and design
The stimuli included 24 FM sentences and 48 fillers that described spatial scenes.
Every FM sentence featured an underlined TR that represented a travel route (e.g.,
road, highway) and a motion verb that expressed (in its literal interpretation) a fast,
slow, or neutral travel rate. The 6 slow verbs were jog, crawl, creep, plod, meander,
and ramble. The 4 fast verbs (some used twice) were jet, fly, race, and speed, and
the neutral verb was go. Verbs were categorized on rate of speed determined by a
survey in which 18 UCSC undergraduates rated 45 action words (e.g., slide, run,
race, creep, jump) on how fast they imagined doing the action and how long the
actions took (see Matlock 2001). See Appendix 1 for stimuli.
All sentences were randomly ordered and put into a booklet. Under every sentence there was a space for drawing the arrow. The space was 2 inches high and 8.5
inches wide.
Procedure
Every participant was instructed to (1) read each sentence, (2) focus on the underlined word in the sentence, (3) quickly draw an arrow to represent it, and (4)
not erase.
Results and discussion
Three research assistants who were blind to the experimental manipulation rated
all arrows on length, crookedness, and thickness. A high degree of inter-rater reliability was obtained (95 to 98 percent). (All p-values are < .05 unless specified
otherwise.)
Length
To calibrate themselves, the coders first examined all arrows produced by a single
individual. Then they rated every arrow on how it compared to all others drawn
by that individual. A length rating of “1” specified “very short”, and a rating of
“7” specified “very long”. All scores were then averaged according to the rate of
speed expressed by the verb (fast, neutral, slow). The mean length rating for fast
verbs (FV) was 4.95, for slow verbs (SV), 4.07, and for the neutral verb (NV),
3.99. A within-subjects analysis of variance showed a main effect for verb, F(2,45) =
11.1, p < .001, suggesting that manner influenced arrow length. Closer inspection
showed a reliable difference between FV and SV, t (30) = 4.26, and between FV and
NV, t (30) = 4.04, but not between NV and SV.
Crookedness
Coders surveyed all arrows for a single participant, and later rated every arrow on
how crooked it was compared to all other arrows drawn by that individual. A rat-

JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.10 (565-610)
Teenie Matlock
ing of “1” meant “not at all crooked”, and a rating of “7” meant “very crooked”.
Average crookedness scores were 1.59 for FV, 2.37 for SV, and 2.63 for NV, respectively. A within-subjects ANOVA then revealed a main effect of verb, F(2,45) = 9.51,
p < .001, indicating that arrow crookedness was affected by the information expressed by the verb.4 Closer inspection yielded a reliable difference between FV
and SV, t (30) = 2.88, and FV and NV, t (30) = 4.8, but not between NV and SV.
Thickness
Coders first examined all arrows per individual. Then they obtained a thickness
score for every arrow by comparing it to all other arrows drawn by that individual.
A rating of “1” meant “not at all thick”, and “7” meant “very thick”. The average
ratings were 1.04 for FV, 1.2 for NV, and 1.41 for SV. A within-subjects ANOVA
showed a main effect for verb, F(2,45) = 5.65, indicating that manner information
in the verb influenced arrow thickness. Closer analysis showed a reliable difference
between FV and SV, t (30) = 2.67, and between NV and SV, t (30) = 2.19, but no
difference was observed between FV and NV.
Together, the results show that arrows that depict TR’s in FM sentences with
fast motion verbs (e.g., race) are longer, thinner, and less crooked than arrows that
are depictive of TR’s in FM sentences with slow motion verbs (e.g., creep). One
possibility is that people mentally simulated motion or tapped into motion information when thinking about and forming an image of fictive motion sentences.
This would mean that fast verbs caused people to simulate movement quickly and
slow verbs caused people to simulate movement slowly. If so, these conceptual differences could have led to differences in how drawings were executed, for instance,
slower pen stroke and shorter arrow for slow manner verbs. Another possibility
is that nothing more than type of manner that was specified in the motion verb
drove the results.
. General discussion
Three studies investigated the comprehension of sentences such as The road runs
along the coast, believed by cognitive linguists to evoke mentally simulated traversal
or scanning. Study 1 and Study 2 used free-style drawing tasks to investigate how
trajectors would be drawn in depictions of FM sentences and depictions of nonFM sentences. The results revealed that depictions of trajectors were longer for FM
sentences than for non-FM sentences even though the sentences were judged as being similar in meaning. Study 3 used a drawing task to investigate how trajectors
would be depicted by arrows in FM sentences. Of interest was whether manner information (slow, fast, or neutral verb) would influence the way arrows were drawn.
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.11 (610-669)
Fictive motion
The results showed that arrows were longer, thinner, and straighter with fast verbs
than with slow verbs.
The results of the studies reported here lend support cognitive linguists’ claims
about fictive motion and its role in the understanding of FM sentences. As shown
in Study 1, objects that are not necessarily long, such as birthmarks, are longer in
depictions of FM sentences, such as The birthmark runs between her knee to her
ankle, than in depictions of non-FM sentences, such as The birthmark is between
her knee to her ankle. Because drawings reflect people’s conceptions about space
(Tversky 1999), it is not unreasonable to assume that longer trajectors in depictions of FM sentences are the end result of (a greater degree of) simulated motion
or scanning. The thinking is that conceptually elongating or scanning along a linear entity takes time and that in a static depiction, time maps onto space. The same
explanation applies to the results of Study 2. In that case, trajectors that were already long (e.g., road) became longer in depictions of FM sentences than they were
in depictions of non-FM sentences.
Study 3 offers further support, as depictions of trajectors were longest with
fast verbs and shortest with slow or neutral verbs, suggesting that the speed of the
verb interacts with and structures the construal of the noun phrase. One explanation is that the semantic velocity expressed by the verb mapped onto the velocity
of the hand during drawing. Support for this comes from recent work on haptic
perception and visual memory. Kerzel (2001), for instance, found a connection
between hand speed and perceived velocity of moving objects. Participants in his
study first watched a fast- or slow-moving visual stimulus. After that, they moved
their hands either slowly or quickly (as per verbal or non-verbal instruction). Next
they were asked to specify how quickly or slowly the visual stimulus moved. The
results, that participants’ velocity of hand movement influenced their retention of
visual velocity, suggested that visual perception and somatosensory perception are
tightly coupled. Thus, based on Kerzel’s findings, it is reasonable to entertain the
idea that in drawing a sentence such as The road jets from one vista point to another,
participants in the studies presented here mapped verb velocity onto hand manual, that is, faster hand movement for drawing trajectors associated with fast verbs.
Future research that measures velocity of hand movements could be informative.
The idea that figurative uses of motion verbs include mental simulation may
seem odd to language theorists who do not appeal to dynamic representations.
However, scores of psychological studies have shown that mental imagery figures
into all sorts of reasoning and problem solving. For instance, people are able to
generate and mentally rotate three-dimensional images (e.g., Cooper & Shepard
1984; Shepherd & Metzler 1971). What’s more, people are able to imagine moving
through an imagined environment and to shift position in the environment with
non visual input (see Denis 1996; Denis & Cocude 1989; Kosslyn 1994). People
are so good at imagining motion that the time taken to mentally “move” across

JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.12 (669-722)
Teenie Matlock
an imaginary region mirrors the times one would expect from actual movement
through an actual region in space (see Kosslyn, Ball, & Reiser 1978). Thus, it is
plausible that people mentally simulate motion or scanning along a trajector when
understanding FM sentences. For instance, it is not unreasonable to assume that
in Study 1, participants elongated items such as lake in drawings because they
mentally scanned the lake during the processing of sentences such as A lake runs
between the golf course and the train tracks.
What is most intriguing about the results reported in this chapter is that none
of the stimuli conveyed actual motion through physical space. In all three studies,
only figurative interpretations of motion verbs were available. If the sentences had
expressed explicit motion through physical space, the results would be less interesting. For instance, if Study 3 had used literal uses of motion verbs, we would
expect long arrows for a sentence such as John races through the park and short
arrows for John crawls through the park. That differences arise even though there is
no physical motion conveyed in the figurative uses of motion verbs provides compelling evidence to support cognitive linguists’ claims that FM sentences involve
simulation or scanning of movement along a trajectory.
These results challenge standard psycholinguistic accounts for how words are
represented and processed. Regardless of how motion or scanning was simulated
while people did the task, there was a strong interdependence of verb and subject noun phrase. In every study, the depiction of the subject noun phrase varied
according to a difference in the verb: a motion verb or copula verb in Studies 1
and 2, and a slow verb or fast verb in Study 3. That the same noun phrase was
depicted differently lends support to the idea that lexical meaning is emergent and
interactive (e.g., Elman, Bates, Johnson, Karmiloff-Smith, Parisi, & Plunkett 1996;
MacWhinney 1999; Tomasello 1998). The results are also problematic for the view
that comprehending polysemous verbs involves a dictionary look-up (for discussion, see Gibbs & Matlock 1999). That would not explain how a verb such as go
would influence the way another constituent in the sentence was depicted in the
end. The results also call into question approaches that assume a hard and fast
distinction between figurative and literal language (for discussion, see Coulson &
Matlock 2001; Gibbs 1994). In some respects, the meaning evoked with fictive motion language is not unlike that of actual motion, even though nothing is described
as moving. This is especially clear in Study 3 (e.g., long arrow with fast verb).
The possibility that simulated motion figures into the use and understanding of language, including of sentences, such as The road runs along the coast,
is not all that mysterious. Thinking about motion and space during language
comprehension is natural, and involves tapping into and assimilating knowledge
acquired from direct embodied experience and interaction with the world (Clark
1973; Lakoff 1987; Glenberg 1999). Understanding FM sentences involves knowing
things like how long movement generally takes and knowing that it occurs along a
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.13 (722-773)
Fictive motion
trajector “contained” by a spatial region (Matlock 2004). Much of this knowledge
is probably tacit and structured by basic image schemata, such as source-pathgoal and container (see Gibbs & Colston 1995; Johnson 1987; Mandler 1992,
1996). Although some of it may be conscious, for instance, remembering the lake
you used to swim in or your local golf course upon hearing The lake runs between
the golf course and the train tracks.
The precise mechanisms underlying fictive motion need to be mapped out before we can fully understand how people process figurative uses of motion verbs
in sentences such as The road runs along the coast. But for now, we can say that figurative uses of motion verbs appear to evoke conceptual structure that is dynamic
and reflective of the way we perceive and enact motion in the world.
Acknowledgments
Thanks to Frank Brisard, Ravid Aisenmann, Raymond Gibbs, Jr., and an anonymous reviewer for comments on an early draft. Thanks also to Herbert Clark,
Rachel Giora, Art Glenberg, Paul Lee, Leonard Talmy, and Barbara Tversky for
sharing insights related to this work, and to Nicole Albert, Jeremy Elman, Kat
Firme, Sydney Gould, Krysta Hays, and John Nolte for collecting and coding data.
Notes
* Some of the work in this paper was presented at RAAM-4 (Research and Applying Metaphor),
Tunis, Tunisia, April, 2001. All correspondence concerning this article should be sent to Teenie
Matlock, Social & Cognitive Sciences, University of California, Merced, CA 95344. Email:
[email protected]
. Talmy (1983) originally used the term virtual motion to refer to this phenomenon. Fictive
motion is akin to Langacker’s (1986) abstract motion and Matsumoto’s (1996) subjective motion.
Here I address only one type of fictive motion, Talmy’s (2000) co-extension path fictive motion.
. Along could not be used for both FM and non-FM sentences because it could have resulted
in a few semantically odd non-FM sentences, for instance, ?The city park is along the financial
district.
. See Note 2.
. This result is statistically reliable even when the verbs meander and ramble (inherently
crooked or curved) are excluded from the analysis.

JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.14 (773-885)
Teenie Matlock
References
Barsalou, Lawrence W. (1999). Language comprehension: Archival memory or preparation for
situated action? Discourse Processes, 28, 61–80.
Boroditsky, Lera (2000). Metaphoric structuring: Understanding time through spatial
metaphors. Cognition, 75, 1–28.
Clark, Herbert H. (1973). Space, time, semantics, and the child. In T. E. Moore (Ed.), Cognitive
development and the acquisition of language. San Diego: Academic Press.
Cooper, Lynn A. & Roger N. Shepard (1984). Turning something over in the mind. Scientific
American, 251, 106–114.
Coulson, Seana & Teenie Matlock (2001). Metaphor and the space structuring model. Metaphor
and Symbol, 16, 295–316.
Denis, Michel (1996). Imagery and the description of spatial configurations. In M. de Vega,
M. J. Intons-Peterson, P. N. Johnson-Laird, M. Denis, & M. Marschark (Eds.), Models of
visuospatial cognition (pp. 128–197). New York, NY: Oxford University Press.
Denis, Michel & Marguerite Cocude (1989). Scanning visual images generated from verbal
descriptions. European Journal of Cognitive Psychology, 1, 293–307.
Elman, Jeffrey L., Elizabeth A. Bates, Mark H. Johnson, Annettee Karmiloff-Smith, Dominco
Parisi, & Kim Plunkett (1996). Rethinking innateness. A cognitive perspective on development.
Cambridge, MA: MIT Press.
Forceville, Charles (1997). Pictorial metaphor in adverstising. London: Routledge.
Gibbs, Raymond W. Jr. (1991). What’s cognitive about cognitive linguistics? In Eugene Casad
(Ed.), Cognitive linguistics in the redwoods: The expansion of a new paradigm in linguistics
(pp. 27–53). The Hague: Mouton.
Gibbs, Raymond W. Jr. (1994). Figurative thought and figurative language. In Morton A.
Gernsbacher (Ed.), Handbook of psycholinguistics (pp. 411–446). San Diego, CA: Academic
Press.
Gibbs, Raymond W. Jr. & Herbert Colston (1995). The cognitive psychological reality of image
schemas and their transformations. Cognitive Linguistics, 6, 347–378.
Gibbs, Raymond W. & Teenie Matlock (1999). Psycholinguistics and mental representations.
Cognitive Linguistics, 10, 263–269.
Glenberg, Arthur M. (1997). What memory is for. Behavioral and Brain Sciences, 20, 1–55.
Glenberg, Arthur M. (1999). Why mental models must be embodied. In Gert Rickheit &
Christopher Habel (Eds.), Mental models in discourse processing and reasoning. New York,
NY: North-Holland.
Johnson, Mark (1987). The body in the mind: The bodily basis of meaning. Chicago, IL: The
Chicago University Press.
Kennedy, John M. (1997). How the blind draw. Scientific American, 276, 60–65.
Kerzel, Dirk (2001). Visual short-term memory is influenced by haptic perception. Journal of
Experimental psychology: Learning, Memory, and Cognition, 27, 1101–1109.
Kosslyn, Stephen M. (1994). Image and brain. The resolution of the imagery debate. Cambridge,
MA: MIT Press.
Kosslyn, Stephen M., T. M. Ball, & B. J. Reiser (1978). Visual images preserve metric spatial
information: Evidence from studies of image scanning. Journal of Experimental Psychology:
Human Perception and Performance, 4, 47–60.
Lakoff, George (1987). Women, fire, and dangerous things: What categories reveal about the mind.
Chicago, IL: University of Chicago Press.
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.15 (885-1011)
Fictive motion
Lakoff, George & Mark Johnson (1980). Metaphors we live by. Chicago, IL: University of Chicago
Press.
Lakoff, George & Mark Johnson (1999). Philosophy in the flesh: The embodied mind and its
challenge to Western thought. New York, NY: Basic Books.
Langacker, Ronald W. (1986). Abstract motion. Proceedings of the Twelfth Annual Meeting of the
Berkeley Linguistics Society, 455–471.
Langacker, Ronald W. (1987). Foundations of cognitive grammar, Vol. 1: Theoretical Prerequisites.
Stanford, CA: Stanford University Press.
Langacker, Ronald W. (1999). Virtual reality. Studies in the Linguistic Sciences, 29, 77–103.
Langacker, Ronald W. (2000). Grammar and conceptualization. Berlin: Mouton de Gruyter.
MacWhinney, Brian (1999). The emergence of language. Mahwah, NJ: Lawrence Erlbaum.
Mandler, Jean M. (1992). How to build a baby: II. Conceptual primitives. Psychological Review,
99, 587–604.
Mandler, Jean M. (1996). Preverbal representation and language. In P. Bloom, M. A. Peterson,
L. Nadel, & M. F. Garrett (Eds.), Language and space (pp. 365–384). Cambridge, MA: MIT
Press.
Matlock, T. (2001). How real is fictive motion? Doctoral dissertation. University of California,
Santa Cruz.
Matlock, T. (2004). Fictive motion as cognitive simulation. Memory & Cognition, 32, 1389–1400.
Matsumoto, Yo (1996). Subjective motion and English and Japanese verbs. Cognitive Linguistics,
7, 183–226.
McCloud, Scott (1993). Understanding comics: The invisible art. HarperPerennial.
Miller, Geroge A. (1972). English verbs of motion: A case study in semantics and lexical memory.
In A. W. Melton & E. Martin (Eds.), Coding processes in human memory (pp. 335–372). New
York, NY: John Wiley & Sons.
Miller, George A. & Philip N. Johnson-Laird (1976). Language and perception. Cambridge, MA:
Harvard University Press.
Nuyts, Jan & Eric Pederson (1997). Language and conceptualization. New York: Cambridge
University Press.
Radden, Gunter (1997). Time is space. In Birgit Smieja & Meike Tasch (Eds.), Human contact
through language and linguistics (pp. 147–166). Frankfurt/Main: Peter Lang.
Shepard, Roger N. & J. Metzler (1971). Mental rotation of three-dimensional objects. Science,
171, 701–703.
Talmy, Leonard (1983). How language structures space. In H. Pick & L. P. Acredolo (Eds.),
Spatial orientation: Theory, research, and application (pp. 225–282). New York: Plenum
Press.
Talmy, Leonard (1996). Fictive motion in language and “ception”. In Paul Bloom, Mary
A. Peterson, Lynn Nadel, & M. F. Garrett (Eds.), Language and space (pp. 211–276).
Cambridge, MA: MIT Press.
Talmy, Leonard (2000). Toward a Cognitive Semantics, Volume I: Conceptual Structuring Systems.
Cambridge: MIT Press.
Tomasello, Michael (1998). The new psychology of language: Cognitive and functional approaches
to language structure. Mahwah, NJ: Lawrence Erlbaum.
Tversky, Barbara (2001). Spatial schemas in depictions. In M. Gattis (Ed.), Spatial schemas and
abstract thought (pp. 79–112). Cambridge, MA: MIT Press.
Tversky, Barbara (1999). What does drawing reveal about thinking? In John S. Gero & Barbara
Tversky (Eds.), Visual and spatial reasoning in design (pp. 93–101). Sydney, Australia: Key
Centre of Design Computing and Cognition.

JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.16 (1011-1073)
Teenie Matlock
Tversky, Barbara & Paul U. Lee (1998). How space structures language. In Christian Freska,
Christopher Habel, & Karl Friedrich Wender (Eds.), Spatial cognition: An interdisciplinary
approach to representation and processing of spatial knowledge (pp. 157–175). Berlin:
Springer-Verlag.
Tversky, Barbara & Paul U. Lee (1999). Pictorial and verbal tools for conveying routes.
Conference on Spatial Information Theory (COSIT ‘99). Hamburg, Germany.
Appendix 1
Experiment 1 Sample Stimuli
The military base runs between the two mountain ranges
The military base is between the two mountain ranges
A lake runs between the golf course and the train tracks
A lake is between the golf course and the train tracks
The pond runs between the barn and the corral
The pond is between the barn and the corral
The swimming pool runs between the patio and the garage
The swimming pool is between the patio and the garage
The blackboard runs between the water fountain and the door
The blackboard is between the water fountain and the door
The birthmark runs between her knee and ankle
The birthmark is between her knee and ankle
The university parking lot runs along the edge of the lagoon
The university parking lot is next to the edge of the lagoon
The city park runs along the financial district
The city park is next to the financial district
The pig pen runs along the side of the barn
The pig pen is next to the side of the barn
The lake runs along the golf course
The lake is next to the golf course
The tattoo runs along his spine
The tattoo is next to his spine
Experiment 2 Sample Stimuli
The highway runs along the coast
The highway is next to the coast
A toll road runs along the coastline
A toll road is next to the coastline
The bike path runs along the railroad tracks
The bike path is next to the railroad tracks
The trail runs along a road
The trail is next to the road
A road runs along a mountain range
A road is next to a mountain range
The trail goes along the road
The trail is next to the road
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.17 (1073-1102)
Fictive motion
A freeway goes along the mountain range
A freeway is next to the mountain range
A frontage road goes along the freeway
A frontage road is next to the freeway
The footpath goes along the creek
The footpath is next to the creek
The sidewalk goes along the canal
The sidewalk is next to the canal
Some huts run along the edge of the lake
Some huts are next to the edge of the lake
Some trees runs along the river
Some trees are near the river
Experiment 3 Sample Stimuli
Fast-manner verbs
The frontage road speeds alongside the freeway
The road jets from one vista point to another
The toll road races through the countryside
The highway races through the grasslands
The road flies through the countryside
Neutral-manner verbs
The road goes through the desert
The footpath goes through the hills
The trail goes through the valley
The street goes through farmland
The freeway goes through the forest
Slow-manner verbs
The toll road meanders through the countryside
The road crawls from one vista point to another
The highway crawls through the grasslands
The sidewalk jogs from one house to another
The road plods through the countryside

JB[v.20020404] Prn:13/02/2006; 13:16

F: HCP1504.tex / p.18 (1102-1126)
Teenie Matlock
Appendix 2
Examples of drawings from Experiment 1
Figure 1. The birthmark is between her knee and her ankle (non-FM)
Figure 2. The birthmark runs between her knee and her ankle (FM)
JB[v.20020404] Prn:13/02/2006; 13:16
F: HCP1504.tex / p.19 (1126-1150)
Fictive motion
Examples of drawings from Experiment 2
Figure 3. A road is next to a mountain range (non-FM)
Figure 4. A road runs along a mountain range (FM)

JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.1 (47-109)
chapter 
Discourse, gesture, and mental
spaces manoeuvers
Inside versus outside F-space*
June Luchjenbroers
University of Wales, Bangor
During discourse conversational gestures tend to occur in the physical area in
front of the speaker. That space is referred to as the ‘comfort zone’ and is where
the bulk of a speaker’s gestures tend to occur. This paper is an investigation into
the relationship between the parameters of that physical space and the mental
spaces required for discourse processing. It is argued that the boundaries of this
space (called the ‘F-space’) provide additional aspects of speaker meaning in the
form of clues about speaker cognition. The examples provided in this paper give
evidence of the conceptual mappings needed in discourse processing.
Keywords: ‘F-space’, comfort zone, iconicity, mental spaces
.
Introduction
This paper is an exploration into the dynamics of the physical, gestural space used
by speakers during discourse. This exploration furthers earlier research into how
lexical, prosodic, and gestural information may combine to provide discourse participants with the appropriate cues needed to set up and structure mental spaces
(cf. Luchjenbroers 2001, 2002, 2004). It is the aim of this paper to expand on what
has been referred to in earlier work as ‘F-space’ and how the dimensions of this
physical space may associate with the necessary navigations around and between
any number of mental spaces required during discourse.
Particular points of theoretical and observational importance are needed for
this exploration, including an overview of the relevant features of Mental Spaces
Theory, and a review of the main types of gesture. These gesture types are then
considered in terms of how they manifest inside or outside a speaker’s ‘comfort
zone’ (or ‘F-space’). The range of examples used in this exploration illustrate how
gesture, together with the physical properties of ‘F-space’, can provide discourse
JB[v.20020404] Prn:13/02/2006; 13:17

F: HCP1505.tex / p.2 (109-163)
June Luchjenbroers
participants with a potentially rich strategy for navigating conceptual space, as
well as enhancing information conveyed in the lexical component of talk.
. Discourse processing theory
The philosophical approach generally embraced in the field sees discourse as a process of ‘mutual ground’ construction. This label is meant to convey that discourse
participants aim to achieve a mutual understanding of what they are talking about,
and appear to work toward that goal (cf. Grice 1975, 1978). This is the basis of the
‘cooperative discourse’ approach inherent in the work of many theorists working
in this field (e.g., Clark 1993, 1996, 1997; Chafe 1994; Lambrecht 1994; Tomlin
1987, 1997; Tomlin et al. 1997; Luchjenbroers 1993, 2000). This cooperation involves speakers giving addressees adequate cues to derive their speaker-intended
meaning, and addressees making a determined search for that meaning. How participants manage to do this is thought to be the product of ‘shared’ knowledge
(called ‘mutual’ or ‘common’ ground) – i.e., a speaker can produce the appropriate bite-sized pieces for their particular addressee(s) because they know or believe
to know the conceptual context in which that information will be integrated; and
similarly their addressee(s) can properly derive the speaker-intended meanings
because they too know or believe to know the conceptual context in which each
speaker’s contribution is being made.
All versions of ‘mutual’ or ‘common’ ground have since had to deal with logical
objections to the notion of ‘shared’ information, stemming from the cognitive fact
that each person only has access to their own conceptual processes. Consistent
with these logical objections, this research embraces the view that during discourse
each speaker actively creates and manipulates a model of discourse that is unique
to that participant’s understanding of the discourse content and their expectations
of everyone else’s (cf. Luchjenbroers ms.).
Hence, during discourse each speaker actively creates and manipulates a singular model of discourse (i.e., a representation of discourse in the speaker’s own
mind) thought to capture the information s/he thinks is ‘mutual’. Thus speakers
will construct their own model according to the expected discourse needs of their
addressee(s), and they project this model into the discourse space between interlocutors, as though it can be mutually observed and manipulated. In this sense,
the speaker’s representation of discourse is their version of a ‘public’ model of
discourse, even though it is no more public than it is mutual.
The result of this version of mutual ground (that isn’t mutual), is that each
speaker’s judgments about how to distribute semantic and functional information
in talk is not based on shared conceptual representations with their addressee(s),
as suggested by the terms ‘mutual’ or ‘common’ ground, but on the perceived
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.3 (163-240)
Discourse, gesture & mental spaces manoeuvers
similarities and discrepancies between the content of their own ‘public’ model of
discourse and all or any information they can glean from their addressee’s linguistic and gestural outputs as well as any other opportunistic sources of information
that may present themselves. To this end the role of skill is paramount.
The structuring of each discourse contribution into comprehensible chunks
for a hearer’s benefit is thus of key importance, which directs the analyst to a more
basic level of complexity: creating and maintaining a coherent discourse structure. This involves not only locating each proposition in the speaker-intended,
pin-point context (or ‘mental space’) to be processed and appropriately understood, it also requires recognising the relationship between that mental space and
any others that may be relevant to the subject-matter being discussed. In effect,
all participants will need to create, manage, and navigate any number of mental
spaces required during discourse.
. Mental Spaces Theory [‘MST’]
Mental Spaces Theory is fundamentally based on the view that linguistic form
under-specifies speaker meaning (cf. Fauconnier 1985; Fauconnier & Sweetser
1996), and that meaning construction takes place at a conceptual level. This conceptual level involves the construction of appropriate mental spaces in which to
process the propositions attributed to them. Mental spaces are like mini contexts
in which propositions are processed and can be measured as True or False. For
example, if a speaker were to say the utterance given in (1), I was in here with a girl
from Sarawak, the hearer would need to discern the proposition being conveyed
[I + a girl, together], and the mental spaces in which to process it – in this case an
undefined temporal space in the past, triggered by the use of the past tense, which
is further defined by a physical location: in here (see Figure 1).
The act of separating the spatial definition from the proposition means that
there are a number of ways in which that proposition can be measured as true
or false. For example, different aspects of the triggered mental spaces can be rejected as false, such as “it wasn’t in this room”, or “it hasn’t happened yet”. This is
quite distinct from the more traditional ways in which a speaker’s utterances are
measured as true or false, such as when aspects of the proposition are rejected –
e.g., reference failure (‘it wasn’t him’, or, ‘he didn’t meet a girl’, or, ‘she wasn’t from
Sarawak’).1 Particularly relevant to this discussion however, is that if a speaker
were to have used a different spatial definition (e.g., in the future, or in another
building), both speaker and hearer would process the same proposition, but use a
different mental space.
Sentence meaning therefore depends on constructing the appropriate mental spaces, and within any stretch of discourse there can be several mental spaces

JB[v.20020404] Prn:13/02/2006; 13:17

F: HCP1505.tex / p.4 (240-305)
June Luchjenbroers
(1) Dennis: I was in here with a girl from Sarawak
PAST TIME
IN HERE
I + a girl (together)
Sarawak
Figure 1. Embedded spaces
simultaneously active, which both speakers and hearers need to navigate during
discourse. These spaces are often interrelated and discourse participants need to
make these interconnections in cognitive space for discourse to be coherent. For
example, Figure 2 is an attempt to illustrate many of the interconnections required
in discourse, such as those required in the following sample data extract, given in
data extract (2).
Talk is about plagiarism and more specifically what actions the participants
deem appropriate. However, in the discussion of that general topic, talk involves
references to: a university handbook; a past reference to the supposed reading of
that handbook (links Past time and an expected action, to the hearer and the handbook); a hypothetical scenario, would you mark yourself down (links a hypothetical
time frame to appropriate actions and the hearer); and then two other hypothetical
scenarios: one located in the future, if I was really unsure I would (links a hypothetical time frame to the Here-&-Now, the supposed reading of the handbook,
the hearer and the handbook – not included in Figure 2); and the other located in
the past, if I were a first-year I would, which again links a hypothetical (counterfactual) time frame to the supposed reading of the handbook, the Here-&-Now,
the hearer and the handbook.
In each case, new mental spaces are created, as need, for the purposes of discourse, and the interconnections between them and others active in talk must be
recognised for discourse to be coherent. There is also evidence that once activated,
a mental space can be reaccessed at any time during discourse, and potentially
also between discourses. For example, the two references to a boy from Hong Kong,
given in (3a) and (3b), were produced roughly 15 minutes apart. Reference to this
boy from Hong Kong (line 256) is specific and requires retrieving the referent from
an earlier mention, and yet the only earlier reference to him was in lines 22–23.2
JB[v.20020404] Prn:12/05/2006; 13:15
F: HCP1505.tex / p.5 (305-305)
Discourse, gesture & mental spaces manoeuvers
(2) Gwen: it’s clearly defined in the hand book
Dana: yeah? .. so wha- what, do you know what it says?
Gwen: nup (laugh) I don’t .. but I know it’s there
Dana: you you’ve not read the handbook?
you would mark mark yourself down (laugh)
Gwen: not for many years (both laugh) . . .
um.. before doing an essay if I really, [was] unsure then I’d go
but I’ve been writing a lot of essays so I
I’m pretty clear on what it should be and isn’t by now..
but if I were a first year I certainly would
Topic = Plagiarism
Task = Decide action
HERE-&-NOW
IN HANDBOOK
Rules about plagiarism
Line 163-6: You know what it says?
Line 171: You’ve not read it?
PAST TIME
HANDBOOK
Is read by students
Line 172: You would mark yourself down?
YOU ARE TUTOR &
YOU ARE STUDENT
Plagiarism is punished
Line 173: Not for many years
Line 184: but if I were a 1st year, I would.
HYPOTHETICAL
First Year STUDENT
( unclear about rules )
I am First year student
Figure 2. Interconnected spaces

JB[v.20020404] Prn:13/02/2006; 13:17

F: HCP1505.tex / p.6 (305-358)
June Luchjenbroers
(3) a.
Ellen:
b. Ellen:
there’s a fellow there from Hong Kong you know..
[22]
he was re-eally ba-ad ..
[23]
one of the assignments this boy from Hong Kong had to do was . . .
[256]
There was no evidence of reference failure by the hearer at the second mention,
and therefore even though both mentions were very brief, it was still sufficient for
discourse purposes.
Research published elsewhere (Luchjenbroers 2001, 2003; Carroll et al. 2003)
has argued that the discourse building process is facilitated by prosodic and/or gestural information that provide additional cues to derive speaker-intended meanings. These works have also illustrated how the informational load of gestures may
substantially enrich the semantic content of the lexical component. In the following I will consider the pertinent gesture types, taking into account how the
dynamics of a speaker’s F-space is utilized to convey aspects of meaning not always
conveyed verbally.
. Data
The body of examples used in this discussion have been drawn from a larger,
video-taped study into negotiated talk involving 36 Australian and non-Australian,
Male and Female university students. These subjects were given the task of devising
guidelines (to be given to faculty) about how new students should avoid the pitfalls
associated with either cheating or plagiarism. They were recorded in a sound-proof
room; positioned diagonally across from each other to enhance the analyst’s view
of the interaction (sitting in the next room, behind a large tinted window), as well
as the video-recorder that was placed back from the dyad in a triangulated position
to the interaction. The total body of video data includes 36 conversational dyads
(approximately 18 hours of data).
. Gestures in discourse
What counts as gesture?
Much work on gesture has been devoted to outlining the different types of bodily movements a speaker can make during discourse, as well as which of these
count as meaningful and thus worthy of linguistic analysis (e.g., McNeill 1992,
2000; Kendon 1981; Krauss 1998). The spectrum is often divided into three categories: (i) movements that have no apparent discourse meaning – i.e., what Krauss
(1998) calls ‘Motor gestures’, or ‘beats’. These may coordinate with speech but have
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.7 (358-409)
Discourse, gesture & mental spaces manoeuvers
no apparent meaning; (ii) ‘symbolic’ gestures, or ‘emblems’, that carry conventionalized meaning, such as a thumbs up means ‘good’;3 and (iii) conversational
gestures which are spontaneous and may have some conventionality but are totally
optional – i.e., some speakers use them but discourse is not dependent on their
presence. It is this last class of gestures that is the object of this research.
The ‘comfort zone’
One of the first observations made of the data is that speakers vary in how much
they gesture during discourse, as well as the proportion of physical space they use
to gesture in. Although a number of subjects gestured very little (mainly Australian
males and Asian females, but also some Australian females), most subjects made
a variety of gestures during talk, and utilized a variable quantity of the physical
space between their bodies and half-way to their interlocutor (where the task instructions were also taped to the desk). Social dynamics also made an impact on
the amount and size of subject’s gestures. For example, Australian Females, in Female+Female dyads, were typically high users of gesture; while Australian Males,
in Male+Male dyads showed noticeably fewer gestures. In fact, Australian Male
participants often appear completely inert.4 The interesting result of mixed gender
dyads (Australian Female+Australian Male) is that it was typically the Australian
women who gestured less instead of Australian Males gesturing noticeably more.
The only exception to this Australian male conduct was when the participant
female was clearly foreign (such as an Asian woman).
In previous papers I have described the physical area in which most gestures
occur as the ‘comfort zone’: the area in which a speaker produces most gestures and which is in easy reach of the posture s/he has taken during discourse
(Luchjenbroers 2001, 2004). The general dimensions of the comfort zone in these
data is roughly the shape of a cube that runs from shoulder to waist in height, from
the elbow (at the waist or in these data, the table) to the hand in depth, and has
body width.5 As discussed above, the actual size of a speaker’s comfort zone, and
the proportion of gesture to speech, varies from speaker to speaker, and culture
to culture. Therefore for some, the gesture space is a much smaller cube, sometimes involving maybe no more than the speaker’s hands, and in some cases, just
movement of the thumbs from a clasped hands position. In general, speakers who
are less animated in gesture use a smaller gestural cube, and those who are more
animated use a larger cube that is more consistent with the dimensions mentioned
above.6
In addition to the comfort zone, speakers also make use of the physical space
that either borders or is clearly outside these general boundaries, often involving a full, physical stretch. I suggest that these general vs. extreme boundaries are
consistent with ‘inside’ and ‘outside’ a speaker’s gestural ‘F-space’, and that these

JB[v.20020404] Prn:13/02/2006; 13:17

F: HCP1505.tex / p.8 (409-458)
June Luchjenbroers
boundaries add an extra dimension to the meaning conveyed lexically by speakers during discourse. Hence, where speakers produce most gestures is where they
are most comfortable (cf. comfort zone), which defines their F-space, and gestures
that cost more physical energy are ‘outside’ that space.
Although it is tempting to refer to this gestural comfort zone as the ‘Focus’space, this would be misleading as a speaker can refer to more than one mental
space with gestures inside their F-space, each enjoying a certain degree of focus.
Similarly, if the comfort zone were intimately linked to discourse focus, then one
would expect the focus space to always be located inside F-space; however the
data also provides examples where the mental space in focus is gesturally located
outside the speaker’s F-space (e.g., talk about foreign practices).
As will become evident in the following section, gestures within F-space are
primarily relevant to ‘Me’ (i.e., the speaker) and gestures outside F-space to ‘Not
Me’. In this sense, gestures can function like contrastive stress, in that pointing to
a physical location in front of the speaker amplifies not only ‘Here’ (where ‘I’ am)
but also ‘Not there’, or ‘This’ (what ‘I’ have) and ‘Not That’; while deictic gestures
to physical locations outside F-space amplify the opposite. There is also some evidence of an association between what is topical in discourse and inside F-space,
although it is often hard to separate these features with aspects of the speaker (i.e.,
its not so much what is topical as the speaker’s argument/ view of that topic, which
again is relevant to the speaker). In the following examples I will consider the relationship between F-space and relevance to the speaker’s location (i.e., here); the
speaker (i.e., important to ‘me’); and the subject-matter being discussed.
Gesture types
Conversational gestures have been identified as the object of this research; however within these spontaneous gesticulations, a number of different types are also
discernable: (i) Deictic gestures, which relate to ‘here’ vs. ‘there’ [also called Indexical (cf. index finger) gestures];7 (ii) simple gestures, which iconically (and often
metonymically) illustrate features of talk; and (iii) complex gestures, which add to
the information conveyed in the lexical component of talk.
Deictic gestures
Indexicals are the most basic form of gesture and are presumably the first to be
used by children learning language. This strategy involves an instruction to the
hearer to direct their view (from the speaker’s finger) to a specific item that both
speaker and hearer can see. Consider example (4) below.8
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.9 (458-534)
Discourse, gesture & mental spaces manoeuvers
(1)↓
(4) Graham: I’ve had a quick look at that. . . particular list
(2)↓
and I’ve resummarised it into fewer words
1. L hand/index finger points to the Task sheet taped to the table (= real referent)
2. L hand/index finger points to the writing pad in front of S (contains a written
list of points, or his summary).
In this example the first gesture, to that, involves a long extension of the left arm
to outside the speaker’s comfort zone and hence, his F-space, to the typed task
that both participants can see. Similarly the second gesture, to the summary, also
points to the true referent which is directly in front of the speaker and thus within
his F-space. However, because both point to the fixed location of the true referent,
the dimensions of F-space are not relevant to the interpretation of the deictic relations involved. The second gesture is however richer than the deictic relation alone
because it goes beyond the information conveyed by the lexical component: it informs the addressee that the written text in front of him is a summary. The most
common way in which gestures enrich the lexical component is through iconicity.
For example, the indexical here can mean, this room, this building, this university,
this city, or this country. Each location can be serviced by the lexical and gestural
indexicals here, and for each extension of the original here, the indexical bears an
iconic (albeit metonymic) relationship to the full dimensions actually referred to.
In each case the chosen gesture will point to the physical location in front of the
speaker (inside their F-space), and references to a place not relevant to here would
be paired with a gesture clearly outside F-space. Cases such as these show the most
basic way in which indexicals can serve an iconic function. Later examples will
show how indexical gestures may serve a more overt iconic function that enrich
the lexical component of talk.
Simple (iconic) gestures
Earlier work has put forward the view that some gestures may be described as
‘simple’ in that they convey a straightforward semantic relationship between the
essential message carried by the gesture and the lexical component it accompanies;
whereas others are described as ‘complex’ in that they do substantially more than
just clarify lexical meaning (Luchjenbroers 2001, 2004). Those described as complex, complement the lexical component by providing meaning not articulated.
A frequent, simple gesture example in these data, is the take gesture (= one
hand scoops an unseen substance or object and draws it to the body), which cooccurred with talk about taking, stealing, plagiarizing, and cheating throughout the
data. This gesture is simple because it is consistent with the verbal component.9
Similarly, indexicals may be used to convey simple iconic relations that go beyond

JB[v.20020404] Prn:13/02/2006; 13:17

F: HCP1505.tex / p.10 (534-602)
June Luchjenbroers
the basic deictic relationship. In such cases the relationship between gesture location and F-space can be additionally informative. For example, in (5) the allocation
of yours and mine relates to a hypothetical event: both speaker and hearer are surrogates for the roles needed in an event that was invented for discourse purposes
only (cf. Liddle 2002).
(1)↓
(5) Jake: it’s say handing in some work ..
(2)↓
(3)↓
saying that this is yours . . . this is mine
(4)↓-------------------------------------------in actual fact it should really come from somebody else
1. R hand clumped, palm down, fingers touching table in front of Speaker’s left
chest (= inside F-space)
2. R hand (same shape) extends toward Hearer in centre field (= border F-space)
3. R hand (same shape) moves back and collides with centre of Speaker’s chest
(= inside F-space)
4. R hand flattens and moves away from S, palm down (dismissive) to Speaker’s R,
past the desk boundary (= outside F-space)
In this example, the first gesture relates to the general topic, some work, which is
focal and clearly located inside the speaker’s F-space. The location of this gesture
inside F-space may also suggest that the speaker puts himself in the protagonist
role. The second and third gestures illustrate more overtly the surrogate roles
played by the hearer, your work, and the speaker, my work, in this hypothetical
discourse event. These gestures are indexical in that they point to the different
characters in this event, but are more meaningful because they simultaneously allocate roles to the discourse participants. Contrastive stress is also relevant here as
the two locations occur at opposite boundaries of the speaker’s F-space. The yours
gesture is reflected from the speaker (= not me), while the mine gesture is not just
inside the speaker’s F-space, but is attached to the speaker (his hand is clutching
his chest). Hence both gestures outline the boundaries of the speaker’s F-space;
while the next gesture, to somebody else, is distinctly thrust outside that square. It
is reflected away from both surrogates and the full dimensions of the speaker’s Fspace. In fact, the fourth gesture exceeds the table surface area, which emphasizes
its complete removal from the scene.
Similarly in (6) below, reference to this author and that author, like example
(5), are again in diametrically opposite locations to each other, although the discourse participants are not the surrogates for these fictional roles. The ‘author’
roles are removed from both speaker and hearer, and again it is no coincidence
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.11 (602-666)
Discourse, gesture & mental spaces manoeuvers
that these gestures involve physical thrusts as far away from the real participants as
possible (i.e., outside F-space).
(1)↓ ----------------(6) Lenard: and they just write their ideas
(2)↓
(3)↓
even if they’re using them from this author and that author
and they do think it’s quite ah bizarre
(4)↓
when they look at an essay that I’ve written
1.
2.
3.
4.
both hands, flat, fingers meet at Speaker’s chest (= inside F-space)
R hand + arm extends to Speaker’s far right (= outside F-space)
R hand + arm extends across F-space to Speaker’s left (= outside F-space)
R hand flicks back and hits Speaker’s right shoulder (= border F-space)
Notably (6) also shows that the relationship between here and F-space is more
complex than initial discussions have suggested, because references to here or this
do not necessarily correlate with inside F-space. In (6) the speaker also refers to
a third person, they, while touching his own shoulder (inside F-space), when one
might expect an outside F-space gesture location, like the someone else gesture in
(5). Here the speaker is talking about practices in Germany, and they refers to German students; his use of a gesture that puts himself in centre stage suggests that he
identifies with that protagonist role, despite the use of the third person pronoun.
In (7), the basic deictic process is further abstracted to illustrate a transference of fictional matter from one fictional location to another. The first gesture
location, outside F-space, is dictated by the second clause reference to your own
piece [work]. These gestures involve two components: (i) deictic references to an
object in two different locations (even if fictional); and (ii) a pantomime consistent with the verb, take. The series of actions for take involves one hand clasping
an unseen substance or object, picking it up, transporting it over an unseen obstacle and depositing it into the speaker’s zone (= ‘make mine’). Consequently, from
whence it came must be ‘not mine’ (= outside F-space). It is no coincidence that
this would be gesticulated as the migration of something from outside F-space to
a point squarely inside F-space; and similarly it is no coincidence that reference to
a chunk would be realized as the gesticulation of clutching and moving an object.
However, in this example, like in (6), the lexical strategy is meaningfully different
from the chosen gestures, in that a distancing (‘not me’) lexical pronoun is used
(your), but a ‘make mine’ gesture complements it.

JB[v.20020404] Prn:13/02/2006; 13:17

F: HCP1505.tex / p.12 (666-757)
June Luchjenbroers
(1)↓
(7) Iris: like taking a big . . . chunk out of a book
(2)↓
and putting it into your own piece and not referencing
1. R hand (palm down; fingers touching desk) to the right of speaker’s F-space –
clutching an imaginary mass (= Outside F-space)
2. clutched mass is picked up & put down inside F-space (S centre front)
The examples given in (4–7) illustrate how the basic deictic process of directing
the hearer’s attention to specific, visual entities has been extended to a role-play
between visual participants in a hypothetical scenario, and then to a role-play of
non-visual entities in a hypothetical scenario. For this second extension of the deictic function, a speaker designates points in physical space to refer to referents in
talk, as is also grammatically correct in sign languages. However, the location of
those designated points is not arbitrary: those located closer to the speaker are
sooner those with which s/he associates (or makes ‘mine’) and those to which
the speaker does not identify or wish to be associated with are typically located
further away from the speaker – often as far as physically possible. Of particular
interest therefore is when the lexical strategies used by a speaker (often diverting
responsibility away from themselves), is paired with gestural strategies that places
themselves in the protagonist’s role.
The example given in (8) similarly illustrates how speakers can distance themselves from the practices they describe by the location of the associated gesture; in
this case, what ‘we’ don’t do is located outside F-space. Notably the next gesture,
to them and what ‘they are required to do’ requires further movement away from
F-space, to amplify the contrast between ‘our practices’ and ‘their practices’.
(1)↓
(8) Jake: we’ve been given strict instruct strict instructions that we
(2)↓
we’re not allowed to correct it for them
(3)↓
they have to fix it up themselves
1. R index finger on desk-top, centre field, outlines the top boundary of an
imagined source document (= inside F-space)
2. R hand (palm + fingers down, touching the desk) moves to the far R of
desk-top (= outside F-space)
3. both hands carry over to beyond S’s R: L hand touches S’s R side and R hand
extends in the same direction (= outside F-space)
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.13 (757-810)
Discourse, gesture & mental spaces manoeuvers
The above examples illustrate how the semantics of the chosen gestures resonate
with the lexical component they accompany.10 However even in these simple cases,
more complex features of added meaning may come into play, such as speaker attitude toward the content of discussion, inferable by the location of the associated
mental space. In such cases, gesture complements speaker meaning in ways not
captured by the lexical component alone. In the following section I will expand on
some of the types of meaning ‘complementation’ that have occurred in these data.
Complex gestures
The data have revealed a range of gestures that expand on the information provided by the lexical component. This type of ‘complementation’ may include instructions regarding mental spaces and discourse structure. The gesture examples
given above have component features that go beyond, and in some cases contradict the semantics of what is said lexically (such as using a gesture to indicate ‘mine’
while using the pronoun ‘your’). However, the type of complexity this category is
meant to capture is that these gestures are both iconic and go beyond the meaning
of the uttered sentence to which the gesture is paired. For example, if a speaker
were to utter, Brilliant observation! but tap their forehead at the time of speaking
(= you’re nuts!) then the total meaning of the uttered sentence would not only be
enhanced, it would be very different.
Another notable example of this was observed in a television program where
an investigator, with obvious interest in a particular woman about to leave the
country, suggests a hypothetical scenario using an unknown character, although
his gesture makes clear that he chooses to fill that role himself.
(9) Investigator: What if you were to get involved. . . . ?
↓
with some American. . . ?
1. both hands, fingers splayed dramatically grasp the S’s chest (inside F-space)
Similarly in (10), the gesture associated with undergraduate conveys a depth of
meaning not necessarily conveyed lexically. Even though Iris’s interpretation of
Fran’s reference to some little snotty kid is undergraduate, the fact that her gesture hand extends beyond the desk-top away from her F-space and both discourse
participants conveys ‘not us’; and the fact that her hand is lower than the desk-top
itself (pointing downwards) conveys her contempt for that category of person (i.e.,
a small person = lesser than ‘us’).

JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.14 (810-916)
 June Luchjenbroers
(10) Fran: if I had spent you know at least two years of my life
devoted to writing this book and some little snotty kid
(1)↓
(2)↓
Iris: undergraduate . . . comes in an does . . .
1. R arm fully extended to R, over desk edge & pointing down (diagonal to
exchange), slightly lower than the desk = lower status (outside F-space)
2. finger flicks back toward Speaker & across F-space (inside F-space)
Examples such as this give evidence that there are more dimensions at play than
just the horizontal plane, making other metaphors also of interest to a full interpretation of gesture (cf. Lakoff & Johnson 1980: up is good, down is bad). In this
case, the location of the speaker’s gestures specifies characteristics of the mental
spaces in which the speaker attributes and processes the information about undergraduates. A similar case is given in (11) where again the location of the gesture,
so far away from the speaker’s F-space that he must turn in his seat to make it,
maximally contrasting the speaker from the group he is talking about.
↓-----------------------------------(11) Jake: I feel a great deal of um empathy for them
1. both hands move to the S’s extreme R: L hand touches S’s R side and R hand
moves from centre chest to far R (= outside F-space)
Example (12) is also particularly interesting because it captures dimensions that
are not conveyed lexically and have not been previously noted in the literature.
Each gesture point refers to an illegal act (such as plagiarism) and each gesture
point illustrates the fictional location of such acts in a hypothetical student paper.
The fact that these gestures are produced on the diagonal captures the frequency
of those acts in this hypothetical paper – i.e., plagiarisms (etc.) would unlikely
be confined to a single location, but would presumably be distributed throughout a piece of work; and because these gestures also move from closer to farther
from the speaker, it also captures that these illegal acts occur throughout the referent piece of work. Thus the diagonal iconically captures both the shape of the
printed page (from top to bottom), as well as the width of the piece of work (from
beginning to end).
↓
↓
↓
(12) Hariette: so we ’say if you do.. this.. and this and this
then.. you-’re ah.. breaking the rules (laugh)
1–3. R hand, pinched (all 4 fingers on top of thumb), pointing to equidistant points
in space, forming an oblique row, from just above F-space (height of L eye),
into F-space (below R shoulder); these points also perceptively move from
closer to the speaker to less close to the speaker.
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.15 (916-968)
Discourse, gesture & mental spaces manoeuvers 
In sum, gestural complexity involves additions to the lexical component of discourse that may have a direct bearing on the interpretation of speaker meaning.
Unlike the lexical component, which can generally be unambiguously assigned
one or other mental space role (i.e., space builder or proposition), gestures often
contain components with multiple roles: some relating to content and others to
the speaker’s F-space and its relation to mental spaces functions.
. Mental spaces manoeuvers
The examples above have illustrated how basic strategies are used iconically to
amplify (= simple gestures) if not complement (= complex gestures) information
conveyed in the lexical component. Similarly, and quite distinct from the mental
spaces these gestures involve, the above examples also illustrated the relevance of
F-space in deriving additional aspects of speaker-meaning. Now I will focus on
how simple and complex gestures, together with F-space, are employed to service
navigations around mental spaces in talk.
For example, in example (7) above, like taking a big chunk out of a book and
putting it into your own piece. . . , the first gesture locates the source of the theft,
the book (outside F-space), and the second gesture locates the target of the stolen
material, your own piece (inside F-space). These locations are relevant to mental
spaces navigation as lexically, a book is neither a positive or negative reference:
the phrase says nothing of ownership or location in a physical sense. It is only
the location of the gesture relative to the speaker, outside F-space, that serves to
disambiguate what book (or what nature of book) is being referred to – i.e., the one
being plagiarised (= ‘not mine’). Similarly, the second reference lexically attributes
the piece to another (possibly abstract) person, but the ‘make mine’ path of the
second gesture in relation to the first (landing inside F-space) associates the deed
with the speaker. Gesturally, the speaker has played out a role to which lexically she
is only a hypothetical spectator. However, the contrast in the two physical locations
of these gestures also amplifies the two mental spaces required here: one for the
source (and its associated attributes) and one for the target.
Examples such as (7) reveal the importance of F-space in helping to construct,
navigate and disambiguate mental spaces, because sentence meaning depends on
the appropriate space(s) being accessed in which to properly associate incoming
information with its content (referent). Similarly, example (13) reveals how complex iconic gestures can help clarify the appropriate mental space(s) for comprehension. Reference to inside is paired with a flipping pages gesture (above F-space),
which conveys that the speaker is talking about a book; the gesture is indicative of
the size of the referent, and helps to clarify the mental space that is needed here
(i.e., a thesis, and not just a page or a short paper). In the next clause, she includes
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.16 (968-1056)
 June Luchjenbroers
the proposition they cited, which is complemented by a writing gesture (in the air,
also above F-space). The height and directionality of the writing gesture conveys
that it is not a short citation, but a full page length declaration. The full import
of these two gestures is to both clarify the mental space needed, as well as give
qualitative detail about the proposition to be processed within it.
↓(1)
↓(2)
(13) Fran: um.. and then inside they they’ve they ‘cited..
1. Right hand in the air, flipping pages, temple height (outside F-space)
2. Right hand, writing in the air, from centre forehead to shoulder height (outside
F-space).
Example (14) also shows mental spaces management in that gestures to different
physical spaces are attributed to (i) the plagiarized material and (ii) the source
from which the plagiarized material was taken (both referents are focal and within
F-space). In cases such as this, once a speaker has attributed a referent to a particular location in (physical) gesture space, s/he will continue to point to the same
locations upon further references to those referents. This gestural strategy also
helps disambiguate when multiple referents are simultaneously ‘on stage’. In this
way, gestures serve as a reference tracking device that is available to all participants
in discourse: a strategy also used in sign language.
↓(1)
↓(2)
(14) Gwen: like, if you know they’ve sort of taken this out of this book. . .
↓(3)
↓(4)
because they’ve referenced this and you’ve read this book ..
what do you do?
1.
2.
3.
4.
R hand, across L hand but centre field (inside F-space = plagiarised material)
R hand, across L hand & further to Left (inside F-space = source text)
R hand points again to ‘source text’ space
R hand points again to ‘source text’ space
In sum, these examples provide clear illustration of how those speakers who
choose to gesture, may also reveal strong clues about their attitudes toward the
subject-matter being discussed by where in the gestural space available to them (in
terms of F-space), they choose to designate a particular referent. The location of
the referent thus indicates the speaker’s mental spaces in which those referents are
conceptually located. This level of speaker meaning involves mappings between a
physical location in the speaker’s gestural space, and specific mental spaces activated in speech. Considering again the assumed projected conceptual models of
discourse information that each speaker produces, and for which they are responsible for their hearer’s comprehension, it seems entirely plausible (if not in some
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.17 (1056-1097)
Discourse, gesture & mental spaces manoeuvers 
cases crucial) that speakers make use of any ploys available to them to keep track
of their own argument, together with those put forward by other speakers. Speakers clearly make use of these mappings between conceptual space and physical
space; however, it remains to be seen if hearers also make full use of this information. Given the potentially huge cognitive load each participant deals with in
discourse, it is plausible that many aspects of speaker-meaning discussed above
may be missed by hearers. This does not diminish the information they may be
able to derive (if they know what to look for and are paying adequate attention);
nor does it diminish the value of these strategies that conceivably assist speakers in
making their contributions as comprehensible as possible for their audience.
. Conclusions
In this paper most energy was devoted to illustrating how a speaker’s choice of
gesture as well as where to locate those gestures, not only serves to amplify the
lexicalized information presented to hearers, but also serves to enrich that information by adding dimensions of meaning that might not otherwise be conveyed.
This extra dimension in some cases illustrates the mental spaces in which propositions are to be processed, such as the flipping pages gesture that denotes a book (in
which a declaration was made), while in other cases is revealed by the strategic use
of F-space that conveys the relevance of the subject-matter or the referent in talk
to the speaker (or the arguments that they put forward). The examples included in
this paper reveal that those speakers who make full use of conversational gesture,
also make productive use of Inside versus Outside F-space and thus amplify the
relevance of these referents to (primarily) themselves, in that a speaker’s F-space
has as its referential centre, the ego. The dynamics of F-space have been shown
to be an important source for discourse participants to navigate the many mental
spaces that may be required during discourse. The conceptual integration of these
sources of discourse information is important if a hearer is to fully comprehend
all the information speakers convey that is before them in talk.
Notes
* The research drawn upon in this paper was supported by a postdoctoral fellowship and a
New Staff grant to the author from the University of Queensland (Australia). Many thanks to
Roland Sussex and Shannon Dougherty. I’d also like to thank Adam Glanz for his very helpful
comments on an earlier draft of this paper. Thanks also to Pat Carroll and Simon Parker, with
whom many of these issues have been discussed and developed, and the helpful comments from
an anonymous reviewer. All oversights are of course my own.
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.18 (1097-1154)
 June Luchjenbroers
All correspondence concerning this article should be sent to: Dr J. Luchjenbroers, c/- Dept of
Linguistics, University of Wales, Bangor, GWYNEDD, LL57 2DG, Wales, U.K. Fax: 44+ 1248–38
2928; Email: <[email protected]>.
. Cf. the negation test (Luchjenbroers 1993). If this sentence were an interrogative form (instead of declarative), a simple ‘no’ answer most specifically rejects the proposition; a rejection of
the spatial definition(s) requires more linguistic effort.
. The complete mental spaces dynamics for this example would be: Focus Mental Space = In
foundation year (‘there’) + [proposition = a boy (from Hong Kong) was really bad (at plagiarism)]. The modifier ‘from Hong Kong’ points to an external, not focal Mental Space.
. McNeill (2000) separates these into (a) emblems and (b) pantomimes, which play out a scene
in more detail.
. During a recent presentation (2002) I showed a silent clip from an Australian male-male
dyad to illustrate this observation (that lasted several minutes), but before I could admit to the
audience that gestural analysis on such data is somewhat challenging, members of the audience
accused me of showing a ‘still’, and when they did finally notice movement it was welcomed with
an applause.
. I have noticed in other less formal conversations, that speakers may reveal a very different
comfort zone. For example, a speaker whose arm is flung over a chair will display a very different
F-space than those discussed in this paper: seemingly disjointed spaces instead of a single space.
Notably the distinction between inside and outside F-space is still defined by the amount of
overt effort a gesture costs the speaker.
. In some cases, where a speaker may be described as ‘voluminous’, the cube may also rise from
the table, as though speaking to a person positioned higher than (or further away from) the
speaker.
. Deixis (sometimes called ‘shifters’ because their specific reference shifts from speaker to
speaker), refers to lexical and gestural items that depend on context for meaning – e.g., sitting
here at my desk, my here is simultaneously everyone else’s there. Hence the words here and there,
have no objective meaning apart from indicating the speaker’s orientation toward phenomena
around him/her.
. Arrows above the utterance example indicate the onset of a gesture (not the target), although
in some cases a line from that arrow is an attempt to indicate how long it took the speaker to get
from gesture onset to target.
. ‘Simple’ does not refer to the length or detail of a gesture, only to whether that gesture
correlates with the meaning conveyed lexically.
. The metalinguistic term ‘semantics’ is used in reference to all forms in which a ‘word’ may
present in discourse. Thus, whether words, such as TAKE or THIS or HERE, are verbalized, or
conveyed gesturally or in print is irrelevant.
References
Carroll, Pat, June Luchjenbroers, & Simon Parker (2003). Sounds, Signs and Rapport: On the
methodological importance of including audio-visual data in an analysis of discourse. In
JB[v.20020404] Prn:13/02/2006; 13:17
F: HCP1505.tex / p.19 (1154-1293)
Discourse, gesture & mental spaces manoeuvers 
Grant Malcom (Ed.), Multidisciplinary Studies of Visual Representations and Interpretations.
Elsevier Science.
Chafe, Wallace (1994). Discourse Consciousness and Time: The flow and displacement of conscious
experience in speaking and writing. Chicago & London: Univ. Chicago Press.
Clark, Herbert H. (1993). Arenas of Language Use. Chicago: Chicago University Press.
Clark, Herbert H. (1996). Using Language. Cambridge U. Press.
Clark, Herbert H. (1997). Dogmas of Understanding. Discourse Processes, 23, 567–598.
Fauconnier, Gilles (1985). Mental Spaces: Aspects of Meaning Construction in Natural Language.
Cambridge, MA: MIT Press. [Rev. Ed. New York: Cambridge U. Press, 1994].
Fauconnier, Gilles & Eve Sweetser (Eds.). (1996). Spaces, Worlds, and Grammar. Chicago:
University of Chicago Press.
Grice, Paul (1978). Further Notes on Logic and Conversation. In Peter Cole (Ed.), Syntax and
Semantics, 9: Pragmatics (pp. 113–127). New York: Academic Press.
Grice, Paul (1975). Logic and Conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and
Semantics, 3: Speech Acts. New York: Academic Press.
Kendon, Adam (1981). Nonverbal communication, Interaction and Gesture. The Hague: Mouton.
Krauss, R. M. (1998). Why do we gesture when we speak? Current directions in Psychological
Science, 7, 54–59.
Lakoff, George & Mark Johnson (1980). Metaphors we live by. Chicago: Chicago University Press.
Lambrecht, Knud (1994). Information structure and sentence form. Cambridge Univ. Press.
Liddle, Scott (2002). Blended spaces and deixis in sign language discourse. In D. McNeill (Ed.),
Language and Gesture (pp. 331–357). Cambridge Univ. Press.
Luchjenbroers, June (1993). Pragmatic inference in language processing. Unpublished doctoral
dissertation, La Trobe Univeristy, Melbourne Australia.
Luchjenbroers, June (2000). Cognitive strategies for mutual ground construction. Paper
presented at the Language & Cognition Conference, Leiden University, Netherlands.
Luchjenbroers, June (2001). Prosodic and Gestural cues for Navigations around Mental Space.
BLS 27: Language and Gesture. Univ. of California Press (to appear).
Luchjenbroers, June (2002). Flick ’o the wrist or deliberate action: how gestural information
makes face-to-face conversation information-rich. Fourth Annual Meeting of Child
Language Group. University of Wales, Gregynog, UK.
Luchjenbroers, June (2004). Verbal & Visual Cues For Navigating Mental Space. In Grant
Malcom (Ed.), Multidisciplinary Studies of Visual Representations and Interpretations.
Elsevier Science.
Luchjenbroers, June ms. Cognitive Discourse: Theory meets Practice. (in progress).
McNeill, David (1992). Hand and Mind: What gestures reveal about thought. U. Chicago Press.
McNeill, David (Ed). (2000). Language and Gesture. Cambridge Univ. Press.
Tomlin, Russell (Ed.). (1987). Coherence and grounding in discourse. Amsterdam: Benjamins.
Tomlin, Russell (Ed.). (2001). Mapping conceptual representations into linguistic representations: the role of attention in grammar. In J. Nuyts & E. Pederson (Eds.), Language and
Conceptualization (pp. 162–189). Cambridge: C.U.P.
Tomlin, Russell, L. Forest, M.-M. Pu, & M. H. Kim (1997). Discourse Semantics. In Teun van
Dijk (Ed.), Discourse: A multidisciplinary introduction. London: Sage.
JB[v.20020404] Prn:9/02/2006; 10:15
 
Computational models
and conceptual mappings
F: HCP15P2.tex / p.1 (47-73)
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.1 (48-112)
chapter 
In search of meaning
The acquisition of semantic structures
and morphological systems*
Ping Li
University of Richmond
The acquisition of meaning has been an intensely debated issue in the field of
child language in the last thirty years. Recently, computational approaches that
rely on connectionist networks and statistical learning models provide new
insights into this issue. These models advocate that semantic representations are
best viewed as emerging out of a continuously developing and adapting
dynamical system. In this chapter, I show that connectionist networks can
capture the emergence and representation of semantic structures. Moreover, such
representations can serve to trigger productive morphological uses such as
overgeneralizations in language acquisition. Our modeling results suggest that
structured semantic representations emerge from statistical computations of the
various form-form and form-meaning constraints, and the evolution and
development of semantic representations as acquired by children are due to
simple probabilistic procedures as embodied in connectionist networks or
similar statistical learning mechanisms.
Keywords: connectionist networks, computational approaches, language
acquisition, corpus analysis, statistical learning
.
Introduction
The representation of language has been traditionally considered as a construction out of basic structural building blocks in the form of symbols and rules.
This approach in general looks at linguistic representations statically. A contrasting approach, in the spirit of recent developments in connectionist networks and
statistical learning, attempts to capture linguistic representations dynamically. It
considers linguistic representations as emergent properties that evolve out of a
continuously developing and adapting system. A shortcut to the understanding of
this approach might come from the following example. Structured, rule-like rep-
JB[v.20020404] Prn:13/02/2006; 13:26

F: HCP1506.tex / p.2 (112-166)
Ping Li
resentations in a connectionist network can emerge in much the same way as a
hexagonal structure emerges from the honeycomb: every honeybee packs a given
amount of honey to the honeycomb from multiple directions, but no honeybee has
a grand planning for the hexagonal structure (Bates 1984). In this paper, I provide
such an account of the emergence of semantic representations, in connection with
morphological learning in language acquisition.
Lexical semantics and its acquisition by children has been a hotly debated issue in the last thirty years. Until recently, most researchers in this domain have
thought that there is a fixed set of conceptual and semantic properties associated
with each lexical item, and that the child’s task is to acquire the necessary conceptual frameworks and the semantic properties. Recent computational models of
language processing suggest that lexical semantics may be emergent properties, in
particular, that lexical categories can be acquired by the computation of statistical
regularities inherent in the input data. These models are in many ways consistent
with the empirical approach of distributional analysis (dating back to structural
linguistics; Saussure 1916) that emphasizes the child’s ability to analyze the linguistic input (e.g., Maratsos & Chalkley 1980). They can be classified roughly into
two categories. First, proposals from statistical analyses of large-scale text corpora
indicate that lexical-semantic representations may emerge from multiple contextual and lexical co-occurrence constraints in a high-dimensional space. Second,
connectionist (or neural network) models indicate that lexical-semantic structures
can emerge from statistical learning of form-form and form-meaning mappings.
In what follows, I will briefly consider both types of models, but the focus of this
chapter will be on the second.1
High-dimensional semantic space and lexical representation
There have been a number of proposals that high-dimensional semantic space can
provide accurate and faithful representations of lexical semantics through multiple
contextual or lexical co-occurrence constraints in large text corpora. Two models
have emerged most prominently in the last few years: the hal model (Hyperspace Analogue to Language), advocated by Burgess and Lund (1997), and Lund
and Burgess (1996); and the lsa model (Latent Semantic Analysis), developed by
Landauer and Dumais (1997), and Landauer, Foltz, and Laham (1998). These two
models are highly compatible with each other, although the specific methods used
are different. In the following, I will focus on the hal model as our research has
linked this model specifically to children’s acquisition of lexical semantics.
According to hal, the meaning and function of a given word are determined
by lexical co-occurrence constraints in a high-dimensional input space, that is, by
what items may precede a word and what may follow it, and how often they do
so. hal focuses on global rather than local lexical co-occurrences: A word is an-
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.3 (166-236)
In search of meaning
Table 1. Global Co-occurrence Matrix for the Sentence, The horse raced past the barn. The
values in the matrix rows represent co-occurrence values for words that preceded the word
(row label). Columns represent co-occurrence values for words following the word (column label). Cells containing zeroes were left empty in this table. See Burgess and Lund
(1997). Reproduced with authors’ permission.
barn
barn
horse
past
raced
the
horse
past
raced
the
2
4
3
6
5
3
4
2
4
5
3
5
5
4
chored with reference not only to other words immediately preceding or following
it, but also to words that are further away from it in a variable co-occurrence window, with each slot (occurrence of a word) in the window acting as a constraint
dimension to define the meaning and function of the target word.
The example in Table 1 illustrates the notion of global lexical co-occurrence
more clearly. It shows a matrix using a 5-word moving window for just one sentence (the horse raced past the barn). Within this five-word window, co-occurrence
values are inversely proportional to the number of words separating a specific pair
of words. A word pair separated by a four-word gap, for instance, would gain a cooccurrence strength of 1, while the same pair appearing adjacently would receive
an increment of 5. The product of this procedure is an N-by-N matrix, where N is
the number of words in the vocabulary being considered.
This table illustrates how the matrix acquires information about meaning.
Consider, for example, the word barn. The word barn is the last word of the sentence and is preceded by the word the twice. The row for barn encodes preceding
information that co-occurs with barn. The occurrence of the word the just prior
to the word barn gets a co-occurrence weight of 5 since there are no intervening
items. The first occurrence of the in the sentence gets a co-occurrence weight of 1
since there are four intervening words. Adding the 5 and the 1 results in a value
of 6 recorded in that cell. A word meaning vector is formed by concatenating the
row and column values for the lexical item. Of course, not all vector values or
elements contribute equally to the meaning representation. The most appropriate elements are those that contribute most to the contextual meaning and this is
determined by identifying which vector elements have the greatest contextual diversity (see Lund & Burgess 1996, for details). It is this more complex pattern of
co-occurrence, which is referred to as global lexical co-occurrence that contributes
to the richness of meaning. In short, global lexical co-occurrence is a measure of a
word’s total experience in the context of other words. The meanings of a word, in

JB[v.20020404] Prn:13/02/2006; 13:26

F: HCP1506.tex / p.4 (236-296)
Ping Li
this perspective, emerge from multiple constraints in a high-dimensional space of
language use.
Although models like hal are not originally designed for language acquisition,
they have significant implications for the acquisition of word meanings. Redington, Chater, and Finch (1998) used a similar method as hal to capture lexical
syntactic categories in child language. In another study, Li, Burgess, and Lund
(2000) applied the hal method to the analysis of parental speech in the childes
database (Child Language Database Exchange System; see MacWhinney 2000, for
a description of the database). We analyzed 3.8 million words from the speeches of
parents and caregivers addressed to children, and found that a reasonable size of
speech corpus (e.g., 3.8 million words) with a reasonable amount of co-occurrence
constraints (e.g., 50 co-occurrence elements) can yield accurate and faithful semantic representations of English words.2 Our results suggest that young children
can learn word meanings by exploiting the considerable amount of contextual information in the input to compute multiple higher-order lexical constraints. This
approach relies on a few simple assumptions about what the learner does. One important assumption is that the learner has the ability to track continuous speech
with some limitation on working memory, which can be modeled with a weighted
moving window of a variable size; another assumption is that the learner is sensitive to lexical co-occurrences during language processing. Such statistical abilities
seem to be readily available to the child at a very early age, as studies of statistical
learning in infants have revealed (Saffran, Aslin, & Newport 1996). In short, global
lexical co-occurrences can provide useful and powerful cues to the young child in
the acquisition of word meanings.
Emergent semantic structures in connectionist networks
A second set of models, consistent and complimentary with the computational approach discussed above, are the connectionist models of language processing and
language learning. Recent years have seen rapidly developing interests in the application of connectionist models to the study of language acquisition (see Elman,
Bates, Johnson, Karmiloff-Smith, Parisi, & Plunkett 1996; Klahr & MacWhinney
1998 for overview). This interest dates back to Rumelhart and McClelland’s (1986)
connectionist model of the learning of the English past tense and the debates thereafter (MacWhinney & Leinbach 1991; Pinker 1991, 1999; Pinker & Prince 1988;
Plunkett & Marchman 1991, 1993; Seidenberg 1997). Connectionist models rely
on the use of a large number of connected micro-processing units (called ‘nodes’
or ‘neurons’) that activate in parallel and adjust weights of connections between
one another through learning and processing.3 Two key assumptions of these networks have to do with (a) representation – knowledge is represented as patterns of
activation distributed across the processing units, and (b) learning – new knowl-
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.5 (296-343)
In search of meaning
edge is formed through the adaptation of the strengths or weights of connections
that hold among the processing units. These assumptions differ from traditional
cognitive assumptions about knowledge representation that involves discrete symbolic representations of concepts, categories, and grammatical rules. With regard
to language acquisition, advocates of connectionism argue that linguistic representations (of the lexicon, morphology, and grammar) are “emergent properties”
due to the interaction of the processing units with the linguistic environment in
the form-meaning mapping process. This view contrasts with the traditional psycholinguistic approaches that emphasize the mental representation of rules and
the innateness of grammatical and semantic categories.
Connectionist principles of distributed representation, weight adjustment,
and nonlinear learning provide a mechanistic account of how syntactic and semantic structures can emerge out of learning. For example, Elman (1990, 1995)
showed that a simple recurrent network is able to derive internal representations
of semantic as well as syntactic categories in a task of predicting the next word in
the sentence. Lexical categories such as nouns and verbs, animate and inanimate,
and human and animals emerge clearly in the network’s hidden-unit representations after the network has been trained to map the current word in the input
stream to the next word. What the network does is similar to the process of detecting lexical co-occurrence constraints in the input (as does the hal model). Note
that both Elman’s network and the hal method can be likened to the “distributional analysis” technique used by structural linguistics (Bensch 1991), although
structural linguistics did not have today’s powerful statistical machinery and computational tools.
Li (1993) and Li and MacWhinney (1996) discussed more explicitly how a
connectionist network can develop internal representations of semantic structures. Using the acquisition of the English reversive prefix un- as an example,
they examined the role of cryptotypes in determining overgeneralization patterns, competition principles, and plasticity of learning. In three simulations, they
showed that structured semantic representations can emerge from connectionist
learning: the network formed internal representations of semantic categories that
correspond to Whorf ’s cryptotypes, on the basis of learning limited semantic features of verbs and morphological classes. More important, the network produced
overgeneralization errors similar to those reported by Bowerman (1982), Clark,
Carpenter, and Deutsch (1995), and those observed in the childes database, indicating that emergent semantic structures underlie patterns of productivity in
child language.
In this paper, I take a more in-depth look at the issue of the acquisition of
semantic structure along with the acquisition of morphological systems. I will focus on the second set of models discussed above, the connectionist approach to
language acquisition, summarizing results from our studies. Our results indicate

JB[v.20020404] Prn:13/02/2006; 13:26

F: HCP1506.tex / p.6 (343-411)
Ping Li
how semantic structures can emerge from the learning of probabilistic associations that hold between lexical items and morphological markers. Moreover,
understanding gained from connectionist semantic acquisition directly helps us
to identify psycholinguistic and computational mechanisms of generalization and
overgeneralization in language acquisition.
. Cryptotype as an emergent category and as a trigger
for overgeneralization
Whorf ’s cryptotype
In one of the classic papers of early cognitive linguistics, Whorf (1956) presented
the following puzzle. In English, the reversive prefix un- can be used productively
with many verbs to indicate the reversal of an action, for example, as in uncoil, uncover, undress, unfasten, unfold, unlock, untie, or untangle (the meaning of reversal
can also be expressed by other prefixes such as dis- or de-). However, many seemingly parallel forms are not allowed, such as *unbury, *unfill, *ungrip, *unhang,
*unpress, *unspill, *unsqueeze, or *untighten. Why is un- prefixation allowed with
some verbs but not others? None of the standard categories of Latin grammar can
be used as a basis for a rule to tell us when we can use un- and when we cannot.
Whorf ’s puzzle was deeper than this simple discrepancy. He reminded us that
un- is a productive device in English morphology, and that despite the difficulties
that linguists have in characterizing its use, native speakers do have an intuitive
feel for which verbs can be prefixed with un- and which cannot. He presented the
following thought experiment: if a new verb, flimmick, is coined to mean “to tie a
tin can to something”, then native speakers are willing to accept the sentence, “He
unflimmicked the dog” as expressing the reversal of the “flimmicking” action; if
flimmick means “to take apart”, then they will not accept “He unflimmicked the
puzzle” as describing the act of putting a puzzle back together. The constrained
productivity of un- prompted Whorf that there was some underlying or covert semantic category, a cryptotype, that governs the productive use of un-. According
to Whorf, cryptotypes only make their presence known by the restrictions they
place on the possible combinations of overt forms. When the overt prefix un- is
combined with the overt verb tie, there is a covert cryptotype that licenses the combination untie. This same cryptotype also blocks a combination such as *unmove.
To Whorf, the deep puzzle was that while the use of the prefix un- is productive,
the cryptotype that governs its productivity is unclear: “we have no single word
in the language which can give us a proper clue to its meaning or into which we
can compress this meaning; hence the meaning is subtle, intangible, as is typical
of cryptotypic meanings.”
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.7 (411-460)
In search of meaning
Although cryptotype seemed puzzling, Whorf did propose that there was “a
covering, enclosing, and surface-attaching meaning” (Whorf 1956: 71) that could
be the basis of the cryptotype for un-. Whorf was correct in noting that verbs that
take un- usually have one or more of the covering, enclosing, or surface-attaching
meaning. But it is not clear whether we should view this cryptotype as a single unit,
three separate meanings, or a cluster of related meanings. Nor is it clear whether
these notions of attachment and covering fully exhaust the subcomponents of the
cryptotype. Subsequent analyses have suggested certain additional components
not initially considered by Whorf. For example, Marchand (1969) and Clark et
al. (1995) argue that verbs that license un- all involve a change of state, usually expressing a transitive action. This transitive action typically reaches a terminal point
in time (encoded by a telic verb; Comrie 1976), or some end state or result (an accomplishment verb; Vendler 1967). When the meaning of a verb does not involve
a change of state or does not indicate telicity or accomplishment, the verb cannot
take un-, thus the ill-formedness of verbs like *unswim, *unplay, and *unsnore.
Cryptotype and morphological productivity in child language
Whorf ’s discussion shows clearly how cryptotype is important to the use of unin the adult language. Bowerman was the first to point out that the notion of
cryptotype might also play an important role in children’s acquisition of un-.
According to Bowerman (1982, 1983, 1988), children’s acquisition of un- tends
to follow a U-shaped pattern, a pattern that children display in other areas of morphological acquisition as well, such as the acquisition of the English past tense
(Brown 1973; Kuczaj 1977). Children initially produce un- verbs in appropriate
contexts, treating un- and its base verb as an unanalyzed whole. This initial stage
of rote control is analogous to the child’s saying went without realizing that it is the
past-tense form of go. Productivity of un- comes at the next stage, when children
realize that un- is independent of the verb to indicate the reversal of an action.
The next stage in the acquisition of un- begins at around age 3. At this
stage, children start to produce overgeneralizations in spontaneous speech such
as *unarrange, *unbreak, *unblow, *unbury, *unget, *unhang, *unhate, *unopen,
*unpress, *unspill, *unsqueeze, or *untake (Bowerman 1982). These overgeneralizations have also been observed in Clark et al. (1995) in both experimental and
naturalistic data with children from ages 3 to 5, for example, *unbend, *unbury,
*uncrush, *ungrow, *unstick, and *unsqueeze. Similar examples can also be found
in the childes database, such as *unblow, *unbuild, *uncatch, *uncuff, *unhand,
*unlight, *unpull, *unstick, and *unzipper (see Li & MacWhinney 1996, for a more
complete list of examples of overgeneralization errors). During this period, children also make certain ‘overmarking’ errors. For example, the child might say
*unopen and really only means to say open, or unloosen to mean loosen. In such

JB[v.20020404] Prn:13/02/2006; 13:26

F: HCP1506.tex / p.8 (460-515)
Ping Li
cases, the base forms open and loosen have a reversive meaning that triggers the
attachment of the prefix, even when the action of the base meaning is not actually
being reversed. These errors are analogous to redundant past-tense marking as in
*camed and redundant plural marking as in *feets (Brown 1973). As children grow
older, overgeneralization or overmarking errors gradually disappear.
A traditional explanation of the U-shaped pattern in children’s morphological acquisition goes like this: initially they rely on rote learning, then they develop
a general rule and apply it productively (and overgeneralize it), and finally they
recover from productive errors (this is much like what has been argued for the
acquisition of the English past tense). For productivity to take place at the second stage, Bowerman correctly pointed out that cryptotype plays an important
role. But how could the child extract the cryptotype and use it as a basis for morphological generalization or recovery, when the cryptotype is intangible even to
linguists like Whorf? (see Whorf ’s comments on the subtle and intangible nature
of the cryptotype as discussed earlier).
A connectionist account of cryptotype and its acquisition
A connectionist perspective provides us with a natural way of capturing Whorf ’s
insights of cryptotype as well as its acquisition in a formal mechanism. In our view,
there can be several ‘mini-cryptotypes’ that work together as interactive ‘gangs’
(McClelland & Rumelhart 1981). For example, “enclosing” verbs, such as coil,
curl, fold, reel, roll, screw, twist, and wind, all seem to share a meaning of circular movement. Similarly, “attaching” verbs, such as clasp, fasten, hook, link, plug,
and tie, all involve hand movement. Other verbs such as bind, buckle, fasten, latch,
leash, lock, strap, tie, and zip form a mini-cryptotype that share a “binding” or
“locking” meaning. Still another cluster of verbs such as cover, dress, mask, pack,
veil, and wrap forms the “covering” mini-cryptotype. These mini-cryptotypes or
mini-gangs interact collaboratively to support the formation of the larger cryptotype that licenses the use of un-, in terms of summed activation, as illustrated in
Figure 1.
The mini-gangs collaborate rather than compete because their members are
closely related by the overlap of semantic features. For example, the verb screw
in unscrew may be viewed as having both a meaning of circular movement and
a meaning of binding or locking; zip in unzip may be viewed as sharing both the
“binding/locking” meaning and the “covering” meaning, and both screw and zip
involve hand movements. Moreover, a feature may also vary in the strength with
which it is represented in different verbs. For example, circular movement is an
essential part of the meaning of the verb screw, but less so for wrap (one can
wrap a small ball with a soft tissue paper without turning around either the object
or the wrapping paper). These properties of feature overlap and degraded featu-
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.9 (515-569)
In search of meaning
covering
“dress, mask,wrap”
binding
“buckle, fasten,
strap”
enclosing
“fold, reel, wind”
change of location
“load, pack, plug”
UNCryptotype
circular movement
“coil, curl, roll”
change of state
“scramble,
tangle, twist”
attaching
“hook, link, tie”
Figure 1. Multiple features support the formation of the un- cryptotype. Arrows represent
the feature-to-category connections; the weights or strengths of connections are omitted.
Dots in the center of the circle represent words that fit the core of the category, while dots
near the border of the circle represent borderline cases.
ral composition lend themselves naturally to properties of connectionist models.
Distributed patterns, weighted connections, nonlinear learning as embodied in
connectionist networks seem to be ideal for handling the elusiveness and gradience
of these semantic structures.
In the last few years, our laboratory has carried out connectionist simulations
to study the issue of semantic structure and overgeneralization, using the acquisition of un- as an example. In the following sections, I will discuss two major
models in this endeavor. The first model uses a standard feed-forward network
to simulate the acquisition of cryptotypes and prefixes. The second model uses
a self-organizing neural network, which has also been recently applied to the acquisition of semantic and grammatical structures in children and in bilingualism.
Readers who are interested in the technical details of these models should consult
Li and MacWhinney (1996), Li and Farkas (2002), Li (2003), and Li, Farkas, and
MacWhinney (2004).

JB[v.20020404] Prn:13/02/2006; 13:26

F: HCP1506.tex / p.10 (569-626)
Ping Li
. A feed-forward network that learns to map semantic features
of verbs to prefixes
Method
Connectionist networks that use the back-propagation algorithm (henceforth
‘backpropagation networks’) are perhaps the most popular class of networks
and are most widely applied in studies dealing with language. A standard backpropagation network consists of three layers of processing units (Rumelhart,
Hinton, & Williams 1986). In this type of network, information is first encoded
at the input layer, then it funnels through the hidden layer, where internal representation is formed, and finally results are produced at the output layer (hence
the nickname of ‘feed-forward networks’). Each layer consists of different units,
representing different states/processes of information processing (from input to
output). Learning in this case is a function of adjusting the weights of the connections between units across the layers. The adjustment is done through the
back-propagation algorithm, according to which the network discovers a discrepancy between its actual output and the desired output, and then an error signal
is propagated back through the system, so that weights are adjusted in a way
such that the next time the same input will lead to an output that matches more
closely to the desired output (for technical details of the algorithm, see Rumelhart,
Hinton, & Williams 1986).
In our simulation, we used 160 verbs as input to our network. They consisted
of 49 verbs that can take the prefix un-, 19 verbs that can take the competing prefix
dis- (see Li & MacWhinney 1996, for the rationale of including dis- verbs, and
the competition between un- and dis- in both child and adult languages), and 92
randomly selected verbs that can take neither prefix (henceforth ‘zero verbs’). Each
verb was represented by a semantic pattern (a vector) that consists of 20 semantic
features. These features were selected in an attempt to capture basic linguistic and
functional properties inherent in the semantic range of these verbs. In order to
objectively determine the values of each semantic feature, we presented 15 native
English speakers with the 160 verbs along with the 20 semantic features, and asked
them to judge the semantic relevance of each feature to each verb. A feature-byverb relevance matrix was derived for each subject, and the final input vectors
were derived by averaging the matrices from all subjects. A hierarchical clustering
analysis on these vectors attests to the validity of our method, as distance metrics
in this analysis reflected the similarities and differences between words.
The task of the network was to take the semantic vectors of English verbs as
input, and map them onto different prefixation patterns in the output: un-, dis-,
and zero. Figure 2 shows the network architecture and examples.
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.11 (626-633)
In search of meaning
UN-
DIS-
Æ
Internal
Representation
.7 .6 .5 .6 .1 .2 .1 .9 .3 .5 .2 .0 .3 .3 .6 .7 .0 .1 .0 .0 connect
.9 .5 .6 .7 .0 .3 .1 .9 .4 .7 .3 .0 .5 .3 .4 .8 .1 .2 .0 .1
link
.6 .0 .0 .0 .3 .5 .5 .1 .3 .1 .1 .6 .0 .1 .1 .0 .1 .1 .0 .0 turn
······
······
······
(160 verbs)
Figure 2. The feed-forward network that learns to map semantic features of verbs to
prefixation patterns (un-, dis-, Ø).
Results and discussion
Connectionist networks are dynamic systems that explore the regularities in the
input-output mapping processes through the activation of the hidden units and
the adjustment of connection weights (to and from the hidden units). To analyze how our network developed internal representations, we used the hierarchical
cluster analysis to probe into the activation of the hidden units at various points
in time during the network’s learning (see Elman 1990, for an application of this
method). Figure 3 (in Appendix) presents such an analysis at three time points, the
early (3a), intermediate (3b), and late stages of learning (3c), respectively. Focusing
here on the verbs that share the enclosing-rotating meaning (most of which can
be prefixed with un-), we can see how the network developed structured semantic
representations. These cluster trees indicate that early on with little learning, there
was not much meaningful structure in the data, and thus, the enclosing-rotating
verbs were scattered all over the cluster tree. Gradually as learning progressed,
these verbs started to form smaller groups at several levels. Finally when learning
reached a stable situation, they were all grouped under one cluster.
These snapshots provide a picture of the developmental trajectories in the
network’s integration of semantic structures during the meaning-form mapping

JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.12 (633-668)
 Ping Li
process. They illustrate how a mini-cryptotype, such as the enclosing-rotating category, which supports the use of un-, can emerge from learning the mapping of
verb semantics to prefixation.
In the studies reported by Li and MacWhinney (1996), we used an incremental learning procedure, in which the network took in the input gradually, verb by
verb. Learning with this procedure also lent us insights into the formation of cryptotype in the network. Figure 4 shows a cluster tree of the network’s hidden-unit
representation when the network learned 50 verbs. In this graph, we can observe
arrange DIS
connect DIS
put ZERO
ravel UN
hold ZERO
wind UN
hook UN
mount DIS
lace UN
coil UN
plug UN
cork UN
hitch UN
bind UN
fasten UN
latch UN
braid UN
chain UN
make ZERO
learn ZERO
turn ZERO
stop ZERO
roll UN
keep ZERO
call ZERO
believe ZERO
wait ZERO
help ZERO
come ZERO
get ZERO
take ZERO
walk ZERO
run ZERO
give ZERO
ask ZERO
tell ZERO
say ZERO
see ZERO
talk ZERO
hear ZERO
like ZERO
start ZERO
go ZERO
work ZERO
look ZERO
show ZERO
use ZERO
charge DIS
allow ZERO
reach ZERO
Figure 4. A hierarchical cluster analysis of the network’s hidden-unit representations after the network has learned 50 verbs. The labels after the verbs were not provided to the
network during training.
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.13 (668-751)
In search of meaning
two general clusters: one for the un- verbs, and the other for the zero verbs – verbs
that cannot be prefixed with un- or dis-. Our interpretation of these clusters is that
the network acquired a distinct representation for the un- verbs by identifying the
mini-cryptotypes inherent in these verbs. For example, most of the verbs in the
un- cluster share the cryptotypic meaning of binding or locking: bind, chain, fasten, hitch, hook, latch, etc. However, not all mini-cryptotypes were identified at this
time, and they emerged at different stages as discussed above. Figure 4 also shows,
for example, that the network had not yet developed a clear representation for the
enclosing verbs: the verbs ravel and coil were correctly categorized into the uncluster, but the verb roll was incorrectly treated as a zero verb.
Note that our network received no discrete label of the semantic category associated with un-, nor was there a single categorical feature that tells which verb
should take which prefix (hence Whorf ’s problem). All that the network received
was semantic featural information distributed over different input patterns. Over
time, however, the network was able to identify the regularities that hold between
distributed semantic patterns and patterns of prefixation, and developed a structured representation in the mapping process. The structured representations in
the network thus emerged as a function of its learning of the association between
form and meaning, not as a property that was given ad hoc to the network by
the modeler.
The emerging representations also clearly capture Whorf ’s notion of cryptotype. The meaning of a cryptotype constitutes a complex semantic network, in
which verbs differ from one another with respect to (a) how many features each
verb contains, (b) how strongly each feature is represented in the verb, and (c) how
strongly features overlap with one another within a verb (all true with the input
to our network). It is these complex relationships that give rise to the notion of
cryptotype.
The emergence of cryptotype representations in our network can be viewed
as a replacement for the traditional analytic frameworks of categories and rules
(Lakoff 1987; MacWhinney 1989). In this perspective, children’s learning of unis not simply the learning of a symbolic rule for the use of the prefix with a class
of verbs (given that it is not even clear what the rule is), but the accumulation of
the connection strengths that hold between a particular prefix and a set of semantic features distributed across verbs. The learner groups together those verbs that
share the largest number of features and take the same prefix. Over time, the verbs
gradually form clustered patterns, with respect to both meaning and prefixation
pattern. This learning process can best be described as a statistical procedure in
which the child implicitly tallies and registers the frequencies of co-occurrence of
semantic features, lexical items, and morphological devices.
Bowerman (1982, 1983) suggested that there are two possible roles for cryptotypes to influence the learning of un-. (a) “Recovery via cryptotype”: cryptotypes

JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.14 (751-794)
 Ping Li
help the child to overcome overgeneralizations made at an earlier stage, if these
overgeneralizations involve verbs that fall outside the cryptotype, such as *uncome,
*unhate, and *untake (Bowerman 1982); (b) “Generalization via cryptotype”:
cryptotypes trigger productivity and leads to overgeneralizations. This occurs because, once children have identified the cryptotype, they will overgeneralize un- to
all verbs that fit the cryptotype, irrespective of whether the adult language actually
allows un- with these verbs. Our simulation results provide support for the second role of cryptotype in inducing overgeneralizations that fall within the realm
of the cryptotype. Figure 4 showed how the network included hold and mount in
the un- category. These verbs were included apparently because of their semantic
similarity with members of the cryptotype, most of which can take un- (e.g., bind,
chain, fasten, hitch, hook, latch). Examining the output patterns of hold and mount
in the network, we found that un- was overgeneralized on these verbs. Similar
overgeneralization errors produced by the network included *unbury, *uncapture,
*unfill, *unfreeze, *ungrip, *unhold, *unloosen, *unmelt, *unpeel, *unplant, *unpress, *unsplit, *unsqueeze, *unstrip, *untack, and *untighten, most of which fit the
cryptotype meaning. Our network produced few simulated errors that were flagrant violations of the cryptotype meaning, such as forms like *uncome reported
by Bowerman (1982), thus our results provide no direct evidence for the first role
of cryptotype as hypothesized by Bowerman. In our simulations, overgeneralizations occurred typically after the network had developed structured cryptotype
representation, indicating that cryptotype served as a trigger for morphological
overgeneralization.
These results match up well with available empirical data. For example, one
child in Bowerman’s study produced errors such as *uncapture, *unpeel, *unpress,
*unsplit, *unsqueeze, and *untighten, similar to those in our network. The overgeneralizations that the child produced all fell within the cryptotype, and her
acquisition of un- as a reversive prefix went hand in hand with her discovery of
the cryptotype meanings of the verbs. In Clark et al.’s (1995) naturalistic data, the
child’s innovative uses of un- also respected the cryptotype from the beginning.
Clark et al. noted that the child’s use of un- matched the semantic characteristics of
the cryptotype even when the conventional meanings of the verb in the adult language did not: *unbuild was used to describe the action of detaching lego-blocks,
*undisappear was used to describe the releasing of the child’s thumbs from inside
his fists.4 Thus, once the learner (child and network alike) formed a structured representation that corresponds to the cryptotype for un-, the representation guides
the learner’s behavior in productive morphological use.
In subsequent simulations, our network also displayed a limited amount of
recovery from overgeneralization errors. Typically, recovery was best when the network had developed only partial or unstable semantic structures at relatively early
stages of learning, and it became increasingly difficult when a fixed structure had
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.15 (794-832)
In search of meaning 
emerged at later stages of learning (Li & MacWhinney 1996). This is because the
back-propagation learning algorithm proceeds in such a way that early on, the network’s weight configurations are not fully committed and more flexible to change,
but later on as the network learns more and more words, it settles on a more
stable weight space that makes adjustment difficult if not impossible (see Elman
1993: 91–93 for a detailed discussion of how the learning algorithm determines
weight adjustment over time). This situation does not seem to match with what
we know about child language: most children eventually recover from all overgeneralization errors, no matter how late. Even tough plasticity might be particularly
characteristic of early learning (Spitzer 1999), older children and adults are still
able to change, adapt, and recover from errors, unlike the network studied here
(Bownds 1999). This mismatch, along with other considerations discussed below,
prompted us to study another type of connectionist model, the self-organizing
neural network, to account for lexical acquisition.
. A self-organizing network that learns to map semantic
features to prefixes
Although most previous connectionist model of language acquisition have relied on the use of feed-forward networks with back-propagation, researchers have
started to see their limitations. In addition to its limited ability to recover from
overgeneralizations, there were two other major limitations to the network that
we used. First, our network, like most previous models, received semantic input
features selected on the basis of linguistic analyses on the part of the modeler. Input representation in this way is subject to the criticism that the network worked
(e.g., displayed cryptotype representation) precisely because of the use of certain
semantic features (cf. Lachter & Bever 1988). To overcome potential limitations
associated with this problem, in the new simulations we used semantic representations that are based on analyses of global lexical co-occurrences from a large
text corpus (see previous discussion of hal, and Method below). Second, backpropagation relies on a gradient-descent weight adjustment process to reduce
the error between desired and actual outputs, but this type of adjustment seems
unrealistic for child language learning. According to the well-known “no negative evidence” argument (Baker 1979; Bowerman 1988; Pinker 1989), children do
not receive constant feedback about what is incorrect in their speech, or receive
the kind of error corrections on a word-by-word basis as provided to a backpropagation network. Thus, back-propagation networks would seem to be poor
candidates as models of language acquisition on grounds of their psychological or
biological plausibility. Considerations of these problems lead us to self-organizing
neural networks. Self-organizing networks are biologically more plausible because
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.16 (832-882)
 Ping Li
one could conceive of the human cerebral cortex as essentially a self-organizing
map (or multiple maps) that compresses information on a two-dimensional space
(Spitzer 1999). They are computationally more relevant because one could argue
that child language acquisition in the natural setting (especially organization and
reorganization of the lexicon) is largely a self-organizing process that proceeds
without explicit teaching (MacWhinney 1998, 2001).
Method
In contrast to standard feed-forward networks, self-organizing networks use unsupervised learning that requires no presence of a supervisor or an explicit teacher;
learning is achieved entirely by the system’s self-organization in response to the
input (Kohonen 1982, 1989, 2001). Self-organization in these networks typically
occurs in a two-dimensional map (self-organizing map), where each unit is a location on the map that can uniquely represent one or several input patterns. At the
beginning of learning, an input pattern randomly activates one of the many units
on the map. Once a unit becomes active in response to a given input, the weights
to the unit and its neighboring units are adjusted so that they become more similar to the input and will therefore respond to the same or similar inputs more
strongly the next time. In this way, the network gradually develops concentrated
areas of units on the map (like the activity “bubbles”) that respond to particular inputs. This process continues until all the inputs can elicit specific response
patterns in the network. As a result of this self-organizing process, the statistical
structures implicit in the multi-dimensional space of the input are represented in
the two-dimensional space of the map.
Here we used the hierarchical feature map model of Miikkulainen (1993,
1997) in our simulations, because it combines multiple self-organizing maps in
a single network. In this model, there is a semantic map that processes semantic
information of the words, and there is a phonological map that processes phonological information of words (for more details of the application of the model, see
Li 2003). The two maps are connected via associative links trained by Hebbian
learning, a well-established biologically plausible learning principle, according to
which the associative strength between two units (semantic and phonological) is
increased if the units are both active at the same time (Hebb 1949).
The same set of verbs described in §3 was used as the input, but they were
represented differently from the way they were represented in the previous simulations. The semantics of these words were encoded as patterns of global lexical
co-occurrence constraints (Burgess & Lund 1997; see §1), rather than patterns of
semantic features selected on the basis of our own linguistic analyses. Each verb
was represented as a pattern of 100 units, and the values of these units reflected
the degree of a lexical co-occurrence constraint (on a continuous scale from 0 to
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.17 (882-959)
In search of meaning 
1). We also derived a phonological representation for each verb and the prefixes
un- and dis-, according to MacWhinney and Leinbach (1991). In this representation scheme, each verb was encoded by 168 units in a syllabic template to represent
the combinatorial constraints of phonology (see also Li & MacWhinney 2002, for
details).
Upon training of the network, a phonological representation of the verb was
presented to the network, and simultaneously, the semantic representation of the
same verb was also presented to the network. By way of self-organization, the network formed an activity on the phonological map in response to the phonological
input, and an activity on the semantic map in response to the semantic input.
Depending on whether the verb is prefixable with un- or dis-, the phonological representation of un- or dis- may also be co-activated with the phonological
and the semantic representations of the verb stem. At the same time, through
Hebbian learning the network formed associations between the two maps for all
the active units that responded to the input. The network’s task was to create
new representations in the corresponding maps for all the input words and to
be able to map the semantic properties of a verb to its phonological shape and its
morphological pattern.
Results and discussion
In our network, the self-organizing process extracted and compressed the highdimensional information from the hal semantic vectors and expressed the semantic similarities on the two-dimensional space as localized patterns of activity.
Figure 5 presents a snapshot of the network’s self-organization of 120 verbs after
the network was trained for 600 epochs.
An examination of the semantic map shows that the network has clearly developed forms of representation that correspond to cryptotype categories. Earlier
we suggested that a connectionist model provides a formal mechanism to capture
Whorf ’s notion of cryptotype, in that there can be several mini-cryptotypes that
work collaboratively as interactive gangs to support the formation of the larger
cryptotype. The idea of ‘mini-cryptotype’ is reflected most clearly in the emerging structure of the self-organizing map. Our network, without the use of ad hoc
semantic features, formed clear mini-cryptotypes by mapping similar words onto
nearby regions of the map. For example, towards the lower right-hand corner,
verbs like lock, clasp, latch, lease, and button are mapped to the same region of
the map, and these verbs all share the “binding/locking” meaning. A similar minicryptotype also occurs towards the lower left-hand corner, including verbs like
snap, mantle, tangle, ravel, twist, tie, and bolt. Still a third mini-cryptotype can
be found in the upper left-hand corner, including hear, say, speak, see, and tell,
verbs of perceptions and audition. Finally, one can observe that embark, engage,
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.18 (959-982)
 Ping Li
Figure 5. A self-organizing map model that shows the organization of 120 verbs after the
network was trained on these verbs for 600 epochs. The upper panel is the lexical phonological map (indicated by capital letters), and the lower panel the semantic map (indicated
by lower-case letters). Words longer than four letters are truncated.
integrate, assemble, and unite are being mapped toward the upper right-hand corner of the map, which all seem to share the “connecting” or “putting-together”
meaning (interestingly, these are the verbs that can take the prefix dis-). Of course,
the network’s representation at this point is still incomplete, as self-organization
is moving from diffuse to more focused patterns of activity; for example, the
verb show, which shares similarity with none of the above mini-cryptotypes, is
grouped with the binding/locking verbs. What is crucial, however, is that these
mini-cryptotypes form the semantic basis for the larger cryptotype of un- verbs.
As shown in Figure 5, the network has mapped most verbs in the cryptotype to the
bottom layer of the semantic map, and these are the verbs that can take the prefix
un-.
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.19 (982-1020)
In search of meaning 
Moreover, our network was not only able to capture the elusive cryptotype
by way of self-organization, but also able to generalize on the basis of its representation of the cryptotype. During testing of the network’s productive ability,
overgeneralization occurred with 50% of the testing words. For example, the network produced overgeneralization errors that match up with empirical data and
our previous simulation results (see §3), including *unbreak, *uncapture, *unconnect, *unfreeze, *ungrip, *unpeel, *unplant, *unpress, *unspill, *unstick, *untighten,
etc. These overgeneralizations were based both on the network’s representation of
the meaning of verbs and on the associative connections that the network formed
through Hebbian learning in the semantics-phonology mapping process. Again,
like in our previous simulations, most of these overgeneralizations involve verbs
that fall within the un- cryptotype. Thus, the results here are again consistent with
the “generalization via cryptotype” hypothesis, that is, the representation of cryptotype leads to overly general uses of un- (see also discussion of the clench example
below) rather than the narrowing down of its uses (as predicted by the “recovery
via cryptotype” hypothesis).
One of the advantages of the self-organizing model is its ability to simulate
comprehension and production through associative connections. The associative
connections formed via Hebbian learning provide the basis for the production
of overgeneralization errors. For example, the semantic properties of tighten and
clench are similar and they were mapped onto nearby regions of the semantic
map. During learning, the semantics of clench and unclench were co-activated,
and the phonology of clench, unclench, and un- were also co-activated. When
the semantics and the phonology of these items were associated through Hebbian
learning, the network linked the semantics of tighten with the prefix un- because
of clench, even though the network learned only the association for un-clench and
not un-tighten (when tighten was withheld from training at an earlier stage). This
associative process of correlating semantic features, lexical forms, and morphological devices simulates the process of learning and generalization in children’s
productive speech, and shows that overgeneralizations can naturally result from
the semantic structure in the lexical representations (which in turn is a result of
self-organization), and from the associative learning of semantics and phonology.
In §3 we discussed the failure of a feed-forward network in recovering from
overgeneralization errors. We attributed that failure to the gradient-descent erroradjustment process used in the back-propagation algorithm. In self-organizing
networks, recovery is a function of the adjustment of associative connections via
Hebbian learning, proportional to how strongly the units in the associated maps
(phonological and semantic maps in this case) are co-activated. When a given
phonological unit and a given semantic unit have fewer chances to become coactivated, the strengths of their associative links are correspondingly decreased.
We could compare this to a situation in which the learner receives no auditory
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.20 (1020-1085)
 Ping Li
support about the specific meaning-form co-occurrences that he or she expects
in the production (MacWhinney 1997). Given that the learning system is inputsensitive, over time, the meaning-to-form connections will weaken and therefore
less likely to occur in its production.
Indeed, our network displayed significant ability to recover from generalization errors. When tested for recovery with additional new learning (500 epochs),
the network recovered from the majority of the overgeneralizations (75% recovery). Recovery in this case is a process of restructuring of the mapping between
phonological, semantic, and morphological patterns, and the restructuring is
based on the network’s ability to reconfigure the associative links through Hebbian learning, in particular, the ability to form new associations between prefixes
and verbs and the ability to eliminate old associations that were the basis of erroneous generalizations. For example, un- was overgeneralized to tighten because of
clench earlier on; when tested for recovery, only un- and clench continue to be coactivated. Hebbian learning determines that the associative connection between
un- and clench remains strong, but that between un- and tighten weakens and gradually decreases to zero. This simulates the situation in which the child receives no
support in the input about the relationship between un- and tighten. Of course, in
the real learning situation, the strength of the connection between un- and tighten
may also be reduced by a competing form such as loosen that functions to express
the meaning of *untighten, whereby principles of contrast or competition help to
eliminate the erroneous combination (e.g., Clark 1987; MacWhinney 1987).
Note that the restructuring of associative connections often goes hand-inhand with the reorganization of the corresponding maps. For example, as the
associative strengths of clench and tighten to un- varied, the verbs’ representations
also became more distinct. This result is consistent with Pinker’s (1989) criteria
proposal that children recover from generalizations by recognizing fine and subtle
semantic and phonological properties of verbs. In the few cases in which our network did not recover from overgeneralizations, the network was unable to make
the fine semantic distinctions between verbs.
. General discussion and conclusions
In this chapter I attempt to provide a computational perspective on a developmental issue. I started with two types of approaches to the problem of the acquisition of
word meanings. I then gave a connectionist account of the acquisition of semantic structures and morphological systems, presenting modeling results from both
a feed-forward network and a self-organizing network. I have chosen to examine
a classical puzzle that Whorf presented some 70 years ago, the issue of cryptotype in connection with the use and acquisition of the English reversive prefix
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.21 (1085-1142)
In search of meaning 
un-. This problem differs from many of the currently debated topics, for example, the acquisition of the English past tense where the patterns of use largely
depend on phonological constraints and where the focus of debate has been on
the competition between regular rules and exceptions. The un- problem examined
here is essentially semantic, and there seems to be no regular rule that governs
the use of this prefix (hence “intangible”, as Whorf named it). Our connectionist models provide some insights into the understanding of Whorf ’s puzzle, in
particular, the understanding of the emergence of complex semantic structures
in language acquisition and the role of a structured semantic representation in
morphological productivity (e.g., overgeneralization). The simulation results suggest a dynamic learning picture in which the network extracts shared semantic
information, develops representations of the cryptotype, and overgeneralizes morphological devices. Such results allow us to understand the processes underlying
important phenomena such as the U-shaped behavior in language acquisition.
Current debates in cognitive science and psycholinguistics revolve around the
issue of the nature of linguistic representation. Symbolic theories construe linguistic representations in terms of rules in physical symbol systems. A child is
said to have a general rule in her mental representation, “adding -ed to make the
past tense”, at some stage of language acquisition. This kind of description seems
intuitively clear, and the rule offers a powerful mechanism for productivity. Connectionist models provide alternative explanations to this perspective, explanations that place emphasis on the statistical learning processes that lead to rule-like
behaviors. In this chapter I have demonstrated that the acquisition of linguistic
patterns, such as the prefixation of un-, can be construed as emerging out of basic
processing capacities, that is, the processing of the intricate relationships among
phonological and semantic features, lexical items, and morphological devices in a
natural language. This perspective seems to be especially suited for the problem
that we have at hand, the cryptotype problem that was once thought “subtle” and
“intangible” in a symbolic framework. In our view, the reason for the intangibility of the cryptotype is probably that the semantic features that unite different
members of a cryptotype are represented in a complex distributed fashion (e.g.,
feature overlaps across categories; see discussion on page 121), such that they are
not easily subject to traditional symbolic analysis, but are accessible to native intuition (according to Whorf). Native intuitions are clearly implicit representations of
the complex semantic relationships among verbs and morphological markers, and
connectionist networks provide mechanisms to capture these intuitions through
weighted connections, distributed representations, and nonlinear dynamics.
Virtually the same story could be told about many other linguistic domains
in which the problem is primarily semantically motivated. For example, the use of
classifiers is one of the hardest problems for second language learners of Chinese,
as well as a major challenge to linguistic theories (cf. Chao 1968; Lakoff 1987; Li
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.22 (1142-1173)
 Ping Li
& Thompson 1981). Each noun in Chinese has to be preceded by a classifier that
categorizes the object of the noun in terms of its shape, orientation, dimension,
texture, countability, and animacy. The appropriate uses of most classifiers by native speakers are mostly automatic, yet it is difficult for linguists to come up with a
clear description of symbolic rules that govern their uses. We can probably assume
that native speakers have acquired a representation by a connectionist cryptotypelike mechanism in which multiple weighted semantic features in a network jointly
support the use of classifiers. We have recently successfully applied this type of
mechanisms and explanations to the study of the acquisition of inherent verb aspect and tense-aspect morphology in Chinese, English, and Japanese (see Li &
Bowerman 1998; Li 2000, 2003; Li & Shirai 2000). Following this line of research
we have, further developed the DevLex model, a self-organizing neural network
model for the development of the lexicon. We have applied DevLex to the modeling of monolingual and bilingual lexicon acquisition, simulating the formation
of categorical representations, the confusion of competing lexical items in early
speech, and the spurt of vocabulary in early word production (see details in Farkas
& Li 2002; Hernandez, Li, & MacWhinney 2005; Li & Farkas 2002; Li, Farkas, &
MacWhinney 2004).
In sum, we can start to understand some of the most difficult problems in
language acquisition, for example, the acquisition of semantic structures such as
cryptotypes, when we take a computational approach of the type discussed here.
Structured semantic representations can emerge from statistical computations of
the various constraints among lexical items, semantic features, and morphological markers in a high-dimensional space of language use, as they dynamically
evolve and develop. The evolution and development of semantic representations
as acquired by children may be due to simple probabilistic procedures of the
sort embodied in connectionist networks or statistical learning mechanisms for
form-to-form and form-to-meaning mappings.
Acknowledgments
Preparation of this article was supported by grants from the National Science
Foundation (#BCS-9975249; #BCS-0131829), and a Faculty Research Grant from
the University of Richmond. I would like to thank Elizabeth Bates, Melissa Bowerman, and Jeffrey Elman for their discussions on the feed-forward network, Brian
MacWhinney and Risto Miikkulainen for their comments and discussions on the
self-organizing network, and Curt Burgess and Kevin Lund for making available
the hal semantic vectors for our modeling.
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.23 (1173-1264)
In search of meaning
Notes
* Correspondence concerning this article should be addressed to Ping Li, Department of Psychology, University of Richmond, Virginia, VA 23173, U.S.A. E-mail: [email protected]
. For some readers, these two sets of models may simply be viewed as the same kind of models,
given that they both rely on statistical patterns and are in many ways closely related.
. Note that the 3.8 million words represent only a small portion of what the child is exposed
to in the learning environment. According to one estimate, an average three-year-old has been
exposed to 10–30 million words (Hart & Risley 1995).
. Readers who are interested in details of connectionist theory and methods should read
Rumelhart, McClelland, and the PDP Research Group (1986). For non-technical introduction
of connectionism, read Bechtel and Abrahamsen (1991) and Spitzer (1999). For technical discussions, read (progressively more technical) Dayhoff (1990), Fausett (1994), Anderson (1995),
and Hertz, Krogh, and Palmer (1991). For its relevance to developmental theories, read Elman
et al. (1996) and Klahr and MacWhinney (1998). For a comprehensive review of all major fields
in neural networks, consult Arbib (1995).
. Diary notes of my daughter’s speech also include similar uses: “unbuild the snowman” was
used to refer to the detachment of decorative pieces from the snowman, and untape to refer to
the removal of tape from a piece of paper that has been taped (child was 6 years and 9 months).
References
Anderson, James (1995). An introduction to neural networks. Cambridge, MA: MIT Press.
Arbib, Michael (1995). Handbook of brain theory and neural networks. Cambridge, MA: MIT
Press.
Baker, Carl (1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10, 533–581.
Bates, Elizabeth (1984). Bioprograms and the innateness hypothesis: Commentary on Bickerton.
Behavioral and Brian Sciences, 7, 188–190.
Bechtel, William & Adele Abrahamsen (1991). Connectionism and the mind. Cambridge, MA:
Blackwell.
Bensch, Peter A. (1991). Neo-structuralism: A commentary on the correlations between the
work of Zelig Harris and Jeffrey Elman. Center for Research in Language Newsletter, 5(2).
Bowerman, Melissa (1982). Reorganizational processes in lexical and syntactic development.
In E. Wanner & L. Gleitman (Eds.), Language acquisition: The state of the art. Cambridge:
Cambridge University Press.
Bowerman, Melissa (1983). Hidden meanings: the role of covert conceptual structures in
children’s development of language. In D. Rogers & J. Sloboda (Eds.), The acquisition of
symbolic skills. New York: Plenum.
Bowerman, Melissa (1988). The “no negative evidence” problem: How do children avoid
constructing an overly general grammar? In J. Hawkins (Ed.), Explaining language
universals. New York: Basil Blackwell.
Bownds, M. Deric (1999). The biology of mind: Origins and structures of mind, brain, and
consciousness. Bethesda, MD: Fitzgerald Science Press.
Brown, Roger (1973). A first language. Cambridge, MA: Harvard University Press.

JB[v.20020404] Prn:13/02/2006; 13:26

F: HCP1506.tex / p.24 (1264-1392)
Ping Li
Burgess, Curt & Kevin Lund (1997). Modelling parsing constraints with high-dimensional
context space. Language and Cognitive Processes, 12, 1–34.
Chao, Yen-Ren (1968). A grammar of spoken Chinese. Berkeley: University of California Press.
Clark, Eve V. (1987). The principle of contrast: A constraint on language acquisition. In B.
MacWhinney (Ed.), Mechanisms of language acquisition. Hillsdale, NJ: Erlbaum.
Clark, Eve, K. Carpenter, & W. Deutsch (1995). Reference states and reversals: Undoing actions
with verbs. Journal of Child Language, 22, 633–662.
Comrie, Bernard (1976). Aspect: An introduction to the study of verbal aspect and related problems.
Cambridge, England: Cambridge University Press.
Dayhoff, Judith (1990). Neural network architecture: An introduction. New York: Van Nostrand
Reinhold.
Elman, Jeffrey L. (1990). Finding structure in time. Cognitive Science, 14, 179–211.
Elman, Jeffrey L. (1993). Learning and development in neural networks: The importance of
starting small. Cognition, 48, 71–99.
Elman, Jeffrey L. (1995). Language as a dynamic system. In R. Port & T. van Gelder (Eds.), Mind
as motion. Cambridge, MA: MIT Press.
Elman, Jeffrey L., E. Bates, M. Johnson, A. Karmiloff-Smith, D. Parisi, & K. Plunkett (1996).
Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT
Press.
Farkas, I. & Ping Li (2002). Modeling the development of lexicon with a growing self-organizing
map. In H. J. Caulfield et al. (Eds). Proceedings of the Sixth Joint Conference on Information
Science (pp.553–556). Durham, NC: Association for Intelligent Machinery, Inc.
Fausett, Laurene (1994). Fundamentals of neural networks. Englewood Cliffs, NJ: Prentice Hall.
Hart, B. & T. Risley (1995). Meaningful differences in the everyday experiences of young American
children. Baltimore, MD: Paul H. Brookes Publishing Co.
Hebb, Donald (1949). The organization of behavior: A neuropsychological theory. New York, NY:
Wiley.
Hernandez, Arturo, Ping Li, & Brian MacWhinney (2005). The emergence of competing
modules in bilingualism. Trends in Cognitive Sciences, 9, 220–225.
Hertz, John, Anders Krogh, & Richard G. Palmer (1991). Introduction to the theory of neural
computation. Redwood City, CA: Addison-Wesley.
Klahr, David & Brian MacWhinney (1998). Information processing. In W. Damon, D. Kuhn, &
R. Siegler (Eds.), Manual of Child Psychology (Vol. 2). New York: Wiley.
Kohonen, Teuvo (1982). Self-organized formation of topologically correct feature maps.
Biological Cybernetics, 43, 59–69.
Kohonen, Teuvo (1989). Self-organization and associative memory. Heidelberg: Springer-Verlag.
Kohonen, Teuvo (1997). Self-organizing maps. Heidelberg: Springer-Verlag.
Kohonen, Teuvo (2001). The self-organizing maps (3rd ed.). Berlin: Springer.
Kuczaj, Stanley (1977). The acquisition of regular and irregular past tense forms. Journal of
Verbal Learning and Verbal Behavior, 16, 589–600.
Lachter, Joel & Thomas Bever (1988). The relation between linguistic structure and associative
theories of language learning: A constructive critique of some connectionist learning
models. Cognition, 28, 195–247.
Lakoff, George (1987). Women, fire, and dangerous things. Chicago: The University of Chicago
Press.
Landauer, Thomas & Susan Dumais (1997). A solution to Plato’s problem: The latent semantic
analysis theory of acquisition, induction and representation of knowledge. Psychological
Review, 104, 211–240.
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.25 (1392-1507)
In search of meaning
Landauer, Thomas, Peter Foltz, & Darrell Laham (1998). Introduction to Latent Semantic
Analysis. Discourse Processes, 25, 309–336.
Li, Charles & Sandra Thompson (1981). Mandarin Chinese: A functional reference grammar.
Berkeley: University of California Press.
Li, Ping (1993). Cryptotypes, form-meaning mappings, and overgeneralizations. In E. V. Clark
(Ed.), The Proceedings of the 24th Child Language Research Forum, Center for the Study of
Language and Information Publications, Stanford University.
Li, Ping (2003). Language acquisition in a self-organising neural network model. In P. Quinlan
(Ed.), Connectionist models of development: Developmental processes in real and artificial
neural networks. Philadelphia & Brighton: Psychology Press.
Li, Ping & Melissa Bowerman (1998). The acquisition of lexical and grammatical aspect in
Chinese. First Language, 18, 311–350.
Li, Ping, Curt Burgess, & Kevin Lund (2000). The acquisition of word meaning through global
lexical co-occurrences. In E. V. Clark (Ed.), Proceedings of the 30th Child Language Research
Forum. Cambridge, MA: Cambridge University Press.
Li, Ping & Igor Farkas (2002). A self-organizing connectionist model of bilingual processing.
In R. Heredia & J. Altarriba (Eds.), Bilingual sentence processing. North Holland: Elsevier
Science Publisher.
Li, Ping, Igor Farkas, & Brian MacWhinney (2004). Early lexical development in a selforganizing neural network. Neural Networks, 17, 1345–1367.
Li, Ping & Brian MacWhinney (1996). Cryptotype, overgeneralization, and competition: A
connectionist model of the learning of English reversive prefixes. Connection Science, 8,
1–28.
Li, Ping & Brian MacWhinney (2002). PatPho: A phonological pattern generator for neural
networks. Behavior Research Methods, Instruments, and Computers, 34, 408–415.
Li, Ping & Yasuhiro Shirai (2000). The acquisition of lexical and grammatical aspect. Berlin and
New York: Mouton de Gruyter.
Lund, Kevin & Curt Burgess (1996). Producing high-dimensional semantic space from lexical
co-occurrence. Behavior Research Methods, Instruments, & Computers, 28, 203–208.
MacWhinney, Brian (1987). The competition model. In B. MacWhinney (Ed.), Mechanisms of
language acquisition. Hillsdale, NJ: Erlbaum.
MacWhinney, Brian (1989). Competition and lexical categorization. In R. Corrigan, F. Eckman,
& M. Noonan (Eds.), Linguistic categorization. New York: Benjamins.
MacWhinney, Brian (1998). Models of the emergence of language. Annual Review of Psychology,
49, 199–227.
MacWhinney, Brian (2000). The childes project: Tools for analyzing talk. Hillsdale, NJ: Lawrence
Erlbaum.
MacWhinney, Brian (2001). Lexicalist connectionism. In P. Broeder & J. M. Murre (Eds.), Models
of language acquisition: Inductive and deductive approaches. Oxford, UK: Oxford University
Press.
MacWhinney, Brian & Jared Leinbach (1991). Implementations are not conceptualizations:
Revising the verb learning model. Cognition, 40, 121–157.
Maratsos, Michael & Mary Chalkley (1980). The internal language of children’s syntax: The
ontogenesis and representation of syntactic categories. In K. Nelson (Ed.), Children’s
language (Vol. 2). New York: Gardner Press.
Marchand, Hans (1969). The categories and types of present-day English word-formation: a
synchronic-diachronic approach. Münich: C.H. Beck’sche Verlagsbuchhandlung.

JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.26 (1507-1609)
 Ping Li
McClelland, James & David Rumelhart (1981). An interactive activation model of context effects
in letter perception: Part 1. An account of the basic findings. Psychological Review, 88, 375–
402.
Miikkulainen, Risto (1993). Subsymbolic natural language processing: An integrated model of
scripts, lexicon, and memory. Cambridge, MA: MIT Press.
Miikkulainen, Risto (1997). Dyslexic and category-specific aphasic impairments in a self-organizing feature map model of the lexicon. Brain and Language, 59, 334–366.
Pinker, Steven (1989). Learnability and cognition: The acquisition of argument structure.
Cambridge, MA: MIT Press.
Pinker, Steven (1991). Rules of language. Science, 253, 530–535.
Pinker, Steven (1999). Out of the minds of babes. Science, 283, 40–41.
Pinker, Steven & Alan Prince (1988). On language and connectionism: Analysis of a parallel
distributed processing model of language acquisition. Cognition, 28, 73–193.
Plunkett, Kim & Virginia Marchman (1991). U-shaped learning and frequency effects in a multilayer perceptron: Implications for child language acquisition. Cognition, 38, 43–102.
Plunkett, Kim & Virginia Marchman (1993). From rote learning to system building: Acquiring
verb morphology in children and connectionist nets. Cognition, 48, 21–69.
Redington, Martin, Nick Chater, & Steven Finch (1998). Distributional information: A powerful
cue for acquiring syntactic categories. Cognitive Science, 22, 425–470.
Rumelhart, David & James McClelland (1986). On learning the past tenses of English verbs.
In James L. McClelland, David E. Rumelhart, & the PDP Research Group (Eds.), Parallel
distributed processing: Explorations in the microstructures of cognition (Vol. 1).Cambridge,
MA: MIT Press.
Rumelhart, David, Geoffrey Hinton, & Ronald Williams (1986). Learning internal
representations by error propagation. In James McClelland, David Rumelhart, & the PDP
Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of
cognition (Vol. 1). The MIT Press.
Saffran, Jenny, Richard Aslin, & Elissa Newport (1996). Statistical learning by 8-month-old
infants. Science, 274, 1926–1928.
Saussure, Ferdinand de (1916). Cours de linguistique générale. Paris: Payot. (English translation:
A course in general linguistics. New York: Philosophical Library; Chinese translation: Putong
Yuyanxue Gangyao. Beijing: Commercial Press).
Seidenberg, Mark (1997). Language acquisition and use: Learning and applying probabilistic
constraints. Science, 275, 1599–1603.
Spitzer, Manfred (1999). The mind within the net: Models of learning, thinking, and acting. The
MIT Press.
Vendler, Zeno (1967). Linguistics in philosophy. Ithaca: Cornell University Press.
Whorf, Benjamin L. (1956). Thinking in primitive communities. In J. B. Carroll (Ed.), Language,
thought, and reality. The MIT Press.
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.27 (1609-1641)
In search of meaning
Appendix
arrange DIS
integrate DIS
make ZERO
write ZERO
roll UN
load UN
place DIS
grow ZERO
wind UN
affiliate DIS
work ZERO
believe DIS
move ZERO
possess DIS
like DIS
settle UN
start ZERO
sit ZERO
turn ZERO
array DIS
stop ZERO
real UN
put ZERO
arm UN
aggregate DIS
engage DIS
Figure 3. Part A.

JB[v.20020404] Prn:13/02/2006; 13:26
 Ping Li
screw UN
bind UN
entangle DIS
lock UN
braid UN
buckle UN
fasten UN
clasp UN
latch UN
tie UN
clog UN
fold UN
bolt UN
strapUN
bandage UN
wrap UN
chain UN
hitch UN
close DIS
lace UN
tangle UN
dress UN
hinge UN
zip UN
curl UN
wind UN
veil UN
hook UN
cork UN
mask UN
sheathe UN
coil UN
twist UN
crumple UN
ravel UN
scramble UN
cover UN
plug UN
snap UN
button UN
leash UN
Figure 3. Part B.
F: HCP1506.tex / p.28 (1641-1641)
JB[v.20020404] Prn:13/02/2006; 13:26
F: HCP1506.tex / p.29 (1641-1641)
In search of meaning 
settle UN
do UN
make ZERO
write ZERO
load UN
rol UN
crumple UN
ravel UN
screw UN
reel UN
braid UN
wind UN
twist UN
fold UN
tie UN
coil UN
curl UN
mask UN
scramble UN
Figure 3. Part C.
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.1 (47-110)
chapter 
Grammar and language production
Where do function words come from?*
Joost Schilperoord and Arie Verhagen
Tilburg University / Leiden University
Most psycholinguistic models of language production start from a strict division
between computation and memorization. Individual content words are retrieved
from the lexicon, and assembled into larger structures by means of grammatical
computation. Because function words are considered grammatical elements,
their insertion into these structures results from computation, rather than
retrieval.
We argue that this view may be incorrect, or at least incomplete. Our case
rests on an analysis of the distribution of production pauses relative to function
words in a corpus of production data. We demonstrate that the data are better
accounted for when we assume that the cognitive status of many of the linguistic
structures people produce is that of schemata, with function words serving to
retrieve them from memory.
Keywords: language production, storage vs. computation, function words,
grammatical schemata
.
Introduction
In this paper, we want to bring evidence from linguistic processing, in particular
from language production, to bear on the issue of the proper characterization of
linguistic knowledge – i.e., on views about the organization of the mental lexicon
and mental grammar. The specific topic we will focus on is the question: in exactly
what way are grammatical words, or ‘function words’, selected in the process of
spontaneous language production, and what this implies for theories of linguistic knowledge. Both the theoretical issue and the evidence we present are actually
quite straightforward, which in our opinion makes the conclusions all the more inevitable, but to our knowledge this particular connection between theory and data
has so far escaped the attention of linguists and psycholinguists alike. Cognitive
linguistics has so far not really developed any serious attempt to relate theoretical
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.2 (110-161)
 Joost Schilperoord and Arie Verhagen
ideas to processes of production, but it should – given the cognitive commitment –
and we show that it actually has considerable insights to offer, especially concerning the question whether grammar and lexicon function as distinct ‘modules’ in
production.
. The roles of lexicon and grammar in a theory of language production
Any approach to what a process of language production looks like naturally assumes that such a process starts with a communicative intention – i.e., the intention to convey the content of a message rather than the intention to produce some
sounds, marks on paper, or whatever (cf. Levelt 1989: 108–110). There is also a tradition, especially among linguists but also embraced by many psycholinguists, to
make a distinction between so-called content words and function words. Typical
examples of the former are nouns and verbs, while typical examples of the latter
comprise articles, conjunctions, prepositions, and the like. As the labels ‘content’
and ‘function words’ suggest, the former are supposed to carry the (conceptual)
content of what is said, while the latter are indicators of some (grammatical) function of the elements that they are attached to – i.e., (at least in their most pure
form) markers of structure rather than content. To give an example, in a phrase
such as, the hunt for the escaped prisoners, the element for does not in itself contribute a particular meaning, but serves to mark the phrase the escaped prisoners
as the object of the predicate hunt; similarly, the definite articles serve to mark
the status of the syntactic category (‘noun phrase’) of the phrases they belong to,
rather than to convey some independent aspect of content.
Now this combination of ideas immediately gives rise to a question. If the language production process starts from conceptual content, and if function words do
not carry semantic content themselves – as are indeed the main assumptions underlying many current theories of language production – then it cannot be content
that triggers the production of function words; so what is it that gives rise to the
production of function words? The natural answer that immediately suggests itself
is, of course: the structural position for a specific function word becomes available
at some point in the production process, and this is what triggers its production.
This in turn leads to a new question: how does this structural position become
available? Again, an answer seems to be readily available: the relevant structure
can be produced through the application of certain grammatical rules invoked by
the elements that do have an immediate connection to conceptual content: the
content words.
It is precisely this view that has been implemented in an influential model
of language production: one that may safely be said to represent the received
view of the role of grammatical rules in language production (cf. Carroll
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.3 (161-254)
Grammar and language production
1999: 208/209) – i.e., the model of Incremental Procedural Grammar (IPG), proposed by Kempen and Hoenkamp (1987), and adopted by Levelt (1989, 1999).
Informally, the model assumes the following major subsystems in the overall language production process:
(1) Conceptualization → Formulation → Articulation.
In principle, the later subsystems (Formulation, Articulation) are dependent on
the ones preceding them. However, the model allows each of these processes, and
possibly ‘smaller’ subprocesses, to operate in parallel to a large extent. That is,
while the routines controlling the articulatory organs are doing their work for one
piece of an utterance, the routines for formulation may be working on the following piece, and the conceptualizer is in fact already planning what to say next.
The part that we are interested in here (as most researchers of language production are) is the Formulator. This subsystem converts conceptual structures into
linguistic structures. The input to the Formulator is formed by a thought from the
Conceptualizer; we do not have to be concerned here with the precise format of
this input, and we will simply assume some system for representing propositions.
This thought contains concepts, and it is these that set the formulation process in
IPG in motion. This consists of the steps listed in Table 1 below.
First of all, we want to stress the importance of step 4 in the model: the inherent limitations of working memory (Baddeley 1990). It is an essential factor in the
explanation of a very general feature of normal language production, viz. the fact
that it is incremental (hence the name of the model) in that it proceeds ‘in spurts’,
with pauses reflecting the workings of the production system in between. If it were
not for the limitations of working memory, language production would not proceed in spurts at all – i.e., it would not be incremental, as it actually is. After all,
if speakers would have unlimited processing capacity at their disposal, then utterances – or even entire texts for that matter – could be prepared in advance, and the
language production process would be continuous, guided by an all encompassing
production plan (Kempen & Hoenkamp 1987: 203). However, since both the empirical phenomenon of pausing and the theoretical assumption of limited space in
working memory are quite robust, we will also adopt this assumption; in fact, the
consequent tendency of releasing working memory as soon as possible will play an
important role in our argument for an alternative analysis.
Secondly, the model is maximally ‘structure building’ and ‘lexically driven’,
and these two features are strongly related. The model is structure building in the
sense that it assumes that all of the structure of grammatical strings is computed,
built ‘on the fly’, and none of it is directly retrieved from (long term) memory.
This is also directly related to the next point, the ‘lexical hypothesis’: there is never
a direct link between the conceptual structure and a rule of grammar (and hence
a piece of grammatical structure), since a call to a grammatical rule (be it a syn-

JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.4 (254-297)
 Joost Schilperoord and Arie Verhagen
Table 1. Overview of formulation in IPG
1
2
3
4
The mental lexicon is accessed, with the concept as address, to retrieve the linguistic
element expressing it. The routines performing this task (mapping non-linguistic concepts
to linguistic units) are called lexicalization procedures. Retrieval of the lexical element
normally activates the entire entry, not just the phonological shape of the word but also
information about its syntactic category, and its sub-categorization frame, and perhaps
other information.
Given the lexical element, especially the information about its syntactic category, the
appropriate phrase structures are built by means of ‘syntactic procedures’ (if the element
retrieved from the lexicon is a noun, a noun phrase is built according to the grammatical
rules for noun phrases in the language, etc.).
The output of these syntactic procedures (i.e., syntactic phrase markers / ‘tree structures’),
contains functional positions; these are filled in with the appropriate bound morphemes,
inflections, auxiliaries, determiners, etcetera, by means of ‘functorization procedures’.
Results of step (3) are put out to the Articulation routines as soon as possible in order to
release working memory – i.e., the limited space is made available for another formulation
process as quickly as possible.
tactic procedure or a functorization procedure) is mediated by at least one lexical
item; only the latter are directly linked to the conceptual structure in the process
of language production. As Levelt put it:
The lexical hypothesis entails, in particular, that nothing in the speaker’s message will by itself trigger a particular syntactic form, such as a passive or a dative
construction. There will always be mediating lexical items, triggered by the message, which by their grammatical properties and their order of activation cause
the Grammatical Encoder to generate a particular syntactic structure.
(Levelt 1989: 181)
These features of the model may be said to express a purely “formal” view of grammar; it specifies structural properties of linguistic utterances without considering
them meaningful.
Let us illustrate these characteristics of IPG by means of some simple examples. How does the production of a simple noun phrase such as the circumstance
proceed? By assumption, the conceptual structure contains a specification of the
concept circumstance and the first step in the formulation process consists of
matching this non-linguistic concept with an element in the mental lexicon. The
specification of the information found there (meaning, phonological shape, syntactic category, possibly other relevant information) is given partly in:
(2) [circumstance, circumstance, N,. . . ]
Subsequently, the information that the element expressing the concept is a noun,
triggers the syntactic procedure for building a noun phrase:1
(3) a.
N2 → det, N1
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.5 (297-344)
Grammar and language production 
The rule produces a structure consisting of two elements, one of which (N1 ) is
itself a trigger for a syntactic subprocedure, (3b):
(3) b. N1 → . . . N . . .
The output of this rule does not contain triggers for calling further syntactic procedures, and the lexical node (N) provides a point to attach the lexical item to. At
this point – i.e., after lexicalization and syntactic specification but before functorization (the end of stage 2, in Table 1), working memory contains the following
partially specified structure:
N2
(4)
det
N1
N
circumstance
This structure contains a node for a functional element, in this case a determiner
position, which functions as a trigger for a functorization procedure (stage 3, in
Table 1). This procedure inspects the conceptual structure for the specification of
the ‘accessibility’ (cf. Ariel 1988) or some equivalent notion of the concept involved, in order to decide between inserting either the, a or ø; supposing that
the value found is +accessible, the element the will be inserted. As Kempen and
Hoenkamp (1987: 218) argue, the insertion of function words is “chiefly motivated on syntactic grounds, so they cannot be supposed to originate simply from
lexicalization”. In this case, for example, it may be supposed that the realization
of a determiner, such as the, is dependent on the presence of a Noun Phrase node
in the structure being produced, and not only on the feature +accessible in the
conceptual structure. An accessible concept expressed by an adjective or a verb
should not be marked by the, so the determiner cannot be seen as arising directly
from the conceptual structure by lexicalization of +accessible, in the same way as
circumstance originates from lexicalization of circumstance. The consequence of
the strict separation of functorization from lexicalization and syntactic procedures
(in two distinct production stages) is thus that structures of the type (4), with all
of the content words and none of the function words specified, have to be taken
as representing a particular and necessary stage in the production of a linguistic
utterance.
To take a slightly more complicated example, consider the production of the
phrase, the start of the program, according to this model. The relevant portion of
the underlying conceptual structure will look like (5):
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.6 (344-397)
 Joost Schilperoord and Arie Verhagen
(5) [START (PROGRAM)]
After lexicalization and syntactic specification, the intermediate representation of
the expression being produced, will look something like:2
N2
(6)
det
N1
N
start
Subject: N2
det
N1
N
program
It is on the basis of this representation, containing all content words and a complete specification of the phrase structure, that the production process enters stage
3, in which functorization results in the addition of the, of, and again the to
the representation, which can then be passed on to the articulation procedures
(stage 4).
The reason why we presented the workings of IPG in this respect in some detail, is that this view on the different status of content words and function words in
production gives rise to a very specific prediction about the temporal structure of
the production of utterances in languages like Dutch and English (for which IPG
was designed), in which most function words precede the lexical heads of phrases.
As we explained earlier, an empirical argument for the incremental nature of production consists in the occurrence of pauses. However, the model not only predicts
that pauses occur at all, but also where they should normally occur. In Dutch, English, and similar languages, pauses are not to be expected between a function word
and the related content word, but only at the phrase boundaries. The reason is that
because of the assumed order of stages 2 and 3, whenever a function word (output
of stage 3) is present, the associated lexical head (output of stage 2) is necessarily present as well. If it were not, the relevant functorization procedures could not
have been called, so when the output of stage 3 is ready to be articulated, all related
material that was produced in stage 2 is equally available.
IPG does not seem to be committed to a particular prediction in this respect
for functional elements that follow content words or for languages in which most
function words occur to the right of a lexical head, since in such cases the linear
order to be produced is parallel to the assumed order of formulation processes
(stage 2 for heads, stage 3 for function words). The claims in this paper concern
only (languages with) function words preceding lexical heads. In this situation the
assumption about the limited capacity of working memory comes into play: since
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.7 (397-462)
Grammar and language production 
information in working memory is released as soon as possible (cf. stage 4), and
since lexical heads are available in working memory when function words are, it
follows that when function words are produced and thus released from working
memory the related lexical head should be uttered as well. Hence normally no
pauses are to be expected after a function word, whereas they are expected to be
quite normal before a function word.
This is a clear and straightforward empirical prediction, related directly to a
central assumption of the IPG model as incorporating a specific view on the relation between grammatical structure and the lexicon, viz. one that is maximally
structure building, with no direct link whatsoever between the grammatical structure and the conceptual content of utterances (cf. above). It is also a prediction
that we believe to be highly problematic. In ordinary language production, as we
will see, pauses immediately following function words are so frequent that they
must be taken as a quite normal phenomenon, not an exception. The next section
is devoted to a demonstration of this claim. Following this demonstration, we will
try to sketch an alternative view, incorporating the idea that function words mark
grammatical constructions, or schema’s, as structured symbolic units that may be
retrieved from long term memory, just as so-called content words are.
. Pause patterns relative to function words
Some quantitative data
The previous section discussed what may be considered the ‘received’ view in
psycholinguistics on the interaction between processing and grammar. Function
words are essentially markers of structure. They enter the production process by
means of functorization procedures which are activated as soon as the lexical head
of the phrase marker is activated. Producing determiners, for example, depends
on features of the activated lemma (its syntactic category, for instance), while the
functorization procedure checks the conceptual structure for the presence of features in order to decide whether a definite or an indefinite article is to be produced.
Therefore, function words have no independently represented correlate at the level
of conceptual structure.
The empirical phenomenon to be analysed in this section consists of pause
patterns relative to function words. We will show that during the (oral) production of routine business letters, text producers tend to pause predominantly after
function words.
Data were collected by audio taping six Dutch lawyers in their offices while
they were dictating routine daily correspondence, using a dictation machine. The
data were naturalistic, i.e. all letters were actually sent to business associates or
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.8 (462-505)
 Joost Schilperoord and Arie Verhagen
clients. The statistical data to be reported in this section are based on 120 of
such letters. Together, these letters contained about 23,000 words, and over 7,800
pauses. Dictating can be taken to be a way of producing written texts (Schilperoord
1996: 19–23; see also Schilperoord 2001). All tapes were transcribed verbatim, including all pauses, errors, restarts and the like. Dictation was chosen because it
makes the job of detecting, locating and measuring pauses fairly easy. Moreover,
because of the monologic situation we may assume that pauses will not occur for
interactional reasons, so that they may in general be considered to reflect cognitive processes. We only considered ‘silent’ pauses, not so-called filled ones (e.g.,
uh. . . ) to increase the validity of this assumption further, as there are suggestions
in the literature that different sorts of filled pauses may have specific functions (cf.
Clark 1996). It should be noted, though, that our conclusions do not depend in
any way on how specific these pause patterns are for dictation.3 We tested general
predictions about the relationship between language production and grammatical structure, using dictation as material, and using pauses between increments of
production as evidence for the status of the segments involved.
There are two possible causes for a pause or hesitation to pop up in the normal
stream of speech:4 it may occur, firstly, because the language producer has some
difficulty, or at least needs some time, in working out the conceptual specifications
of his message, or secondly, because matching a concept with a lexical item leads
to some delay. In both cases, however, we may expect pauses – allegedly reflecting these cognitive activities – to occur before a function word, and not after it. In
other words, lexically driven models predict pauses to respect the phrasal structure
of the message. Another way of putting this would be that by their very nature lexically driven models of language production deny that there might be any cognitive
reason for a pause to occur after a function word.
With this in mind, let us now have a look at the following transcript, taken
from a dictation session of a Dutch lawyer, producing a routine judicial letter – see
example (7).
(7) (. . . )
→ 1. deel ik u mede dat de /
inform I you that the
→ 2. door /
by
→ 3. mij op de /
me at the
→ 4. zitting van /
session of
→ 5. DATUM bij /
DATE at
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.9 (505-560)
Grammar and language production 
6. de NAAM overhandigde /
the NAME delivered
7. pleitnotities /
oral petitions
(. . . )
“. . . I inform you that the oral petitions delivered at NAME at the session
of DATE. . . ”
All numbered lines represent the increments by which this stretch of discourse
came about. That is, slashes after each line indicate a pause of at least .3 seconds.5 As can be seen in lines 1 to 5, pauses occur right after a function word –
determiners in 1 and 3, and prepositions in 2, 4 and 5.
Obviously, the pattern shown in (7) is not what is expected on account of lexically driven models of speech production. Phrasal boundaries are often violated
indicating that at these locations there is ‘structure’ with no apparent content (a
situation that is ruled out by lexically driven models). If indeed these pauses reflect
conceptualization or lexicalization processes, then where do the function words
originate from? For example, if the presence of the determiner in line 3 depends
on the presence of the lexical head zitting (“session”), as lexically driven models
have it, then how can de (“the”) have been produced already whereas zitting is still
underway, or may not even have been retrieved from memory? Phrased differently,
how can we account for the fact that an NP is already ‘there’, so to speak, whereas
its lexical head is not?
To anticipate the conclusion, it is our conviction that data such as these force
us to seriously consider the possibility, first, that functional elements such as articles might have an independent correlate at the level of conceptual structure,
and second, that structured phrases, such as noun phrases, may be activated during language production as relatively underspecified templates, or ‘constructions’.
That is, ‘bare’ phrasal units may very well result from retrieval processes, with
a complete structural unit being accessed holistically, in a ‘Gestalt’-like manner,
rather then from computational processes that build them out of elementary parts.
However, in order to substantiate such a (far reaching) claim, we have to show
that we are in fact dealing with a regular pattern in language production. That
is, we have to show that what we see in (7) is not exceptional. To this end, we
will provide information concerning the proportions of pauses relative to function
words, such as articles and conjunctions. In brief, the question is: are we dealing
with a phenomenon that occurs frequently enough to be theoretically interesting?
For the proportional analysis, we used the data-base described above. Each
transition between every pair of words in the 120 texts in the corpus was scored for
the syntactic category of the word preceding the transition and the syntactic category of the word following the transition. A gross distinction was made between
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.10 (560-660)
 Joost Schilperoord and Arie Verhagen
function words, such as articles, prepositions, and conjunctions on the one hand,
and content words such as nouns, adjectives, adverbs and verbs, on the other. In
addition, transitions between words were scored for the presence or absence of a
pause. This information allows us to analyse pause occurrences in strings such as
those in (8). Each slash marks a potential pause location:
(8) /a/garden/
/an/English/garden/
/in/an/English/garden/
/in/the/garden/of/Monet/
/that/I/visited/an/English/garden/
The following set of patterns was selected for statistical analysis:
(9) 1. det – (adjective) – N
2. prep – NP
3. conj – subordinate clause
The category “det” in (9) included the definite (de and het) and indefinite (een) articles, not demonstratives occupying a pre-nominal position. The category “prep”
includes all prepositions, and pronouns were included as members of “NP”. Finally, “conj” consisted of the words dat and om (i.e., the elements that can introduce complement clauses, such as finite and infinite clauses, respectively), and that
are therefore often considered purely grammatical elements, devoid of meaning.
These strings allow for the following set of possible locations for pauses:
(10) 1. a. pause – det and/or: b. det – pause – (adj.) – N
2. a. pause – prep and/or: b. prep – pause – NP
3. a. pause – conj and/or: b. conj – pause – clause
We first estimated pause proportions for each possible location with regard to
these three kinds of function words. Then, in order to put these proportions
into perspective, comparisons were made between the proportions of pauses preceding and those following function words (the a- and b-columns in (10)). In
order to produce interpretable comparisons for the first two categories (det – N,
prep – NP), all sentence-initial occurrences of these phrasal types were omitted as
other analyses had revealed pauses occur at almost every sentence (or paragraph)
transition (Schilperoord 1996). As such pauses presumably serve widely different
cognitive purposes, including them in the data set would lead to an overestimation of pause proportions before function words. Both proportionate data and
comparisons are summarized in Table 2.
The data show that 53% of all determiners produced were followed by a pause,
whereas 39% were preceded by a pause; similarly, 25% of the prepositions and
59% of conjunctions were followed by pauses. What is particularly noteworthy is
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.11 (660-696)
Grammar and language production 
Table 2. Proportions of pauses and comparisons for three opposite pairs of pause locations
(P = pause occurrence)
string types
proportions
χ2
det – P
P – det
prep – P
P – prep
conj – P
P – conj
.53
.39
.25
.30
.59
.47
93.91*
14.24*
25.89*
* = significant with P ≤ p .05
that the proportions of pauses after function words is quite high, given the prediction that no pauses should occur at these locations. In the case of determiners
and conjunctions, these proportions even exceed the ones of pauses preceding
these functional categories (but the situation is the reverse for prepositions).6 A
chi-square analysis proved these differences to be significant. So, to conclude this
section, pause occurrences after function words are a highly regular phenomenon;
in fact, the post function word location even seems to be the favourite one in the
case of determiners and conjunctions.
Constructions: The case of determiners
The empirical evidence presented in the previous section indicates that pauses predominantly occur after ‘meaningless’ function words such as determiners and conjunctions. Given the processing assumptions discussed in the first section, these
data are difficult to account for by lexically driven models of speech production.
This section will (briefly) introduce an alternative view on production, based on
the notion of constructions (cf. Langacker 1990; Goldberg 1995; Jackendoff 1995,
1997, 2002; Kay & Fillmore 1999). The basic tenet of our proposal is that phrasal
categories are involved in the process of production as underspecified constructions or schemas, which, being stored in long term memory, are on a par with
words – i.e., they are all contained in the mental lexicon. Indeed, the relevant distinction between lexically driven models of speech production and a construction
based view primarily concerns the relation between ‘lexicon’ and ‘grammar’, and
the interplay between what is stored knowledge and what is computed ‘on the fly’.
In order to avoid redundancy, lexically driven models tend to identify the grammatical component of the production system as computational, and to reduce it
to the smallest possible set of rules required to account for the facts of language.
Consequently, if a certain grammatical structure (say, that of a noun phrase) can
be computed by some set of rules, then noun phrase templates cannot be part of
the declarative mental lexicon.
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.12 (696-747)
 Joost Schilperoord and Arie Verhagen
Construction based models, on the other hand, allow for redundancy: the
‘rules’ themselves are viewed as ‘constructional idioms’ (Jackendoff 1995: 155;
Jackendoff 2002: Chapter 6) that may vary as to their degree of phonological specifications. This means that the outcomes of a certain set of rules coexist freely together with the rules. Redundancy is built in, so to speak, rather than an exception.
With regard to noun phrases, the maximally underspecified or basic construction
for languages such as English or Dutch is (11):
(11) NP: [det + . . . + N]
This construction has a number of elaborations, inheriting the features of the basic
constructions, as shown in (12).
(12) NP: [de/het/een + . . . + N]
A construction such as (12) thus consists of a fixed element (the determiner), a
‘slot’ for the obligatory element (usually the lexical head) and (in some cases) some
optional slots (indicated by dots). In addition, some expressions that are licensed
by (11) may be fully specified, constituting a ‘fixed’ or ‘prefabricated’ construction
(cf. Erman & Warren 2000), as in (13).
(13) een kop koffie (“a cup of coffee”)
het toilet (“the bathroom”)7
The essential property of basic schemas/constructions and their elaborations is
that they are lexical items, stored in long term memory, despite the fact that they
can be computed by phrase structure rules.
Now, how do such constructions allow us to account for the kind of pause patterns observed? Our discussion of this issue will first be confined to noun phrase
constructions – later on, we will discuss prepositions and conjunctions. First, look
at the transcript example in (14).
(14) (. . . )
1. de /
the
2. omstandigheid /
circumstance
(. . . )
Let us suppose that the pause after the determiner de indeed signals some cognitive
activity, aimed either at specifying the concept to be expressed, or at retrieving a lexical item that serves to express an already activated conceptual structure
[circumstance]. Since according to lexically driven models, the production of the
determiner is ruled out in both situations, we have to look for ways in which the
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.13 (747-813)
Grammar and language production
determiner nevertheless can be produced independently from its ultimate lexical
head (the noun omstandigheid). How can this be accomplished?
Our proposal is that in the course of producing the noun phrase de omstandigheid, in fact two independent structures are activated: one ‘schematic’ construction [de + . . . + N], and one lexical element (omstandigheid, “circumstance”),
and for some reason a more or less brief delay may occur between the activation
of the two elements. Possible reasons for a delay of activation can be taken to be
of the standard type (see also the discussion in Note 4); they may involve conceptualization (deciding on exactly what concept is to be expressed) or lexicalization
(retrieving a lemma from the mental lexicon).
In a lexically driven model, there is just one possible alternative cause for a
pause to occur in such a location. This has to do with the fact that a lexical entry
is assumed to be split up into two parts: lemma and form information, where the
former is used in the grammatical encoding stage of the Formulator, and the latter
(specifying the word’s morphology and phonology, and ‘pointed’ to by the lemma)
is in the phonological encoding phase. This makes it possible in principle that the
following situation arises: the lemma (e.g., [circumstance, circumstance, N,. . . ])
is retrieved, followed by grammatical processing and functorization leading to the
utterance of an article (e.g. definite the), and then something goes wrong with
retrieving the word’s phonological shape pointed to by the lemma. The resulting
situation is one in which the speaker knows exactly what the word is he wants to
say, with all kinds of relevant properties, except its full phonological shape; this is
usually referred to as the tip-of-the-tongue phenomenon.
Thus, theoretically there is a way in an IPG-type model, to account for pauses
following a function word, while maintaining that the model, including the production of function words, is lexically driven. The question is, however, to what
extent this can be considered a serious alternative to the hypothesis that such
pauses reflect genuine cognitive processes (conceptualization or lexical retrieval).
First of all, as Levelt points out, little is known whether or not lexical retrieval is a
one stage or two stage process – i.e., whether in general an entry’s lemma and form
properties are retrieved simultaneously or successively: “The distinction should
not be overstated. In particular, we should not conclude that a lexical entry cannot
be retrieved as whole [. . . ]” (Levelt 1989: 188). Secondly, as we all know from experience, the tip-of-the-tongue phenomenon is quite rare. If it were to account for
the amount of observed pauses after determiners, we would be forced to assume
this phenomenon to have occurred in over 50% of all noun phrases produced. This
seems highly implausible. The safest thing to assume is therefore that the proportions presented in Table 2 are in fact marginally over-estimated. The large majority
of cases, however, must have been produced by ordinary cognitive processes. We
therefore feel justified in taking these data as strong support for our construction
based proposal.

JB[v.20020404] Prn:9/02/2006; 12:00

F: HCP1507.tex / p.14 (813-877)
Joost Schilperoord and Arie Verhagen
This account for (most of) the observed pause patterns immediately raises the
question: What conceptual specification is required in order to activate the construction [de + . . . + N] (cf. Section 1)? In other words, if indeed [de + . . . + N] is a
lexical item, what does it ‘mean’? Actually, we think an answer is readily available.
From a conceptual point of view, a determiner such as de (“the”) indicates that an
instance of the category named by the noun with which it combines is part of the
body of knowledge that is shared in the communicative situation. The communicative situation is called the ‘ground’ of a linguistic usage event, and determiners
(among other elements) in English, Dutch, and other languages are said to have
the function of specifying if and how concepts are instantiated in the ground – i.e.,
of ‘grounding’ the concepts that they are applied to. In Langacker’s words:
In the case of (. . . ) nominals, grounding is effected by articles, demonstratives and
certain quantifiers. Whereas a simple noun (. . . ) merely names a ‘type’ of thing, a
full nominal (. . . ) designates an ‘instance’ of that type (. . . ). (Langacker 1990: 321)
Langacker goes on by stating that “only ‘grammaticalized’ (as opposed to ‘lexical’)
elements can serve as true grounding predications” (1990: 322). Since speakers
usually talk about ‘instances’ of things, rather then ‘types’, grounding is a necessary element of any speech act. So, to answer the question “What does a determiner
mean?” we may say that it “means” [grounded entity], a conceptual structure
that, as such, is associated with the construction [de + . . . + N] in Dutch. Therefore, the notion of grounding constitutes the necessary conceptual motivation for
determiners to pop up in the stream of language being produced. In addition however, it accounts for their appearance independently from the conceptual ‘type’
designated by the noun, and it is this property that we need in order to account for
pauses occurring after function words. If the two lexical elements can be activated
independently, rather than the activation of one being dependent on the activation
of another, then nothing prohibits a ‘cognitive’ pause intervening between them.
If Langacker’s grounding theory is essentially adequate, the meaning of this
schema (or construction) and its activation can be usefully phrased in terms of
Jackendoff ’s triple-theory of lexical items.8 The determiner represents a grounding
function, taking an entity type as its argument, together constituting a [grounded
entity]. This conceptual function can be represented as in (15):
(15) [entity ground [entity type ( ) ]]
Let us further assume that the Conceptual Structure is associated with Syntactic
and Phonological Structures (CS, SS and PS, respectively) in the full lexical entry,
as represented in (16). The associations are indicated by subscripts a, b, and c:
(16) CS: [entity grounda [entity type ( ) ]b ]c
SS: [NP [det de ]a Nb ]c
PS: [[CL {de} ]a [WORD { } ]b ]c
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.15 (877-944)
Grammar and language production
According to this conception, a noun phrase, such as “the circumstance”, comes
about as a result of a process of ‘unerging’9 of independently retrieved lexical items:
(16), and the one shown in (17).
(17) CS: [entity circumstance]x
SS: [N ]x
PS: [WORD {circumstance}]x
The information in (17) tells us what type of entity is grounded; where it is to be
inserted within the noun construction; and how it is to be pronounced.
To summarize our proposal, the production process underlying noun phrases
such as the one in (14) consists of retrieving two independent structures: (16)
and (17), respectively. As the retrieval of these structures may well be separated in
time, this allows a pause to occur after a determiner as a result of either a process
of working out the conceptual specifications of the entity, or of a lexical search.
This assumption of two independently retrieved structures, as an assumption
about language processing, is directly tied up with assumptions about the structure of a person’s linguistic knowledge. First, the cognitive status of determiners
is not inherently different from that of lexical nouns, whereas IPG considers the
former as output of a computational process and only the second as retrieved from
memory. Second, there can be immediate connections between aspects of conceptual structure and determiners; the latter are essentially meaningful. In brief: as far
as determiners and nouns are concerned, there is no essential difference between
grammar and lexicon, and structure may be retrieved from memory on the basis
of conceptual content.10 This is not to say that this is the only possible route for
the production of noun phrases; we would rather see this as an entirely empirical
issue, not precluding the possibility that similar products (linguistic utterances)
may in actuality result from multiple and variably used cognitive resources. In the
present context, however, the crucial point is that the idea of grammatical schema’s
(partly specified by determiners) finds strong support in processing phenomena,
viz. pauses in language production.
. Infinitival conjunctions and prepositions
We will now turn to two other types of function words, in order to see whether
the ideas put forth in the previous section may be generalized. This section is split
up into two parts: the production of a special type of conjunction in Dutch, the
infinitival conjunction om, after which attention will be paid to the functional
category of prepositions.

JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.16 (944-980)
 Joost Schilperoord and Arie Verhagen
Om-clauses
The sentences in (18) both contain a non-finite clause introduced by the (infinitival) conjunction om (“(for) to”, “in order to”).
(18) a.
Ik moge u dan ook thans verzoeken om deze notas
met de
I may you therefore now request for these invoices with the
grootste spoed aan de dienst over te leggen.
utmost speed to the service over to put
“I therefore want to ask you now to hand these invoices over to the
department with the utmost speed.”
b. Misschien is het goed wanneer u een dezer
dagen
Maybe
is it good when
you one these-gen days
telefonisch
contact met mij opneemt om hiervover nader te
telephone-adj contact with me takes-up for here-about further to
overleggen
consult
“It may be a good idea that you call me one of these days in order to
discuss the matter further.”
The examples given here illustrate the most common uses of om-clauses in Dutch.
(a) contains a complement clause om . . . over te leggen, while (b) contains an adjunct om .. te overleggen. The ‘canonical’ grammatical construction of om-clauses
can be captured as follows:
(19) [om + . . . + te + Vinf ]
Conceptually speaking, however, there are some important differences between
the two types of clauses. As we will show later in this section, there is a generalization to be made concerning the function of om itself in these two types (om is
not homophonous), as well as the way in which they relate to their matrix clauses
which is also quite different, and relevant to processing. In the case of a complement om-clause, the contents of the clause specify some aspects of the matrix
phrase it is attached to, usually a mental space predicate (noun or verb of cognition or communication, e.g. believe to, request to, promise to), sometimes a causal
predicate (e.g. cause to, attempt to). Adjuncts, on the other hand, are connected to
the main clause by means of an adverbial relationship which is not itself predicated
in the main clause, e.g. means-ends. Thus in (a), the om-clause specifies the object of the verb verzoeken (i.e., it gives the content of the request), whereas in (b),
the relation of the om-clause to the main clause is interpreted such that its contents
(“discussing the matter further”) constitutes the goal of “getting in touch with me”,
which is expressed by the matrix clause. Thus the relationship between a non-finite
complement clause and its matrix is that of part-to-whole (conceptually, as well as
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.17 (980-1056)
Grammar and language production
syntactically) – i.e., a matter of constituency: the relationship between a matrix
and an adjunct is that of two parts constituting a whole: it is a coherence relation
creating a discourse unit.
Structural differences between both types of om-clauses testify to this conceptual difference. First, in the case of complements, om may be omitted (under
certain conditions, cf. Van Haaften 1991). It is, in other words, optional for many
of these clauses; but in adjuncts, such as in (b), om may never be omitted. Another
difference concerns the order of clauses. In the case of adjuncts, the om-clause may
be put in front position, a possibility that is ruled out for om-complements.
With this in mind, we may say that the construction is associated with two
conceptualizations, as indicated in brackets, {. . . } revealing the optional nature of
the enclosed element (either om or the entire clause); ‘→’ indicates a constituency
relation and ‘↔’ indicates a coherence relation).11
(20) a.
CS:
SS:
b. CS:
SS:
[WHOLEa →x [PART]b ]c
[matrix phrasea + [{om}x + . . . + te + Vinf ]b ]c
[MEANS]a ↔x {[END]}b
[[matrix phrasea ] + {[omx + . . . + te + Vinf ]b }]
As can be gleaned from (20), there is yet another difference in characterizing
these two kinds of om-clauses. This feature is treated in detail in Schilperoord
and Verhagen (1998) under the heading of conceptual dependency. Put briefly, omcomplements represent some obligatory element of the matrix phrase. We can only
conceptualize the event referred to by the verb verzoeken (“request”) if we can in
one way or another construe the contents of what is being requested. On the other
hand, the optionality of om-adjuncts reflects the fact that a sentence describing a
certain action is in itself not necessarily interpreted as an instrument for reaching a
goal in an event or state described in another clause. Put simply: one cannot make
requests without some content, whereas one can get in touch with someone without this having to be thought of as an instrument for reaching some goal. This
distinction leads us to the idea that the relation between a matrix phrase and an
om-complement is to be located on the level of clause structure, whereas the relation between the matrix clause and an om-adjunct is to be located at the level
of discourse structure (cf. Verhagen 2001 for a discussion of finite complementation as opposed to adjunction in these terms). In other words, om in om-adjuncts
signals a coherence relation holding between two discourse segments (cf. Sanders,
Spooren, & Noordman 1992).
Having discussed the two constructions om participates in, the question now
is: What does the X in both CSs mean? In other words: What concept motivates
the occurrence of om in both constructions? In principle one could assume that,
since there are two constructions, there are two oms as well. However, that would
miss an interesting generalization. As we said, in the case of adjuncts om marks

JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.18 (1056-1101)
 Joost Schilperoord and Arie Verhagen
the relation between the matrix and the adjunct as one of means to ends; the fact
that om specifically introduces the purpose clause is no coincidence: this clause
represents the proposition that is not (yet) realized. It turns out that in this way,
a generalization can be made to the role of om in complement clauses. Although
the issue has been, and still is, much debated (cf. Pardoen 1998: 419ff. and the references cited there), most analysts agree that om in complements also indicates a
notion of ‘potentiality’. The role of om, as marking a purpose in the case of adjuncts, provides a specific instance of this concept; after all, a goal is a potential
state of affairs that is yet to be realized. In complements too, the notions of ‘goal’
and ‘potentiality’ can be quite close. In (21), for example, the complement (“to
never read anything by Voskuil again”) may be said to just express something potential that is not necessarily someone’s purpose, but in (22) the realization of the
potential state of affairs (“to come home early”) is probably also the purpose of
the person asking the question:
(21) Dit deed mij besluiten (om) nooit meer iets van Voskuil te lezen.
“This made me decide to never read anything by Voskuil again.”
(22) Hij vroeg mij (om) vroeg thuis te komen.
“He asked me to come home early.”
The possibility of om in these examples contrasts with (23):
(23) Hij beweert (*om) ziek te zijn.
“He claims to be ill.”
In such cases, om is prohibited. The explanation is precisely that om marks its
complement as a potential, non-realized state of affairs, which conflicts in this case
with the meaning of claim, imposing an interpretation as ‘real’ on its complement.
To conclude this point, the meaning of om can be captured as construing the
potentiality of the state of affairs represented in the complement clause. Thus as
far as its conceptual import is concerned, there is only one om. However, it is also
part of the Dutch speaker’s linguistic knowledge that this element can conventionally participate in (at least) two different types of conceptual relations: one a
part-whole relationship (complementation); and the other a relationship between
two parts (coherence).
This provides us with a basis for believing that the presence of om is tightly
related to the conceptual structure underlying its production, and not the result of
the presence of the lexical head of the non-finite clause, as lexically driven models
would have it. With regard to the distribution of pauses with respect to om-clauses
that inherit the properties of the schemas in (20a) and (20b) respectively, IPG
would predict no differences: pauses would occur mainly before om, but no differences as to pause frequencies are to be expected. Our construction based approach,
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.19 (1101-1196)
Grammar and language production
however, predicts a substantial amount of pauses, possibly even the majority, to
occur after the production of om.
However, the differences between the schemas in (a) and (b) even allow for
a further refinement of this prediction, especially with regard to pauses occurring before om. To see why, consider again the notion of conceptual dependency. In Schilperoord and Verhagen (1998), Langacker’s definition of conceptual
dependency was used:
D is conceptually dependent on A to the extent that A elaborates a salient substructure of D.
(Langacker 1991: 436)
Note that this definition only characterizes schema (20a), but not (20b). In (20a),
the ‘whole’-concept is conceptually dependent upon the ‘part’-concept; that is,
upon the non-finite complement clause, because the latter elaborates a salient, in
fact an essential substructure of the ‘whole’-concept. The main clause in (20b)
and its corresponding conceptual import is, however, not conceptually dependent upon the adjunct clause. Its contents may be conceptualized independently
from the contents of the non-finite clause. And since pausing between discourse
segments is a fairly regular phenomenon (Schilperoord 1996), our specific expectation is that the proportion of pauses before om-adjuncts will surpass the
proportion of pauses before om-complements. Hence, the predictions are:
I. Pause proportions after om ≥ Pause proportions before om
II. Pause proportions before om-adjuncts ≥ Pause proportions before omcomplements
In order to test this hypothesis, all cases of om-clauses within the corpus were
selected, and labelled for their conceptual import (that is, whether it represented
an instance of either (20a) (whole-part) or (20b) (means-end)). All cases of omadjuncts in sentence initial position were excluded from the data base, for reasons
mentioned earlier (see the discussion preceding Table 2). This resulted in 89 omcomplements and 32 om-adjuncts. In addition, pauses occurring either before or
after om were counted, and proportions were calculated, the results of which are
presented in Table 3.
Table 3. Numbers and proportions (between brackets) of pauses before and after om in
complements and adjuncts
complements
adjuncts
Totals
(N = 89)
(N = 32)
(N = 121)
before om
after om
28 (.29)
21 (.46)
49 (.35)
68 (.71)
25 (.54)
93 (.65)

JB[v.20020404] Prn:9/02/2006; 12:00

F: HCP1507.tex / p.20 (1196-1235)
Joost Schilperoord and Arie Verhagen
In accordance with our first prediction, the total number of pauses after om
by far exceeds the number of pauses before om (χ2 (1) = 13.63, p < .001). However, this is true only for complements (χ2 (1) = 16.67, p < .001), but not for
adjuncts (χ2 (1) < 1). The second prediction concerned the (relative) number of
pauses before om in case of om-complements and om-adjuncts. A chi-square test
revealed that the proportion of pauses before om-adjuncts exceeds the one before
om-complements (χ2 (1) = 3.85, p = .05).12 These data seem to indicate that in the
production of om-complements the usual pause pattern is om -{pause}- non-finite
clause, while the pattern characterizing the production of om-adjuncts is {pause}_om_-{pause}-_non-finite_clause. Note that this marked difference between both
instances of om-clauses could in no way have been predicted on account of lexically
driven models, since according to such models, the different conceptual structures
in which om-clauses occur are not allowed to play any role as far as producing
the ‘functional’ category om is concerned; om would enter the picture as the result
of a functorization procedure, triggered by the clausal head alone (the V, being
non-finite). In other words, lexically driven models would have predicted no proportionate differences with respect to the two types of om-clauses. However, as the
schema’s in (20) clearly show, it is not the verb of the non-finite clause that makes
the difference between the two kinds of om-clauses.
We have now shown that, just as for determiners, a conceptual motivation for
the presence of the conjunction om can be provided. Om marks the potentiality of
the proposition expressed by its complement. We also showed that om participates
in different constructions, in such a way that processing differences can be deduced
depending on the kind of construction, and that these differences actually show up
in systematically different patterns of pauses for these constructions.
Prepositions
We have now discussed two types of function words with different kinds of functions. In the sub-section Constructions, we analysed determiners as providing
‘grounding’ information for (roughly) ‘things’ under discussion in a discourse and
as activating a noun phrase schema; and in the sub-section Om-clauses, we characterized the element om as activating a non-finite clause schema and marking
the proposition as potential, either as a part of a complementation schema or as a
marker of a coherence relation – a difference that was clearly reflected in the pause
data. In the course of the discussion, it also became evident that pause patterns
around function words may actually differ significantly depending on ‘details’
of the precise conceptual and linguistic relationship between a specific function
word and its environment. To conclude our discussion of the relationship between
linguistic knowledge and linguistic processing, we will now turn to prepositions.
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.21 (1235-1295)
Grammar and language production 
The implicit claim of a language production model such as IPG, making a
categorical distinction between content words (independent entries in the mental lexicon) and function words, is that elements of each of the two classes share
some crucial properties that are not shared with elements from the other class (cf.
Slobin 2001 for a critical discussion from the point of view of acquisition). We
have already put forward several arguments against such a claim, but prepositions
provide a particularly strong case against it.
That prepositions pose a challenge for such a view could actually have been
clear from the very beginning of IPG. Prepositions are markers of some kind of relation. Sometimes those relations seem to be purely ‘grammatical’; in a construct
like the transfer of the documents to the judge by the lawyer, the prepositions of,
to, and by apparently just mark the grammatical relations in the nominal phrase
(direct and indirect object, and subject, respectively); whereas in something like
staying under water during a whole day the prepositions under and during express
conceptual content. On the basis of this observation, Kempen and Hoenkamp
(1987) divided the class of prepositions into two types: ‘short’ and ‘frequent’
prepositions on the one hand; ‘long’, ‘infrequent’ ones on the other. Short prepositions, such as of, to, in, by, are believed to serve grammatical functions,13 and
therefore belong to the class of function words, which are supposed to be produced
through the application of functorization procedures, as we have seen. Longer and
less frequent prepositions (beneath, during, despite, etc.) are assigned to the class
of content words expressing conceptual content, and thus are produced by means
of lexicalization, in the IPG-model.
In view of the preceding discussion we may conclude that this version of IPG
predicts systematic differences in the distribution of pauses around prepositions.
No pauses are to be expected after the short, grammatical prepositions, precisely
because they result from functorization which follows lexicalization; but pauses
might very well occur after the longer, lexical prepositions. However, in transcript
(7), pauses can occur right after the short prepositions door (“by”), van (“of ”), and
bij (“at”), and we have little reason to believe that this would be unnatural or uncommon. So as far as we can see, the proposed division of the class of prepositions
into grammatical and lexical subclasses lacks empirical support.
However, prepositions as a class might still be said to occupy a kind of intermediate position, but in a different sense. Our view on the cognitive status of function
words as developed in the previous sections implies that we attribute two distinct
characteristics to them: one is their conceptual import (e.g. marking grounding, or
potentiality); the other the fact that they activate a particular linguistic schema, a
grammatical construction of some kind. Especially in the class of prepositions, the
precise ‘balance’ between these features can differ greatly: whereas some elements
serve more as schema activators than as indications of some specific conceptual
content, others may specify the conceptual content of part of a message in a highly
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.22 (1295-1343)
 Joost Schilperoord and Arie Verhagen
particular way. In the former kind of cases, the ‘meaning’ of the element in question may be felt to be so vague as to be virtually absent, which may lead people to
conclude it serves ‘only’ a grammatical function. In our view, however, this represents just one extreme end of a scale of differences between the relative weights of
conceptual content and schema activation, the other extreme end being the case
of names – i.e., elements evoking a certain conceptual constellation but not activating any particular linguistic schema. On this scale, prepositions can occupy a
wide range of positions, but there is no sharp dividing line between one class of
(purely) grammatical elements, and another of (purely) lexical ones.
This approach also provides a basis for understanding the difference in pause
patterns between prepositions and other function words in Table 2: there are more
pauses before prepositions than after them, whereas it is the other way around in
the case of determiners and conjunctions. This may very well be a statistical result
of the fact that many prepositions have at least some specific conceptual content,
so that the production of a preposition more often reflects a conceptual choice
which may require some time than the choice of, for instance, a determiner, where
the function of schema activation is relatively more important.
On the other hand, prepositions, especially if their meaning is highly schematic
as in cases such as of, can participate in different grammatical schema’s, and thus
give rise to different pause patterns in production. An illustration of this phenomenon was provided in the previous section. There are fewer pauses before
om when it is part of a complementation construction than when it introduces
an adjunct. Thus we actually should not expect any direct relationship between a
particular word and the distribution of pauses during the production of this word;
rather what we should look at is the construction of which it is a part on a specific
occasion of use. In the case of prepositions, a phenomenon that is especially relevant is that of the so-called ‘fixed prepositions’ (as in prepositional objects, but also
in other kinds of expressions). Consider expressions of the type “reply to X”, “think
of X”, “talk about X”, and the like. In a view of linguistic knowledge as consisting
largely of schema’s that may occur in any degree of abstractness, these expressions
are no more than simple illustrations of the point; the schema’s may be retrieved
from memory in their entirety. But in a view that distinguishes sharply between
lexicalization and functorization, these expressions are much more problematic.
Kempen and Hoenkamp (1987) implicitly treat listen to as a single lexical item,
but they do not elaborate the point generally. One important point is, in our view,
the fact that such units are still analysable.14 That is, think in the combination think
of still means think, and of functions as an introduction of a PP-complement, in
the same way as it does in the start of the program. It is not clear at all how an
approach with a strict separation of lexicon and grammar would allow for this.
Another point is that there are many cases where the choice of a head noun or
verb (such as think or reply) may strongly constrain the choice of preposition, but
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.23 (1343-1394)
Grammar and language production
does not determine it fully (consider think of and think about, for example); again,
in a ‘maximalistic’ schema conception, this does not pose a problem, but it is not
at all clear how this could be accounted for with a strict separation of lexicalization
and functorization.
Thus, the strict division of prepositions into two subclasses with completely
different processing properties, as proposed by Kempen and Hoenkamp (1987),
does not seem viable. In retrospect, this should perhaps not come as a surprise.
After all, the fact that they are all called prepositions is based on similarities in
linguistic behaviour, which becomes something of a riddle when some prepositions are assigned a fundamentally different linguistic and cognitive status than
others. Furthermore, the whole idea that superficial properties such as length and
frequency of prepositions would correlate directly with a specific kind of cognitive status, seems highly implausible, both from a language-internal and from a
comparative perspective. For example, would in and into in English have to be
produced by two crucially different components of the Formulator – i.e., as a result of functorization and lexicalization, respectively? Or, would the same be true
for na (“after”) in Dutch and after in English? Positive answers to both types of
questions seem unlikely a priori, so that they would require substantive empirical
and theoretical support. But they are precisely what IPG suggests, though without
much independent support.
All in all, it seems to us that when considered carefully, the treatment of prepositions in a lexically driven model gives rise to exactly the kind of problems that
show that the distinction between lexicalization and functorization as processes
that are supposed to be temporally separated in a systematic way, is untenable.
. Conclusion
What we have presented in this paper represents, as usual, to a large extent work
in progress. We nevertheless believe to have established some points of general
interest. Our explicit aim was to bring together cognitive linguistic views on the
nature of linguistic knowledge on the one hand, and evidence from actual language
processing on the other. In this way, we have been able to propose some reasonable theoretical accounts for empirical observations of language-in-use (viz. pause
patterns relative to function words). Admittedly, some of the ideas presented are
still somewhat vague, and as such they may seem to lack the formal elegance and
rigour that constitute much of the attractiveness of models such as IPG, positing
a strict division of labour between declarative and procedural components of linguistic knowledge (‘lexicon’ as opposed to ‘grammar’; ‘content words’ as opposed
to ‘function words’). But elegance and rigour are not all that matters, of course,
and especially not if such models leave data obtained from actual language use un-

JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.24 (1394-1450)
 Joost Schilperoord and Arie Verhagen
explained. Despite some vagueness, we therefore claim to have demonstrated the
following points.
1. The production of linguistic elements marking grammatical constructions
(so-called function words) does not have to depend on specification of other
linguistic elements, assumed to express the conceptual content of a message
(so-called content words, or lexical entries); conceptual motivation can be
provided for the production of alleged function words independently from
their ‘lexical heads’ or neighbours.
2. Therefore, linguistic knowledge, as put to use in spontaneous processes of
language production, does not involve a principled distinction between ‘functional’ and ‘lexical’ words.
3. The view of language production that emerges from this is that of a person
assembling an utterance by putting together a number of symbolic units retrieved from long term memory, some of which are more schematic than
others, and each of which is relevant to at least some aspect of the message
to be conveyed; constraints on the way the units are put together derive from
information in the units themselves, at least to a large extent.
By themselves, these ideas are not new, as even a brief glance at the history of cognitive linguistics shows. However, showing that one can use data from spontaneous
language use to support these ideas is relatively new. Although it may sometimes
be convenient, for expository purposes, to make a distinction between the linguistic system and language use, we would like to stress the importance of combining
these points of view in linguistic research if we want to avoid either developing
empirically inadequate theories or collecting theoretically empty data.
As a final theoretical point, we would like to explicate one general consequence
of these ideas. We think our results actually call for a serious reconsideration
of the role of abstract notions such as ‘function word’, and abstract categories
such as ‘Noun’, ‘Verb’, or ‘Preposition’ in theories of linguistic processing, and
consequently in actual linguistic knowledge. Models such as IPG are obviously
strongly inspired by formal theories of grammar, and therefore take great pains to
model the role of abstract grammatical categories independently from concrete semantic and phonetic considerations. Levelt’s (1989) Formulator thus models the
lexico-grammatical stage of the production process as a computational process
of manipulating abstract, formal categories. Meaning is strictly separated from
grammar, with a lexicon as mediator, and at the other side phonetic properties
of an utterance are also separated from the grammar. Grammatical operations are
not conceived as operations on units of meaning and form – i.e., symbolic elements. But does an abstract, formally defined notion of ‘function word’ ever play
a separate role in processing, independently from the conceptual characterization
of the specific element involved? IPG, formally inspired as it is, in fact embodies
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.25 (1450-1507)
Grammar and language production 
the claim that this abstract notion has direct relevance in processing; similarly, it
implies that equally abstract notions such as NP and PP, defining functorization
procedures that essentially mirror very general phrase structure rules, have direct
processing relevance. Our results, however, suggests a rather different picture.
As we have seen, there are statistical patterns in the distribution of pauses
around function words that do indeed tell us something about their cognitive status. But it has been clear from the start, first of all, that these patterns do not set
function words apart, and second, that they are not the same for all subtypes of
function words: prepositions differ significantly from determiners and conjunctions. Thus the notion “function word” does not really seem to have a unitary status in processing. Subsequently, we found that the more specific notion “infinitival
conjunction” does not have some unitary processing relevance either: the way om
is produced in complementation constructions differs from its production process
in adjuncts. We also argued that we should in fact not expect specific prepositions
to have exactly the same kind of processing properties as other ones. In IPG, prepositions are divided into two subclasses with different processing properties – i.e., a
lexical and a grammatical one, in an apparent attempt to retain the idea of immediate processing relevance of such abstract notions. But the more details of actual
language processing are taken into account, the more it becomes evident that ultimately each element has its own set of processing properties (which may vary with
the constructions in which it participates). Some elements will be more similar in
their processing properties than others; these relations of higher and lower degrees
of similarity may provide a partial organization (in a kind of network) of the elements, and some of the nodes in this network may correspond to categories such as
“Noun” or “Preposition”, which are essentially no more than sets of elements of, to
some degree, similar linguistic behaviour, but without such an abstract notion in
itself ever being directly relevant in processing.15 In fact, as we have seen, the best
way to conceive of the activation of a grammatical schema, e.g. the “NP-schema”,
is as the result of the activation of a function word – e.g., a determiner, the selection of which is itself directly motivated by some aspect of conceptual structure.
What we process in linguistic communication are conceptual categories and relations which are conventionally associated with particular patterns of form; many
of these categories and the relations between them are ‘frozen’ to varying degrees,
into what may be analysed as ‘constructions’. These specific constructions are what
we use when we produce language.
Notes
* We thank the audiences at different occasions where we had the opportunity of presenting
previous versions of this material for their feedback. We would especially like to thank Gerard
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.26 (1507-1571)
 Joost Schilperoord and Arie Verhagen
Kempen, Ray Jackendoff, Sieb Nooteboom, and other participants in the Utrecht Congress on
Storage and Computation of October 1998, as well as June Luchjenbroers and two reviewers
of the present volume. Their comments have led to several changes and refinements. Naturally,
the responsibility for all claims and speculations in this paper remain entirely our own.All correspondence concerning this chapter should be sent to: Dr. J. Schilderoord, c/- Linguistics, University of Tilburg, P.O. Box 91053, 5000 LE Tilburg, The Netherlands. Email: [email protected],
or Prof. A. Verhagen, Research Institute Linguistics Leiden, P.O. Box 9515, 2300 RA Leiden, The
Netherlands. Email: [email protected]
. The superscript indicates the number of levels of a category in the sense of the X-bar notation
(“N2 ” = “N-double bar”).
. Kempen and Hoenkamp’s notation of syntactic structures differs from standard generative
tree structures in that they explicitly specify at least some of the grammatical functions. This
functional information, even though it may strictly speaking be redundant, must be represented
in the structure at some point anyhow in order to function as a trigger for the relevant functorization procedures, in this case for example the ones that ultimately result in the insertion of the
preposition of.
. This question was raised by one of the reviewers of this paper. Although we are not aware of
research into the distribution of pauses with respect to function words in spontaneous conversation, incidental observations, including some reported in the literature (e.g., Clark 1996: 268)
do suggest that similar patterns at least occur in conversation as well. See Schilperoord (2001)
for various methodological and empirical aspects of dictation research.
. Of course, pauses may have various other sources than cognitive ones, and this may endanger
the validity of both our data and the conclusions drawn from them. In our research, we consider
a pause ‘cognitive’ if it reflects conceptualization processes or lexical retrieval (see Boomer 1965;
Schilperoord 1996). But what about other sources of pausing, how can we be sure to have kept
pauses from other sources out of the corpus? We should first distinguish between pauses that
are involuntary, and pauses that language producers willingly insert into the stream of speech.
These latter pauses occur by intent and often serve rhetorical or communicative purposes, i.e.
they are oriented towards an addressee. Clearly, such pauses could not be considered cognitive
in the above sense. However, the possibility of such pauses being present in our corpus can safely
be ruled out because of the strictly monologic nature of the production circumstances. All letters
in our corpus were dictated to a machine, not to secretaries taking notes. Hence, pauses cannot
even have resulted from a friendly employer pausing for the typist’s convenience.
But even if pauses can be considered involuntary, they still can be caused by various factors. In
terms of the IPG-model, pauses may be caused by all main components of the model, and therefore they may reflect conceptualization processes (preparing what to say), lexical-grammatical
processes (retrieving lexical items), morpho-phonological processes (accessing word forms),
monitoring processes (monitoring one’s own production), or they may originate from the workings of the articulator. Let us briefly consider these factors in turn. Obviously, the first two factors
do not pose any problem since these are the factors that we are interested in in the first place.
Pauses caused by the articulator were excluded from the corpus on grounds of pause duration.
Dechert and Raupauch (1980) have calculated that ‘breathing’ pauses last .3 seconds at most,
so we simply excluded pauses up to that length from the corpus. One should keep in mind that
pauses lasting longer than .3 seconds may reflect articular activity, but in those cases one can be
sure that this is not the only factor causing these pauses. In other words, pauses lasting over .3
seconds at least also originate from cognitive processing (see also Note 5).
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.27 (1571-1648)
Grammar and language production 
Then there may be morpho-phonological factors causing pauses manifesting the so-called tip-ofthe-tongue. Clearly, such pauses are not cognitive. However, as we devote a lengthy discussion to
this possibility in Section 3 (p. 14), we leave this issue aside here.
In addition, pauses may originate from the workings of the monitor. While producing texts,
text producers constantly monitor their own production. They attend to various aspects of their
actions, such as content, choices of phrasing, and so on. Monitoring becomes apparent from
various types of self-repairs that are produced ‘on the fly’, that is, while producing speech, but
the monitoring process may also cause pauses itself. Once again, monitoring pauses are not the
type of pauses that we are interested in here, so how can we be sure that monitoring does not
interfere with conceptualization and lexical retrieval? To be honest, we cannot in any strict sense.
However, there is some circumstantial evidence that in dictation, monitoring predominantly occurs at pre-established locations: major text structural locations such as prior to paragraphs and
sentences. While it is clear that in spontaneous speech, the orientation of monitoring is mainly
backwards, under the far more controlled production circumstances that we are dealing with
here, its orientation is mainly forwards. That is, while dictating letters, text producers devote
quite some attention to conceptual planning pieces of text in advance. One factor suggesting
this is the fact that in dictation self-repairs are almost totally absent. Another point is the fact
that pausing between paragraphs or sentences last considerably longer than pausing within sentences and clauses (see Schilperoord 1996, 2001), suggesting that at these locations preplanning
the content of text parts takes place. Since the pauses that we are interested in are all located
around function words, there seem to be good reasons for assuming that such pauses reflect the
processes of refining conceptualization or retrieving lexical items.
To conclude, our considerations thus far suggest the pauses in our corpus to be mainly caused
by cognitive factors (conceptualization, lexical retrieval). Admittedly, other factors can never be
ruled out completely, but in the absence of any compelling evidence that such factors correlate structurally with the relevant location types that we consider in this chapter, we may safely
assume that these other factors are randomly distributed, and hence do not jeopardize the validity of the data. Finally, we would like to stress the fact that pausing in language production
is an empirical phenomenon, and that pausing parameters, such as pause locations, can be analyzed independently from any pre-established theoretical point of view, be it computational
psycholinguistics, or cognitive linguistics. What matters, in our view, is how to arrive at a proper
account of this issue.
. In psycholinguistics, .3 seconds is the generally accepted ‘cut off ’ value for a pause to be taken
as reflecting some cognitive activity, rather than as resulting from muscular activities of the vocal
tract. See for example Dechert and Raupauch (1980).
. This may have something to do with the somewhat ambivalent status of prepositions with
regard to their category status: lexical or functional. See Section 4.2 for further discussion, and
also Schilperoord (1996), Schilperoord and Verhagen (1997).
. In case a reader wonders what is ‘fixed’ about these expressions, compare them with the
phrases this cup of coffee and a bathroom (e.g. in Would you like this cup of coffee? or Where can I
find a bathroom, please?).
. In Jackendoff ’s (2002) theory, lexical items are viewed as correspondence rules between semantic, syntactic and phonological information. Moreover, a lexical entry may be both larger
and smaller than an individual word. Idioms are a case in point, but a plural suffix, as an item
licensing the formation of plural forms, is also a lexical entry. These assumptions are shared
by all present construction based approaches to grammar. Croft (2001) may be seen as arguing
against a separate level SS for syntactic information, essentially because there is no way to define
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.28 (1648-1708)
 Joost Schilperoord and Arie Verhagen
the necessary global syntactic notions (“noun”, etc.) in a non-circular fashion, independently of
(language) specific constructions. In his view, the information usually considered syntactic reduces to schematic aspects of form and to the symbolic relation between form and meaning. On
the other hand, Croft’s view seems to allow for language specific distributional classes to be included in the specification of the form of a construction. As this issue is not directly relevant for
the present discussion, we use the more conservative notation here. To avoid misunderstanding:
we use Jackendoff ’s formalism only for reasons of convenience. As he has repeatedly and rightly
pointed out himself, the formalism does not assume any particular theoretical point of view.
. This is the term used in Jackendoff (2002); another term used for essentially the same concept
is ‘unification’. See Goldberg (1995), among others, for discussion of the way this notion fits into
the theory of Construction Grammar.
. While the possibility of a direct relationship between a function word and conceptual structure is a necessary condition for language production as we see it, a reviewer suggested that it
might be a sufficient condition. For example, the production of a determiner such as the could
be motivated by the presence of the feature +accessible in the conceptual structure, but it need
not activate the structure “det–N”, which might still come from the head noun. Being lexically
driven or not and being structure building or not are in principle separate characteristics of a
production model. On logical grounds, such a possibility cannot be foreclosed, obviously. However, it is first of all not a part of IPG, and second, we have explicitly based our proposal on the
constructional approach. The analyses of function words that we are aware of, all share the view
that precisely what makes these elements “grammatical”, is the fact that they do not function
independently (they are “bound forms”), and are necessarily associated with other, variable linguistic material. We thus continue to assume that activation of a function word by a feature of
the conceptual structure also activates the associated schema.
. For ease of exposition, we conflated the two formal representational levels S[yntactic]
S[tructure] and P[honetic] S[structure]. But see also Note 8.
. The difference between the proportions of pauses after om failed to reach significance (χ2
(1) = 3.31, p > .10).
. Confusingly labelled ‘clitics’; they are not pronominal and they are also phonologically
independent.
. Recall that analyzability does not imply compositionality (in the sense of ‘having been composed’). If elements can be distinguished within a linguistic unit (analyzability), it does not
follow that the unit has been constructed out of these elements. Even obvious idioms, necessarily
stored as units, may exhibit analyzability: in spill the beans, the element spill corresponds to the
semantic component divulge and the beans corresponds to information. For a recent discussion,
moving in a somewhat different direction, cf. Croft (2001: 180–184).
. This position resembles the one defended for linguistic theory in general on the basis of
methodological, typological and analytic considerations in Croft (2001), and from the perspective of acquisition in Slobin (2001). In a sense, our analysis provides an additional argument
from processing for the hypothesis that global structural notions do not really have explanatory
power, and are not primitive but rather based on similarities between specific constructions (cf.
Verhagen 2002: 420/421).
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.29 (1708-1829)
Grammar and language production 
References
Ariel, Mira (1988). Referring and accessibility. Journal of Linguistics, 24, 65–87.
Baddeley, Alan (1990). Human memory. Theory and practice. Hove: Lawrence Erlbaum
Associates.
Boomer, David S. (1965). Hesitation and grammatical encoding. Language and Speech, 8, 148–
158.
Carroll, David W. (1999). Psychology of language (3rd ed.). Pacific Grove, CA: Brooks/Cole
Publishers.
Clark, Herbert H. (1996). Using language. Cambridge: Cambridge University Press.
Croft, William (2001). Radical construction grammar. Syntactic theory in typological perspective.
Oxford: Oxford University Press.
Dechert, Herbert W. & Marius Raupauch (Eds.). (1980). Temporal variables in speech. Studies in
honour of Frieda Goldman-Eisler. The Hague: Mouton.
Erman, Britt & Beatrice Warren (2000). The idiom principle and the open choice principle. Text,
20, 29–62.
Goldberg, Adele (1995). Constructions: A construction grammar approach to argument structure.
Chicago/London: University of Chicago Press.
Haaften, Ton van (1991). De interpretatie van verzwegen subjecten [The interpretation of
understood subjects]. Diss. VU Amsterdam. Dordrecht: ICG Printing.
Jackendoff, Ray (1990). Semantic structures. Cambridge, MA: MIT Press.
Jackendoff, Ray (1995). The boundaries of the lexicon. In M. Everaert et al. (Eds.), Idioms:
Structural and psychological perspectives (pp. 133–167). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Jackendoff, Ray (1997). The architecture of the language faculty. Cambridge, MA: MIT Press.
Jackendoff, Ray (2002). Foundations of language; Brain, meaning, grammar, evolution. Oxford:
Oxford University Press.
Kay, Paul & Charles J. Fillmore (1999). Grammatical constructions and linguistic
generalizations: the What’s X doing Y? construction. Language, 75, 1–33.
Kempen, Gerard & Edward Hoenkamp (1987). An incremental procedural grammar for
sentence formulation. Cognitive Science, 11, 201–257.
Langacker, Ronald W. (1990). Concept, image and symbol. The cognitive basis of grammar.
Berlin/New York: Mouton de Gruyter.
Langacker, Ronald W. (1991). Foundations of cognitive grammar, Volume II, Descriptive
application. Stanford: Stanford University Press.
Levelt, Willem J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Levelt, Willem J. M. (1999). Producing Spoken Language: a blueprint of the speaker. In P.
Hagoort & C. W. Brown (Eds.), The Neuro-cognition of Language (pp. 94–122). Oxford:
Oxford University Press.
Pardoen, Justine A. (1998). Interpretatiestructuur. Een onderzoek naar de relatie tussen
woordvolgorde en zinsbetekenis in het Nederlands. [Interpretation Structure. A study of
the relation between word order and sentence meaning in Dutch.] Amsterdam/Münster:
Stichting Neerlandistiek VU/Nodus Publikationen.
Sanders, Ted, Wilbert Spooren, & Leo Noordman (1992). Towards a taxonomy of coherence
relations. Discourse Processes, 15, 1–35.
Schilperoord, Joost (1996). It’s about time. Temporal aspects of cognitive processes in text
production. Amsterdam/Atlanta: Rodopi.
JB[v.20020404] Prn:9/02/2006; 12:00
F: HCP1507.tex / p.30 (1829-1864)
 Joost Schilperoord and Arie Verhagen
Schilperoord, Joost (2001). On the cognitive status of pauses in discourse production. In T. Olive
& M. C. Levy (Eds.), Contemporary tools and techniques for studying writing (pp. 60–89).
Dordrecht, Boston, London: Kluwer Academic Publishers.
Schilperoord, Joost & Arie Verhagen (1997). Functionele elementen in een cognitief perspectief.
Evidentie uit taalproductie [Functional elements in a cognitive perspective. Evidence from
language production]. Nederlandse Taalkunde, 3, 223–248.
Schilperoord, Joost & Arie Verhagen (1998). Conceptual dependency and the clausal structure
of discourse. In J. Koenig (Ed.), Discourse and cognition; Bridging the gap (pp. 141–165).
Stanford, CA: CSLI Publications.
Slobin, Dan I. (2001). Form-function relations: How do children find out what they are? In
Melissa Bowerman & Stephen C. Levinson (Eds.), Language acquisition and conceptual
development (pp. 406–449). Cambridge: CUP.
Verhagen, Arie (2001). Subordination and discourse segmentation revisited, or: Why matrix
clauses may be more dependent than complements. In Ted Sanders, Joost Schilperoord,
& Wilbert Spooren (Eds.), Text representation. Linguistic and psycholinguistic aspects (pp.
337–357). Amsterdam: John Benjamins.
Verhagen, Arie (2002). From parts to wholes and back again. Cognitive Linguistics, 13, 403–439.
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.1 (47-114)
chapter 
Word recognition and sound merger
Paul Warren
Victoria University of Wellington
Theories of spoken word recognition largely assume stability in the lexical
representations onto which the input signal is mapped. Speaker-, style- or
situation-dependent variability in the input is accounted for by an appropriately
sensitive (or desensitized) pre-lexical analysis. In contrast to the primarily
perceptual accounts provided for such variability, this chapter considers the need
for a more cognitive account for variation arising from sound change. The
particular case under consideration is the merger-in-progress of the front
centering diphthongs in New Zealand English. This chapter reviews key research
on the realization and comprehension of words containing these diphthongs, and
discusses the theoretical implications for theories of lexical access and
representation that derive from the current fluid state of these vowels.
Keywords: word recognition, sound change, vowel merger, New Zealand English
.
Introduction
Most psycholinguistic models of spoken word recognition assume that the process
of recognizing a word normally involves the extraction of acoustic phonetic information from the speech signal and its utilization in some lexical search procedure,
together with the exploitation of contextual information to constrain this search.
This procedure has been the object of a variety of research questions, concerned
with the processes and representations involved in analyzing the input (e.g. Klatt
1989; Marslen-Wilson & Warren 1994; Cutler 1990), as well as with the relationships between form-based and content-based access, or between ‘bottom-up’ and
‘top-down’ processing (Tyler 1990). The central questions for this paper concern
how the recognition system copes with variability in the form of the input, long acknowledged as a potential problem for the successful recognition of a word. While
the relative lack of invariance has long plagued speech researchers and engineers
(Stevens & Blumstein 1981), it is important to recognize that variation can be informative rather than a hindrance to recognition; differences in the articulation of
sounds at different word positions can for instance be exploited by the recogni-
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.2 (114-170)
 Paul Warren
tion system as indicators of important phenomena such as word boundaries (cf.
Church 1987), and of course differences between speakers are informative about
aspects of their identity and status. For the process of spoken word recognition,
however, it is important that variation in the input signal does not disrupt the
correct identification of a word.
The source of variability that we will be considering in this paper is the merger
of the /i6/ and /e6/ diphthongs in New Zealand English (nze), sometimes referred
to as the ear/air merger. In the remainder of this introduction, I will outline possible consequences of sound mergers in terms of the neutralization of distinctions
between words. I will then review relevant recent findings from the study of the
ear/air merger, adding some new analyses that highlight issues for word recognition. Finally, I will consider some of the implications of the merger for the process
of spoken word recognition.
Complete neutralization
If the distinction between two sounds disappears completely, in all environments,
then a range of words, which differed previously only in that one of them contained one of the sounds where the other had the second, will become homophones. Such homophones may either contain one of the two phonemes previously distinguished in the language, or they may be collapsed onto some intermediate form. In either case, once the relevant adjustments have been made
to the phonemic system, the recognition mechanism will find itself in a familiar state, since homophony is already widespread throughout human language.
Research on the recognition of homophones suggests that all meanings of a lexically ambiguous word such as bank are initially accessed when the word is heard,
even in a strongly biasing context. However, selection of the intended meaning
is then rapidly achieved as the various meanings are assessed against the context
(e.g. Swinney 1979; Tanenhaus & Lucas 1987). Our experience of homophony is
certainly such that it rarely causes difficulties in processing.
Predictable partial neutralization
A neutralization is partial if it occurs in some linguistic contexts but not in others.
In many cases the neutralization is predictable, and is conditioned by the immediate phonetic context. Thus, in reasonably fast speech, the sequence bad girl may
not be distinguishable from bag girl because of assimilation of the final /d/ of the
first word to the initial /:/ of the second word (though see Nolan 1992, who finds
some residual articulatory and perceptual evidence for an alveolar gesture in such
sequences). Gaskell and Marslen-Wilson (1998) provide evidence that the perceptual and word recognition system can tolerate such surface variations as long as
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.3 (170-220)
Word recognition and sound merger
they occur in phonologically viable contexts, so that processes of phonological
inferencing can help ensure the successful recognition of the intended word. Similarly, though they were not considering the issue of phonemic merger, Mann and
Repp (1981) found that listeners will compensate for the effects of phonetic context, so that a stimulus ambiguous between /t/ and /k/ will be interpreted as the
former after /w/ but as the latter after /s/, taking into account the fact that /s/ but
not /w/ makes a following /k/ more /t/-like. In both of these examples, the conditioning context can be exploited by the listener in their interpretation of the
acoustic input, as it were ‘unravelling’ the effects of phonetic context.
In other cases, the partial neutralization of a distinction may result in greater
dependency on higher-level contextual information for ambiguity resolution, just
as with homophones. Thus in many dialects of English, /t/ and /d/ are both produced as an alveolar flap in intervocalic positions, so that latter and ladder may
become auditorily indistinguishable (Wells 1982: 249). In situations like these, presumably, both latter and ladder will be activated on the basis of partial match
with the input (or via a phonological rule that allows either to match the flapped
form), and selection between them will be based on the additional contextual
information, such as the overall meaning of the utterance.
Unpredictable partial neutralization
Sound mergers resulting in the types of neutralization discussed above will present
no new problems for the word recognition system, which already needs strategies
for dealing with homophony and predictable partial neutralizations. Our interest
in this paper is in a rather different situation from these, in which a phonemic
contrast appears currently to be undergoing merger so that the distinction is in a
state of flux. Of course, a merger that is in the process of taking place could nevertheless result in some predictable distributions at a particular point in its progress.
For instance, an ongoing process of phonological merger might be reflected in a
previous distinction being lost for certain word pairs, while still maintained for
others. Through a process of lexical diffusion, the merger may subsequently affect
all instances of the sounds in question. Before such a time, however, there are likely
to be word pairs that are still distinct, and others that have become homophones,
and may be treated as such by the word recognition process. In other situations,
if two phonemes are merged in predictable phonetic contexts, these contexts will
provide information that can be used in retrieving the underlying form. An investigation of the implications of sound merger for word recognition thus includes an
examination of whether the merger results in new homophones, and of whether
phonetic contexts can be seen to motivate the merger.
But simply looking for homophones or for phonetic motivation of merger involves taking a somewhat static snap-shot view of language change, which assumes

JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.4 (220-268)
 Paul Warren
that there are discrete stages at which merger has taken place completely for some
forms and not at all for others. The reality is generally quite different, with changes
taking place more gradually and with some communities or groups of speakers
showing a greater tendency to merge than others. This being so, the word recognition system clearly needs to be sensitive to a wide range of variables, potentially
including extra-linguistic factors such as speaker identity. The next section isolates
some of the variables that are influencing the ear/air merger.
. Production data on the ear/air merger
Regional differences
Various studies of this diphthong merger have been carried out in recent years
in New Zealand, the most extensive being a long term study of adolescents in
Christchurch (e.g. Gordon & Maclagan 2001, henceforward G&M) and Holmes
and Bell’s social dialect study of Porirua near Wellington (Holmes & Bell 1992,
henceforward H&B). Both studies agree that the distinction between ear and air
diphthongs is becoming less clear. However, they differ in the claims they make
about the pattern of the merger. H&B argue that the two diphthongs are now
sharing the vowel space in which distinct forms used to be articulated, so that
there is an eair vowel that ranges from /i6/-like to /e6/-like forms. The precise
form of the merger appears to be in part dependent on speaker age. Thus H&B
studied, amongst other groups in their Porirua survey, old (70–79 years), midaged (40–49) and young (20–29) Pakeha speakers (New Zealanders of European
descent). Compared with the oldest group, they found that the mid-aged speakers show a shift towards /i6/, but the younger speakers show a movement in the
opposite direction, towards /e6/. This is in apparent contradiction to G&M, who
have surveyed Christchurch adolescents every five years since 1983, and find increasing evidence of a merger on /i6/, a conclusion that is supported by acoustic
analyses by Watson et al. (1998). However, Maclagan and Gordon (1996) pointed
out that their 1983 13–14 year-olds, who are near contemporaries of 20–29 yearolds sampled by H&B in 1989–1990, showed the same preference for /e6/. They
attribute this to a perceived stigmatisation of the /i6/ form for this cohort of speakers. In addition, it is clear that there are a number of methodological differences
between these two studies, not least of which are the speech styles sampled. Thus
G&M’s study focused on read materials, while H&B included a larger sample of
conversational speech.
Starks, Allan and Kitto (1998) present data from a large number of subjects
taped in the Auckland area. They also find a difference across age groups, apparently supporting H&B’s claims for increasing movement towards /e6/ amongst
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.5 (268-345)
Word recognition and sound merger 
younger speakers. While /e6/ forms in their sample show little merger, /i6/ forms
show movements towards /e6/ at rates that increase with decreasing age. However,
Starks et al. only sampled the two words air and ear, and the first of these was actually given in the spoken question used to elicit the word. Thus their data may not
be representative of what is happening to these diphthongs in the Auckland area.
Age differences
Given the sampling and methodological differences between the studies summarized above, the remainder of this section will focus on just one, H&B’s survey, for
which the raw data were made available to the current author. It has already been
mentioned that H&B found shifts to /i6/ for mid-aged and to /e6/ for young speakers. This finding was based on an analysis in which they collapsed an auditory scale
covering [i], [iœ], [e›] and [e] (i.e. /i/, lowered /i/, raised /e/ and /e/ respectively) into
a binary ear ([i], [iœ]) / air ([e›], [e]) distinction. Their analysis shows that midaged speakers pronounced 22% of air words with /i6/, but only 8% of ear words
with /e6/, while younger speakers pronounced 15% of air words with /i6/, and
35% of ear words with /e6/. These figures also show an increasing tendency for
instability.
Using the auditory values from H&B’s raw data (i.e. using their initial fourpoint scale), I have derived median starting points for the diphthongs in ear and
air words from a range of tasks from word list reading to conversation. The results,
by age group for ear and air words, are shown in Figure 1. Each value plotted here
represents the median starting point for the ear or air diphthong for a particular age group, and is based on between 904 and 1191 data points (the variation in
sample size being largely due to unequal frequencies of occurrence in the conversational data). If age group differences reflect change over time, then the figure shows
a small early shift of /e6/ (air) tokens to a closer (higher) starting point (compare
old and middle-aged groups), followed by a significant lowering of the /i6/ vowel
(i.e. ear, from mid to young groups). In terms of the scale of the changes, there
seems to be support for Starks et al.’s (1998) finding of a greater change over age
groups for /i6/ than for /e6/.
Although speaker age is the dominant factor in their analysis, H&B highlight
other speaker-group differences in the distribution of the diphthongs, pointing
out that Pakeha women appear to be in the vanguard of change. Thus, the Pakeha
women in the middle-aged group show the greatest shift towards /i6/ for air
words, and the Pakeha women in the young group show the greatest movement
in the opposite direction for ear words. Watson et al. (1998) also report a more
complete merger for women than for men in their acoustic analysis.
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.6 (345-390)
 Paul Warren
EAR
AIR
starting point
i
i–
e+
e
old
mid
young
Figure 1. Median starting points for ear and air diphthongs, by speaker age, based on
data from Holmes and Bell (1992)
Linguistic constraints on the merger
Using median starting points computed for ear and air in H&B’s Porirua data,
and the distance between these starting points, I consider now other factors that
may condition or constrain the merger, and which might thereby influence the
word recognition process. The main points of interest are first, are there any word
pairs that are so consistently merged that they are effectively homophones, and so
are most likely to be processed in the same way as lexically ambiguous words like
bank? Second, are there any further linguistic factors – such as phonetic context –
that might be used by listeners as indicators that some process of change has taken
place, thus helping them to recover the underlying form of a merged or merging
diphthong?
Evidence for homophony
To look for evidence for homophony, consider H&B’s minimal pair words: ear/air,
beer/bare, kea/care (kea is a native New Zealand parrot), cheer/chair, fear/fair,
hear/hair, peer/pair, really/rarely, sheer/share and spear/spare. The /i6/-/e6/ distance
data, for all age groups together, are shown in Figure 2.
In this figure, one unit represents the distance between neighbouring steps on
H&B’s auditory scale (e.g. between starting points for these diphthongs that were
labeled as [iœ] and [e›]). Each plotted value corresponds to the distance between
medians based on at least 73 and up to 76 ear and air pairs, the variation in
sample size being due to the exclusion of some tokens as being monopthongs.
Two word pairs in particular, cheer/chair and sheer/share show closer values
than the others – inspection of the medians for the members of these contrasts
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.7 (390-430)
Word recognition and sound merger
i\-e\ distance
3
2
1
ear-air beerbear
kea- cheer- fearcare chair fair
here- peer- really- sheer- spearhair pair rarely share spare
Figure 2. Distance between median starting points for ear and air words, based on data
from Holmes and Bell (1992)
shows that this is largely due to a closer (higher) articulation for the air form (to
which we will return in the discussion of phonetic influences below). At the other
end of the scale, ear/air and really/rarely have greater ear/air distances than the
other sets. For each of these pairs, the greater distance between the two median
values arises because more extreme pronunciations are kept for both forms. Interestingly, of the pairs of words examined, really/rarely are probably the most likely
to appear in identical contexts, and with opposite meanings, as in I really/rarely like
the ice-cream from that dairy. There may therefore be greater pressure on speakers
to keep these words distinct.
What these data clearly show is that the word pairs in question are not homophones in current New Zealand English. In fact median tests show that all
pairs have reliably distinct starting points for ear and air (χ2 at p < 0.01). Some
pairs, however, are clearly less distinct than others. These word differences will be
discussed further in later sections.
Homophony across age groups
Does the change over time reflected in the age group comparison affect some
words more than others? Figure 3 below shows the distances between median
values for the three groups of subjects, with each plotted point representing the
difference between medians based on between 21 and 26 tokens.
The effect of speaker age is quite clear. Older speakers distinguish all word
pairs, younger speakers hardly distinguish them at all, and mid-aged speakers are

JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.8 (430-460)
 Paul Warren
4
old
mid
young
i\-e\ distance
3
2
1
0
ear-air beerbear
kea- cheercare chair
fear- herefair hair
peer- really- sheer- spearpair rarely share spare
Figure 3. Distance between median starting points for ear and air words, by age group,
based on data from Holmes and Bell (1992)
between these two. The two older groups are quite similar overall, but differ more
obviously for word pairs in /w-/, /tw-/ and /k-/, where the mid-aged group show a
smaller, though still significant, ear/air distance. As with the overall picture, this
difference is almost entirely the result of a higher starting point for the air tokens.
For the younger speakers, it seems that many of the word pairs may effectively be
homophones; the distance between the starting points of most pairs, at less than
one point on the scale, is smaller than that between, e.g. [e] and a raised [e›].
In addition to the really/rarely pair discussed above, which is as distinct for
these younger speakers as it is for the older groups, there is also a larger difference
for the /k-/ pair, which is due to a higher starting point for the ear token, i.e. for
the bird-name kea. It is possible that the fact that this word is of Māori origin
results in a clearer distinction being maintained by this younger group, though
G&M note that none of their surveys distinguished kea and care, which were both
consistently pronounced with /i6/. These two pairs, really/rarely and kea/care, are
the only pairs for which the younger group in H&B’s sample shows a significant
ear/air difference in the median tests (χ2 at p < 0.01).
These data suggest quite strongly that although the word pairs are distinct for
most older speakers, they will become homophonous for the New Zealand population as the younger generation gets older, as long as following generations exhibit
the same absence of a consistent distinction between the two diphthongs. And the
data in Figure 2 suggest that cheer/chair and sheer/share are leading the way.
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.9 (460-513)
Word recognition and sound merger 
Linguistic context
There are a number of aspects to the question of whether the realisation of the
diphthongs is in any way dependent on the linguistic context, involving sentence,
lexical and phonetic levels.
H&B’s survey included conversational data, word lists, prose reading and minimal pair lists. This range of tasks allows us to address the question of whether
speakers are more likely to merge the vowels when there is a sentence context that
could also serve to disambiguate, i.e. where the information load of the vowel contrast is reduced. In fact, there is very little difference in the /i6/-/e6/ distances in
the four tasks; the prose reading task, where there is a sentential context, produces
a slightly lower median difference (2.03) than the minimal pair task (2.26), but
does not differ from the word list task (2.00). The conversational data, involving
a different speech style as well as including sentence contexts, have a somewhat
larger difference (2.49). There would appear to be no clear evidence for sentential
contexts resulting in a greater degree of merger.
Lexical frequency
A factor that may influence the incidence of merged forms is the frequency with
which individual words are used. In particular, if the change is proceeding through
the language by lexical diffusion, then high frequency words may be affected earlier than low frequency words. Conversely, the opposite prediction results from an
assumption that words that are used more often have more stable pronunciations
and are so less likely to change. The frequency values for words in the minimal
pair set were compared with their median starting point values, for ear and air
words separately. Frequency counts from the Wellington Corpus of Spoken New
Zealand English were used in preference to published frequency norms, since these
have been collected for other English varieties. Since the different age groups show
different tendencies as far as the merger is concerned, correlation coefficients were
computed separately for each group. These showed that neither of the two older
groups show any clear correlation of lexical frequency and sound change, while the
younger group has higher starting points for higher frequency ear words (with a
correlation of 0.518). However, this is due largely to the fact that really is a high
frequency word in the corpus, and also has a high starting point for all speaker
groups. Since really, as we have seen, may be kept distinct because of the extent of
potential confusion with rarely, there is little evidence of a causal or constraining
relationship between lexical frequency and extent of merger.
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.10 (513-575)
 Paul Warren
Minimal contrast
A further lexical factor that may influence the merger is whether a word stands
in minimal opposition to another word, distinguished only by the ear/air diphthong. Gordon and Maclagan conjecture (1990: 140) that sound change may be
able to proceed more quickly when a word is not perceived as part of a minimal
pair. However, the median starting points in H&B’s data show no evidence of this,
the overall distance between ear and air being very similar for minimal pair words
(2.25) and words not in minimal pairs (2.42).
Phonetic context
The analysis of the minimal pair set in Figures 2 and 3 showed that there may
be an effect of phonetic context, in the form of the consonant immediately preceding the diphthong. The data presented there suggest on closer inspection that
the relevant phonetic context might be the place of articulation of the preceding consonant. It was noted that for the mid-age group in particular the distance
between the ear and air vowels was smallest for words beginning in postalveolar /w/ and /tw/ and velar /k/, mainly because of a higher starting point for air
words. In other words, for these speakers the air vowel is higher after a consonant
with a high front(ish) tongue articulation. Since /k/ is likely to be fronted before
these front vowels, these three consonants can all be characterized as having the
phonetic place feature [+coronal], To examine the influence of coronal place of articulation on the ear/air vowels, the words for the minimal pair set were grouped
according to whether they involved a coronal consonant (which also included /s/
in spear/spare). This grouping was carried out for each age group of speakers, but
excluded really/rarely, which were discussed in the context of other factors above.
Figure 4 shows the effect of consonant place of articulation on the height of the
air vowel. Each data point is based on a median starting point for between 75
and 84 tokens (for the coronal contexts) or between 145 and 168 tokens (for the
non-coronals; the variation in sample size is again due to the exclusion of some
tokens for some speakers on the basis of their being monophthongs). A median
test including all age groups showed the effect of coronality to be significant at
p < 0.001. In separate median tests for each age group, coronality had a significant
effect on air height for old speakers (p < 0.01), a greater one for mid-age speakers
(p < .001), but no effect at all for young speakers. Data for the ear vowel are not
presented, since this was not influenced by the value of the [coronal] feature of the
preceding consonant.
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.11 (575-607)
Word recognition and sound merger 
starting point
i
coronal
non-coronal
i–
e+
e
old
mid
young
Figure 4. Median starting points for air diphthongs, by age group and place of articulation
of preceding consonant, based on data from Holmes and Bell (1992)
Summary of production data
The data reviewed in the preceding sections show quite clearly that the ear/air
merger is a change in progress, as reflected in the different patterns of realisation
across the age groups in H&B’s survey (Figure 1). Speakers in the older group still
distinguish the two diphthongs with reasonable consistency. Those in the midaged group also distinguish them, but exhibit a raising of air, particularly after
coronal consonants (see also Figure 4). In marked contrast, the youngest speakers
show a lowering of ear, in addition to a raising of air compared with the old
group. However, data from G&M and from Watson et al. suggest that this group
may be exceptional, and that the overall trend is to /i6/ (see also Warren & Hay
2005; Hay et al. 2006).
For H&B’s mid-aged group, then, there is the suggestion of phonetic conditioning on air raising. The raising of the air vowel may be part of the general chain-shift raising of the New Zealand front vowels (Bauer 1992; Gordon &
Deverson 1985), as pointed out by G&M. The re-analysis of H&B’s mid-aged data
presented here suggests that the shift was initiated earlier in those environments
in which it is phonetically conditioned.
The youngest speaker group in H&B’s survey shows significant ear/air differences only for two of the minimal pair sets, really/rarely and kea/care. However, the
data show that for all pairs (except hear/hair) there is a residual difference, since
the median value for the ear vowel is in each case higher than that for the air
vowel, although their distributions clearly overlap considerably.
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.12 (607-700)
 Paul Warren
The review of other linguistic factors leads to the further conclusion that ear
and air forms are distinct words for the older two groups, but that most are effectively homophonous for the youngest group. One constraint on homophony
appeared to come from considerations of ambiguity – really and rarely, more than
any other pair, have the potential to occur in identical environments with opposite
meanings, and without being disambiguated by context. However, with the exception of this pairing, there was little evidence that merger is occurring less rapidly
in minimal pairs than in other cases. In addition, lexical frequency and the general
availability of a sentential context appeared to have no constraining or motivating
effect on the extent of the merger.
. Word recognition and sound merger
The preceding sections have highlighted the variability of ear/air in NZE, and
isolated a few factors, mainly phonetic context and speaker age, that appear to
have the strongest influence on the merger. How do these observations relate to
the process of word recognition?
Word recognition in the merger process
To what extent might the process of sound merger be affected by the requirements and mechanisms of word recognition? Clearly, an overriding objective of
the process of word recognition is the correct identification of the intended word,
and access to its further lexical properties. The merger of a phonemic distinction
potentially inhibits this process, particularly for minimally distinct words. This
danger is clearly reduced when the words concerned are unlikely to be found in
comparable contexts. It is also reduced when phonetic context conditions a change
and can be used in interpreting the result of that change, through processes of
compensation for coarticulation (Mann & Repp 1981; Elman & McClelland 1988).
In Table 1 I set out a series of hypothesised ‘states’ in the progress of the
ear/air merger, forming an approximate temporal sequence for the process of
the change as reflected in the production studies. The phonetic values given are
approximate.
In the following description, I conjecture how the constraints of word recognition might influence the progress of the merger. At state (1) in Table 1, ear and
air words are distinct, representing the situation for H&B’s older group, as shown
in Figure 1. At (2), phonetic conditioning raises the air vowel after coronal consonants (as represented in the table by /tw/). Since non-coronals do not show this
raising, the overall effect is of a slight rise in the starting point of the air vowel (cf.
the mid-aged data in Figures 1 and 4). The two /tw-/ words are still perceptually
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.13 (700-724)
Word recognition and sound merger
Table 1. Hypothesised states in the process of merger (see text for details)
state
ear set
(1)
[twi6]
[bi6]
[twi6]
[bi6]
[twiœ6]
[biœ6]
[twi6]
[bi6]
(2)
(3)
(4)
air set
cheer
beer
cheer
beer
cheer/chair
beer
cheer/?chair?
beer/?bear?
[twe6]
[be6]
[twiœ6]
[be6]
[twe›6]
[be6]
[twiœ6]
[biœ6]
chair
bear
chair
bear
chair/cheer
bear
chair/cheer
bear/beer
distinct, despite the closeness of their phonetic realisation, because listeners compensate for coarticulation, and hear [iœ] as lower than it actually is. State (3) shows
the young speakers’ reaction against a stigmatised /i6/. All /i6/ forms are lowered,
including the ear set. If the mechanisms of compensation for coarticulation are
still operative, the system shown at (3) potentially runs a greater risk of ear/air
confusion than that at (2), since both forms following coronals are now lowered,
to positions that could be explained perceptually as air forms resulting from coarticulation, while their intermediate pronunciations mean they could also be heard
as ear forms. Once the ear form is no longer perceived as stigmatised, we get
spreading of the raised air from coronal to non-coronal contexts (4), potentially
assisted by perceptual confusions that might have arisen at state (3), but possibly also by a more general principle by which subsequent generations of speakers
“forget” the reasons for coarticulation (Ohala 1992). The eventual merger on /i6/
is but a short step from here.
Speaker age and phonological status
As noted above, even the youngest speaker group in H&B’s survey shows a residual difference between ear and air forms. It is possible that whatever variation
remains respects Nolan’s hypothesis that “differences in lexical phonological form
will always result in distinct articulatory gestures” (1992: 272–274). If this is the
case, then the difference, though slight, suggests that even the younger speakers
have distinct phonological forms for /i6/ and /e6/. This, however, conflicts with
self reports from speakers in this age group. Similarly, my own observations of
phonetics students agree with those reported by G&M, namely that young New
Zealanders find it very difficult to distinguish /i6/ and /e6/, in other words that
they appear to be losing the phonemic distinction between these diphthongs.
Such difficulty for younger speakers contrasts with the heightened awareness
of the merger reflected in opinions voiced in New Zealand newspapers, which pre-

JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.14 (724-803)
 Paul Warren
sumably originate from speakers in the older groups. One writer claims that the
merger “could lead to a great deal of frustration, trouble and strife”, and continues:
To use the [. . . ] examples of “beer” and “bare” and “here” and “hair”, I go into this
bar and say, “Beer, please” and the barmaid, being an obliging girl, takes off her
top and bra. Because I am devoutly decent, I say indignantly, “Here! Here!” [sic]
and the barmaid who knows when enough’s enough whacks me with a jug of Old
Dark which starts a bloody brawl.
You see, the potential for misunderstanding is substantial and the consequences
may be horrendous.
Alex Veysey – Opinion column in Evening Post, Wellington, 29/10/94
Other writers complain about hearing on radio that cars are to be fitted with
earbags, or on television that stuffed beers are available. Such comments suggest
that the phonemic distinction is still very much alive for these speakers. If these
comments are typical, it is interesting that they also reflect the direction of the
change towards /i6/, in that these more ‘conservative’ listeners criticise speakers
for making air words sound like they contain /i6/. Similarly, the opinion column
gives a constructed example where the speaker (who maintains the phonemic distinction) produces /i6/ for ear words, but his productions are misinterpreted (by
the presumably younger barmaid) as air words.
A conflict of two systems?
It is significant that our (mainly young) phonetics students claim not to be able
to distinguish ear and air words, while the (presumably) older correspondents
deplore the merger and its consequences. If these were two static and separate
populations, then we could assume that the former would treat any vowel in the
[i6]-[e6] range as a token of one vowel – i.e., there would be for these speakers complete neutralization, so that forms of fear and fair with vowels in this range would
initially map onto both of these words, and contextual information would then
be used to select the appropriate lexical form. The older speaker group, however,
would hear [fi6] as fear and [fe6] as fair (or fare). However, these populations are
neither static nor separate, and this raises further issues for processes and models
of word recognition.
The question of what a New Zealander does when hearing a form like [fi6]
clearly depends on who that New Zealander is. But does it also depend on what the
New Zealander knows about the speaker who uttered the form (Hay et al. 2006)?
It is possible that listeners may be sensitive to the perceived age of the speaker.
However, given the testimonials from young speakers attesting to their inability
to distinguish reliably between the two vowels, this relationship between speaker
and hearer may not be symmetrical. That is, as suggested above, young speakers
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.15 (803-834)
Word recognition and sound merger 
may no longer have a reliable phonemic distinction between /i6/ and /e6/, and this
will affect them both as speakers and as hearers. Let us assume, in line with most
models of word recognition (though cf. Marslen-Wilson & Warren 1994), that listeners need to recognize phonemes in the speech stream and use these to make
contact with lexical representations. In this case, the younger listeners, for whom
[i6] and [e6] are allophones of a single EAIR phoneme, will generally be unable to
distinguish /i6/ and /e6/, for all ages of speakers. (The issue of whether their single
phoneme is more like /i6/ or /e6/ is irrelevant. What is important is that a phonemic distinction has been lost.) Older listeners, on the other hand, may recognize
that a merger is in progress for the younger generations, and consequently map a
young speaker’s [i6]-[e6] forms indiscriminately onto both /i6/ and /e6/, while still
expecting closer correspondences of [i6] to /i6/ and [e6] to /e6/ from other older
speakers. So [fi6] may be interpreted by older listeners as fear if produced by a
speaker from the same age group, but as ambiguous between fear and fair if from
a young speaker. Since such adjustments involve extra-linguistic knowledge, they
are clearly different from the kind of normalization usually envisaged for other
types of (idiosyncratic or allophonic) variation in the speech signal (e.g. template
matching, distance metrics, etc. – cf. Klatt 1989). They also invoke interactions of
knowledge types that are different in scope from even the lexical or sentential influences that are argued to have an effect on the outcomes of phonetic processing
(Ganong 1980; Tyler 1990).
A further empirical issue concerns differences between coronal and noncoronal contexts in interaction with this age group difference. For instance, if compensatory strategies are operative in the interpretation of [twiœ6] as chair amongst
the mid-aged listeners in (2) in the table above, then maybe these listeners are
more tolerant of [iœ6] for air after coronals for all speaker groups.
Ambiguity in context
Some of the newspaper correspondence that deplores the ear/air merger argues
that it is important to know whether the speaker is saying that something is fair or
fear. What is interesting about such comments, as well as the opinion text and
other letters to the editor cited above is that the suggested confusion between
members of word pairs is actually unlikely to arise in most contexts, since very
few of the minimal pair words investigated in the production studies, and few of
the examples cited in newspaper opinion and letter columns, are likely to be found
in otherwise identical contexts. When they do, the confusion will probably be as
short-lived and remain as unnoticed as ambiguities involving words like bank,
thanks to the multiple access of word forms and rapid integration with context
(Tyler 1990, but see also Schvaneveldt et al. 1976). An empirical question remains
as to whether the recognition system is any way disadvantaged by the merger, i.e. is
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.16 (834-904)
 Paul Warren
there any processing delay or any greater likelihood of error, relative to processing
in a non-merged system?
. Closing comments
Whatever the answers to questions raised above may turn out to be, it is clear that
the recognition system is able to cope with the type of variability that arises during
an on-going process of sound merger, since the confusions reported in opinion
and letter columns are rarely experienced. What remains to be seen, though, is just
how it does cope with this variability. In addition to linguistic variables such as
the phonetic context in which the diphthong is produced, the review and additional analysis of H&B’s data has shown that extralinguistic factors such as the age
group of the speaker and potentially also of the listener are influential in determining the attested forms. Including these factors in an account of word recognition
will amount to an extension of such accounts beyond perceptual and linguistic considerations, invoking a consideration of possible interactions between the
acoustic-phonetic analysis of the input and the listener’s awareness of speaker
identity. Further empirical studies of the merger will also address issues such as
the relative importance of sentential and phonetic contextual information in coping with such variability, as well as the role of the word recognition process itself
in constraining or directing the process of sound change.
References
Bauer, Laurie (1992). The second great vowel shift revisited. English World-Wide, 13, 253–268.
Church, Kenneth W. (1987). Phonological parsing and lexical retrieval. Cognition, 25, 53–69.
Cutler, Anne (1990). Exploiting prosodic probabilities in speech segmentation. In Gerry T. M.
Altmann (Ed.), Cognitive models of speech processing (pp. 105–121). Cambridge, MA: MIT
Press.
Elman, Jeffrey L. & James L. McClelland (1988). Cognitive penetration of the mechanisms
of perception: compensation for coarticulation of lexically restored phonemes. Journal of
Memory and Language, 27, 143–165.
Ganong, William F. (1980). Phonetic categorization in auditory word perception. Journal of
Experimental Psychology: Human Perception and Performance, 6, 110–125.
Gaskell, M. Gareth & William D. Marslen-Wilson (1998). Mechanisms of phonological
inference in speech perception. Journal of Experimental Psychology: Human Perception and
Performance.
Gordon, Elizabeth & Tony Deverson (1985). New Zealand English: An introduction to New
Zealand speech and usage. Auckland: Heinemann.
JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.17 (904-1009)
Word recognition and sound merger
Gordon, Elizabeth & Margaret A. Maclagan (1990). A longitudinal study of the ear/air contrast
in New Zealand speech. In Allan Bell & Janet Holmes (Eds.), New Zealand ways of speaking
English (pp. 129–148). Clevedon, Avon: Multilingual Matters.
Gordon, Elizabeth & Margaret A. Maclagan (2001). Capturing a sound change: A real time study
over 15 years of the NEAR/SQUARE diphthong merger in New Zealand English. Australian
Journal of Linguistics, 21(2), 215–238.
Hay, Jennifer, Paul Warren, & Katie Drager (2006). Factors influencing speech perception in the
context of a merger-in-progress. Submitted to special issue of Journal of Phonetics, Vol. 34,
issue 1, to appear in 2006.
Holmes, Janet & Allan Bell (1992). On shear markets and sharing sheep: The merger of EAR and
AIR diphthongs in New Zealand English. Language Variation and Change, 4, 251–273.
Klatt, Dennis H. (1989). Review of selected models of speech perception. In William D. MarslenWilson (Ed.), Lexical representation and process (pp. 169–226). Cambridge, MA: MIT Press.
Maclagan, Margaret A. & Elizabeth Gordon (1996). Out of the AIR and into the EAR: Another
view of the New Zealand diphthong merger. Language Variation and Change, 8, 125–147.
Mann, V. A. & Bruno H. Repp (1981). Influence of preceding fricative on stop consonant
perception. Journal of the Acoustical Society of America, 69, 548–558.
Marslen-Wilson, William D. (1987). Functional parallelism in spoken word recognition.
Cognition, 25, 71–102.
Marslen-Wilson, William D. & Paul Warren (1994). Levels of perceptual representation and
process in lexical access: words, phonemes, and features. Psychological Review, 101, 653–675.
Nolan, Francis J. (1992). The descriptive role of segments: evidence from assimilation. In Gerard
J. Docherty & D. Robert Ladd (Eds.), Papers in laboratory phonology II: Gesture, segment,
prosody (pp. 261–279). Cambridge, England: Cambridge University Press.
Ohala, John J. (1992). What’s cognitive, what’s not, in sound change. In G. Kellermann & M.
D. Morrissey (Eds.), Diachrony within synchrony: Language history and cognition (pp. 309–
355). Frankfurt am Main: Peter Lang Verlag.
Schvaneveldt, Roger W., David Meyer, & Curtis A. Becker (1976). Lexical ambiguity, semantic
context and visual word recognition. Journal of Experimental Psychology: Human Perception
and Performance, 2, 243–246.
Starks, Donna, Scott Allan, & Catherine Kitto (1998). Why vernacular speech? Speech samples
from the taped Auckland rapid and anonymous survey. Paper presented at the Sixth New
Zealand Language and Society conference, Wellington, 28–30 June 1998.
Stevens, Kenneth N. & Sheila E. Blumstein (1981). The search for invariant acoustic correlates
of phonetic features. In Peter D. Eimas & Joanne L. Miller (Eds.), Perspectives on the study
of speech (pp. 1–38). Hillsdale, NJ: Erlbaum.
Swinney, David A. (1979). Lexical access during sentence comprehension: (Re)consideration of
context effects. Journal of Verbal Learning and Verbal Behavior, 18, 645–659.
Tanenhaus, Michael K. & Margery M. Lucas (1987). Context effects in lexical processing.
Cognition, 25, 213–234.
Tyler, Lorraine K. (1990). The relationship between sentential context and sensory input. In
Gerry T. M. Altmann (Ed.), Cognitive models of speech processing (pp. 315–323). Cambridge,
MA: MIT Press.
Warren, Paul & Jen Hay (2005). Using sound change to explore the mental lexicon. To appear
in Claire Fletcher-Flinn & Gus Haberman (Eds.), Cognition and language: Perspectives from
New Zealand. Bowen Hills: Australian Academic Press.

JB[v.20020404] Prn:9/02/2006; 12:57
F: HCP1508.tex / p.18 (1009-1018)
 Paul Warren
Watson, Catherine I., Jonathan Harrington, & Zoe Evans (1998). An acoustic comparison
between New Zealand and Australian English vowels. Australian Journal of Linguistics, 18,
185–208.
Wells, John (1982). Accents of English, 3 vols. Cambridge: Cambridge University Press.
JB[v.20020404] Prn:9/02/2006; 13:00
 
Linguistic components
and conceptual mappings
F: HCP15P3.tex / p.1 (47-73)
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.1 (47-121)
chapter 
Verbal explication and the place of NSM
semantics in cognitive linguistics*
Cliff Goddard
University of New England, Australia
This paper argues that verbal explication has an indispensable role to play in
semantic/conceptual representation. Cognitive linguistic diagrams are not
semiotically self-contained and cannot be interpreted without overt or covert
verbal support. Many also depend on culture-specific iconography. When verbal
representation is employed in mainstream cognitive linguistics, as in work on
prototypes, cultural models and conceptual metaphor, this is typically done in an
under-theorised fashion without adequate attention to the complexity and
culture-specificity of the representation. Abstract culture-laden vocabulary also
demands a rich propositional style of representation, as shown with contrastive
examples from Malay, Japanese and English. As the only stream of cognitive
linguistics with a well-theorised and empirically grounded approach to verbal
explication, the NSM (natural semantic metalanguage) framework has much to
offer cognitive linguistics at large.
Keywords: Wierzbicka, semantic primes, diagrams, Malay, Japanese
In natural language, meaning consists in human interpretation of the world. It
is subjective, it is anthropocentric, it reflects predominant cultural concerns and
culture-specific modes of social interaction as much as any objective features of
the world ‘as such’.
(Wierzbicka 1988: 2)
.
Friend, foe, or fellow traveller?
Cognitive linguists seem somewhat divided in their attitude towards Anna
Wierzbicka and the distinctive semantic theory (the natural semantic metalanguage or NSM theory) originated by her (Wierzbicka 1972, 1988, 1992, 1996,
1999, and other works; cf. Goddard 1998a). As Paul Werth (in Niemeier 1997) has
pointed out, Wierzbicka anticipated important themes in cognitive linguistics by
many years. As early as 1972, in her book Semantic Primitives, she was upholding a
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.2 (121-174)
 Cliff Goddard
view of meaning as conceptualisation, as opposed to the structure-based or logicbased views that dominated (and still dominate) the linguistic mainstream. In her
empirical descriptive work she was researching topics such as emotion, time, and
the interaction between language and culture well before cognitive linguistics, and
other late-breaking trends in twentieth century linguistics, brought these topics
out of the shadow of Chomskyan generativism. Already in the 1970s, Wierzbicka
was employing a version of prototype analysis, some years before Fillmore, Lakoff,
and others (cf. McCawley 1983).
As noted by Peeters (1997a), Wierzbicka was at Duisburg in the spring of
1989 for the symposium organised by René Dirven which “marked the birth of
cognitive linguistics as a broadly based, self-conscious intellectual movement”
(Langacker 1990: 1), and she published in the first issue of the journal Cognitive
Linguistics. Even if, as Peeters comments, she is best considered “a co-opted member rather than a founding member”, there is no doubt in the minds of many
cognitive linguists that her work holds an honourable place within the broad
movement of cognitive linguistics. For example, Athanasiadou and Tabakowska
(1998: xxi) introduce a collective volume on emotions with the remark that it “represents a wide spectrum of cognitive trends, thereby testifying to the pluralism
within the cognitive linguistic paradigm: the metaphorical-metonymical Lakovian approach (Kövecses), the semantic-primitives approach (Wierzbicka), and
the semasiological-structure approach (Geeraerts/Grondelaers)”.
On the other hand, Wierzbicka’s approach has also been deemed incompatible with, or even inimical to, the core tenets and proper principles of cognitive
linguistics. For example, Lakoff (1990) in the same issue of Cognitive Linguistics just referred to, spends some time differentiating his own approach from
Wierzbicka’s “Leibnizian commitment”, which includes her semantic universalism
and her use of reductive paraphrase in natural language as a vehicle for semantic explication. Geeraerts (1999) characterises present-day cognitive linguistics
as having two methodological extremes: the ‘empiricist’ tendency (corpus analysis, psycholinguistic research, neurophysiological modelling) and the ‘idealistic’
tendency represented by Wierzbicka and her colleagues, with their dubious appeals to intuition and platonist views about universal conceptual primes.1 Even if
NSM semantics is not mentioned by name, one often finds cognitive linguistics
characterised in terms which would seem to marginalise the role of propositional meaning and verbal explication – for example, when it is claimed to be a
defining assumption of cognitive linguistics that meaning originates in experiential schemas, in visual-spatial templates, or in other pre-conceptual or embodied
modes of understanding.
The main thesis of this chapter is that verbal explication is in fact indispensable in cognitive linguistics, and that the NSM approach, as the only well
developed and empirically grounded theory of verbal explication, is well equipped
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.3 (174-232)
Verbal explication and NSM semantics
to meet this need. Section 2 argues that familiar devices of cognitive linguistics,
such as diagrams representing image schemas, and conceptual metaphors, cannot
work effectively without a much improved theory of verbal explication. Section
3 then takes up a domain in which the need for verbal explication is particularly
clear, namely the domain of abstract, culture-laden vocabulary, illustrating with
examples from English, Malay and Japanese. Section 4 contains concluding remarks touching on the relative merits of NSM and alternative approaches to verbal
explication.
The natural semantic metalanguage approach
The general outline of the ‘natural semantic metalanguage’ approach is well
known, so I will give an abbreviated version here (for a review of common misunderstandings, see Goddard 1998b). The initial assumption is that the meanings encoded in the linguistic forms of any language (at least, the propositional
or symbolic meanings) can be adequately described within the resources of that
language – i.e., that any natural language is adequate as its own semantic metalanguage. The approach began as an attempt to systematise the traditional definitional
technique in lexical semantics – i.e., stating the meaning of a word (in a particular utterance) by means of an exact paraphrase in other words. As recognised
by seventeenth century thinkers such as Arnauld, Descartes, Pascal, and, above
all, Leibniz, this procedure can only succeed if the paraphrasing is done in terms
which are semantically simpler (i.e., easier to understand) than the term being defined. Otherwise the analysis gets bogged down in circularity and terminological
obscurity. Assuming it is possible to avoid circularity and infinite regress, it follows
that every natural language must contain a non-arbitrary and irreducible ‘semantic core’ which would be left after all the decomposable expressions had been dealt
with. This semantic core must have a language-like structure, with a lexicon of indefinable expressions (semantic primes) and a grammar; that is, some principles
governing how the lexical elements can be combined. The semantic primes and
their principles of combination constitute a kind of ‘mini-language’. It is furthermore assumed, as a working hypothesis, that at this most basic level of semantic
analysis there is substantial identity between the languages of the world:2 in effect,
the semantic primes and elementary combinatorial grammar of different natural
languages coincide. This assumption is supported by a large and growing body of
empirical cross-linguistic research.
After thirty years of trial-and-error experimentation in different semantic
domains, and taking into account a number of careful cross-linguistic studies
(Goddard & Wierzbicka 1994; Goddard 1997; Goddard & Wierzbicka 2002), the
current inventory of proposed semantic primes numbers in the mid-sixties. It
is listed in Appendix 1. Examples include substantive and determiner-like ele-

JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.4 (232-284)
 Cliff Goddard
ments such as i, you, something/thing, someone, this, other, one, and two;
predicate-like elements such as do, happen, move, think, know, want, and say;
descriptive and evaluational elements like big, small, good and bad; spatial
and temporal elements such as where/place, here, above, below, near, far,
when/time, before, and after; and logical elements such as because, if, not,
can, and maybe. In the NSM system, one states the meaning of a word (grammatical construction, etc.) in terms of an extended paraphrase or ‘explication’
couched entirely within the natural semantic metalanguage. In this way it is hoped
to achieve maximum granularity and transparency of semantic description, and at
the same time minimise the problem of terminological ethnocentrism; that is, the
danger of adopting a mode of semantic representation which is tied to one particular language (English), and which carries with it conceptual baggage from that
language.
In some respects NSM semantics can be seen as a classical approach to semantics, especially in its commitment to representation in discrete, propositional
terms. However, it is quite unlike other so-called classical approaches to semantics,
which have rightly attracted the ire of cognitive linguists. First, NSM explications
are not bundles of semantic features. They are essentially texts composed in a specified minimal subset of ordinary language. Second, the proposed primes are not
abstract in any sense, but are identified with word meanings of ordinary natural
language. This means that they are grounded in everyday linguistic experience.
Third, the NSM approach is not linked in any way with so-called Objectivism –
i.e., the view that linguistic expressions get their meaning from correspondences
with aspects of an objective, language-independent reality. On the contrary, the
NSM metalanguage contains sundry elements which are inherently subjective,
vague, and evaluational (such as, for example, like and good). Fourth, it is entirely possible to incorporate conceptual prototypes, scenarios, and so on, within
NSM explications.
From the exposition up to this point, it might appear that the NSM program is
primarily about semantic universals, but it is equally about linguistic relativity and
diversity. If the number of semantic primes is a mere 65 or so, it follows that the
vast bulk of the vocabulary and syntax of any language is not language-universal,
but language-specific. As Langacker has put it, in the context of acknowledging
parallels between his own work and that of Wierzbicka:
In positing her universal semantic metalanguage, Wierzbicka claims that all languages exhibit a fundamental commonality in their lexicogrammatical structure.
At the same time, the limited array of elements in this metalanguage are combinable to form higher-order semantic structures of indefinite complexity and
essentially infinite variability. This unity-in-diversity is not unlike the great profusion of life forms on earth, all governed by strands of DNA comprising different
sequences of just four nucleotide bases.
(Langacker 1999: 215)
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.5 (284-337)
Verbal explication and NSM semantics 
The NSM program has produced numerous studies of language-specific semantics in a range of languages – studies of culture-specific lexical items (e.g., kin
categories, colours, values, emotions, speech acts, natural kinds), of illocutionary
devices (especially particles and conversational routines), and of morphosyntax
(e.g., number marking, passives, causatives, case constructions, evidentials). The
languages include English, French, Polish, Russian, Spanish, Chinese, Cree, Ewe,
Lao, Hawaiian Creole English, Japanese, Korean, Malay, Mangaaba-Mbula, and
Yankunytjatjara, among others. A selection of these studies is listed in Appendix 2.
We will sample a small portion of this work in Section 3.
. The indispensability of verbal explication
In this section I wish to argue that verbal explication has an indispensable role in
cognitive linguistics – not in place of, but alongside other, more schematic modes
of representation.
Diagrams are not enough
Even if one grants that diagrammatic, or other non-verbal, means are sufficient
to depict certain kinds of concept (spatial and dimensional concepts, concrete objects, concrete part-whole relations, numbers perhaps), surely not all concepts are
amenable to such a treatment. The reason that spatial and dimensional concepts
(e.g., inside, above, below, big, small) lend themselves to diagrammatic representation is that one can rely on an analogue (iconic) relationship between the diagram
and the modelled reality. For example, to depict the idea of inside (‘containment’)
one can present one figure inside another; to depict one object as above or below
another, one can rely on an analogous spatial relationship on a page (taking the
top of the page to represent the ‘up’ dimension); to depict the contrast between big
and small one can present two figures, one big and one small. The Figure-Ground
relationship can be conveyed visually by making the Figure visually ‘heavier’ (e.g.,
thicker, darker, shaded). Diagrams like those in Figure 1 are an everyday feature of
cognitive linguistics.
I will argue in a moment that these seemingly transparent visual depictions
are not as semiotically ‘pure’ as they may seem; but for the time being, suppose
we grant that they do the job they are intended to do. How can the diagrammatic
mode be extended to abstract concepts – i.e., to concepts that lack physical or
perceptual correlates? For example, how could we represent evaluational notions
(good and bad) in a purely visual medium? How to depict the difference between
‘thinking that such-and-such’ and ‘knowing that such-and-such’? How to depict
the relationship of similarity (like) or the notions of potentiality (can) or possibil-
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.6 (337-405)
 Cliff Goddard
lm
tr
X
Y
Figure 1a. Containment
Figure 1b. X above Y
st1
st2
st3
Figure 1c. She went out of the room
ity (maybe)? I am assuming, of course, that it is necessary to depict such notions
in some fashion if we are to faithfully model the conceptual content of language,
but this seems a thoroughly reasonable assumption. It is commonly accepted that
countless lexical items embody evaluational dimensions, and that many languages
have special benefactive and adversative constructions (linked with the notions of
good and bad, respectively). Similarly, the notions of thinking and knowing are implicit in numerous lexical items, at least in English (doubt, wonder, prove, believe,
etc.), and are involved in evidential markers and constructions in many languages.
Similarity, potentiality and possibility are recurrent and pervasive dimensions of
conceptual structure in the world’s languages.
It is of course a simple matter to set up some symbolic conventions that could
enable us to express such concepts in a visual mode. Trivially, one could designate
good by a tick () and bad by a cross (x). One could depict the mental state of
a person ‘thinking that . . . ’ by a thought balloon with a thin wavy line, but use
a thick blocked line for ‘knowing that . . . ’. But clearly devices like these have a
fundamentally different character to the analogue (iconic) representations given
in Figure 1. Ticks, crosses, and so on, are symbolic in nature, bearing no particular iconic relationship with the intended meaning. Essentially they are just visual
substitutes (codes) for the words whose meanings they represent. To know what is
intended by a tick or by a thought balloon, one has to learn a particular culturespecific convention, and this learning itself depends on words. From a semiotic
perspective, a tick or a cross is parasitic upon the verbal sign it represents.
Now it may be pointed out that schematic diagrams for certain abstract concepts have played a prominent role in cognitive linguistics. For example, Figure
2 below is from Mark Johnson’s (1987) ground-breaking book The Body in the
Mind, and Figure 3 comes from Leonard Talmy’s (1988) influential article on
force dynamics. Johnson and Talmy refer to these diagrams as depicting imageschematic gestalts or experiential schemas. My point is that they cannot do their
intended job without the assistance of verbal captions and explanations. For exF1
Figure 2. Compulsion
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.7 (405-492)
Verbal explication and NSM semantics 
+
Figure 3a. Force-dynamic pattern for a sentence like, The ball kept rolling because of the
wind blowing on it
+
Figure 3b. Force-dynamic pattern for a
sentence like, The shed kept standing despite
the gale wind blowing against it
ample, without verbal explanation I doubt very much if anyone would interpret
Figure 2 as depicting ‘compulsion’. We need the caption and the accompanying
explanation: “In such cases of compulsion, the force comes from somewhere, has
a given magnitude, moves along a path, and has a direction. We can represent this
image-schematic gestalt structure with the visual image below. Here the dark arrow represents an actual force vector and the broken arrow denotes a potential
force vector or trajectory” (Johnson 1987: 45).
Similarly, it is highly unlikely that anyone would understand Figure 3 (from
Talmy 1988) without the accompanying legend, which tells us that the graphic
elements of the circle and the shape with a concave side indicate the entities involved in the force dynamic scenario, the so-called Agonist and the Antagonist,
respectively; that > and • represent the intrinsic force tendency towards action or
towards rest, that the + and – signs indicate the “balance of strengths”, and the
line at the bottom of each diagram gives the “resultant of the force interaction”
(either action or rest, as in (a) and (b), respectively). In short, the diagrams are
not semiotically “self-contained”.
Even apparently self-explanatory diagrams such as Figure 1a for ‘containment’
are not necessarily as simple as they look. Compare that Figure with the very similar Figure 4a below. This was employed by Hawkins (1984) to designate not in (or
‘containment’), but on. Hawkins’ drawing makes sense, however, within his own
set of conventions, which include him having adopted the ellipse shape as representing surface, as in Figure 4b. Once again, the point is that the captions play a
vital interpretive role.
Notice that I am not saying that the diagrams are perfectly equivalent to the
verbal glosses of what they mean. I accept that diagrams have the capacity to convey gestalt or figural properties in a way that cannot be duplicated in words, and
that properties of this kind may be very important for our understanding of how
language works as part of the overall cognitive system.3
I am not saying that cognitive linguists should give up diagrams and use only
verbal paraphrases instead. However, diagrams cannot achieve their purpose without verbal support, and we therefore must have some theory about the nature of
the verbal items that form an essential part of the representational system. What,
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.8 (492-534)
 Cliff Goddard
TR
LM
Figure 4a. ON (profile only)
Figure 4b. SURFACE configuration
for example, is the status of terms such as ‘vector’, ‘Agonist’, ‘force tendency’, and
‘path’? Are these terms primitives of the representational system, and if not, how
can their own conceptual content be analysed? Does it matter that such terms are
technical and unknown to the speech community at large? Does it matter that they
have no equivalents in most languages of the world, thus effectively tying the representational system to one language – i.e., English? In my view there are some
fundamental issues here for cognitive linguistics.
One further point: even simple diagrams often (perhaps always) rely on
culture-specific, ‘Western’ interpretive conventions and iconography. Interpretive
conventions that arise from Western literacy practices (such as the institution of
writing, the convention that print is read from left to right, and the existence of
books and other printed materials as portable individual objects) can seem so
natural to the encultured person that their artificiality is seldom noticed. For example, it seems very natural to Westerners that the ‘top’ of the page (i.e., the side
canonically held furthest from the body) can represent a higher position than the
‘bottom’ of the page. It seems natural that moving from left to right across the
page can represent the passage of time (as when in generative parlance we speak of
a word or phrase being ‘in left-most position’, meaning that it is pronounced before the rest of the sentence). Furthermore, the institutions of representational art,
and more recently photography, have entrenched the convention that, all other
things being equal, images will be read as representing shapes viewed from one
side and “in perspective”.
We are reminded of the culture-specific nature of these conventions when we
consider cultures that lack literacy (as most cultures do) and representational art.
In the traditional Aboriginal cultures of the Australian Western Desert, for example, visual representations are usually made on the ground (as sand-drawings) or
on the human body (as ceremonial designs). The usual viewpoint is from above
(an aerial view) rather than from one side. In sand-drawings the placement of elements is usually done with respect to an absolute, external frame of reference;
for example, a figure placed on the east side of the drawing represents someone
who is on the east in the scene being depicted. For someone raised in this tradition, even Western diagrams like those in Figure 1 above will not convey the
intended meanings.
Figures 5 and 6 show some figures from typical Western Desert sand-drawings
(Bardon 1979; Munn 1973: 120). In their own cultural context, figures such as the
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.9 (534-624)
Verbal explication and NSM semantics 
Figure 5a. A person
Figure 5b. A camp
Figure 5c. Two people in camp
Figure 6a. Kangaroo track
Figure 6b. Emu track
Figure 6c. Dingo track
U-shape and the concentric circles are instantly recognisable as depicting human
figures and camp or waterhole, respectively. Similarly, depictions of animal tracks
(such as kangaroo, emu, and dingo) are instantly recognisable as indicating the
presence or movements of those animals.
Taking the perspective of a non-Western culture can dramatise the fact that
something like the arrow symbol (→) of Western iconography, which is heavily
relied upon in cognitive linguistic diagrams, is by no means a transparent and
purely iconic sign of movement or directionality. For someone raised in the traditional Central Australian cultures, for example, it looks more like an emu track
(Figure 6b) than anything else. The culture-specific character of visual representation warrants more detailed treatment, but for present purposes it is enough to
note that signs such as the arrow symbol and the use of left-to-right sequencing to
represent the passage of time are another way in which schematic diagrams may
covertly assume verbal support – at least, if we aspire to a representational system
which can be used across languages.
Scenarios, models and conceptual metaphor
I turn now to the role of verbal (or quasi-verbal) representation as used in work on
prototypical scenarios, cultural models, etc. and in work on conceptual metaphor.
My position is that the value of much of this work is compromised because the
language of the representations is not sufficiently theorised.
As representative of prototypical scenarios, consider the influential treatment
of anger in Lakoff (1987; cf. Lakoff & Kövecses 1987: 213f.). This is a five-stage
scenario which opens as follows:
Stage 1, Offending Event: Wrongdoer offends S. Wrongdoer is at fault. The
offending event displeases S. The intensity of the offense outweighs the intensity of the retribution (which equals zero at this point), thus creating an
imbalance. The offense causes anger to come into existence.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.10 (624-685)
 Cliff Goddard
Stage 2, Anger: Anger exists. S experiences physiological effects (heat, pressure,
agitation). Anger exerts force on the Self to attempt an act of retribution.
Subsequent stages portray an attempt to control the anger, a loss of control leading
to an outbreak of ‘angry behaviour’, and a final stage of retribution such that the
intensity of the retribution balances that of the offense and the anger disappears.
The basic idea behind the proposal is widely accepted: that the meaning of the
English word anger is based on an ideal or prototypical scenario which involves an
experiencer construing someone else as having done something bad, and because
of that, experiencing a ‘bad feeling’ and a concomitant desire to do something bad
to this person in return. However, the formulation as given above does not actually
say this. Instead of using simple terms such as ‘do’, ‘bad’, ‘want’, ‘because’ and so
on, it is phrased in complex words such as offending event and retribution, which
obscure the semantic content rather than making it explicit. Words like offend and
retribution are surely of comparable (or greater) complexity than anger itself, and
just as deserving of explication. Presumably, additional prototypical scenarios will
be required for them. How are they to be phrased? What are the implications of
“scenarios within scenarios”? Does it matter that a child may know the word anger
(or angry) prior to acquiring the words offend and retribution?4 These are serious
questions and it is unsettling to think that they have not yet been widely identified
and addressed within cognitive linguistics.
Another cognitive linguistic tool which employs a propositional (or quasipropositional)5 style of representation is conceptual metaphor, a notion which has
proved enormously fertile since it was introduced by Lakoff and Johnson (1980).
Canonical examples include those shown in (1a) and (1b) below, along with some
of the expressions which are supposed to instantiate them. These illustrate conceptual metaphors of the so-called ‘ontological’ type, which are supposed to establish
a set of figurative correspondences between the elements of two domains: a concrete source domain and an abstract target domain.
(1) a.
THEORIES ARE BUILDINGS
We will show that the theory is without foundation.
We need to buttress that argument with more support.
Some of the arguments are well constructed.
b. ANGER IS THE HEAT OF LIQUID IN A CONTAINER
She really got steamed up.
He was seething.
He just exploded.
As work on conceptual metaphors advanced, however, certain problems became
apparent. On the one hand, the correspondences between source and target domains are not comprehensive; one cannot, for example, speak of a theory having
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.11 (685-738)
Verbal explication and NSM semantics 
walls or a roof, or as having inhabitants. On the other hand, many attested correspondences are not specific to particular source domains or target domains;
rather, the mappings are many-to-many (Kövecses 1995; Grady 1997). One proposal which may alleviate both problems is to re-cast the formulation in terms of
broader and more general metaphorical mappings.6 For example, Grady (1997)
proposed the metaphors shown in (1c); cf. Kövecses’ (1995: 326–328) Complex
Systems Metaphor.
(1) c.
ORGANISATION IS PHYSICAL STRUCTURE
PERSISTING IS REMAINING ERECT
I have a lot of sympathy for Grady’s proposal, but from the point of view of the
language of representation, the metaphors in (1c) are framed in terms (such as ‘organisation’, ‘structure’, and ‘persisting’) which are more abstract and remote from
ordinary usage than those in (1a) and (1b). If these terms were supposed to have
a privileged theoretical status this could be problematical, but my impression is
that Grady does not intend them as such. He evidently does not intend them to
be semantically transparent – i.e., self-explanatory, because he goes to some effort
to explain them. For example, he explains (quoting the American Heritage Dictionary) that the term ‘structure’ is to be understood as implying “a complex entity
composed of arranged parts”. The ‘parts’ of a theory (such as its premises, claims,
arguments, and supporting facts) can be seen as “arranged in certain logical relationships” in an analogous fashion to the arrangements of the physical parts
of a complex physical object. In similar fashion, Kövecses (1995: 328) explains
that ‘complex systems’ (which include theories, society, and complex interpersonal
relationships such as marriage and friendship) resemble complex objects in the following ways: “they do not exist first and then they are made; they are made for a
purpose; they have a function; they have a large number of parts that interact with
each other; they require effort to make and maintain; the stronger they are the
longer they last”.
This kind of conceptual unpackaging is moving in the right direction. Complex notions are being resolved (or partially resolved) into simpler notions, and in
the process semantic relations, which were implicit, are being made explicit. For
example, Grady and Kövecses have identified two key components of ‘structure’
as ‘something which has many parts’, and which is ‘made by people’. Once this
unpackaging has been done, the proposition that the same schema can be applied
both to abstract objects and to concrete physical objects becomes much easier to
appreciate, and, in my view, much easier to accept. Of course there is still more to
‘structure’ in the relevant sense but for present purposes we need not grapple with
the substantive details. My concern is rather with the language of the representation. If complex terms like ‘structure’, ‘organisation’ and ‘persisting’ are not the
end of the line, but merely “shorthand” (Grady 1997: 274) for other, more articu-
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.12 (738-787)
 Cliff Goddard
lated meanings, then what is their theoretical status? Are they intermediate steps
in the chain of analysis (in the same way that, for example, certain complex organic molecules can be broken down into simpler molecules before being further
decomposed into atoms of their constituent elements)? Or are they merely approximations on the road to more elaborate and explicit analyses? In either case,
no analysis is complete until it has been resolved into the simplest possible terms.
It is legitimate to expect some theoretical account of the role of verbal elements in
the representation of conceptual metaphors (schemas, etc.).
One further theoretical issue is the question of language-specificity in the representational system. Does it or does it not matter if, at a particular level of analysis,
the terms of the analysis are specific to the language at issue? Plausibly, the answer
could depend on the level of the analysis. At an initial and fairly concrete level, one
might expect the conceptual metaphors of Russian, Japanese, or Yankunytjatjara
to fall out in terms of language-specific words of Russian, Japanese or Yankunytjatjara. At a deeper or more articulated level of analysis, however, one might expect
the terms of the analysis to become less language specific. Endorsing a proposal
by Lakoff (1993) to this effect, Cienki (1998: 141) provides comparative evidence
from English and Russian that “higher-level metaphors for event structure are
the ones that are more likely to be shared cross-culturally, while the lower-level
metaphors are more likely to vary across cultures”.
In my view this is a very interesting proposal, worthy of systematic research
across a range of languages and cultures. On the other hand, an equally provocative (and incompatible) proposal has also been made in the literature, namely,
Mühlhäusler’s (1995) claim that even the most fundamental metaphorical mappings can have a language-specific character, so that what is literal in one language
is metaphorical in another. An issue like this goes to the heart of the cognitive linguistic project to understand the nature of human conceptualisation. Though we
cannot enter this debate here (cf. Goddard 1996a), I am mentioning it to support
my general point that issues of profound theoretical importance hinge on the metalanguage of representation, and to urge cognitive linguists to engage with these
issues in a more sustained fashion.
In summary, I have argued in this section: (a) that no matter how valuable diagrammatic representations may be they cannot do without verbal representation,
(b) that in any case, verbal representation has played a leading role in cognitive
linguistic practice, in the areas of scenarios, schemas, and conceptual metaphors.
However, I have also argued that (c) cognitive linguistics has not sufficiently problematicised the role of verbal representations, and thus (d) runs the risk of unwittingly employing contradictory or self-defeating practices, and (e) at the same
time misses the opportunity to focus on certain fundamental theoretical issues.
As far as I am aware, the only stream of cognitive linguistics that has a welldeveloped position on these issues backed by wide-ranging empirical research, is
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.13 (787-853)
Verbal explication and NSM semantics 
the natural semantic metalanguage approach. That is, only this approach problematicises and theorises the role of verbal elements in semantic/conceptual representations.
. The challenge of abstract, culture-laden vocabulary
From a methodological point of view, the need for verbal explication seems particularly pressing in relation to complex abstract vocabulary, such as terms for
emotions and attitudes (e.g., happy vs. joyful, love vs. pity), values and social ideals
(e.g., honest vs. sincere, freedom vs. duty), and speech-acts (e.g., praise vs. compliment, vow vs. swear). For words like these there seems to be no alternative to a
propositional style of representation: essentially, a verbal explication. The fact that
words of this kind tend to be highly culture specific poses an extra descriptive challenge, while at the same time adding a dimension of theoretical importance for any
approach to language that seeks to articulate culture-specific conceptualisations.
As mentioned, Anna Wierzbicka is responsible for numerous studies of
abstract culture specific vocabulary, especially in European languages and in
Japanese; see especially Wierzbicka (1992, 1997, 1999, in press). In this section,
however, I draw on research by other NSM researchers, myself and Catherine
Travis, in specific relation to value terminology. The studies to be summarised here
both involve explicating subtle differences between apparently close translation
equivalents across languages (Malay and English, Japanese and English). My intention is to illustrate the effectiveness of the paraphrase method in cross-linguistic
applications, and at the same time to throw up a challenge to cognitive linguists
who reject this methodology to show how they would cope with the same data.
Malay ikhlas vs. English sincere
Goddard (2001a) presents a contrastive semantic analysis of the Malay cultural key
word ikhlas, and its conventional English translation equivalent sincere. In English
language newspapers in Malaysia, it is not uncommon to see sincerity identified
as one of the most important values. For example, Dr. Mahathir Mohamad, the
then Prime Minister of Malaysia, was reported as telling the 1996 General Assembly of the UMNO political party that the party supported a culture based on
“good manners, discipline, hard work, sincerity and fairness” (New Straits Times
11/10/96, 17). The National Literature Laureate, Abdullah Hussain, included sincerity among the list of values he said could be strengthened by literature (New
Straits Times 10/4/96, Life and Times, 9). A newspaper column by an Islamic educator was headed “Sincerity is pure and absolute and the panacea against all vices”
(New Straits Times 20/4/96).
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.14 (853-908)
 Cliff Goddard
Reading examples like this one gets the feeling that the word sincerity is being
used in a peculiarly Malaysian sense – perhaps something more akin to ‘selflessness’. It comes as no surprise to find such usages have their source in the Malay
word ikhlas. Though ikhlas is usually given the gloss ‘sincere’ in bilingual MalayEnglish dictionaries, it can be used in a broader range of contexts than English
sincere or sincerely. There is certainly an overlap in the range of use. In particular,
ikhlas can be used to indicate that a person who is conveying some kind of ‘positive
message’ deserves to be believed. In formal contexts, this use of ikhlas can often be
translated as ‘sincere’, as in (2). In informal situations, it sounds more natural to
translate ikhlas using a phrase such as ‘(to) really mean it’. For example, someone
offering a compliment could back it up by saying (3).7
(2) Nampak-nya dia ber-cakap dengan ikhlas.
look-3
3sg intr-talk with
sincere
He seemed to be speaking sincerely (with ikhlas).
(3) Percaya-lah. Saya betul-betul ikhlas.
believe-emph I
really-really sincere
Believe me. I really mean it.
There are, however, many contexts in which English sincere can be used but in
which ikhlas is quite impossible. For example, in English one can speak of sincerely
believing something, sincerely admiring someone, or sincerely wanting something
(see below). None of these uses are possible with Malay ikhlas. Conversely, unlike
sincere, Malay ikhlas is frequently coupled with beri ‘give’, tolong ‘help’, and other
benefactive verbs. For example, to urge someone to accept a gift one could say:
(4) Saya beri dengan ikhlas, terima-lah.
I
give with
sincere receive-emph
I’m giving (it) with ikhlas, accept it.
The former Malaysian P.M., Dr. Mahathir, a well-known commentator on Malay
culture, has said that Malay people have a tendency to suspect a “hidden agenda”
(the proverbial udang di balik batu ‘prawn under the rock’) behind any good
gesture: Tetapi kita orang Melayu terutamanya suka sangat memikir, ‘Kalau dia
memberi kepada saya sesuatu apa tujuan di sebaliknya?’ Kita selalu bertanya’, But
people, especially Malays, tend to harbour thoughts (suspicions). “‘If he gives me
something, what’s the hidden motive behind it?’ we always ask” (Utusan 7/8/96,
6). To say that something is done dengan ikhlas is to repudiate the idea that there
is any hidden, self-interested motive.
In a similar vein, ikhlas is often used about cinta (roughly, ‘romantic love’) and
about domestic relationships. Used about love, ikhlas has about the same emotional “weight” as the English word true in the expression true love. It is a common
word in pop songs; for example, the singer Nurul’s (Sept. 1996) album and hit
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.15 (908-970)
Verbal explication and NSM semantics 
song Ikhlasnya Cintaku ‘My love is so ikhlas’. In saying this, I don’t want to suggest
that ikhlas means the same thing as true as in true love, where, roughly speaking,
what is at issue is the “faithfulness” of the love. Rather, ikhlas is concerned with the
“purity” of the lover’s motives. For example:
(5) Cinta sejati
lahir dari hati yang ikhlas dan niat-nya untuk
love genuine born from heart lig sincere and desire-3 for
kekal selamanya. Bila hati-nya ikhlas kita,
men-cinta-i dia
lasting forever
when heart-3 sincere 1du.incl tr-love-appl 3sg
akan ber-kawan dengan kita
tanpa niat buruk.
will intr-friend with
1du.incl without desire bad
Genuine love is born of a heart which is ikhlas and it seeks after permanence. When someone loves with an ikhlas heart he befriends us without any
bad motives.
So far we have seen ikhlas being used in contexts where, to speak metaphorically,
it indicates that a person acts with a “pure” motive. Ikhlas can also be used to
indicate that something is done “freely”, and not as a result of being under pressure
or coercion. In discussing love and marriage, it is not uncommon to find ikhlas
opposed to terpaksa ‘forced’; for example:
(6) Per-kahwin-an
itu biar-lah ikhlas. I tak setuju kahwin
noml-marry-noml that let-emph sincere I not agree marry
ter-paksa.
inv-force
Marriage should be free (ikhlas). I won’t agree with a forced marriage.
The same usage can be found in politics too. For example, in July 1996 a group of
members from one political party crossed over en bloc to a rival party. In an English
language newspaper, a spokesperson was reported as saying that “he and the other
members of S46 were sincere in declaring themselves as members of Pas. . . ‘We
were not forced or persuaded by any party to join Pas’, he said” (New Straits Times
25/7/96, 4).
With this broad range of use, how can the meaning of ikhlas be explicated? In
particular, is it polysemous or not? Some Malay dictionaries give unitary definitions such as hati suci ‘pure of heart’ and putih hati ‘white hearted’, but figurative
expressions like these do not really make the meaning explicit (plus, they would
not be translatable into some languages). Other dictionary definitions are disjunctive, for example rela atau jujur ‘willing or honest’ (Kamus Harian Federal),
implying polysemy.
I suggest that ikhlas can be explicated as in [A]. This depicts an act (which
could include a speech-act) as being ikhlas if (a) the person wants to do it, (b) because he or she thinks it would be good to do so, and (c) not for any other reason.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.16 (970-1031)
 Cliff Goddard
The first components combine the elements of “voluntariness” and “good intentions”, and the final component rules out any other causal factors being involved
(thus excluding self-interest or coercion).8
[A] X did something dengan ikhlas =
X did something because X wanted to do it
X wanted to do it because X thought like this:
it will be good if I do this
not because of anything else
Turning now to English sincere, English dictionaries tend to concentrate on the
“genuineness” aspect of the meaning. For example, the Little Oxford Dictionary
gives ‘free from pretence or deceit, genuine, frank, not assumed or put on’. This
formulation is satisfactory, as far as it goes, but it does not make explicit the fact
that one speaks of someone saying something sincerely, being sincere, etc., only
in relation to words or actions that can be seen in a positive light. For example,
one can sincerely thank, sincerely apologise, or sincerely praise, but not *sincerely
threaten or *sincerely abuse; similarly, one can sincerely admire or sincerely appreciate someone, but hardly *sincerely despise them; one may smile sincerely, but not
*snarl sincerely.
Although sincere (sincerely, etc.) can be used in a wide variety of contexts, it is
always connected with what we can call “self-expression” on behalf of the speaker.
Obviously this applies in the case of the verb say itself and other speech-act verbs,
and also with expressive actions such as smile and weep. The ‘self-expression factor’ is less obvious in connection with attitude verbs such as admire and appreciate,
verbs of intention such as seek, try, and intend, and with believe (or related nouns
such as belief and conviction). However, when one considers examples such as
the following, it is clear enough that they all imply some verbal expression by
the subject.
(7) a. We sincerely appreciate your efforts.
b. Her admiration for him was sincere and unreserved.
(8) a. We sincerely hope you will take advantage of our offer.
b. Brezhnev sincerely sought peace.
(9) a. We sincerely believe that wisdom will prevail.
b. He sincerely believed that he had a mission from God.
Often, as in the (a) examples, the subject is first-person (I or we), in which case
the sentence amounts to a profession of attitude, intention, or belief by the speaker.
But even with third-person subjects, as in the (b) examples, the term sincerely (sincere, etc.) implies some ‘act of saying’. For instance, it wouldn’t make sense to say
that She sincerely admired Bill Clinton unless she had expressed this admiration to
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.17 (1031-1084)
Verbal explication and NSM semantics 
someone. One reflection of this fact is that such sentences become unacceptable if
adverbs like secretly or privately are inserted into them (e.g., *Secretly, she sincerely
admired Bill Clinton; *Privately, he sincerely believed he had a mission from God).
Since it seems that sincerely has a special affiliation with saying, explication [B]
below sets out its meaning in the frame involving saying something. The first part
alludes to the potential perception that X spoke not from the heart, so to speak,
but because of an expectation that someone else would approve of it (because of
thinking ‘someone will think it is good if I say this’). The phrasing here is compatible with various possible motives, such as to create a good impression or to satisfy
social expectations. The next component repudiates this potential perception. X’s
real reason for speaking is that X was thinking: ‘I want to say what I think, I want
to say what I feel’.
[B] X said it sincerely =
X said something
someone could think that X said it because X thought like this:
someone will think that it is good if I say this
X didn’t say it because of this
X said it because X thought like this:
I want to say what I think, I want to say what I feel
people think that it is good when someone does this
Comparing explications [A] and [B], it should be plain that the resemblance between sincere and Malay ikhlas is rather superficial. As Trilling (1972: 2) says, sincere
“refers primarily to a congruence between avowal and actual feeling”. Ikhlas, in
contrast, is not primarily about one’s true motives and feelings, but about the
goodness of one’s intentions. How could such differences could be brought out
purely in terms of diagrams, without recourse to verbal explication?
Japanese omoiyari vs. English empathy
Travis (1998a) presents an insightful contrastive semantic analysis of the Japanese
cultural key word omoiyari, and its nearest English translation equivalent empathy. (Other glosses found in Japanese-English dictionaries, and in scholarly
commentaries, include ‘kind’, ‘considerate’, ‘thoughtful’, ‘sympathetic’, ‘compassionate’, ‘sensitive’, and ‘caring’.) Travis argues that a full understanding of omoiyari
provides valuable insights into Japanese culture, revealing a great deal about the
Japanese indirect communicative style, the importance of being “in tune” with
others’ unexpressed desires and feelings and the “interdependence” on which
group relations are based in Japan. Drawing on a wide range of intercultural commentaries (e.g., Lebra 1976; Barnlund 1975), she argues that omoiyari represents a
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.18 (1084-1153)
 Cliff Goddard
kind of intuitive understanding of the unexpressed feelings, desires and thoughts
of others, and doing something for those others on the basis of this understanding.
For example, sentence (10) is about the writer’s ex-husband.9 The use of omoiyari implies that the ex-husband did these things for the writer without her communicating to him that she would like him to do them. It implies an understanding
of the writer’s feelings, and also her desires, in terms of what she would need to live
on ‘without any trouble’ in the manner to which she has become accustomed.
(10) Watashi ga hyaku-made ikite-mo
komaranai-yoo
ni,
I
subj hundred-until live:ger-even.if trouble:neg-ensure dat
bantan
totonoete
okuridashite kudasai-mashi-ta.
(Rikon
everything get.ready:ger set.up:ger give.to.me-pol-past divorce
shi-ta-to
iu
koto.) Konna
yasashii, omoiyari no aru
do-past-quot called thing this.much kindness empathy poss be
otto iya, wakare-mashi-ta node
otoko-tte hajimete
desu.
man no separate-pol-past therefore man-quot first.time:ger is:pol
He sent me away (I mean, divorced me), setting up everything for me so that
I wouldn’t be in any trouble if I lived to be 100. I have never known such a
yasashii (‘kind’) husband – no, I mean man, we’ve split up – with omoiyari.
Consider also the following example, where not having omoiyari (omoiyari ga nai)
implies a lack of understanding of others. This comes from the Japanese psychologist Takeo Doi’s (1971: 4) discussion of Japanese and American entertaining styles,
in which he comments on the markedly different treatment guests receive in these
two countries. While in America the guest is given a series of choices about what to
drink and how they would like it served, in Japan the host assesses what the guest
would like and serves it. The host is expected to know the guest’s desires, and to
automatically satisfy them. As for the possibility in the West of inviting guests to
“help themselves”:
(11) ‘Go jibun o tasuke-nasai’ de
wa funare-na
kyaku ni
hon self obj help-imp
well.then top unfamiliar-adj guest dat
taishite
amari ni mo omoiyari no nai kotoba-to
regarding:ger very dat also empathy poss neg word-quot
omowarenai ka.
think:neg ques
To leave a guest unfamiliar with the house to ‘help himself ’ would seem
excessively lacking in omoiyari.
In addition to having an understanding of another person, an essential part of
omoiyari is actually doing something on the basis of this understanding. This has
already been evidenced in the two examples given above, which imply that the
person with omoiyari has done or would do something for another. Similarly, a
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.19 (1153-1216)
Verbal explication and NSM semantics 
person could not be described as having omoiyari for someone who was sick or in
trouble simply on account of “empathising” with that person. It would be necessary to actually go on and do something about it – otherwise, one would be said
simply to kawaisoo ni omou ‘feel sorry for him/her’, rather than to have omoiyari.
Another subtle point about the meaning of omoiyari is that it does not necessarily imply a focus on the other person’s feelings (specifically, wanting the other
person to feel good). The following quote describes the writer’s grandmother as
having omoiyari, and refers specifically to the grandmother’s wish to die at a time
when it was neither too hot nor too cold, so as not to cause her family the meiwaku
‘trouble’ of holding a funeral at an inconvenient time. Clearly there is no implication that the grandmother wanted to die at the right time of year so that others
would feel something good.
(12) Watashi-tachi wa, donna
toki-mo minna no koto o
I-plural
top how.much time-even we.all poss thing obj
kangaete kure-ta
sobo
no koto o sonna
fuu
think:ger give.to.me-past granny poss thing obj that.much manner
ni hanashi, ima-made no yasashi-sa omoiyari o aratamete
dat story
now-until poss kind-ness empathy obj again:ger
kanji-ta no desu.
feel-past prt it.is.so:pol
When saying such things about our grandmother, who always used to think
of us, we all felt even more strongly her yasashii-ness (‘kindness’) and her
omoiyari.
On the basis of this and a good deal of other evidence, Travis (1998a) proposes an
explication for the frame omoiyari ga aru ‘to have omoiyari’. It is presented here in
a modified form.
[C] X has omoiyari =
X often thinks like this about other people:
I can know what this person feels
I can know what this person wants
this person doesn’t have to say anything about it to me
I can do some good things for this person because of this
I want to do these things
X does some things because of this
The explication is framed in terms of how X ‘often thinks about other people’ to
capture the notion that it refers to a permanent characteristic of someone’s personality, as opposed to something that manifests itself in a one-off incident, and that
it is not uniquely directed towards any particular other person or group of people. Subsequent components represent the notion of having an understanding of
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.20 (1216-1265)
 Cliff Goddard
another person, or of being “in tune” with them, without verbal communication.
The penultimate pair of components reflect the subject’s (imputed) belief that they
can do things which would be to the other person’s benefit and that they wish to
do so. The final component reflects the fact that omoiyari implies that this attitude
translates into practice in some way, i.e., that the subject actually does some things
(without specifying whether or not they are in fact beneficial to the other person
in a particular situation).
As Travis says, an explication like this is not particularly lengthy or complicated. Presented in this way, omoiyari is relatively easy to understand, as opposed
to the very complex concept it seems to be when explained via a list of apparently
close English equivalents (such as empathetic, caring, sensitive, thoughtful, considerate, and so on), each of which is both somewhat similar and somewhat different
to omoiyari. To underscore this point, Travis presents a parallel analysis of the English concept of empathy. This word is not, of course, a particularly salient concept
in Anglo society, but it is frequently employed in the cultural literature on Japan
as a gloss for omoiyari. Essentially, Travis’ explication (slightly modified) presents
empathy as a capacity to appreciate someone else’s bad feelings, based on being
able to imagine how it would feel to be in the same situation.
[D] X has empathy =
X can think like this about someone else:
I know that something bad happened to this person
I know that this person feels something bad now because of this
I can know how this person feels
because when I think about it, I can know how I would feel
if something like this happened to me
There are at least three differences between omoiyari and empathy. First, empathy is focused specifically on bad feelings; one cannot empathise with someone
who is feeling good (for example, if someone announces that they have won a trip
around the world, the response I really empathise with you is quite inappropriate
unless intended sarcastically). Second, the kind of understanding evident in empathy is not based on intuition, but on imagining oneself in the same situation as
another person, “putting oneself in their shoes”. Third, empathy does not imply
that one actually does anything for that person on the basis of one’s understanding, as omoiyari does. The existence of these various differences does not diminish
the fact that omoiyari and empathy both imply some kind of understanding of
others, but equally the importance of the shared component should not be overstated. As Travis (1998a) says, the most illuminating perspective is to be found not
in the recognition that the meanings are somehow similar, but by ascertaining exactly where the meanings coincide and where they vary. How this exploration of
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.21 (1265-1323)
Verbal explication and NSM semantics 
cross-cultural semantics could possibly be conducted without verbal explication is
entirely unclear.
. Concluding remarks
In Section 2, I argued on theoretical and methodological grounds for the indispensability of verbal explication. In particular I argued that diagrams cannot
stand alone without verbal support, and, moreover, that cognitive linguistic diagrams often rely on complex culture-specific iconographic conventions, which are
“smuggled in” without the necessary acknowledgment or explanation. In Section
3, I sought to show that by taking a fine-grained approach to verbal explication
one can deal with subtle nuances of culture-rich vocabulary with a degree of success which would be unattainable by other means. In this concluding section I will
take the argument one step further, by briefly considering the status of the NSM
approach to verbal explication as compared with other possible approaches. Before that, however, I want to reiterate my view that the indispensability of verbal
explication does not mean that one can or should give up diagrammatic representation. On the contrary, diagrammatic (schematic, figural) representation can
have a valuable role to play in depicting meaningful aspects of language which are
not symbolic or conceptual in nature, such as iconic-indexical effects, psycholinguistic processes, and experiential image schemas;10 see Goddard (2002b). I would
still insist, however, that even when used for these purposes diagrammatic representation cannot stand alone. It always requires some semiotic support in the form
of verbal explanation.
Now as Dirk Geeraerts (p.c. 2001) has pointed out to me, even if the arguments advanced in the main body of this paper convincingly establish the need for
verbal explication (paraphrase, definitional analysis), they are largely neutral with
respect to the choice of natural semantic metalanguage as the descriptive language:
“The argument against a purely diagrammatic form of representation supports
any form of propositional representation, whether it is couched in the NSM language, a featural representation, dictionary-like entries, or even formal semantics”.
I will therefore briefly mention several arguments in favour of the NSM approach
in comparison with these alternatives.
The most basic point is that cognitive linguistics cannot continue to approach
verbal explication in a casual manner, disregarding theoretical and empirical
grounding. If one compares dictionary-style entries and feature-based representations, on the one hand, with the NSM approach, on the other, then in my view
one sees a very sharp contrast. The NSM approach has a well developed theoretical
basis and a large body of empirical support from cross-linguistic studies conducted
over several decades (see Wierzbicka 1996; Goddard & Wierzbicka Eds. 2002 and
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.22 (1323-1372)
 Cliff Goddard
the works listed in Appendix 2), while the alternatives do not. As for formal semantics, it has a well-developed theory and methodology but its range of application
is very narrow, and its theoretical commitments are orthogonal to those of cognitive linguistics. In any case, I would like to see cognitive linguists who reject the
notion of paraphrase within a constrained metalanguage take up the challenge of
developing a systematic alternative method of verbal explication – be it structuralist feature-style representations, or conventional dictionary-style explanations, or
even some modified form of formal semantics. Better to see a range of positions
being debated and discussed, than to have continued theoretical indifference to a
fundamental issue.
Unlike the abstract and technical categories of feature-style analysis and formal semantics, semantic primes are demonstrably present as word-meanings in
basic vocabulary. They are grounded in the everyday linguistic experience of language users – apparently in all languages. On account of their definitional simplicity, they provide for a maximally fine-grained, explicit and transparent depiction
of conceptual meanings. Being framed in natural language (albeit a standardised
and constrained subset of natural language), NSM explications can be substituted
directly or indirectly into contexts of use and their accuracy assessed against the
evidence of usage and against native speaker intuitions.11
Finally and importantly, the natural semantic metalanguage approach is committed to avoiding the terminological ethnocentrism that arises when the vocabulary of English, and other European languages, is uncritically used as a descriptive
vocabulary for semantics. It seems obvious that to represent the concepts of widely
divergent languages and cultures in English-specific terms is necessarily to distort
the linguistic conceptualisations inherent in those languages – unless, that is, one
assumes that English and other European languages are specially “gifted” as tools
of conceptual representation. Is this indeed the assumption of those who steadfastly ignore the non-translatability of their descriptive metalanguage?12 Or are we
to assume that ethnocentrism is inevitable in cognitive linguistics (perhaps in the
interpretive sciences generally) and that one must simply “grin and bear it”? Even
so, one would like to see this position explained and defended, and to ask how the
concomitant analytical relativism can be minimised or counteracted.
The NSM approach provides cognitive linguistics with a well theorised, empirically grounded, and non-ethnocentric methodology for verbal explication. To
what extent “mainstream” cognitive linguists will choose to adopt and implement
the approach in their own work remains to be seen. I have the impression that
interest in NSM work is growing, especially among the new generation of younger
cognitive linguists. In any case, I hope to have shown in this chapter that the cognitive linguistics “mainsteam” can benefit from considering the principles behind
the NSM approach.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.23 (1372-1434)
Verbal explication and NSM semantics
Notes
* For helpful comments on earlier versions of this paper I would like to thank Nick Enfield,
Catherine Travis, June Luchjenbroers, Dirk Geeraerts, and an anonymous reviewer.
All correspondence concerning this article should be addressed to Cliff Goddard, School of Languages, Cultures and Linguistics, University of New England, Armidale N.S.W. 2351, Australia.
. A good deal of Geeraerts’ (1999) critique is directed against what he sees as Wierzbicka’s
excessive reliance on semantic intuition and her “outspoken idealistic commitments”. Against
this he counterposes a so-called “empiricist” approach which proceeds “not by denying the
importance of intuition, to be sure, but by supplementing and supporting any reliance on
introspection with corpus analysis or experimentation” (1999: 170). In recent years, however,
Wierzbicka has been relying increasingly on corpus-based evidence, e.g., Wierzbicka (2002b, in
press). Geeraerts also accuses NSM researchers of a “monosemous bias”, but this charge appears
ill-conceived when one considers the meticulous documentation of grammatical and lexical polysemy undertaken in NSM studies such as Wierzbicka (1988, 1998), Goddard (2000, 2003),
among others.
. Exponents of semantic primes in different languages are of course not expected to be equivalent in every respect. While their primary (simplest) senses can be matched across languages,
their secondary, polysemic meanings may differ widely. For example, English feel and Malay
rasa have the same primary sense, but English feel has a secondary meaning related to ‘touching’
which is not shared by the Malay word, while Malay rasa has a secondary meaning ‘taste’ which is
not shared by English feel (Goddard 2002a). It should also be pointed out that the term ‘lexical’
is used in a broad sense to include not only words, but also bound morphemes and phrasemes.
Even when exponents of semantic primes take the form of single words, there is no need for them
to be morphologically simple, and they can also have variant forms (allolexes or allomorphs).
All these factors mean that testing the cross-linguistic viability of the proposed lexical primes
is no straightforward matter. It requires rich and reliable data, and careful language-internal
analysis of polysemy and allolexy (cf. Goddard & Wierzbicka Eds., 2002).
. That is, I am not falling into the “standard trap”, as characterised by Johnson (1987: 4), of
saying that “since we are bound to talk about preconceptual and non-propositional aspects of
experience always in propositional terms, it must follow that they are themselves propositional
in nature”.
. A defender of the “retribution scenario” could perhaps argue that the wording is unimportant
because the scenario is not intended to represent a propositional meaning, but this “catch all”
move would considerably weaken the verifiability of the scenario. From the point of view of
representing the conceptual reality of a young child, it would surely be preferable to re-formulate
the scenario into simpler terms which are familiar to young children. For NSM studies of early
conceptual and lexical acquisition, see Goddard (2001c) and Tien (forthcoming).
. Lakoff (1993) has argued that, despite appearances, conceptual metaphors are not propositional but are merely shorthand for a set of preconceptual correspondences or mappings.
This would make statements of conceptual metaphor “quasi-propositional”, rather than literally
propositional in nature.
. An interesting consequence of generalising conceptual metaphors is that it partially undermines the original claim of Lakoff and Johnson (1980) that abstract concepts are understood in
terms of concrete, experience-near concepts. As noted by Grady (1997: 273), a metaphor such

JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.24 (1434-1519)
 Cliff Goddard
as ORGANISATION IS STRUCTURE is framed in terms which are too general to represent
experiential domains.
. Interlinear gloss symbols are as follows. 1du.incl first-person dual inclusive, 3: 3rd person,
3sg: third-person singular, adj: adjectiviser, appl: applicative suffix, dat: dative marker, emph:
emphatic, ger: gerundive, hon: honorific, imp: imperative, intr: intransitive prefix, inv: involuntary prefix, lig: ligature, neg: negative, noml: nominaliser, obj: object marker, past: past
tense, pol: politeness marker, poss: possessive, prt: particle, ques: question particle, quot:
quotative, subj: subject marker, top: topic marker, tr: transitive prefix.
. Although the NSM approach assumes that words have stable meanings which can be captured
in paraphrase explications, such as [A], this does not mean that the approach is committed to
a narrow invariance hypothesis. It is recognised that in natural discourse the interpretation of
words in context is always influenced by linguistic context and by situational and cultural factors.
Important aspects of such interpretative processes can be modelled using the theory of cultural
scripts (cf. Goddard & Wierzbicka Eds., 2004).
. Interlinear glosses have been added to the following three Japanese examples from Travis
(1998a).
. Since this point is sometimes misunderstood, it is worth stating explicitly that upholding the
irreducibility of symbolic (conceptual) meaning in no way commits one to denying the existence
of experiential schemas. One may very well accept that embodied, preconceptual experiential
schemas underlie, constrain, and support the emergence of conceptual meaning without accepting that conceptual meaning is reducible to experiential schemas. Johnson (1987: 5) expresses a
similar view, though from the opposite perspective, so to speak: “I am perfectly happy with talk
of the conceptual/propositional content of an utterance, but only insofar as we are aware that
this propositional content is possible only by virtue of a complex web of non-propositional schematic
structures that emerge from our bodily experience” (italics in original).
. Needless to say, these merits do not safeguard the analyst against all error. No doubt a good
deal of the extant NSM work could be revised and improved, as shown in fact by successive
revisions and improvements undertaken by NSM scholars themselves, in various domains.
. I do not wish to imply that NSM researchers are the only cognitive linguists who are concerned with these issues. In particular one thinks of anthropologically oriented researchers, such
as Palmer (2003), who has argued that indigenous conceptualisations are best revealed by the
language-internal explanations and commentaries of native speakers.
References
Amberber, Mengistu (in press). Semantic primes and their grammar in Amharic. In C. Goddard
(Ed.), Crosslinguistic Semantics. Amsterdam: John Benjamins.
Amberber, Mengistu (2003). The grammatical encoding of thinking in Amharic. Cognitive
Linguistics, 14(2/3), 195–220.
Amberber, Mengistu (2001). Testing emotional universals in Amharic. In J. Harkins & A.
Wierzbicka (Eds.), Emotions in Crosslinguistic Perspective (pp. 35–67). Berlin: Mouton de
Gruyter.
Ameka, Felix (2002). Cultural scripting of body parts for emotions: On ‘jealousy’ and related
emotions in Ewe. Pragmatics & Cognition, 10(1), 1–25.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.25 (1519-1639)
Verbal explication and NSM semantics 
Ameka, Felix (1996). Body parts in Ewe. In H. Chappell & W. McGregor (Eds.), The Grammar
of Inalienability (pp. 783–840). Berlin: Mouton de Gruyter.
Ameka, Felix (1994). Ewe. In C. Goddard & A. Wierzbicka (Eds.), Semantic and Lexical
Universals – Theory and Empirical Findings (pp. 57–86). Amsterdam: John Benjamins.
Athanasiadou, A. & E. Tabakowska (Eds.). (1998). Speaking of Emotions: Conceptualization and
expression. Berlin: Mouton de Gruyter.
Bardon, Geoff (1979). Aboriginal Art of the Western Desert. Adelaide: Rigby.
Barnlund, Dean (1975). Public and Private Self in Japan and the United States: Communicative
styles of two cultures. Tokyo: Simul.
Bugenhagen, Robert D. (2002). The syntax of semantic primes in Mangaaba-Mbula. In C.
Goddard & A. Wierzbicka (Eds.), Meaning and Universal Grammar – Theory and Empirical
Findings. Volume II (pp. 1–64). Amsterdam: John Benjamins.
Bugenhagen, Robert D. (2001). Emotions and the nature of persons in Mbula. In J. Harkins &
A. Wierzbicka (Eds.), Emotions in Crosslinguistic Perspective (pp. 73–118). Berlin: Mouton
de Gruyter.
Bugenhagen, Robert D. (1994). The exponents of semantic primitives in Mangap-Mbula. In C.
Goddard & A. Wierzbicka (Eds.), Semantic and Lexical Universals – Theory and Empirical
Findings (pp. 87–108). Amsterdam: John Benjamins.
Chappell, Hilary (2002). The universal syntax of semantic primes in Mandarin Chinese. In C.
Goddard & A. Wierzbicka (Eds.), Meaning and Universal Grammar – Theory and Empirical
Findings. Volume I (pp. 243–322) Amsterdam: John Benjamins.
Chappell, Hilary (1994). Mandarin semantic primitives. In C. Goddard & A. Wierzbicka
(Eds.), Semantic and Lexical Universals – Theory and Empirical Findings (pp. 109–148).
Amsterdam: John Benjamins.
Chappell, Hilary (1986). The passive of bodily effect in Chinese. Studies in Language, 10, 271–
296.
Cienki, Alan (1998). straight: An image schema and its metaphorical extensions. Cognitive
Linguistics, 9, 107–149.
Doi, Takeo (1971). Amae no Koozoo [The Anatomy of Dependence]. Tokyo: Koobundoo.
Enfield, N. J. (2002). Combinatoric properties of Natural Semantic Metalanguage expressions
in Lao. In C. Goddard & A. Wierzbicka (Eds.), Meaning and Universal Grammar – Theory
and Empirical Findings. Volume II (pp. 145–256). Amsterdam: John Benjamins.
Enfield, N. J. (1999). On the indispensability of semantics: Defining the ‘vacuous’. In J. Mey
& A. Boguslawski (Eds.), ‘E Pluribus Una’. The One in the Many. Special Issue of RASK,
International Journal of Language and Communication, 9(10), 285–304.
Geeraerts, Dirk (1999). Idealistic and empiricist tendencies in cognitive semantics. In T. Janssen
& G. Redeker (Eds.), Cognitive Linguistics: Foundations, scope and methodology (pp. 163–
194). Berlin/New York: Mouton de Gruyter.
Goddard, Cliff (2003). Dynamic ter- in Malay (Bahasa Melayu): A study in grammatical
polysemy. Studies in Language, 27(2), 287–322.
Goddard, Cliff (2002a). Semantic primes and universal grammar in Malay (Bahasa Melayu).
In C. Goddard & A. Wierzbicka (Eds.), Meaning and Universal Grammar – Theory and
Empirical Findings, Volume I (pp. 87–172). Amsterdam: John Benjamins.
Goddard, Cliff (2002b). Ethnosyntax, ethnopragmatics, sign-functions, and culture. In N. J.
Enfield (Ed.), Ethnosyntax. Explorations in Grammar and Culture (pp. 52–73). Oxford:
Oxford University Press.
Goddard, Cliff (2001a). Sabar, ikhlas, setia – patient, sincere, loyal? A contrastive semantic study
of some “virtues” in Malay and English. Journal of Pragmatics, 33, 653–681.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.26 (1639-1770)
 Cliff Goddard
Goddard, Cliff (2001b). Hati: A key word in the Malay vocabulary of emotion. In J. Harkins &
A. Wierzbicka (Eds.), Emotions in Crosslinguistic Perspective (pp. 167–195). Berlin: Mouton
de Gruyter.
Goddard, Cliff (2001c). Conceptual primes in early language development. In M. Pütz,
S. Niemeier, & R. Dirven (Eds.), Applied Cognitive Linguistics I: Theory and language
acquisition (pp. 193–227). Berlin/New York: Mouton de Gruyter.
Goddard, Cliff (2000). Polysemy: A problem of definition. In Y. Ravin & C. Leacock (Eds.),
Polysemy and Ambiguity: Theoretical and applied approaches (pp. 129–151). New York:
Oxford University Press.
Goddard, Cliff (1998a). Semantic Analysis: A practical introduction. Oxford: Oxford U. Press.
Goddard, Cliff (1998b). Bad arguments against semantic primitives. Theoretical Linguistics,
24(2/3), 129–156.
Goddard, Cliff (1996a). Cross-linguistic research on metaphor. Language & Communication,
16(2), 145–151.
Goddard, Cliff (1996b). The “social emotions” of Malay (Bahasa Melayu). Ethos, 24(3), 426–464.
Goddard, Cliff (1994). Lexical primitives in Yankunytjatjara. In C. Goddard & A. Wierzbicka
(Eds.), Semantic and Lexical Universals – Theory and Empirical Findings (pp. 229–262).
Amsterdam: John Benjamins.
Goddard, Cliff (1992). Traditional Yankunytjatjara ways of speaking – A semantic perspective.
Australian Journal of Linguistics, 12, 93–122.
Goddard, Cliff (1991a). Testing the translatability of semantic primitives into an Australian
Aboriginal language. Anthropological Linguistics, 33(1), 31–56.
Goddard, Cliff (1991b). Anger in the Western Desert – A case study in cross-cultural semantics
of emotion. Man, 26(2), 265–279.
Goddard, Cliff (1990). The lexical semantics of ‘good feelings’ in Yankunytjatjara. Australian
Journal of Linguistics, 10(2), 257–292.
Goddard, Cliff (Ed.). (1997). Studies in the Syntax of Universal Semantic Primitives. Special issue
of Language Sciences, 19(3).
Goddard, Cliff & Anna Wierzbicka (Eds.). (2004). Cultural Scripts. Special issue of Intercultural
Pragmatics, 1(2).
Goddard, Cliff & Anna Wierzbicka (Eds.). (2002). Meaning and Universal Grammar – Theory
and Empirical Findings. Volumes I and II. Amsterdam: John Benjamins.
Goddard, Cliff & Anna Wierzbicka (Eds.). (1994). Semantic and Lexical Universals – Theory and
Empirical Findings. Amsterdam: John Benjamins.
Grady, Joseph E. (1997). Theories are buildings revisited. Cognitive Linguistics, 8(4), 267–290.
Harkins, Jean (2001). Talking about anger in Central Australia. In J. Harkins & A. Wierzbicka
(Eds.), Emotions in Crosslinguistic Perspective (pp. 197–215). Berlin: Mouton de Gruyter.
Harkins, Jean & David P. Wilkins (1994). Mparntwe Arrernte and the search for lexical
universals. In C. Goddard & A. Wierzbicka (Eds.), Semantic and Lexical Universals – Theory
and Empirical Findings (pp. 285–310). Amsterdam: John Benjamins.
Hasada, Rie (2001). Explicating the meaning of sound-symbolic Japanese emotion terms. In
J. Harkins & A. Wierzbicka (Eds.), Emotions in Crosslinguistic Perspective (pp. 217–253).
Berlin: Mouton de Gruyter.
Hasada, Rie (1998). Sound symbolic emotion words in Japanese. In A. Athanasiadou & E.
Tabakowska (Eds.), Speaking of Emotions: Conceptualization and expression (pp. 83–98).
Berlin: Mouton de Gruyter.
Hawkins, Bruce (1984). The semantics of English spatial prepositions. PhD thesis. University of
California.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.27 (1770-1897)
Verbal explication and NSM semantics 
Johnson, Mark (1987). The Body in the Mind. Chicago: Chicago University Press.
Junker, Marie-Odile (in press a). Semantic primes and their grammar in a polysynthetic
language: East Cree. In C. Goddard (Ed.), Crosslinguistic Semantics. Amsterdam: John
Benjamins.
Junker, Marie-Odile (in press b). Are there emotional universals? Evidence from the native
American language East Cree. Culture & Psychology.
Junker, Marie-Odile (2003). A native American view of the “mind” as seen in the lexicon of
cognition in East Cree. Cognitive Linguistics, 14(2/3), 167–194.
Kamus Harian Federal: Bahasa Malaysia – Inggeris – Bahasa Malaysia (1995). Mohd Salleh Daud
(Ed.). Kuala Lumpur: Federal Publications.
Kornacki, Pawel (2001). Concepts of anger in Chinese. In J. Harkins & A. Wierzbicka (Eds.),
Emotions in Crosslinguistic Perspective (pp. 255–289). Berlin: Mouton de Gruyter.
Kornacki, Pawel (1995). Aspects of Chinese cultural psychology as reflected in the Chinese
lexicon. PhD Thesis. Australian National University.
Kövecses, Zoltán (1995). American friendship and the scope of metaphor. Cognitive Linguistics,
6(4), 315–346.
Lakoff, George (1993). The contemporary theory of metaphor. In A. Ortony (Ed.), Metaphor
and Thought (pp. 202–251). Cambridge: Cambridge University Press.
Lakoff, George (1990). The Invariance Hypothesis: Is abstract reason based on image-schemas?
Cognitive Linguistics, 1(1), 39–74.
Lakoff, George (1987). Women, Fire and Dangerous Things. Chicago: Chicago Universtiy Press.
Lakoff, George & Mark Johnson (1980). Metaphors We Live By. Chicago: The University of
Chicago Press.
Lakoff, George & Zoltán Kövecses (1987). The cognitive model of anger inherent in American
English. In D. Holland & N. Quinn (Eds.), Cultural Models in Language and Thought (pp.
195–221). Cambridge: Cambridge University Press.
Langacker, Ronald W. (1999). A study in unified diversity: English and Mixtec locatives. In J.
Mey & A. Boguslawski (Eds.), ‘E Pluribus Una’. The One in the Many (pp. 215–256). Odense:
Odense University Press.
Langacker, Ronald W. (1990). Concept, Image, and Symbol. The Cognitive Basis of Grammar.
Berlin: Mouton de Gruyter.
Lebra, Takie Sugiyama (1976). Japanese Patterns of Behavior. Honolulu: The University Press of
Hawaii.
McCawley, James D. (1983). Review of Anna Wierzbicka’s Lingua Mentalis: The semantics of
natural language. Language, 59(3), 654–659.
Mühlhäusler, Peter (1995). Metaphors others live by. Language and Communication, 15(3), 281–
288.
Munn, Nancy D. (1973). Walbiri Iconography. Ithaca: Cornell University Press.
Niemeier, Susanne (1997). Introduction. In S. Niemeier & R. Dirven (Eds.), The Language of
Emotions. Amsterdam: John Benjamins.
Onishi, Masayuki (1997). The grammar of mental predicates in Japanese. Language Sciences,
19(3), 219–233.
Onishi, Masayuki (1994). Semantic primitives in Japanese. In C. Goddard & A. Wierzbicka
(Eds.), Semantic and Lexical Universals – Theory and Empirical Findings (pp. 361–386).
Amsterdam: John Benjamins.
Palmer, Gary (2003). Talking about thinking in Tagalog. Cognitive Linguistics, 14(2/3), 251–280.
Peeters, Bert (2002). Métalangue sémantique naturelle au service de l’étude du transculturel.
Travaux de linguistique, 45, 83–101.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.28 (1897-2030)
 Cliff Goddard
Peeters, Bert (2000). “S’Engager” vs. “To Show Restraint”: Linguistic and cultural relativity in
discourse management. In S. Niemeier & R. Dirven (Eds.), Evidence for Linguistic Relativity
(pp. 193–222). Amsterdam: John Benjamins.
Peeters, Bert (1997a). Using the natural semantic metalanguage in the French classroom. Paper
delivered at Fifth International Cognitive Linguistics Conference, Amsterdam.
Peeters, Bert (1997b). The syntax of time and space primitives in French. Language Sciences,
19(3), 235–244.
Peeters, Bert (1994). Semantic and lexical universals in French. In C. Goddard & A. Wierzbicka
25 (Eds.), Semantic and Lexical Universals – Theory and Empirical Findings (pp. 423–444).
Amsterdam: John Benjamins.
Stanwood, Ryo E. (1999). On the Adequacy of Hawai’i Creole English. PhD dissertation.
University of Hawai’i.
Stanwood, Ryo E. (1997). The primitive syntax of mental predicates in Hawai‘i Creole English:
A text-based study. Language Sciences, 19(3), 209–217.
Talmy, Leonard (1988). Force dynamics in language and cognition. Cognitive Science, 12, 49–100.
Tien, Adrian (2005). The Semantics of Children’s Mandarin Chinese: The first four years. PhD
thesis. University of New England.
Travis, Catherine (2003). The semantics of the Spanish subjunctive. Its use in the natural
semantic metalanguage. Cognitive Linguistics, 14(1), 47–69.
Travis, Catherine (2002). La Metalengua Semántica Natural: The Natural Semantic
Metalanguage of Spanish. In C. Goddard & A. Wierzbicka (Eds.), Meaning and Universal
Grammar – Theory and Empirical Findings. Volume I (pp. 173–242). Amsterdam: John
Benjamins.
Travis, Catherine (1998a). Omoiyari as a core Japanese value: Japanese-style empathy? In
A. Athanasiadou & E. Tabakowska (Eds.), Speaking of Emotions: Conceptualization and
expression (pp. 83–103). Berlin: Mouton de Gruyter.
Travis, Catherine (1998b). Bueno: A Spanish interactive discourse marker. BLS, 24, 268–279.
Trilling, Lionel (1972). Sincerity and authenticity. London: Oxford University Press.
Wierzbicka, Anna (in press). English Meaning & Culture. New York: Oxford University Press.
Wierzbicka, Anna (2002a). Semantic primes and universal grammar in Polish. In C. Goddard
& A. Wierzbicka (Eds.), Meaning and Universal Grammar – Theory and Empirical Findings.
Volume II (pp. 65–144). Amsterdam: John Benjamins.
Wierzbicka, Anna (2002b). Right and wrong: From philosophy to everyday discourse. Discourse
Studies, 4, 225–252.
Wierzbicka, Anna (1999). Emotions Across Languages and Cultures: Diversity and universals.
Cambridge: Cambridge University Press.
Wierzbicka, Anna (1998). The semantics of English causative constructions in a universaltypological perspective. In M. Tomasello (Ed.), The New Psychology of Language (pp. 113–
153). Mahwah, NJ: Lawrence Elbaum.
Wierzbicka, Anna (1997). Understanding Cultures Through Their Key Words. Oxford: Oxford
University Press.
Wierzbicka, Anna (1996). Semantics, Primes and Universals. Oxford: Oxford University Press.
Wierzbicka, Anna (1992). Semantics, Culture and Cognition. Oxford: Oxford University Press.
Wierzbicka, Anna (1988). The Semantics of Grammar. Amsterdam: John Benjamins.
Wierzbicka, Anna (1972). Semantic Primitives. Translated by Anna Wierzbicka and John
Besemeres. Frankfurt/M.: Athenäum Verlag.
Wilkins, David P. (1986). Particles/clitics for criticism and complaint in Mparntwe Arrernte
(Aranda). Journal of Pragmatics, 10(5), 575–596.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.29 (2030-2177)
Verbal explication and NSM semantics 
Wilkins, David P. (2000). Ants, ancestors and medicine: A semantic and pragmatic account
of classifier constructions in Arrernte (Central Australia). In G. Senft (Ed.), Systems of
Nominal Classification (pp. 147–216). Cambridge: Cambridge University Press.
Ye, Zhengdao (2004). The Chinese folk model of facial expressions: A linguistic perspective.
Culture & Psychology, 10(2), 195–222.
Ye, Zhengdao (2002). Different modes of describing emotions in Chinese: Bodily changes,
sensations and bodily images. Pragmatics and Cognition, 10(1/2), 321–356.
Ye, Zhengdao (2001). An inquiry into “sadness” in Chinese. In J. Harkins & A. Wierzbicka (Eds.),
Emotions in Crosslinguistic Perspective (pp. 359–404). Berlin: Mouton de Gruyter.
Yoon, Kyung-Joo (2004). Korean maum vs. English heart and mind: Contrastive semantics of
cultural concepts. In C. Moskosky (Ed.), Proceedings of the 2003 Conference of the Australian
Linguistics Society. [www.newcastle.edu.au/school/lang-media/news/als2003/proceedings.
html]
Yoon, Kyung-Joo (2003). Constructing a Korean Natural Semantic Metalanguage. PhD thesis.
The Australian National University.
Appendix 1
Semantic primes – English exponents (after Goddard & Wierzbicka Eds.,
2002)
Substantives:
Determiners:
Quantifiers:
Descriptors:
Evaluators:
Intensifier:
Mental predicates:
Speech:
Events and actions:
Existence and possession:
Life and death:
Time:
Space:
Logical concepts:
Augmentor:
Taxonomy, partonomy:
Similarity:
i, you, someone, something/thing, people, body
this, the same, other/else
one, two, all, much/many, some
big, small
good, bad
very
want, feel, think, know, see, hear
say, word, true
do, happen, move
there is, have
live, die
when/time, now, after, before, a long time, a short time, for
some time, moment
where/place, here, above, below, side, near, far, inside,
touching, be (somewhere)
not, maybe, if, can, because
more
kind of, part of
like
Notes: • primes exist as the meanings of lexical units (not at the level of lexemes) • exponents
of primes may be words, bound morphemes, or phrasemes • they can be formally, i.e. morphologically, complex • they can have different morphosyntactic properties, including word-class,
in different languages • they can have combinatorial variants (allolexes) • each prime has
well-specified syntactic (combinatorial) properties.
JB[v.20020404] Prn:20/03/2006; 15:53
F: HCP1509.tex / p.30 (2177-2184)
 Cliff Goddard
Appendix 2
Selected NSM studies of languages other than English
Language
Korean
Lao (Tai)
Mangaaba-Mbula
(Austronesian)
Malay (Austronesian)
Mandarin Chinese
(Sinitic)
Polish (Indo-European)
Spanish (Indo-European)
Hawaii Creole English
Primes and syntax
Descriptive semantic studies
Comprehensive studies
Yoon (2003)
Yoon (2004)
Enfield (2002)
Enfield (1999)
Bugenhagen (1994, 2002) Bugenhagen (2001)
Goddard (2002a)
Chappell (1994, 2002)
Kornacki (1995, 2001)
Wierzbicka (2002a)
Travis (2002)
Stanwood (1997, 1999)
Goddard (1996b, 1997, 2001a, b)
Chappell (1986), Ye (2001, 2002, 2004)
Wierzbicka (1997)
Travis (1998b, 2003)
Partial studies
Amharic (Ethiosemitic)
Arrernte (Pama-Nyungan)
Cree (Algonquian)
Ewe (Niger-Congo)
French (Indo-European)
Japanese
Yankunytjatjara
(Pama-Nyungan)
Harkins/Wilkins (1994)
Junker (in press a)
Ameka (1994)
Peeters (1994, 1997b)
Onishi (1994, 1997)
Goddard (1991a, 1994)
Amberber (2001, 2003, in press)
Harkins (2001), Wilkins (1986, 2000)
Junker (2003, in press b)
Ameka (1996, 2002)
Peeters (2000, 2002)
Hasada (1998, 2001), Travis (1998a)
Goddard (1990, 1991b, 1992)
For a more comprehensive listing, consult the NSM Homepage:
www.une.edu.au/arts/LCL/disciplines/linguistics/nsmpage.htm
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.1 (47-121)
chapter 
“How do you know she’s a woman?”
Features, prototypes and category stress in Turkish
kadin and kiz
Robin Turner
Bilkent University, Turkey
This paper examines Turkish words for girls and women in order to investigate
the relationship between categorization, culture and personal interaction. In
doing so, it serves as a test case for a model which attempts to integrate prototype
and featured-based categorisation based on a distinction between defining and
typical features, both of which are subdivided into strong and weak features (the
latter being more heavily dependent on context). I also consider a phenomenon I
term category stress resulting from situations where there is a conflict between
feature-based and prototype categorisation.
Keywords: Turkish, categorization, prototype, stress
.
Introduction
It has become a common-place observation that different cultures categorise phenomena in different ways. We “cut nature up, organize it into concepts, and ascribe
significances as we do, largely because we are parties to an agreement to organize
it in this way” (Whorf 1956: 214). If we can avoid the armchair cultural linguistics
of the “Eskimos have twenty words for snow” variety, a comparison of categories
across cultures can reveal much, not only about the cultures involved, but also
about the nature of categorisation itself.1 This is perhaps clearest where ‘natural
kinds’ and ‘functional kinds’ (Lehrer 1990: 372) overlap. It may not come as a surprise that some languages draw no distinction between a turtle and a tortoise,
or a solicitor and a barrister, but when the boundaries of such basic concepts as
‘man’ or ‘woman’ are drawn differently, we might expect this to be indicative of a
difference in attitudes towards these concepts which is not merely linguistic.
The categories woman and girl provide a good example of the interaction
between ‘natural’ and ‘functional’ kinds. Humanness (as distinct from humanity)
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.2 (121-175)
 Robin Turner
and femaleness (as distinct from femininity) can be regarded as the result of natural discontinuities. Apart from a few extremely fuzzy cases, one requires no specific
cultural apparatus to perceive what is or is not human or female. In contrast, the
other major element in the categorisation of these terms in English is adulthood,
which is not only culturally specific but also context-specific: someone may be referred to as a ‘girl’ in one context and a ‘woman’ in another, and choosing the
appropriate term requires considerable sociolinguistic competence.
Of course, there is much more to the category, woman, than the supposedly
simple features of [+human][+female] and [+adult]. However, as I shall argue
later, there is some value in adopting such a traditional semantic analysis alongside
the more current prototype-based view. In deciding whether a particular human
female should be classed as a woman or a girl, simply looking for the presence
or absence of the feature [adult] is obviously simplistic; nevertheless, we can assume that, in English, our idea of adulthood, and the extent to which it applies
to a particular person in a particular context, is the most important factor in the
equation.
In other languages and cultures, though, adulthood may not be the most significant factor involved. The Turkish terms, kiz and kadın approximate to ‘girl’
and ‘woman’ respectively, but to refer to an unmarried woman as a kadın would
be a serious faux pas, since the most important factor in distinguishing between
kiz and kadın is not age but sexual experience; all things being equal, a kiz is a
virgin and a kadın is not.
It is important to bear in mind that there is an asymmetry between terms for
men and women here. Not only is sexual experience not important in the transition from oğlan (‘boy’) to erkek (‘man’), in fact the former term is rare: when it is
necessary to specify a male child, the term erkek çocuğu (‘man child’) is more common. Erkek seems to have only one defining feature, [+male], since it is commonly
used for male animals as well (e.g. erkek köpeği – ‘dog’ as opposed to ‘bitch’).
In traditional semantic terms, this cultural difference in categorisation can be
explained quite simply: in both English and Turkish a woman is [+human] [+female], but the languages differ in ascribing the feature [+adult] in one case, and
[–virgin] in the other. This approach is, however, inadequate in explaining examples such as “I’m going out with the girls”, which in English may be uttered by a
seventy-year-old, or in Turkish where kız can be used to greet any female friend,
irrespective of age.
(1) N’aber,
kız?
what news girl?
?“How’s things, girl?”
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.3 (175-230)
How do you know she’s a woman? 
Furthermore, there are terms such as ‘International Women’s Day’ or ‘women’s
sports’, which in both languages refer to all female humans, not just adults or nonvirgins.
On the other hand, a ‘fuzzy’ prototype-based analysis is on its own equally
inadequate. A middle-aged, unmarried woman (spinster) is almost as far-removed
from the prototype of kiz as she is from that of girl, but outside certain specific
contexts, she still may not be placed in the kadın category: category boundaries,
while changeable, are often anything but fuzzy.
In this study of the terms kiz and kadın, I will argue that both feature-based
and prototype-based models are necessary in order to explain categorisation acts;
however, instead of a simple binary feature bundle I use a variable weighted feature approach that involves a distinction between ‘defining’ and ‘typical’ features
(which indicate category membership and prototypicality respectively), and a further distinction between those features that remain fairly constant and those whose
salience varies according to context and communicative intent. Furthermore, I also
propose the concept of ‘category stress’, which can be seen as a kind of cognitive
dissonance resulting from disparity between feature-based and prototype-based
categorisation processes. This may result from a number of factors, both contextual and cultural; it also often results in infelicitous categorisations or a search
for alternative categories. Thus, in situations where the “strictly speaking” use of
kız would apply to an item far removed from the prototype (as in the middleaged spinster example), alternative terms such as bayan or hanım (both roughly
meaning ‘lady’) may be used.
. Views of categorisation
Since the publication of Lakoff ’s (1987) Women, Fire and Dangerous Things, it
has become common to divide theories of categorisation into traditional, featurebased semantics in one camp, and cognitive approaches based on prototypes,
metonymy and metaphor in the other, with Aristotle cast as the villain of the piece
(Wierzbicka 1990: 364). However, Aristotle himself had a more sophisticated view
of categorisation than is commonly supposed, and, with his distinction between
‘essential’ and ‘accidental’ attributes was the first to introduce the idea that not all
features are of equal importance. The problem with the Aristotelian view lies not
so much in the distinction between essential and accidental attributes as in its failure to realise that some accidental (i.e. non-defining) attributes are anything but
accidental. To give one of Aristotle’s favourite examples, “white man” (Metaphysics,
VII(6): 1031), the attribute ‘white’ is obviously not essential to being a man, but
falls within an accepted colour-range that is probably an element in the process
of categorising a creature as a ‘man’: “white man” and “black man” indicate dif-
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.4 (230-307)
 Robin Turner
ferences between men, but “green man” or “purple man” imply something odd is
going on (maybe what we are referring to is not actually a man but a leprechaun or
an alien, or maybe the colour term is being used metaphorically). If we view features as “focal values in a continuous cognitive space” (Jackendoff 1992: 205), we
need some set of rules for determining their relationship and relative importance,
rather than simply lumping them together.
The realisation that not all features are created equal gave rise to the “weighted
feature-bundle” approach (Coleman & Kay 1981). A simple feature bundle fails
to describe the internal structure of a category, nor does it give an accurate picture of its relationship with other categories (Langacker 1987: 19–20). Therefore
an alternative is to rank features from most to least essential. However, while the
idea of assigning different weightings to features is useful, it is still necessary to
draw a distinction between types of feature in terms of those that define a category and those that establish centrality within that category. For this reason Lehrer
(1974) proposed a distinction between ‘obligatory’ and ‘optional’ features, and
similar approaches have been adopted by Lipka (1986), and Wierzbicka (1985).
What these approaches have in common is an attempt to reconcile feature- and
prototype-based categorisations.
From a different perspective, Jackendoff (1983) and Pustejovsky (1995) have
also addressed this problem. Jackendoff in particular suggests that the combination of an atomistic feature-based system with preference rules can explain “categories with fuzzy boundaries and family resemblance properties á la Wittgenstein
and Roth” (1992: 206). In any case, it is obvious that mere resemblance to a prototype is in itself insufficient as a basis for categorisation. As Wierzbicka (1990: 350)
points out, resemblance does not explain why “an ostrich is a bird but a bat is
not”, as the latter is in many ways closer to our celebrated prototypical robin than
the former. Furthermore, Cruse (1990: 388) argues, “It is not easy to see how the
boundaries of a category can be derived from its prototypes.”
Another problem is raised by context. Cruse points out that “It is at least possible that different criteria are used with different categories, and perhaps even
different criteria on different occasions of judgement with the same category,
under different sorts of contextual pressure” (1990: 384). Following Hymes,
the role of language as a device for categorizing experience and its role as an instrument of communication cannot be so separated, and indeed, the latter includes
the former. This is the more true when a language, as is often the case, affords
alternative ways of categorizing the same experience, so that the patterns of selection among such alternatives must be determined in actual contexts of use.
(Hymes 1972: 33)
It is clear, then, that categorisation is partly determined by contextual factors and
partly by the speaker’s state of mind and intention in communicating.
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.5 (307-349)
How do you know she’s a woman? 
A categorisation act may also be influenced by a scenario: “a culturally defined sequence of actions; a story schema” (Palmer 1996: 75). Holland and Skinner
claim that female American students’ categorisations of types of men are based
on “a taken-for-granted relationship between males and females”, and that by using specialised terms, such as ‘jock’, ‘nerd’ and so on, women “relate types to a
set of scenarios in which the prototypical male/female relationship is disrupted”
(1987: 103). Disruption of a scenario results in what I have termed ‘category stress’,
and often occurs in the search for alternative categories. In this case, such categories as ‘jock’ or ‘nerd’ could be seen as a way out when a male fails to meet the
requirements of the scenario (e.g. by spending the whole of a date talking about
football or computers). Our middle-aged unmarried Turkish woman is another
example, since in the prototypical scenario she would have married sometime between the ages of fifteen and twenty-five (Atalay 1992: 271), and would have made
the transitions from [+virgin] to [–virgin] and [–adult] to [+adult], moreor-less simultaneously. In Turkey, as in many cultures, the concepts of adulthood
and marriage are intertwined, especially for women; marriage provides a rite of
passage that enables the normally fuzzy boundary between child and adult to become much more clear-cut. A late, or a very early marriage can thus give rise to
category stress.
Finally, it is important to remember that diachronic factors are also important
in categorisation and category stress; in fact historical linguistics as a field is largely
concerned with the process of change in categories, such as amelioration and so
forth. Societies change, and their languages change with them “through time and
incessant patter” (Palmer 1996: 6), although there is frequently a time lag. Because
cultural models organise large amounts of information successfully and are thus
resistant to change (Holland & Skinner 1987: 105), the result is category stress.
From this review of the literature, we may raise the following hypotheses:
1. Both features and prototypes play an important role in categorisation; neither
approach on its own is adequate.
2. Features are not of equal importance in assigning items to a category, and can
be broadly grouped into ‘defining features’ and ‘typical features’.
3. While the status of some features remains fairly constant, that of others varies
according to a number of contextual and communicative factors.
4. Contextual and cultural factors may lead to disparity between feature- and
prototype-based categorisation (category stress).
In the rest of this paper I will test these hypotheses using the terms kiz and
kadın as benchmarks. With the exception of example (13), which is presented as
a “theoretically possible” sentence, the phrases and sentences used are examples of
Standard Turkish. These data include utterances by friends and relatives (referred
to by their initials) and popular media.2 Some examples come from concordances
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.6 (349-416)
 Robin Turner
provided by Petek Kurtböke from her “Ozturk Corpus” (for a more detailed description of the data-collecting procedure and results of the concordancing, see
Turner 1998).3
. Defining features of kiz and kadin
I have claimed that in assigning items to the categories kiz and kadın, the feature
[±virgin] is more important than [±adult], in contrast to the English categories
girl and woman, where the reverse is the case. However, this claim obviously
needs to be tested if it is not to fall into the “twenty words for snow” category.
The first case is that the question given in (2) would receive an answer based
on the referent’s sexual experience or marital status rather than age; it can function
either as “Is she a virgin?” or “Is she married?”
(2) kız mı, kadın mı?
girl int. woman int.
“Girl or woman?”
In the prototypical scenario, loss of virginity and marriage coincide, but in practice, of course, they often do not. It is therefore necessary to establish which
criterion – [±virgin] or [±married] – is more important in categorising such
peripheral cases. After all, in some languages, such as Greek, it is marriage which
makes one a woman (gineka), rather than a girl (kopela). An example that illustrates the priority of [±virgin] is given in (3).
(3) Sen orta-okul-da-yken
kadın ol-muş-sun
you middle-school-loc.-while woman become-said-2ndsing.
“They say you became a woman when you were at Middle School.”
(Comedian “Huysuz Virjin” to singer Ajda Pekan, Star TV, 11/7/98)
Kadın olmak, ‘to become a woman’, is also defined as ‘to have one’s hymen broken’
(Türk Dil Kurumu 1998). Additionally, the medical test to establish virginity (still
legal, though widely condemned) is colloquially known as, kız kontrolu, ‘girl test’.
We may conclude, then, that [+virgin] is a more important feature of kiz
than [–adult], though not as vital as [+human] and [+female], since it may on
occasions, be over-ridden, as we shall see later. [+virgin] in kiz and [–virgin] in
kadın are defining features, but weak ones; an absence of strong defining features,
such as [+human] and [+female], marks its usage as clearly metaphorical, as in
the case of, kiz neyi, the smallest size of reed flute (ney), which is a metaphorical
extension of the kiz category.
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.7 (416-483)
How do you know she’s a woman? 
. Typical features
If [±adult] is not of prime importance in distinguishing between kız and kadın,
it is obviously important in establishing centrality in a category, and thus may be
termed a ‘typical feature’. I would also argue that it is a ‘strong’ typical feature, in
that in some contexts it may override [±virgin].
One case is where someone is not a virgin, but is nevertheless very young. An
example was provided by the media furore surrounding the marriage of Sarah, a
fourteen-year-old English tourist, to a Turkish waiter. Had Sarah been Turkish this
might not have been worthy of comment, since in this area of Turkey (Kahramanmaraş) marriage at this age, though illegal, is still common; she would thus have
made the normal transition from kiz to kadın. However, once the case made the
press, Sarah was largely referred to as kiz rather than the technically accurate kadin.
When I pointed this out to a Turkish-speaker, the reaction was “well, I suppose
strictly speaking she is a kadın, but . . . ” (ŞH). Similarly, outside medical contexts,
children who are the victims of rape are generally referred to as kız, as would a girl
who had broken her hymen accidentally (TK, AA).
What seems to be happening here is a disruption to the normal scenario, or,
as I have called it, category stress. Lack of a typical feature is not usually enough
to justify exclusion from a category, but extreme cases can change the weighting of
features, so that a strong typical feature can override a weak defining feature. In the
case of Sarah, her age was seen as sufficiently atypical as to override the [–virgin]
feature, and placing her in the kiz category, added to the sense of moral outrage
that draws on other typical features (or connotations) of kiz, such as innocence
and vulnerability. These are associated with the feature [–adult] but probably
more so for girls than boys. On their own, these weak typical features would not
be sufficient to override a defining feature, but may add their weight to this process
in combination with [–adult].
Not all cases need be as extreme as that of Sarah, though, and there is a fair
degree of latitude in whether to refer to a married woman as kadın or kız. In addition to the absolute age of the person referred to, her age relative to the speaker
can play a part. An older woman may refer to a younger woman as kız irrespective
of the marital status of the latter.4
It is, however, very rare for the reverse to occur – i.e., for [+adult] to override
[+virgin]. Outside contexts in which [±adult] becomes irrelevant (which will be
discussed later), I have observed hardly any instances of virgins being referred to
as kadın.
Another typical feature of kiz is [+intimate], since one thing girls prototypically do is form close peer friendships. The phrase kız kıza (“girl-to-girl”) conjures
up images of intimate conversation, while erkek erkeğe has the same connotations
as its English equivalent, “man-to-man”: the emphasis is less on intimacy (though
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.8 (483-534)
 Robin Turner
that may be involved) and more on honesty, or in competitive situations, fairness
(erkek erkeğe dövüşmek translates as “fight fairly”). Kız kıza is a popular title for
websites dealing with “girl-talk” (e.g. Turkstudent.net 2005) and the peer intimacy
aspect has been used to market products. As we have seen in the case of “going
out with the girls”, in both English and Turkish [+intimate] can be regarded
as a strong typical feature, since it may override both [+virgin] and [–adult],
though this is not true for all languages.5 In Turkish a woman going out with her
workmates may say a sentence like (4), even though nearly all the ‘girls’ in this
case were married or divorced. Interestingly, there seems to be no male equivalent
(cf. “going out with the boys/lads”). Again, then, we see an asymmetry between
gender-specific terms.
(4) kız-lar-la
gid-iyor-um
girl-pl.-with go-prog.-1stsing.
“I’m going with the girls”
(NT)
There seem to be two processes at work in this apparent miscategorisation, along
with other examples, such as hadi kızlar! (‘Come on, girls!’). The first is that the
presence of even one unmarried woman in the group would make the use of kadın
infelicitous; the second is the peer-friendship element. As in English, the term
would probably not be used by an outsider, especially a male one, since what would
be understood there would not be [+intimate] but [–adult]; in other words, it
would be seen as patronising.
The case is clearer when only one person is being addressed. As we saw earlier, N’aber kız? (‘How’s things girl?’), may be used to greet any female friend,
though its use is probably rather more common in female-female than malefemale exchanges.
kiz is also used paternalistically, emphasizing the [–adult] feature, and this
use is particularly common with the first person singular possessive (kızım, ‘my
girl’). This may well be a metaphorical extension of the polysemic meaning of kiz
as ‘daughter’, which I will discuss later. Like the intimate use, it may be felicitous or
infelicitous depending also on whether the speaker is seen as occupying an appropriate social/discourse role. Older friends, relatives and sometimes even strangers
are often expected to play a fatherly/motherly role, so this use of kiz or kizim may
be appropriate, but it can equally well be seen as condescending. For example,
in a television debate (Siyaset Meydanı), an older male participant repeatedly addressed a young woman with whom he was arguing as kızım, and although she did
not verbally object, her anger was visible. As a viewer put it:
(5) Görü-yor mu-sun
nasıl aşağılı-yor
see-prog. int.-2ndsing. how lower-prog.
“Do you see that? He’s really putting her down.”
(NT)
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.9 (534-599)
How do you know she’s a woman? 
Note that there is no danger of infelicity when referring to a third party as kız, as
mentioned earlier.
. Context and topic
Women and men
We have seen how context and communicative intent can cause a strong typical
feature to override a weak defining feature. In addition, strong contextual or topical pressure may sometimes simply eliminate a weak defining feature, such as
the aforementioned ‘International Women’s Day’ example. As in English, topicbased categorization based on “women as opposed to men” results in kadın being
stripped down to its strong defining features [+human] and [+female].6 Thus, it
is possible to say kadın, to someone who would normally be a member of kiz,
(6) Dünya Kadın Gün-ün
kutlu
olsun
world woman day-2ndpos. celebrated be-3dimp.
“Congratulations on ‘International Woman’s Day.”’
(NT, to NH, opening telephone conversation)
Similar cases arise when kadın is found in collocation with erkek (‘man’) or with
haklar (‘rights’) as in the following examples:
(7) kadın mi, erkek mi?
woman int. man int.
“Male or female?”
(8) kadin ve erkek iş-çi-ler-i
woman and man work-er-pl.-pos.
“male and female workers”
(ŞH)
(Ozturk Corpus)
(9) köprü-ler-in
altı-ndan, kadın hak-lar-ı-ndan,
kadın-erkek
bridge-pl.-gen. under-abl. woman right-pl.-pos-abl. woman-man
eşit-liğ-i-nden
yan-a
çok su-lar
geç-tiğ-i
için
equal-ness-pos.-abl. side-dat. very water-pl. pass-part.-pos. for
“As for women’s rights and male-female equality, much water has flowed
under the bridge.”
(Milliyet, 7/12/92)
This raises the question of whether we have a case of polysemy: one distinct
meaning of kadın as [+human][+female] and [–virgin], and another meaning
as simply [+human] and [+female]. However, as Wierzbicka (1992: 14) argues,
“polysemy must never be postulated lightly”. Since all members of the first kadın
category postulated are automatically members of the second, it might be more
parsimonious to assume that there is just one kadın category, and this category
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.10 (599-668)
 Robin Turner
may expand, in certain cases, by losing the [–virgin] feature. The same applies, of
course, to English woman, where the [+adult] feature is dropped under the same
circumstances.
Other collocations
Common collocations to do with services have the effect of eliminating the
[±virgin] distinction. No one would assume that a kadın kuaförü (‘women’s hairdresser’) would only cut the hair of married women. Similarly, it is rare to think
that sexual experience is a prerequisite for seeing a kadın doktoru (‘gynaecologist’),
and if one were thrown out of a kız yurdu (‘girl’s hall of residence’) for not being
a virgin, it would be on moralistic rather than semantic grounds. The first two
cases employ the ‘women as opposed to men’ sense of kadın, while with kız yurdu
[–adult] overrides [+vırgın].
In the case of occupations, the prototypical member of the collocated category
pushes out exceptions. Thus a kadın doktor (without the accusative/possessive -u
suffix) simply means ‘female doctor’. Aside from the use of kadın to mean woman
as opposed to man, it is the case that most doctors are married, and the small
number of doctors in the kiz category is insufficient to warrant use of the phrase
kız doktor.
A similar consideration applies in the case of kiz öğrencisi, which literally
means ‘girl student’ but in practice conveys, ‘female student’. Turkish tends to force
a choice between kız and kadin for ‘female’, since the literal word for female, dişi,
is generally only used for (i) animals, (ii) as an insult through metaphorical extension (BÇ), (iii) as a way of emphasising female sexuality, again perhaps using the
animal metaphor, or (iv) in collocations that are seen as somehow ‘odd’, such as
dişi Rambo, ‘female Rambo’ (Milliyet 27/3/99). Prototypically female students are
members of kız, and this is even extended to those students who obviously do not
belong to this category. The following television news headline illustrates this:
(10) Profesör-ler, kiz öğrenci-ler-i
kullan-ıyor
Professor-pl. girl student-pl.-acc. use-prog.
“Professors are using girl students.”
(Star TV News, 2/2/98)
‘Use’, in this case, means ‘have sex with’, referring to a scandal at Izmir University. Obviously if the students in question have been ‘used’, they are not strictly
speaking members of kız, but the collocation overrides this and perhaps also adds
to the sense of outrage. There is also a certain ambiguity here, though, since there
may be a sense that the professors are actually deflowering students, who would, at
the time, be classed as kız. Note that again the asymmetry applies: the male counterpart of kiz öğrencisi is erkek öğrencisi (‘man student’), not oğlan öğrencisi (‘boy
student’), as seen in example (11).
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.11 (668-717)
How do you know she’s a woman? 
(11) Brighton College, yaklaşık 500 erkek ve kız öğrencisi
olan
Brighton College close-to 500 man and girl student-pos. being
yatılı
bir okuldur
boarding a school-is
“Brighton College is a boarding school for around 500 male and female students.”
(Promeths.com 2005)
Collocation requires lexical conformity almost by definition. It is therefore not
surprising that collocations are based on prototypical instances, and these exclude
atypical cases.
. Causes and effects of category stress
I have argued that category stress occurs when there is a disparity between the
results of feature-based and prototype-based categorisations. Sometimes this disparity is inconsequential, as when we call something a cup because it is used for
drinking, even though it may actually look more like a bowl. However, with categories like woman and kadın, there is more at stake. The prototypes have more
psychological impact; miscategorisation can have undesirable consequences; and
difficulty in categorisation is more stressful. All three defining features of woman
have a direct impact on social identity, and fuzziness or ambiguity, in that any of
these can result in discomfort, humour or even fear. How these reactions can be
exploited, consider: Lolita ([±adult]), Twelfth Night ([±female]), or Invasion of
the Body Snatchers ([±human]).
Normally, we would expect category stress to be a rare phenomenon; if this
routinely occurred at the boundaries of categories, such categories would probably
be altered over time to avoid confusion or infelicitous usage. It would be premature to say whether features are abstracted from prototypes, or that prototypes are
constructed from commonly occurring features. In either case, the two work in
parallel, otherwise they would not work at all. Nevertheless, at the periphery of
a category there are bound to be some items that strike us as ‘odd’, like flightless
birds or promiscuous priests.
A more interesting cause of category stress is social change. I have stated that
there is usually a lag between social change and linguistic change, and this is probably greater the more socially and psychologically salient a particular aspect of the
cultural model is. In Turkey, the main causes of category stress in kız and kadın
are the rise in the average age of marriage and cultural Westernisation. In the past,
early marriage was the norm, but now it is rare for women to marry before the age
of twenty.7 Despite the strong social sanctions still in operation, this has inevitably
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.12 (717-795)
 Robin Turner
led to an increase in pre-marital sexual activity amongst young women (something
which had always been considered normal in young men).
As previously mentioned, one occasional result of category stress is humour.
An ostrich is seen as a funny kind of bird, and a promiscuous priest might be the
subject of a joke.8 An example of this in the case of kiz is the following line spoken
by a stereotypical ‘schoolmarm’ in a comic film:
(12) Yaş-ım
otuz beş. Ve kizim. El
değ-me-miş. . .
age-gen.1st thirty five and girl-1st hand place-neg.-part.
“My age is thirty-five. And I am a ?girl. Undefiled . . .”
(Hababam Sınıfı)
This is quite culture-specific humour, arising from the contrast between the strict
definition (‘she is a girl’) and the prototype (‘she is not what one would expect on
hearing the word’), plus the fact that one would not normally allude so obviously
to one’s virginity.
A more common effect of category stress is alternative categorization – i.e.,
use of a different word. Hanım and bayan are both acceptable alternatives, though
somewhat formal. These can be used when one is not sure of the status of the
person in question, although it is also common for older, unmarried women, in
order to avoid the [–adult] implications of kiz.
Bayan, although literally meaning ‘lady’ (and also a formal title similar to ‘Ms’)
seems to be becoming a neutral term with defining features [+human] and [+female] with a typical feature [+adult].9 It is, for example, the normal term used
in sports, such as in (13),
(13) tek
bayan-lar-da
single lady-pl.-loc.
“In the women’s [tennis] singles.”
(Ozturk Corpus)
An extreme example of bayan shedding its ‘ladylike’ associations is
(14) bayan terörist-i
?lady terrorist-pos.
“female terrorist”
(TRT News)
However, this use was still greeted with amusement by a Turkish colleague (AA),
which seems to indicate that escaping from one form of category stress may lead
to another.
As for the features of kız and kadın themselves, it is possible that these may
eventually change to reflect changes in the cultural model. However, given the continued importance given to virginity in Turkish society as a whole (rather than the
progressive urban elite), this seems highly unlikely in the near future.
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.13 (795-844)
How do you know she’s a woman? 
. Caveats and conclusions
Perhaps because of the fluid nature of ‘meaning’, it is all too easy to think up a
semantic theory and find language examples that seem to justify it. In using kadın
and kız, I have deliberately chosen terms and contexts that place considerable
strain on a semantic model, and an ability to perform successfully in interpreting the linguistic data should not be taken as proof of the validity of that model,
but merely serve as an indication that it has potential. In particular, it should be
stressed that this model makes no claims with regard to neurology; it attempts to
explain linguistic behaviour in cognitively plausible terms, but does not presume
to assert that the brain processes semantic information in exactly the same way as
the model proposes.
A point worth emphasising is that the use of traditional semantic notation
should not be interpreted as support for the idea of atomistic binary features.
Features are themselves categories, and are subject to fuzziness, prototype effects,
metaphorical extension, and so forth. Writing, for example, [+virgin] is simply a convenient way of indicating that in the view of the person performing the
categorisation act, the item to be categorised fits their minimum criteria for virginity. Even such an apparently non-gradable category as virgin has peripheral
members, such as “technical virgin”; criteria for membership may also vary across
cultures, so strictly speaking I should have used the term [±bakire] rather than
[±virgin].10
Assignment of a positive or negative sign to features is also somewhat arbitrary. [+female] could be, and often is, written as [–male]. My choice of the
former is simply a reflection of an assumption that femaleness is not perceived
simply as the absence of maleness. I retained the conventional [–adult], rather
than using [+child], because children may be viewed teleologically as potential
adults, while women are not viewed as potential men. In the case of [+virgin] a
positive rather than a negative feature was used because in both English and Turkish virginity is seen as a positive attribute or even a possession; something that may
be ‘lost’ (English) or ‘broken’ (Turkish). This may not only be due to the cultural
importance attached to virginity, but also to the physical existence of the hymen.
It would also be possible to develop the model further to give a better idea of
the internal structure of a category. Some features are subordinates of others; for
example [+virgin] implies [+human], since one would not normally speak of a
virgin cat. Similarly, some features are typical features of categories alluded to by
other features; for example [+vulnerable] is a typical feature of [child] – i.e.,
[–adult].
One application of the model that has not been much examined in this study
is its use in describing metaphor. For example, the item kız neyi (the smallest ney,
or reed flute) is clearly metaphorical, as in (15),
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.14 (844-910)
 Robin Turner
(15) kız gibi araba
girl like car
“beautiful new car”
In kız neyi, the weak typical feature [+small], itself a typical feature of [–adult],
is used to create the metaphor; in (15) a metaphorical extension of [+virgin]
is used – kız gibi is explained as bozulmamiş – literally meaning ‘unbroken’ or
‘unspoiled’ (AH). We can postulate, therefore, that a metaphor may not simply
involve a transfer of features from a source to a target domain, but a creative
metaphorical extension of features themselves: a kind of ‘meta-metaphor’. This
follows naturally from the assumption that features are themselves categories.
This type of feature-based approach could be useful in distinguishing between
deeply-buried metaphors, and ‘obvious’ metaphors of the kız neyi or ‘female joint’
type. It is possible that the obviousness of the metaphor has an inverse relationship to the number and strength of features transferred from the source to the
target domain; it may also depend on whether features are transferred “as is” or
are themselves metaphorically extended. For example, when Captain Kirk says
of the Enterprise “She is a beautiful woman, and I love her!” he is deliberately
confusing an object with a human. The starship lacks the features [+human]
[+female] and [+adult], and the metaphor succeeds by taking the one feature
[+female] and metaphorically extending it. On the other hand referring to a pet
as ‘she’, rather than the customary ‘it’ for animals, is less obviously metaphorical,
since it possesses at least one defining feature of woman [+female] in its original
feature-bundle, rather than being an extended form.
The notion of category stress is, I have suggested, of use in illuminating some
types of sociolinguistic behaviour, language change and even art (as in the Twelfth
Night example), whether or not it is coupled to the particular feature analysis discussed here. Categorisation, as I have argued, reveals much about culture and I
would suggest that ‘stressful’ categorisation acts may be particularly revealing. In
fact, my interest in the kadın and kız categories, and a realisation that virginity is not only socially but linguistically significant followed from the following
embarrassing exchange (in English) with a Turkish student:
(16) Author: “She’s a nice woman.”
Student: “How do you know she’s a woman?”
Notes
. The idea that Eskimos have twenty words for snow is a linguistic ‘urban myth’ ably exploded
by Geoffrey Pullum in The Great Eskimo Vocabulary Hoax (1991).
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.15 (910-983)
How do you know she’s a woman? 
. Subjects referred to are:
AA: Female, 25, English teacher
BÇ: Male, 19, student
AH: Male, 55, news photographer
NH: Female, 25, travel agent.
ŞH: Female, 45 (?), housewife
TK: Female, 28, English teacher
NT: Female, 28, ceramics teacher
All are native speakers of Turkish, living in Ankara and born in either Ankara or Izmir.
. The Ozturk Corpus is collected from Australian Turkish community newspapers. With a
few exceptions (Kurtböke 1996) which are not relevant to this study, the corpus can be seen
as representative of Standard Turkish.
. I recently noticed my wife doing this, and when asked why, received the answer “I don’t
know – probably because she’s ten years younger than me.”
. In Greek, for example, an adult female does not ‘go out with the girls’, but with the women –
ginekes (Sophia Piperis, personal communication, 1998).
. This also eliminates the asymmetry of kadın and erkek, by providing erkek with the feature
[+human], since we are obviously not talking about ‘women as opposed to male creatures of any
species’. Incidentally, erkek, the commonest term for ‘man’, is never used in the sense of ‘human
being’. Occasionally its near-synonym, adam, is used like this, but normally one would say insan
(‘person’). Similarly, the problem of the ambiguous male third person pronoun does not arise,
since Turkish pronouns have no gender.
. For example, in 1991 only 28% of female 15 to 19-year-olds were married, compared to 82%
of women in the 50–54 age range who had married at 19 or younger (calculated from figures in
Atalay 1992: 271).
. It is interesting that jokes may employ both stereotypes and peripheral category members,
as in the Turkish joke about the frustrated housewife and the blind imam (unfortunately so
culture-specific as to be virtually untranslatable).
. One may refer to a child as bayan, but it is normally modified as küçük bayan, ‘little lady’.
. The Turkish for ‘virgin’, bakire (from Arabic), simply means a female with an unbroken
hymen, so a technical virgin is still a virgin. This can in turn lead to category stress, since the
importance placed on the hymen encourages a lot of ‘technicality’, in a manner reminiscent of
1950’s America.
References
Aristotle (1987). Metaphysics. In J. L. Ackrill (Ed.), A New Aristotle Reader. Oxford: Clarendon
Press.
Atalay, Besir (1992). Türk Aile Yapısı Araştırması. Ankara: DPT Sosyal Planlama Genel
Müdürlüğü.
Coleman, Linda & Paul Kay (1981). Prototype Semantics: the English word, lie. Language, 57,
26–44.
JB[v.20020404] Prn:20/03/2006; 15:56
F: HCP1510.tex / p.16 (983-1099)
 Robin Turner
Cruse, D. A. (1990). Prototype theory and lexical semantics. In S. Tsohatzdis (Ed.), Meanings
and Prototypes: Studies in linguistic categorization (pp. 382–402). London: Routledge.
Holland, Dorothy & Debra Skinner (1987). Prestige and intimacy: the cultual models behind
Americans’ talk about gender types. In D. Holland & N. Quinn (Eds.), Cultural Models in
Language and Thought. Cambridge: Cambridge Univ. Press.
Hymes, Dell (1972). Towards Ethnographies of Communication: the analysis of communicative
events. In Pier Paolo Giglioli (Ed.), Language and Social Context. Harmondsworth: Penguin.
Jackendoff, Ray S. (1983). Semantics and Cognition. Cambridge, MA: The MIT Press.
Jackendoff, Ray S. (1992). Languages of the Mind: Essays on mental representation. Cambridge,
MA.
Kurtböke, N. Petek (1996). A Corpus-Based Analysis of the Turkish Community Newspapers in
Australia: a progress report. In Proceedings of the VIIIth International Conference on Turkish
Linguistics. August 7–9 1996. Ankara.
Lakoff, George (1987). Women, Fire, and Dangerous Things: What categories reveal about the
mind. Univ. Chicago Press.
Langacker, Ronald W. (1987). Foundations of Cognitive Grammar: volume I: theoretical
prerequisties. Stanford, CA: Stanford University Press.
Lehrer, Adrienne (1974). Semantic Fields and Lexical Structure. Amsterdam: North-Holland.
Lehrer, Adrienne (1990). Prototype theory and its implications for lexical analysis. In S.
Tsohatzdis (Ed.), Meanings and Prototypes: Studies in linguistic categorization (pp. 368–381).
London: Routledge.
Lipka, Leonhard (1986). Semantic Features and Prototype Theory in English Lexicography. In D.
Kastowsky & A. Szwedek (Eds.), Linguistics Across Historical and Geographical Boundaries.
Berlin: Mouton de Gruyter.
Palmer, Gary (1996). Towards a Theory of Cultural Linguistics. Austin: University of Texas Press.
Promeths.com (2005). [Online] “İngiltere’de Ortaokul ve Lisede Misafir Öğrencisi” Available at
http://www.promeths.com/programlar/ortaokullise/ingiltere.php
Pullum, Geoffrey K. (1991). The great Eskimo vocabulary hoax, and other irreverent essays on the
study of language. Chicago: University of Chicago Press.
Pustejovsky, James (1995). The Generative Lexicon. Cambridge, MA: MIT Press.
Türk Dil Kurumu (1998). Güncel Türkçe Sözlük [on line] Available at: http://tdk.org.tr/
sozluk.html
Turkstudent.net (2005). Kız kıza. [Online] Available at: http://www.turkstudent.net/cat/558
Turner, Robin (1998). Culture, context and categorisation: a feature- and prototype-based study of
Turkish terms for women. Unpublished MA dissertation, Surrey University.
Whorf, Benjamin L. (1956). Language, Thought and Reality: selected writings of Benjamin Lee
Whorf (Ed. John B. Carroll). New York: Wiley.
Wierzbicka, Anna (1985). Lexicography and Conceptual Analysis. Ann Arbor: Karoma.
Wierzbicka, Anna (1990). ‘Prototypes Save’: on the uses and abuses of the notion of ‘prototype’
in linguistics and related fields. In Savas L. Tsohatzidis (Ed.), Meanings and Prototypes:
studies in linguistic categorization. London: Routledge.
Wierzbicka, Anna (1992). Semantics, Culture, and Cognition: universal human concepts in
culture-specific configurations. Oxford: Oxford University Press.
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.1 (47-111)
chapter 
Cross-linguistic polysemy in tactile verbs*
Iraide Ibarretxe-Antuñano
University of Zaragoza
The link between the semantic field of tactile perception and that of emotions
has been long established (Kurath 1921; Buck 1949). Within the framework of
Cognitive Semantics, Sweetser (1990) analyses the semantic extensions that occur
in perception verbs. Taking Sweetser’s study as a starting point, in the first half of
this paper, I analyse the metaphorical scope of tactile verbs, not only in English,
but also in two other languages, Basque and Spanish. In the second half, I explain
how these polysemous structures are obtained, what the semantic packaging of
these extended meanings is. In other words, how the semantic content of the
lexical items (tactile verb and arguments) interacts and contributes to the
creation of each semantic extension.
Keywords: polysemy, metaphor, touch, cognitive linguistics
.
Introduction: Tactile perception and emotions
The link between the semantic field of tactile perception and that of emotions has
been long established. Authors such as Kurath (1921) and Buck (1949) pointed out
the relationship between these two domains in Indo-European languages already
in the first half of the twentieth century. Although these studies are thorough investigations into the etymology and polysemous senses of tactile words, they do not
provide a motivated account of why these different meanings are related to these
words in particular. More recent studies within the cognitive semantics framework (Lakoff 1987; Johnson 1987; Langacker 1987, 1991) have tried to show that
the polysemous structure of tactile words is motivated. That is to say, the fact that
a lexical item has different meanings is not whimsical, but motivated by our experience and understanding of the world. These different meanings are not random,
but structured by means of cognitive devices such as metaphor.
Within the cognitive semantics model, Sweetser (1990) analyses the semantic extensions that occur in perception verbs. Like Kurath and Buck, she relates
the physical sense of touch to emotional feeling and to the general sense of percep-
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.2 (111-172)
 Iraide Ibarretxe-Antuñano
tion.1 She also proposes that these extended meanings are not particular to English
only, but cross-linguistic.
Taking Sweetser’s study as a starting point, in the first half of this paper, I have
analysed the different meanings conveyed by tactile verbs, not only in English but
in two other languages: Basque and Spanish. Following Kövecses’ (1995, 2000) terminology, the ‘metaphorical scope’ of the verbs of touch in these three languages
seems to be broader than that proposed by the studies mentioned above.2
The aim of this paper, however, is not only to give an account of the meanings conveyed by these verbs in these three different languages, but also to explain
how these polysemous structures are obtained, what the semantic packaging of
these extended meanings is. In other words, how the semantic content of the lexical items (tactile verb and arguments) interact and contribute to the creation of
each semantic extension. Previous studies on polysemy (Brugman 1981; Vandeloise 1991; Herskovits 1986; among many others) offer detailed descriptions of
the different polysemous senses of specific lexical items, the relations that hold
among themselves, the conceptual motivation for such relations, and so on. However, what these analyses do not explicitly do is to address the question of whether
these different meanings are the result of the different senses of a polysemous verb
through the interaction between the semantics of the verb and its arguments or
whether it is the choice of a particular argument what really determines different
meanings. I examine this issue in the second half of this paper.
. Metaphorical scope of tactile verbs revisited
The semantic field of tactile perception is usually linked only to the domain of
emotion. However, if we review the different meanings that these verbs can convey
in English, Basque and Spanish, it is found that these verbs not only map onto the
field of emotions but also onto other semantic fields as well.3
The verbs used in this case are touch in English,4 ukitu in Basque,5 and tocar
in Spanish. The linguistic data come from three different sources: (i) monolingual
and bilingual dictionaries; (ii) several corpora: English (The Lancaster-Oslo/Bergen
Corpus-LOB, The British National Corpus-BNC),6 Basque (Present-day Basque
Reference Corpus-EEBS), and Spanish (Reference Corpus for Present-day SpanishCREA); and (iii) examples for the most part constructed by me, occasionally on
the basis of an utterance that I have seen or heard used.7 Native speakers were
always consulted concerning the naturalness of these examples.
In the first instance there are two concrete extended meanings found in the
three languages. One meaning is ‘to partake of food or drink’ as in (1), (2) and (3).
(1) John hardly touched the food
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.3 (172-237)
Cross-linguistic polysemy in tactile verbs 
(2) Jonek
ez du
ia
janaria ikutu
john.erg neg aux.3s hardly food.abs touch.per
“John hardly touched the food”
(3) Juan apenas ha tocado la comida
john hardly has touched the food
“John hardly touched the food”
In these three examples we learn that John did not eat much of his food, so in
these cases, the meaning is ‘to partake of food’. If we change the direct object food
for drink, then the meaning will be ‘to partake of drink’ instead. It has been suggested (Barcelona, p.c.) that instead of having the meaning ‘to partake of food or
drink’, which is too specific, it would be better to propose a more general meaning like ‘to partake of something’. That would cover not only sentences like John
hardly touched the food, but also examples like I didn’t touch a penny of your money.
Although this proposal is sensible to some extent, I keep the former for two reasons. First, because several dictionaries contain this entry as a separate one (cf.
am, col). Second, because intuitively these two sentences do not imply exactly the
same meaning. In my opinion, the inferences resulting from the two examples are
different. A sentence like John hardly touched the food can only make reference to
one action ‘to eat’ (or ‘to drink’ if we change the direct object to a drink), and
the verb ‘to touch’ can be replaced by the verb ‘to taste’. In the second sentence,
the verb ‘to touch’ is not related to the meaning ‘to eat’ (or ‘to drink’) and therefore, this substitution for ‘to taste’ is not possible. Here the meaning refers more
to the fact that I have not taken any money from that person, where ‘taken’ can be
understood as the physical action of grabbing something, if not ‘to steal’ it.
Another physical meaning is ‘to affect’ as in (4), (5) and (6).
(4) Just don’t touch anything in my room
(am)
(5) Nork
ukitu
nau, nork
ukitu
ditu
nire soinekoak?
who.erg touch.per aux.1s who.erg touch.per aux.3s my dress.abs.pl
“Who touched me, who touched my dresses?”
(is)
(6) ¿Quién tocó
mis vestidos?
Who touched my dresses
“Who touched my dresses?”
These three examples imply that not only has physical contact occurred, but there
has also been a change of location. In (4), the speaker does not want the other person to change anything in his/her room; whereas in both (5) and (6), the person is
asking about the person who did change the position of the dresses from the place
they were before. This meaning, which I term ‘to affect’, has also a metaphorical
extension as we shall see below.8
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.4 (237-306)
 Iraide Ibarretxe-Antuñano
As far as metaphorical meanings are concerned, there are three meanings:
‘to affect’, ‘to reach’ and ‘to deal with’. We have already seen that ‘to affect’ can
be understood physically as in (4), (5), and (6), but it also has a metaphorical
interpretation as in the examples below.
(7) The appeal touched her heart
(lob)
(8) Edertasunak ukitu
du
azkenean Iñakiren bihotz gogorra
beauty.erg touch.per aux.3s end.loc iñaki.gen heart strong.abs
“In the end, beauty changed Iñaki’s hard feelings”
(is)
(9) Juan le
tocó
el corazón a María
john she.dat touched the heart to mary
“John touched Mary’s heart”
(cse)
In these examples what is affected is the emotional side of the person in question.
In (7), the appeal was very emotive to this person; she was not able to remain
with the same feelings or ideas she had before hearing it. In (8), Iñaki’s feelings
are changed too, as a result of the beauty that he saw in a person or thing. Finally,
in (9), John also affected, i.e. changed, Mary’s feelings. Although the emotional
perspective of touch has been seen as an independent metaphorical mapping
(Sweetser 1990: 37/43), I would like to include it as part of this wider meaning
domain ‘to affect’. There are other examples in these languages where we have the
same ‘contact-to-effect’ chain and that can also be included under this label. For
instance, in Basque there is the expression ardoa ukitu, (lit.) ‘touch wine’, which
means that the wine is spoilt and can no longer be drunk. In Spanish, when a person wins the lottery it is very common to say Me tocó la lotería, (lit.) ‘the lottery
touched me’, in which case the lottery is the agent that provokes the change in me;
that is to say, I became rich.
A second metaphorical meaning is ‘to reach’ as in (10), (11), and (12) below.
(10) He touched the high point in his career
(11) 1685etik aurrera agintearen
gailurra ukitu
zuena
1685.abl forward mandate.gen top.abs touch.per aux.3s.who
“He who reached the top of his mandate from 1685 onwards”
(12) Ha tocado el punto más alto de su carrera
has touched the point most high of his career
“He has reached the peak of his career”
(col)
(is)
(osd)
These three examples imply that there is a point, an aim to be reached or that the
moment to do something or end-point has arrived. In (10), (11) and (12), this
end-point is the success achieved in a career.9 In other cases, as in (13) and (14),
the end-point is spatial.10
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.5 (306-384)
Cross-linguistic polysemy in tactile verbs 
(13) The ship touches at Tenerife
(14) meros transeúntes que
han tocado puerto mientras hacían
mere.p passerby.p which have touched port while
made
cruceros. . .
cruises
“. . .Just travellers who arrive here while they are on a cruise. . .”
(col)
(crea)
The ship in (13) and the passengers in (14) have arrived at their destination, at the
dock. In both examples, the fact that the ship is going to stay in the dock for a brief
period of time is also implied.11 In Spanish, however, this is not always the case:
(15) El barco tocó
puerto ayer
the ship touched port yesterday
“The ship arrived yesterday”
In (15), the information we are given is simply that the ship arrived, but not about
the length of time it will stay.
In Spanish there is a further usage of this meaning ‘to reach’ in the sense of
‘reaching the time to do something’ as in (16), where it is implied that the time
to pay has come, and in (17), where we are about to reach the end of a five year
period. What (16) implies is that the time to pay has come. This usage is very
interesting because it is etymologically related to the onomatopoeic origin of the
verb tocar. In old times the tolling of the bells used to announce events in villages.
Still in current times one can hear the church bells calling people to prayer. In
Spanish this is referred to as tocar a misa, (lit.) ‘touch to mass’. Nowadays, we do
not use bells for these matters anymore, but we use the same construction tocar a,
which reflects this tradition, to indicate that the time to do something has come.
The end point is temporal in these examples.
(16) Tocan
a pagar
touch.3.p to pay
“It is time to pay”
(rae)
(17) Durante estos cinco años que
ya
tocan
a su
fin. . .
while
these five years which already touch.3.p to their end
“In this five year period that is about to end. . .”
(crea)
A third metaphorical meaning in the sense of touch is ‘to deal with’ as in (18),
(19), and (20).
(18) I wouldn’t touch that business
(19) Nik ez nuke gai hori
ikutuko
I.erg neg aux.1s topic that.abs touch.f
“I wouldn’t touch that issue”
(am)
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.6 (384-454)
 Iraide Ibarretxe-Antuñano
(20) Hasta el momento no ha tocado el tema de la alfabetización
until the moment neg has touched the topic of the literacy
“Until now he hasn’t dealt with the literacy issue”
(crea)
In these examples we are told that these people do not want or have not yet had the
chance to deal with a specific subject (a business in (18), and some kind of issue in
(19) and (20)). If we insert adverbial expressions such as luzez, ‘for a long time’, or
en muchas ocasiones, ‘on many occasions’, the meaning ‘deal with’ changes a little
bit, as in examples (21) and (22).
(21) Unibertsitate-gaia
luzaz ukitu
dut
university-topic.abs long.in touch.per aux.1s
“I’ve dealt with university matters for a long time”
(is)
(22) En muchas ocasiones hemos tocado el tema de una posible
on many occasions have.1p touched the topic of a
possible
intervención de las fuerzas armadas
intervention of the forces armed
“We have dealt with a possible intervention by the armed forces on many
occasions”
(crea)
Due to the semantics of these specific adverbial expressions, what we imply is that
we have dealt with the same subject for quite a long time, repeatedly. As a result
we become very familiar with the subject, and come to know it fairly well. The
meaning shifts from ‘deal with’ to ‘be familiar with’ (know by experience).
Tocar can also mean ‘to deal with superficially’, such as in English, when a word
like barely, and/or the preposition on is inserted, as in (23) and (24) respectively.
(23) He barely touched on the incident in his speech
(amgd)
(24) To some extent I shall be touching on points already made by previous speakers
(lob)
In summary, the four major semantic extensions in tactile verbs analysed in this
section include: ‘partake of food/drink’, ‘affect’ (physically and metaphorically),
‘reach’, and ‘deal with’. These semantic extensions represent four different ways in
which the domain of tactile perception is conceptually linked to different experiential domains. These ‘links’ or mappings between domains are shared by the
three languages under investigation, English, Basque and Spanish. The fact that
these polysemes are found in different genetically unrelated languages must not be
taken as a surprise. These mappings take place at a conceptual level. As has been argued in the Cognitive Linguistics literature (cf. ‘embodiment’, Johnson 1987), this
conceptual level represents the way we understand and interact with the world;
our own experience of what surrounds us. We, as human beings, have the same
perceptual apparatus for the sense of touch, and therefore, it is only natural that
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.7 (454-501)
Cross-linguistic polysemy in tactile verbs 
the experiences that we have with the sense of touch – how we perceive with this
sense, its limitations and advantages, the type of information available through
this sense – are used as the conceptual basis for these metaphorical meanings.12
But, in the case of Basque, English, and Spanish speakers, we also have to bear in
mind that they are entrenched in the same Western culture (cf. Gibbs 1999), and
therefore, they share – at least, as far as these meanings are concerned – a common
view and conceptualisation of the tactile sense.
The role of culture in conceptualisation is an important issue because it has
been shown that in some cases, the ‘embodiment’ of the senses is not sufficient
enough to explain why certain sensory modalities are linked to certain cognitive
processes (cf. Classen 1993; and Howes 1991, for an enlightening exposition of
how the senses are conceptualised by different cultures). As Ong (1967 [1991: 26–
27]) puts it:
Cultures vary greatly in their exploitation of the various senses and in the way
in which they relate their conceptual apparatus to the various senses. It has been
a commonplace that the ancient Hebrews and the ancient Greeks differed in the
value they set on the auditory. The Hebrews tended to think of understanding
as a kind of hearing, whereas the Greeks thought of it more as a kind of seeing,
although far less exclusively as seeing than post-Cartesian Western man generally
has tended to do.
The relation between visual/auditory perception and cognition is a good example to show that culture really matters. In Cognitive Linguistics, the link between
vision and cognition has been generally accepted as one of the most consistently
universal mappings in this domain. Authors such Sweetser (1990) suggest that vision has primacy as the modality from which verbs of higher intellection, such
as ‘knowing’, ‘understanding’, and ‘thinking’, are recruited, whereas hearing verbs,
such as hear or listen, would not take these readings, because they are more “connected with the specifically communicative aspects of understanding, rather than
with intellection at large” (1990: 43). Although it is true that this correspondence
is systematically found in many languages, and certainly, in the three languages
under investigation here (cf. Ibarretxe-Antuñano 1999a, 2002), it is far from been
universal. Evans and Wilkins (2000) have shown that Australian languages do not
conceptualise intellection as vision, but as hearing. Furthermore, these authors
claim that one of the possible reasons that could explain why Australian languages behave differently has to be found in the cultural and social practices of
the Aboriginal people.
In our case, we find that there are not significant differences in the conceptualisation of the sense of touch in English, Basque, and Spanish. Speakers of these
three languages, therefore, seem to share both their experience and understanding
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.8 (501-557)
 Iraide Ibarretxe-Antuñano
of the tactile sense, and the background and practices of Western culture – despite
individual differences.
Another issue that I would like to point out is that the semantic extensions
that we have discussed in this paper are not the only ones that can be found in
English, Basque, and Spanish. As discussed elsewhere (Ibarretxe-Antuñano 1999a,
2002), each of these languages creates further mappings from the domain of touch
onto other semantic domains. In English, for example, the verb touch can convey
the meaning of ‘to ask for a loan’, as in Touch a friend for five dollars (am). In
Basque, we also find the semantic extension ‘to consider, to weigh up’ with the
verb haztatu ‘touch’. In Spanish, the verb tocar ‘touch’ also means ‘to be a relative’
and ‘to fall to’. However, it is important to notice that the usage of these meanings is
quite peripheral in comparison with the other extensions discussed above. There
are two reasons that support this: (i) these extensions – especially in the case of
English and Basque – are restricted to certain dialectal variations. The meaning ‘to
ask for a loan’ is typical of American English, and the Basque verb haztatu is more
common in northern dialects; (ii) these meanings are hardly ever used. Although
we would need a statistical analysis of corpus data to be really sure about their
status, a random search of a hundred examples on the corpora that we have used
in these three languages retrieves no cases of these usages.
The following sections examine how these polysemous senses are lexicalised.
The main goal is to test whether these semantic extensions emerge from interaction between the semantic content of the verb and that of its arguments; and then,
to determine what elements intervene in the lexicalisation, as well as to what extent each element is semantically responsible for such meanings. Contrary to the
results in Section 2, the lexicalisation tools and techniques that languages possess
vary from one to another. As a consequence, the lexicalisation patterns of these
semantic extensions apply only to one language, and not cross-linguistically.
. Compositional polysemy: The semantic packaging of lexical items
A word is understood as polysemous if all its multiple meanings are systematically related. One of the most important goals in Cognitive Linguistics has been to
show that the multiple semantic extensions of a lexical item are related not in an
arbitrary but in a systematic and natural way by means of several cognitive mechanisms such as image schemas, metaphor and metonymy. Numerous studies within
this framework have shown that this is a strong hypothesis. A classical example is
the analysis of the preposition ‘over’ (Brugman 1981; Lakoff 1987).
These authors offer a very detailed exposition of the relationships among the
different semantic extensions of the preposition ‘over’. However, neither of them
explicitly acknowledges that these meanings are possible not only thanks to the
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.9 (557-629)
Cross-linguistic polysemy in tactile verbs 
semantic content of the preposition itself, but also to the insertion of very specific
lexical items. Let us draw some examples to illustrate this point.
The central meaning of ‘over’ is one that combines elements of both ‘above’
and ‘across’ as in (25). The ‘above-across’ meaning has several variants as in (26),
(27), and (28):
(25) The plane flew over
(26) The bird flew over the yard
(27) Sam climbed over the wall
(28) Sausalito is over the bridge
These are just four different examples taken from Brugman’s analysis of the preposition ‘over’. According to this author, the central sense of the preposition ‘over’
(‘above-across’), has different variants depending on (i) the contact or no contact
between the LM and TR; (ii) the position and extension of the LM, and (iii) the
endpoint focus. However, not all these extra bits of information are contained in
the preposition itself; instead, they are contained in other elements of the sentence.
For instance, the fact that in some cases ‘over’ implies contact, is not inferred from
the preposition but from the verb used. In (27), the information provided by the
verb, ‘climb’, automatically entails that there is contact between the subject, “Sam”
(the TR) and “the wall” (the LM), because it is impossible to climb a wall without touching it. In a similar way, the no-contact characteristic of ‘over’ in (25) and
(26) is also implied by the verb ‘to fly’. In most cases, when we say that something
is flying, we visualise the flying object (bird, plane. . .) as not touching any surface.
In (27), the additional information that the LM is vertical, is not only provided
by the LM (“the wall”) itself, but also by the verb ‘to climb’, which by default implies an upward movement. Even in the case of an end-point focus, as in (28), this
meaning is not added by anything in the sentence, but is “the result of a general
process that applies in many, but not all English prepositions” (Lakoff 1987: 424),
the other members of the sentence contribute to this meaning. Without the static
verb ‘to be’, which implies that there is no movement, and ‘the bridge’ (a structure
with a beginning and an end), the end-point focus could not be inferred.
Based on these examples,13 it can be argued that the polysemy in the preposition ‘over’ is not only obtained by the semantic content of this preposition, but
also in conjunction with the semantic content of the words that accompany it in
the sentence in which it occurs. The emergence of different senses from an interaction between a preposition and co-occurring elements is not an isolated case.
A similar situation can be found in the case of the semantic extensions of tactile
verbs described in Section 2. For example:
(29) John hardly touched the food
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.10 (629-685)
 Iraide Ibarretxe-Antuñano
One of the cross-linguistic extensions of tactile verbs is ‘to partake of food (or
drink)’ as illustrated in (1), reproduced here as (29). The reason why we interpret
this sentence with this sense lies not only on the presence of the verb ‘to touch’,
but also on those elements that directly complement it, such as “the food” and the
adverbial “hardly”. Without either of these two elements, it would be impossible
to infer a meaning like ‘to partake of food’. If we removed the adverbial, as in John
touched the food, the meaning would correspond to either the prototypical meaning of touch, or to the semantic extension ‘affect’. If we change the complement
that denotes some kind of edible object for some other concrete element as in John
hardly touched the table, the interpretation of this sentence would be the same as
in the case before: a prototypical ‘touch’ or ‘affect’. The same situation occurs both
in Basque and Spanish.
(30) Jonek
janaria ikutu
du
john.erg food.abs touch.perf aux.3s
(31) Juan tocó
la comida
john touched the food
(32) Jonek
ez du
ia
mahaia ikutu
john.erg neg aux.3s almost table.abs touch.perf
(33) Juan apenas tocó
la mesa
john hardly touched the table
If the adverbs ia ez and apenas are got rid of, as in (30) and (31),14 or if the complement is exchanged for one such as mahai and mesa, as in (32) and (33), we obtain
similar interpretations as those in the English examples. Therefore, it is possible
to predict that whenever the complement of the verb ‘to touch’ refers to an edible
object, then the meaning is ‘to partake of food’.
The situation is somehow different in the following examples.15
(34) Nork
ukitu
ditu nire soinekoak?
who.erg touch.perf aux my dress.abs.p
“Who touched my dresses?”
(35) Ha tocado el punto más alto de su carrera
has touched the point more high of his career
“He touched the highest point in his career”
In (34), the extended meaning is ‘to affect, physically’. Someone has changed the
state in which the clothes were and we want to know who that person is. In order
to infer this meaning we need an entity that is able to carry the action of touching,
as well as an entity that can be touched by the subject. Unlike in (33), where the
choice of both subject and complement is not very wide; there are many entities
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.11 (685-756)
Cross-linguistic polysemy in tactile verbs 
that can carry out both tasks (see previous examples (30), (31), (32), and (33)).
This meaning does not depend upon such a restrictive choice of arguments.
The same statement can be made about the second sentence. In (35), the
meaning is ‘to reach’. In this case, the fact that an end-point is implied is not only
conveyed by the nature of the tactile verb itself, but also by the complement el
punto más alto, ‘the highest point’, that denotes a limit to that metaphorical action
of ‘touching’. And since el punto más alto is without dimension, we get the achievement reading of ‘to reach’. As in the other examples, there are many other entities,
like ‘bottom’ and ‘eternity’, that can be placed in this position.
In these two examples, the semantics of the other elements of the sentence
plays a role in the overall meaning, but the importance of these elements is not
as decisive as in the previous example (33). In order to obtain the meanings ‘to
affect, physically’ and ‘to reach’, it is necessary to have subjects who are able to
touch, and complements that can be touched. The achievement of that meaning,
however, is not as dependent on these arguments, as in (33). In (34) and (35) the
intrinsic meaning of the verb itself plays a much more important role, than that of
its arguments.
Finally, we have sentence (36) with the meaning ‘to affect’.
(36) John touched Mary
This sentence is highly ambiguous; there are simultaneous interpretations of this
sentence. (36) can infer a physical contact between John and Mary, i.e. the prototypical meaning of touch; the meaning ‘to affect, physically’ as in a situation
where John is not expected by Mary and when he touches her, he makes her shiver;
and the meaning ‘to affect, metaphorically’, in which case an emotional reaction
from Mary is implied. Without any more information about the context in which
this sentence is uttered, one cannot decide whether (36) should be interpreted
physically or metaphorically.
Unlike the other examples, (36) cannot be predicted by the semantic properties of the arguments that the verb takes. “John” and “Mary” are too vague to
constrain the semantic extension that takes place in this example. In the case of
the meaning ‘to partake of food’, the complement, “the food” constrains the semantic extension of the verb, because there are not too many things that can be
done with food, apart from eating, cooking. . . With “John” and “Mary” the case
is different: the possibilities for these two entities are infinite; and yet the meanings include only the prototypical meaning, and ‘to affect’ (both physically and
metaphorically).
In sum, based on these sets of examples, we can divide these polysemous senses
into two groups. On the one hand, examples like (36), where it is not possible
to predict what the interpretation is by means of the choice of arguments, are
called ‘unpredictable’ cases of polysemy; and on the other hand, those where the
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.12 (756-799)
 Iraide Ibarretxe-Antuñano
choice of arguments leads to a specific predictable extension of meaning are called
‘predictable’ cases.16 The latter is further classified depending on the degree of
influence of the semantics of the arguments involved. Where a meaning such as
‘to partake of food’ is mainly determined by the arguments and other elements
in the sentence and in other meanings like ‘to affect, physically’ and ‘to reach’,
where it is the verb that mainly governs the choice of arguments and meaning.
The former are called ‘argument-driven extensions’ and the latter, ‘verb-driven
extensions’. These two groups, therefore, reveal that the weight of the semantics of
the different elements in the overall meaning of a sentence is not the same in all
extended meanings, but hierarchically organised according to the degree of influence of the lexical items involved. I call ‘compositional polysemy’ to this graded
involvement of elements in the creation of polysemes.
Although three different types of semantic extensions have been proposed so
far, it is important to notice that all these meanings must have something in common in order to be extended from the physical sense of touch, and also in order
to explain why the same extensions of meaning happen in English, Basque and
Spanish. Otherwise, it will be impossible to say why other sentences like (37) are
ruled out.
(37) Peter touched the joke
The reason why this example is not felicitous when no context is given lies in the
fact that “the joke” is not a ‘touchable’ type of concept – i.e., a joke cannot be
touched in any abstract possible way, as el punto más alto, ‘the highest point’, is in
example (35) above. From a cognitive linguistics point of view, the fact that “the
joke” is not licensed with the verb ‘to touch’, stems from the way we experience
this sense in our lives, in the human embodiment of this sense (Johnson 1987).
Therefore, all these meanings must fulfil the ‘verb property requirement’, which
in the case of tactual verbs is the condition of being ‘touchable’. That is, the verb
arguments must be able to touch, if they are subjects, or be able to be touched, if
they are complements.
. Cross-linguistic polysemy: Meaning and lexicalisation across languages
In the previous section, I have shown that extended meanings are obtained by the
interaction of the semantic content of both the perception verb and its complements. The role of the semantics of both the perception verb and its complements
is not the same in all extended meanings; in some cases, the verb is more important
and in some other cases, the complements are.
One common characteristic of all the examples analysed in Section 3 is that
the explanations provided were applicable to the same cases in the three languages
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.13 (799-860)
Cross-linguistic polysemy in tactile verbs 
under investigation. In other words, the same elements were crucial in the lexicalisation of those meanings in English, Basque and Spanish, and as a result they were
classified under the same degree of compositionality.
However, this situation does not always happen. In most cases, as authors such
as Talmy (1991, 2000) have shown, languages show a great deal of variation in
mapping lexical resources onto semantic domains. The systematic relations between semantic elements – meaning – and surface elements – linguistic forms –
do not usually show one-to-one correspondence across language types. In fact,
this relationship may take different forms, with multiple semantic elements being
expressed by one surface element, or a single semantic element being expressed
by multiple surface elements. Let us illustrate this point with the example of
unpredictable polysemy, as in John touched Mary.
If we translate this sentence into Basque (38), and Spanish (39), using exactly
the same elements (“John”, “Mary”, and “touch”), the results are quite different.
(38) Jonek
Miren
ukitu zuen
john.erg mary.abs touch aux.3s
“John touched Mary”
(39) Juan tocó
a María
john touched to mary
“John touched Mary”
In both languages, the only possible interpretation of (38) and (39) is the prototypical meaning of physical touching. In these sentences, it is understood that ‘John
physically touched Mary’. In no way can they have the metaphorical ambiguity that
exists in the English version. This is not to say that it is impossible to express the
metaphorical reading ‘to affect’ in these two languages with tactile verbs. This is
perfectly possible as we saw in examples (8) and (9) in Section 2.
In Basque as well as in Spanish the mapping between the physical domain of
‘touch’ and that of ‘to affect’ is also allowed; but in order to obtain this meaning
it is necessary to add a verb complement that denotes feelings. The direct object –
bihotz gogorra ‘hard heart’ in (8), and el corazón ‘the heart’ in (9) – fulfils this necessity. The heart in these examples is not understood as a physical object, but
as the seat of feeling. In the cognitive approach literature, ‘heart’ is a metaphorical realisation of the image schema of a container, where heart is a container
for feelings (Kövecses 1986; Lakoff & Johnson 1980). In fact, as Moliner (1983)
points out, in Spanish the verb tocar needs expressions, such as el corazón, ‘the
heart’, el amor propio, ‘one’s own pride’, la dignidad, ‘dignity’, in order to imply this
interpretation.17
These examples show that, although the same semantic mappings between
different domains take place cross-linguistically, the strategies that each language
follow to express such meanings are different. What in one language can be ex-
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.14 (860-915)
 Iraide Ibarretxe-Antuñano
pressed by a single lexical item (i.e., a verb), in other languages may require
several lexical items (i.e., a verb and arguments) to generate the same meaning.
This statement has important implications for our theory of polysemy and its
cross-linguistic character.
First of all, it is important to make a distinction between conceptual mappings
on the one hand, and overt realisations of those conceptual mappings on the other;
between the links established between different domains of experience – those discussed in Section 2 – and the different strategies that languages follow to overtly
express those links. In other words, one issue appeals to our conceptualisation of
the world, which is shared by all humans with the same cultural background, the
other, to the linguistic means that each language in particular has to lexicalise those
conceptualisations.
In previous analyses of polysemous lexical items (cf. Brugman 1981; Lakoff
1987), there was no distinction between these two concepts. If a lexical item was to
be taken as polysemous in itself, that is to say if polysemous senses were localised
in one lexical item without taking into account the semantic content of the other
words that co-occur with this lexical item, then both conceptual structure and
overt expression of such conceptual structure were the same. If the conceptual
structure were cross-linguistic, and conceptual structure and the overt expression
of such conceptual structure were the same, then, transitively, it could be argued
that both were cross-linguistic.
However, I have shown that this is not the case. Lexical items are not generally
polysemous in themselves, unless they are cases of ‘unpredictable polysemy’. They
need the help of the semantic content of other lexical items in order to obtain those
polysemous senses, and as shown in this section, which lexical items are required
to trigger and build the different extended polysemous readings are not the same
in every language.
It is for these reasons that I will consider that the verbs themselves are not
polysemous, but that the conceptual domain of sense perception is polysemous.
The different mappings presented in Section 2 are not to be taken as semantic extensions of the perception verbs themselves, but polysemous senses of the
conceptual domain of sense perception. I will call the group of these extended
meanings ‘conceptual polysemy’.
In sum, I argue that when we analyse the meanings that take place in a semantic field, we need to distinguish and address two different sides. On the one
hand, we need to establish its ‘conceptual polysemy’, i.e. the conceptual mappings
that take place between different domains of experience. This conceptual polysemy
is constrained by the bodily basis of the semantic field under analysis. Because
this bodily basis is shared by and common to all humans with the same cultural
background, conceptual polysemy is cross-linguistic. On the other hand, it is necessary to establish which elements are involved in the creation of such conceptual
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.15 (915-970)
Cross-linguistic polysemy in tactile verbs 
polysemy, and to what extent their semantic content participates in the creation
of such extended meanings. Therefore, conceptual polysemy can be considered a
cross-linguistic phenomenon, but the classification of extended meanings under
the three different degrees of compositionality only a language particular one.
. Conclusions
In this paper I have examined the polysemy that exists in tactile verbs, one per
language, in three genetically unrelated languages, English, Basque and Spanish.
The two major concerns raised include on the one hand, a description of the semantic extensions that take place in this domain of tactile perception; and on the
other, the study of lexicalisation patterns and elements needed to convey these
polysemous senses.
As I have shown, tactile verbs do not only express physical contact. There
are four semantic extensions shared by these three languages: ‘to partake of
food/drink’, ‘to affect, physically’, ‘to affect, metaphorically’, ‘to reach’, and ‘to deal
with’. The last three senses prove that the metaphorical scope of this domain is
much more productive than that described in other studies (Sweetser 1990).
With respect to the lexicalisation of these meanings, I have proposed the idea
of ‘compositional polysemy’, i.e. different polysemes of a lexical item – the tactile
verb in this case – are obtained through the interaction of the semantic content
of both the lexical item itself and its different co-occurring elements. The weight
of the semantics of these elements in the creation of these semantic extensions is
not the same; it varies according to the degree of semantic influence of these elements on the overall meaning. That is to say, in some meanings, the role played
by the arguments of the verb is crucial. In some other meanings it is the verb that
governs the choice of arguments and meaning. These cases are predictable polysemous meanings: the former is an ‘argument-driven extension’, and the latter
a ‘verb-driven extension’. Finally, there is a third class of meanings, where interpretation is not predictable by means of the choice of arguments. These are
unpredictable cases of polysemy. Our model for the analysis of polysemy, therefore, can be situated between what is known as the maximization of polysemy, i.e.
the word itself carries most of the polysemous workload and speakers just have
to choose correctly in context, and the minimization of polysemy, i.e. most of the
workload is on the speaker’s side who has to interpret the meaning from the context (see Behrens 1999; Cruse 1986, 2000; Lyons 1977, 1995, among others). On
the one hand, our proposal recognises the importance of contextual elements in
the creation of polysemous senses, but on the other, it establishes the necessity
for (i) a graded typology of contextual involvement, and (ii) a constraint that restricts the participation of co-occurring elements to those that are compatible with
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.16 (970-1031)
 Iraide Ibarretxe-Antuñano
the conceptual properties characterising the polysemous word analysed – tactile
verbs in this paper. These last two elements differentiate our analysis from more
traditional pragmatic approaches to polysemy, where many of these polysemous
senses are the result of contextual effects (cf. Sperber & Wilson 1995; see Nerlich
& Clarke 2001, for more information about polysemy in relation with pragmatics). Finally, it has been shown that the phenomenon of compositional polysemy
is found cross-linguistically. What differs from language to language is the degree
of compositionality of the same semantic extension which is language specific.
Notes
* This research is supported by Grant BFI99.53.DK from the Basque Country Government’s
Department of Education, Universities and Research. I would like to thank June Luchjenbroers
and an anonymous referee for their valuable comments, and especially June for her never-ending
patience. The author can be contacted at <[email protected]>.
. Sweetser argues that in all Indo-European languages, the verb to feel is the same as the verb
indicating general perception. For instance, the verb sentir (< Latin sentire) in Spanish. However,
this is overstated because it does not hold in languages such as Russian (Moiseeva 1998: 160).
. “The scope of metaphor is simply the full range of cases, that is, all the possible target domains, to which a given specific source concept (such as war, building, fire) applies” (Kövecses
2000: 81).
. Due to space constraints, in this paper I limit myself to enumerate and describe what those
semantic fields are, I do not get into much detail about the cognitive mechanisms – metaphor
and metonymy, for example- that make such mappings between conceptual domains possible.
Those interested in this topic may consult Ibarretxe-Antuñano (1999a, 1999b, 2000, 2003).
. In each of these languages there are more verbal realisations of the sense of touch than just
the specific verb I have chosen to illustrate the main theoretical points put forward in this paper.
The fact that I only analyse one per language is only due to length restrictions. The theoretical
claims are therefore applicable to any tactile verb, or to any perceptual verb for that matter, as I
have shown elsewhere (Ibarretxe-Antuñano 1999a).
. Ukitu is the verb used in Standard Basque. In some of the examples discussed in this section,
the verb ikutu is also used. This is a variant in the Guipuzcoan and Biscayan dialects.
. The right to use the BNC is granted by Oxford University Press to researchers working on the
FrameNet project, International Computer Science Institute and the Univ. California, Berkeley.
. These examples occur without any bracketed indication of the source.
. It has been suggested by one of the anonymous reviewers that these two physical semantic
extensions in touch, ‘partake’ and ‘affect’, could be considered different interpretations of the
literal physical touch calculated from the minimally necessary condition of touching for each
of these activities (eating, taking, etc.), instead of conventional meanings. It is true that both
activities, either when we eat/drink or cause a physical effect on something, require physical
contact to be performed. It we are going to eat something, we have to necessarily touch the
food, if we want to change the place where something is, we have to touch it. However, I think
that these activities go beyond physical touch and cannot be considered as simple implicatures of
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.17 (1031-1097)
Cross-linguistic polysemy in tactile verbs
touching. Touching is a necessary condition for these activities, but not a sufficient explanation.
In both cases, the physical activity – partaking, affecting – stands for the action that caused this
result, the touching, and therefore, they can be considered cases of the metonymy result for
action.
. This positive interpretation is explained in Lakoff and Johnson (1980). (10), (11), and (12)
are examples of what they call ‘orientational’ metaphors: “metaphorical concept that organises a
whole system of concepts with respect to one another” (1980: 15). Up is always related to good,
high status and it is opposed to down, which implies bad, low status; as in the expression to
touch bottom.
. The fact that the endpoint in these examples is spatial, i.e. there is a physical destination to
which these people arrived – the dock –, causes an ambiguous interpretation. On the one hand,
there is a physical contact between the ship and the dock and therefore, this interpretation could
be understood as metonymical. However, on the other hand, there is a metaphorical mapping
in these sentences because the expression touch at refers to the activity of arriving or reaching a
destination. I would like to thank one of the anonymous reviewers for drawing this point to my
attention.
. This is possible thanks to the semantic contribution of the preposition at in (13) and that of
the phrase mientras hacían cruceros ‘while they were cruising’ in (14). As I will explain in more
detail in Section 3, these are cases of compositional polysemy.
. For a detailed discussion on the conceptual basis of the semantic extensions of tactile verbs,
see Ibarretxe-Antuñano (2000). In this paper, the sense of touch is characterised in terms of prototypical properties. These are drawn from psychological and physiological descriptions of this
sense. Each semantic extension selects a number of these properties. These selected properties
are to be taken as the bodily basis for the semantic extensions. For instance, the meaning ‘to
reach’, selects three properties: (i) <contact>: the perceiver must have physical contact with the
object perceived, (ii) <closeness>: the object perceived must be in the vicinity of the perceiver,
and (iii) <limits>: the perceiver is aware of the boundaries imposed by the object perceived.
. More discussion on other extensions of over can be found in Ibarretxe-Antuñano (1999a:
183).
. The Basque equivalent to the English adverb hardly is the adverb ia ‘almost’ together with
the negation ez.
. In order to save some unnecessary repetition of the same explanations in each of these examples, I only include one example per language. The same explanations are applicable to the
equivalent sentences in the other two languages reproduced in Section 2.
. The labels ‘predictable’ and ‘unpredictable’ are to not be taken just as descriptive terms for
the individual examples analysed in this section. In our opinion, in every polysemous word, we
can find semantic extensions that can be easily ‘predicted’ or ‘guessed’ by the semantics of the
co-occurring elements, and semantic extensions that cannot be ‘predicted’ or ‘guessed’.
. As I have shown elsewhere (Ibarretxe-Antuñano 1999c), there is another possibility to lexicalise this meaning in Basque: to change the verb ukitu for the etymologically related hunkitu.
The latter is generally used in the metaphorical sense, and does not refer to the physical touching
unless an adjunct denoting a physical instrument such as eskuz ‘with the hand’ is inserted.

JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.18 (1097-1211)
 Iraide Ibarretxe-Antuñano
References
Aulestia, Gorka (1989). Basque-English Dictionary. Reno and Las Vegas: University of Nevada
Press. (au)
Behrens, Leila (1999). Aspects of polysemy. In David A. Cruse, Franz Hundsnurscher, Michael
Job, & Peter Rolf Lutzeier (Eds.), Lexicologie – Lexicology, Vol. 1 (pp. 135–167). Berlin:
Walter de Gruyter.
Brugman, Claudia (1981). The Story of Over. MA Thesis. University of California at Berkeley.
Buck, Carl D. (1949). A Dictionary of Selected Synonyms in the Principal Indo-European
Languages. Chicago: Chicago University Press.
Classen, Constance (1993). Worlds of Sense: Exploring the Senses in History and Across Cultures.
London: Routledge.
Collins English Dictionary and Thesaurus (1993). Italy: Harper Collins Publishers. (col)
Collins Spanish-English-Spanish Dictionary (1996). Glasgow: Harper Collins Publishers. (cse)
Cruse, David A. (1986). Lexical Semantics. Cambridge: Cambridge University Press.
Cruse, David A. (2000). Meaning in Language. Cambridge: Cambridge University Press.
Diccionario de la Real Academia de la Lengua Española (1984). Madrid: RAE. (rae)
Evans, Nick & David Wilkins (2000). In the mind’s ear: The semantic extensions of perception
verbs in Australian languages. Language, 76(3), 546–592.
Gibbs, Raymond W. Jr. (1999). Taking metaphor out of our heads and putting it into the cultural
world. In Raymond W. Gibbs, Jr. & Gerard J. Steen (Eds.), Metaphor in Cognitive Linguistics
(pp. 145–166). Amsterdam and Philadelphia: John Benjamins.
Herskovits, Anna (1986). Language and Spatial Cognition: An Interdisciplinary Study of the
Prepositions in English. Cambridge: Cambridge University Press.
Howes, David (Ed.). (1991). Varieties of Sensory Experience: A Sourcebook in the Anthropology of
the Senses. Toronto: University of Toronto Press.
Ibarretxe-Antuñano, Iraide (1999a). Polysemy and Metaphor in Perception Verbs: A Crosslinguistic Study. PhD Thesis. University of Edinburgh.
Ibarretxe-Antuñano, Iraide (1999b). Metaphorical mappings in the sense of smell. In Raymond
W. Gibbs, Jr. & Gerard J. Steen (Eds.), Metaphor in Cognitive Linguistics (pp. 29–45).
Amsterdam and Philadelphia: John Benjamins.
Ibarretxe-Antuñano, Iraide (1999c). Predictable vs. unpredictable polysemy. In S. J. Hwang &
Arle Lommel (Eds.), LACUS Forum, 25, 201–211.
Ibarretxe-Antuñano, Iraide (2000). An inside look at the semantic extensions in tactile verbs.
In Francisco J. Ruiz de Mendoza (Coord.), Panorama actual de la lingüística aplicada.
Conocimiento, procesamiento y uso del lenguaje (pp. 1053–1060). Logroño: Universidad de
La Rioja.
Ibarretxe-Antuñano, Iraide (2002). Mind-as-body as a cross-linguistic conceptual metaphor.
Miscelánea. A Journal of English and American Studies, 25, 93–119.
Ibarretxe-Antuñano, Iraide (2003). El cómo y el porqué de la polisemia de los verbos de
percepción. In Clara Molina, María Luisa Blanco, Juana Marín, Ana Laura Rodríguez,
& Manuela Romano (Eds.), Cognitive Linguistics in Spain at the turn of the century / La
Lingüística Cognitiva en España en el cambio de siglo (pp. 213–228). Madrid: Universidad
Autónoma de Madrid.
Johnson, Mark (1987). The Body in the Mind. The Bodily Basis of Meaning, Imagination and
Reason. Chicago: Chicago University Press.
JB[v.20020404] Prn:9/02/2006; 15:42
F: HCP1511.tex / p.19 (1211-1319)
Cross-linguistic polysemy in tactile verbs 
Kövecses, Zoltán (1986). Metaphors of Anger, Pride, and Love: A Lexical Approach to the Study of
Concepts. Amsterdam and Philadelphia: John Benjamins.
Kövecses, Zoltán (1995). American Friendship and the Scope of Metaphor. Cognitive Linguistics,
6(4), 315–346.
Kövecses, Zoltán (2000). The Scope of Metaphor. In Antonio Barcelona (Ed.), Metaphor and
Metonymy at the Crossroads. A Cognitive Perspective (pp. 79–92). Berlin and New York:
Mouton de Gruyter.
Kurath, Hans (1921). The Semantic Sources of the Words for the Emotions in Sanskrit, Greek, Latin
and the Germanic Languages. Menasha, WI: George Banta.
Lakoff, George (1987). Women, Fire and Dangerous Things. What Categories Reveal about the
Mind. Chicago: Chicago University Press.
Lakoff, George & Mark Johnson (1980). Metaphors We Live By. Chicago: Chicago University
Press.
Langacker, Ronald W. (1987). Foundations of Cognitive Grammar, Vol. I: Theoretical Prerequisites.
Stanford, CA: Stanford University Press.
Langacker, Ronald W. (1991). Foundations of Cognitive Grammar, Vol. II: Descriptive Application.
Stanford, CA: Stanford University Press.
Lyons, John (1977). Semantics. Cambridge: Cambridge University Press.
Lyons, John (1995). Linguistic Semantics. Cambridge: Cambridge University Press.
Moiseeva, Nadezda (1998). Verbs of perception in Russian. In M. Giger, T. Menzel, & B. Wiemer
(Eds.), Lexicologie und Sprachveränderung in der Slavia. Studia Slavica Oldenburgensia 2 (pp.
153–164). Oldenburg: Bibliotheks- und Informationssystem der Universität Oldenburg.
Moliner, María (1983). Diccionario del Uso del Español. Madrid: Gredos.
Nerlich, Brigitte & David D. Clarke (2001). Ambiguities we live by: Towards a pragmatics of
polysemy. Journal of Pragmatics, 33, 1–20.
Oxford Spanish Dictionary (1994). Oxford, New York, Madrid: OUP. (osd)
Ong, Walter J. (1967). The shifting sensorium. In D. Howes (Ed.), Varieties of Sensory Experience:
A Sourcebook in the Anthropology of the Senses (pp. 25–30). Toronto: University of Toronto
Press.
Sarasola, Ibon (1984–1995). Hauta-lanerako Euskal Hiztegia. Zarauz: Itxaropena. (is)
Sperber, Dan & Deidre Wilson (1995). Relevance. Communication and Cognition. Oxford:
Oxford University Press.
Sweetser, Eve E. (1990). From Etymology to Pragmatics. Metaphorical and Cultural Aspects of
Semantic Structure. Cambridge: Cambridge University Press.
Talmy, Leonard (1991). Path to realisation: A typology of event conflation. In Proceedings of the
Seventeenth Annual Meeting of the Berkeley Linguistics Society (pp. 480–519).
Talmy, Leonard (2000). Toward a Cognitive Semantics. Cambridge, MA: MIT Press.
The American Heritage Dictionary (1992). Houghton Mifflin Company. 3rd edition. (am)
Vandeloise, Claude (1991). Spatial Prepositions: A Case Study from French. Chicago: Chicago
University Press.
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.1 (48-113)
chapter 
How experience structures the
conceptualization of causality*
Maarten Lemmens
Université Lille 3, France
The present article sketches some variations in the conceptualization of causative
events, in particular as coded by a subgroup of lexical causatives – i.e., verbs of
killing. However, my analysis aims to go one step further than a mere description
of these variations by showing their experiential alignment, adopting a moderate
experiential point of view. By examining a considerably large corpus of lexical
causatives, and the variations between the event construals that they entail, the
present paper will outline some of the factors that play a role in structuring our
experience, and thus our coding of causation. The analysis shows how the choice
between two different models at work in the grammar of causative events, viz.
the transitive or the ergative model, aligns in subtle ways with the specifics of the
event experienced.
Keywords: lexical causatives, causative alternation, transitivity, ergativity
.
Introduction
One of the basic tenets of Cognitive Grammar is that meaning is equated with conceptualization. That is, semantic structure is defined as conceptualization “tailored
to the specifications of linguistic convention” (Langacker 1987: 99).1 The meaning
of a linguistic expression is a cognitive structure characterized relative to cognitive
domains “where a domain can be any sort of conceptualization: a perceptual experience, a concept, a conceptual complex, an elaborate knowledge system, etc.”
(Langacker 1991a: 3). As Lakoff (1987) has shown, most of these are idealized
cognitive models (icms). Such models are similar to Fillmore’s frames, defined
as “unified frameworks of knowledge, or coherent schematizations of experience”
(Fillmore 1985: 223).
Against the background of such larger conceptual structures, linguistic structures impose their own specifications, which brings us to another pivotal claim of
Cognitive Grammar, viz. that “linguistic expressions and grammatical construc-
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.2 (113-164)
 Maarten Lemmens
tions embody conventional imagery” (Langacker 1988: 7). Meaning thus relies on
our ability to conceptualize the same object or situation in different ways. As Casad
summarized it: “the speaker’s ability to conceptualize situations in a variety of ways
is, in fact, the foundation of cognitive semantics” (1995: 23).
The present article sketches some of such variations in the conceptualization
of causative events, in particular as coded by a subgroup of lexical causatives –
i.e., verbs of killing. However, my analysis aims to go one step further than a mere
description of these variations by showing their experiential alignment, adopting
a moderate experiential point of view, as proposed by Lakoff (1987) and others.
Lakoff defines the experientialist strategy as an attempt “to characterize meaning in terms of the nature and experience of the organisms doing the thinking”
(1987: 266). In line with what was said before, experience is not to be understood
in an individual sense, but in a broad sense: “the totality of human experience and
everything that plays a role in it” (1987: 266).
By examining a considerably large corpus of lexical causatives, and the variations between the event construals that they entail, the present paper will outline
some of the factors that play a role in structuring our experience, and thus our coding of causation.2 The choice of a causative model, viz. transitive or ergative, aligns
in subtle ways with the specifics of the event experienced. For example, the ergative
predilection of the suffocate verbs (e.g., ‘asphyxiate’, ‘suffocate’, or ‘choke’), as
emerging from both historical and contemporary data, can be explained as having
an experiential basis. My data further suggest that the opposition between external and internal causation aligns firstly with the paradigmatic opposition between
the transitive and the ergative; and secondly, within the ergative model, with the
opposition between ‘effective’ and ‘non-effective’ constructions (or ‘causative’ vs.
‘non-causative’ in more traditional terminology). Experience from a more culturally or ideologically coloured point of view will be at the heart of the changes that
have occurred in the semantic and constructional evolution of ‘abort’. By looking
at the data in this way, we can arrive at a better characterization of the conceptual
structures of these verbs as well as the constructions in which they occur, revealing
coding patterns and/or tendencies that would otherwise have gone unnoticed.
. Two models of causation
In general, my analysis of lexical causatives follows Davidse’s (1991, 1992) account
which posits that the English grammar of actions and events is governed by two
distinct causative models, viz. the transitive and ergative paradigms. These two
models represent different ways of conceptualizing causative processes, implying
different conceptual centres and different participant relations. In Halliday’s terms,
they are said to project different ‘inherent voice’ relations. The following summary
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.3 (164-278)
The conceptualization of causality 
hardly does justice to Davidse’s innovative work and merely mentions the most
basic distinctions relevant to my present purpose.3
The transitive paradigm, as realized in the example, John killed Mary, centres around an Agent, who directs a prototypically volitional action onto an inert
Affected. In more specific terms, in the transitive system the Actor-Process combination is the nuclear building block: it can be isolated in an objectless transitive
(e.g., Soldiers trained to kill), where the Actor is the transitive instantiation of a
more schematic Agent. This system is a linear one that prototypically extends to
the right to incorporate a fully passive Affected, called the ‘Goal’.
The ergative paradigm, in contrast, centers around the Affected, which in addition to being affected is also active. Its conceptual independence is reflected
in the fact that this participant can be isolated in a one-participant construction with an ‘agentive’ participant – e.g., Mary suffocated. In an ergative construal
(with either one or two participants), the process is conceptually dependent on the
‘Medium’, which is the entity that is affected yet also co-participates in the event
(much like a medium in the ESP sense). The process-medium cluster is semiautonomous vis-à-vis the ergative Agent, called the ‘Instigator’. In other words,
unlike the linear transitive, the ergative system is a nuclear one with two processual layers: the instigated process and the instigation of the process, which need
not be co-extensive in time or space (Davidse 1991: 67ff.). It is left oriented, in that
the basic conceptualization may be opened up to include the instigator.
Unlike many Asian, Australian or Amerindian languages, English does not
indicate transitivity and/or ergativity by overt case marking. They manifest themselves in more covert ways as reflected in, among other things, different alternation patterns. In his cognitive reinterpretation of nominative/accusative and
ergative/absolutive case marking, Langacker also observes that the transitive and
ergative patterns are not only coded by morphological markings, but find “numerous other linguistic manifestations” (1991b: 381). On the basis of the most essential alternation patterns, the transitive and ergative paradigms can be distinguished
as in Table 1.
The effective constructions are more specific instantiations of the agentprocess-affected schema. While formally identical, the underlying semantics
is different for the transitive and the ergative instantiations: for the latter the
Affected still co-participates, as reflected in the possibility of forming an ergative non-effective.4 The objectless transitive maximizes the transitive focus on
the actor-process unit. However, it is to be regarded as an effective construction, since the Goal is still very much implied (cf. Rice 1988; see also Lemmens
1998b: 140–146 for a more elaborate description).5 Such an objectless agentprocess construction is not possible with ergative verbs as they centre on the
Affected. For example, in the sentence, John suffocated, John cannot be interpreted
as the Agent who causes someone else’s suffocation; only as the entity who has
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.4 (278-298)
 Maarten Lemmens
Table 1. Paradigmatic instantiations
Construction
Transitive
Paradigmatic instantiations
Ergative
EFFECTIVE
Ag-Proc-Af
John killed Mary
Actor-Process-Goal
OBJECTLESS
John killed
Actor-Process-(Goal)
NON-EFFECTIVE
AcMe-Proc
John suffocated Mary
Instigator-Process-Medium
Mary suffocated
Medium-Process
Mary died
Actor-Process
PSEUDO-EFFECTIVE
(φ)Ag-Proc-(φ)Af
Mary died a slow death
Actor-Process-Range
The house blew a fuse
Setting-Process-Medium
suffocated (i.e., affected by the verb). The semantic value of the ergative, noneffective construction is that it neutralizes whether the process was self-instigated
or instigated by an external Instigator. As Smith (1978) has argued, a construction
is positively marked for the features of external control, as well as independent
activity (cf. also Haspelmath 1993: 90). The ergative effective resolves the voice
vagueness. The traditional intransitive, here called ‘transitive non-effective’ (e.g.,
Mary died; John stumbled), is regarded as a subtype of the transitive paradigm, as
it too centres around a volitional or non-volitional Agent. The pseudo-effectives,
not really relevant to the present discussion, have one participant that is not a
true participant (marked by the symbol φ); for the transitives, it is a pseudo-Goal
(called a ‘Range’); and for the ergatives, a setting functions as a pseudo-Instigator
(see Davidse 1991: 115–140). Note that the area of variability is different for both,
and pertains to a non-nuclear participant.
. Transitivity and ergativity in the field of killing
In her classification of verbs, Levin (1993) distinguishes two categories of verbs of
killing, the murder verbs which are manner-neutral and the poison verbs which
“lexicalize a means” – i.e., are “verbs which relate to actions which can be ways of
killing” (1993: 232). Next to these, she distinguishes a group of suffocate verbs,
as a subgroup of processes involving the body and defines them as “[relating] to
the disruption of breathing” (1993: 224). While Levin’s classification is correct for
the murder verbs, her semantic grid is not refined enough when it comes to her
class of poison verbs, which is a relatively heterogeneous group both lexically and
grammatically, something of which she herself is well-aware. To remedy some of
these shortcomings, I propose an alternative classification of the field which does
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.5 (298-340)
The conceptualization of causality 
transitive
murder
lynch butcher kill
ergative
slaughter execute
slay
massacre murder
assassinate
suffocate
decapitate
throttle
suffocate
decollate
stifle
behead
strangle
asphyxiate
choke
smother
instrument
drown
knife
starve
garrot
starve
action
famish
stab shoot
hang
Figure 1. General classification of verbs of killing
more justice to the verbs’ lexical as well as constructional prototypes. Diagrammatically, my classification can be represented as in Figure 1 (the labels of the
categories have been underlined; members are in italics).
While this figure is a gross oversimplification, it nonetheless sheds some light
on the internal structure of the field and the paradigmatic home-ground of some
of the items. Most obvious is the transitive predilection of the whole field. The
reason is straightforward: the concept of killing someone is quite compatible with
the meaning of the transitive paradigm, in which an inert Goal is affected by the
action of a volitional Actor. As indicated by the thickness of the circle, the murder verbs are prototypical within the field, and their markedly transitive character
radiates to the rest of the field.
The other group relevant to the present article is that of the suffocate verbs.
Levin’s original group of suffocate verbs has been extended to include the prototypically transitive verbs ‘smother’, ‘strangle’ and ‘throttle’. Despite these three,
the suffocate verbs are prototypically ergative (as are the starve verbs), as will
be elaborated below.
Two important nuances need to be added to the diagram. First, there are differences in the prototypicality of the different subgroups, which also show prototype effects. Moreover, the boundaries of the subgroups cannot always be sharply
delineated. In other words, the lexical field of killing emerges as a prototypebased category, whose external boundaries are not sharply delineated, and whose
internal structure shows considerable flexibility and differences in salience (see
Lemmens 1998b for more discussion). Finally, the diagram presents a constructional (i.e., paradigmatic) classification of verbs as fixed and stable; whereas, in
fact, these are features of the specific clause construal in which the lexical and
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.6 (340-392)
 Maarten Lemmens
constructional meanings combine to form a complex semantically well-motivated
unit (cf. Note 4; see also Lemmens 1998b for detailed descriptions of semantically
motivated paradigmatic shifts).
. Examples of experiential grounding
In view of the different conceptual nuclei of the transitive and the ergative models, Actor vs. Medium, it is sensible to assume that events that are perceived as
centring around either of these participants will activate a different coding model.
While watertight predictions are impossible to make, the data examined confirm
this view. As said, the field of killing generally tilts to the transitive side, which
aligns with the typical experience of a kill-event as involving a (typically) volitional
Agent who does something to an inert (and involuntary) Goal. In some events, the
Affected is experientially more salient, leading to an ergative coding. The following
subsection describes some specifics investigations into this experiential grounding.
Focus on volitionality
The more salient the volitionality of an Actor, the more likely that a typical transitive coding will be used. In the field of ‘killing’, this usually means a conceptualization in terms of a murder verb. Consider the case of a premeditated murder. In
such events, the intentionality of the Actors is quite salient, as they definitely plan
to kill someone. Given the availability of a particular lexical term to code such an
event, ‘murder’, this will most likely be the coding used. As Geeraerts, Bakema and
Grondelaers (1994) have shown, the more unique a referent, the more likely it is
that a unique term will be used. Conversely, the more a referent is described in
terms of a certain concept, the more that concept can be said to be entrenched.
Clearly, a given event may be coded in alternate ways, and thus, the choice of
‘murder’ cannot be rigidly predicted.
It can be noted in passing that the said importance of the victim in an assassination event does not constitute evidence that the construction should be ergative.
As for the ergative system, it is not the Affected’s importance tout court, but their
degree of (co-)participation in the event – i.e., its importance vis-à-vis the process
itself – that opens up ergative constructions (e.g., the ergative non-effective). This,
it can be added, is definitely not possible with ‘assassinate’.
Accidental causation
The murder verbs saliently incorporate the notion of volitionality into their semantic structure, and resist codings with unvolitional or inanimate Actors. Ironi-
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.7 (392-454)
The conceptualization of causality 
(a)
(b)
(c)
setting
(d)
setting
in
tr
lm
in
tr
die in a crash
lm
tr
be killed in a crash
tr
be killed by a crash
lm
the crash killed
112 people
Figure 2. Settings and Agents
cally, the verb in the literature most often cited as typically transitive, is the general
verb ‘kill’, which in fact, conforms least to the intentional Actor prototype of transitives. Its generality as well as non-prototypicality are explained by the interaction
of a number of factors:
1. its high lexical flexibility (the data show a 22% metaphor ratio vs. less than 9%
for the other murder verbs)
2. a less stringent implication of goal-achievement (as for instance reflected in
the common hyperbolical use, mostly in the progressives – e.g., my feet are
killing me.)
3. the possibility of a non-volitional Actor – e.g., he killed the woman without will
or conscious mind.
4. the possibility of an inanimate Actor – e.g., they were killed by stray bullets or
shrapnel.
It is precisely the latter environment that may at first be difficult to explain, yet
allows interesting comparisons with die verbs (e.g., die, perish, etc.). While the
semantic difference between ‘kill’ and ‘die’ may typically be unproblematic, there
seems to be a strong degree of overlap when it comes to coding casualties in accidents, diseases, and the like. A coding in terms of ‘kill’ for casualties in accidents
is quite frequent and occurs in more than 71% for such events described in the
wsj corpus, as opposed to 23% in terms of ‘die’ (cf. Table 2 below). This may be
surprising, given the absence of a volitional Actor, which would typically trigger
a transitive coding. Taking a ‘plane crash’ as an example, the most common codings can be represented as in Figure 2 (see Langacker 1991b, for the diagrammatic
conventions).
In all four constructions, the event (‘the crash’) is nominalized and consequently represented as a thing (indicated by a circle), lethally affecting the victim
(the change of state is represented by the squiggly line). The difference between
(c) and (d) represents the semantic purport of the passive construction, viz. a
trajector/landmark reversal (Emanatian 1993; cf. also Langacker 1987: 120ff.).
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.8 (454-542)
 Maarten Lemmens
Of the codings with ‘kill’, some 45% code the accident as the Actor, diagram (c)
or (d), which makes it the largest subgroup of inanimate Actors with ‘kill’ (60%).
This presents clear counterevidence to Dirven’s (1993: 95) claim that “killed by [. . .]
is incompatible with circumstantial causes such as accidents”, from which he deduces that the circumstances can be expressed only by means of ‘in. . .’, as in, be
killed in an accident. What is true, of course, is that for the latter construction, the
agent is highly schematic (as indicated by the cross-hatching), and the accident
cannot be added overly as the agent – e.g., *in that crash 111 people were killed by
it. The reason is that within one series of temporally contiguous segments of an
action chain, which Ryder (1991) conveniently terms ‘episodes’, one and the same
participant cannot simultaneously function as Setting and as Agent.6
The absence of an overt Agent in a ‘circumstantial passive’ construction is experientially grounded: in events like the ones reported, there is usually no Agent
more salient than the event itself. If anything is to function as Agent it is either
the event itself or a participant situated ‘within’ that event, and one that furthermore acquires sufficient prominence to be construed as Agent. While the latter is
in principle possible (e.g., In the explosion, seven people were killed by flying shards,
one person died from a heart attack), it is marked, as indicated by its total absence
in all corpora consulted (containing over 500 passives with the verb ‘kill’). Sansò
(2000) correctly observes that the passive leaves the trajectory (here the victim) on
stage while removing the causer: “the speaker can choose to focus on the patient,
thus displaying empathy with him, or can choose to embrace the maximal scope
of what is on-stage, thus conceptualizing the event as a whole” (2000: 3).
At first sight, the use of ‘kill’ in the above context may not seem to align
with our immediate experience, whence probably Dirven’s observation. The relationship between ‘kill’ and ‘die’ in the above contexts can indeed be puzzling
(cf. DeLancey 1984), and probes into the difference between Agents and Causers.
Within the scope of this paper, I cannot elaborate on this issue (see Lemmens
1998b: 123–126), but merely indicate that a coding with ‘kill’ still furnishes a more
active construal; whereas that with ‘die’ brings in a cause that straddles the border
between a nuclear participant and a circumstantial setting (see also Davidse 1991;
Talmy 1985). Given the overlap between the two verbs in the coding of accidents
and diseases, an absolute generalization is impossible to make. Nevertheless, the
corpus reveals a clear pattern, tabulated in Table 2.
As can be deduced from the table, the ratio of ‘kill’ and ‘die’ is reversed for the
two different events. It can be noted in passing that neither the type of accident
or disease, nor the number of people killed, can account for the different conceptualizations; on the whole, the contexts are comparable if not identical. Table
2 nevertheless indicates the prototypical conceptualizations: ‘kill’ will more readily be used for events that are prototypically caused by external and perceptually
distinguishable causes; whereas ‘die’ will be more typical in cases of less percepti-
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.9 (542-599)
The conceptualization of causality 
Table 2. Distribution of kill and die for accidents and diseases
Lexical item
Type of event
accident
disease
Total
die
Freq
Col Pct
Row Pct
36
23.2%
26.1%
102
85.7%
73.9%
138
50.4%
kill
Freq
Col Pct
Row Pct
111
71.6%
86.7%
17
14.3%
13.3%
128
46.7%
other
Freq
Col Pct
Row Pct
8
5.1%
100%
Total
Freq
Row Fct
I55
56.6%
8
0.%
119
43.4%
274
100.0
ble causation, often of the kind that comes ‘from within’. I hold the view that in
the prototypical case a transitive construal entails a maximally ‘external’ point of
view: the Actor is an entity external to the Goal and directly impinging on it, from
the outside.
The ergative predilection of the suffocate verbs
The internal-external alignment emerges even more strongly from the group of
suffocate verbs where it motivates not only the opposition between the (prototypically) transitive members of the group (‘strangle’, ‘throttle’, and ‘smother’) and
the ergative ones, but also within the latter group, the typical occurrence of an effective construction when combined with an external cause. The suffocate verbs
can be distinguished by which part of the respiratory system is affected (lungs,
throat, mouth, nose), and how it is typically affected (constriction, immersion,
coverage, etc.). This distinction can be represented as in Figure 3.
Although this distinction does not represent a rigid dichotomy, it gives a first
assessment of the experiential basis of the ergative preference of the group as a
whole, as well as the transitive character of some of its members and the mixed
external zone
transitive
struggle
throttle
internal zone
transitive-(ergative)
smother
stifle
Figure 3. Internal/external and ergative/transitive correlation
ergative
drown
choke
suffocate
asphyxiate
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.10 (599-658)
 Maarten Lemmens
character of some others. The latter group are those that oscillate (both synchronically and diachronically between the two paradigms. Within the confines of this
article, these cannot be elaborated in full detail; some discussion of such particular
uses are found in Lemmens (2005a, subm.). That is, the ergative conception seems
to align strongly with how a suffocation process is experienced, for a number
of reasons.
First, the causes of suffocation events, although multifarious, are typically imperceptible, as in the case of gases or (lung) diseases. This imperceptibility encourages a conception of suffocation as independent of the cause or instigation, which
is an essential feature of an ergative construal. The lower perceptibility of a cause
often leads to different (metaphorical) conceptualizations of it (e.g., as covering,
enveloping, etc.), which gives rise to different lexical choices (‘smother’, ‘drown’,
etc.). Clearly a transitive conception (e.g., with the cause as Actor) is not fully
excluded, but the corpus shows that this is uncommon with these types of clauses.
Secondly, the conceptual independence of the caused process is reinforced
by the (prototypical) temporal distance between the instigation and the consequences, leading to an enhanced focus on the process itself. Moreover, as is
also typical of an ergative conception of events, there is a low salience of goalachievement, which contrasts sharply with the foregrounding of this property in
the case of prototypically transitive verbs, such as the murder verbs, but also
‘strangle’ and ‘throttle’. The latter two verbs typically code a process that ends in the
death of the victim; whereas for the ergative suffocate verbs this feature is much
less salient (though not excluded). Consider, for instance, the common usage of
‘choke’, in reference to (mostly non-lethal) swallowing the wrong way. The suffocate verbs also often occur in hyperbolical uses, in which speakers (deliberately)
exclude the end point of the process from the conceptualization, focusing instead
on the suffering of the Medium. (Recall that for the murder verbs only ‘kill’ occurs in such hyperbolical constructions.) Such usage is also common with ‘starve’,
when it refers to the state of being very hungry and excludes the lethal outcome; for
some speakers this has been attested as the default case. In other words, an ergative
conception can come to focus strongly on the Medium’s activity, thereby not only
excluding the instigation but also the end-point of the process. It can be noted that
the hyperbolical uses only occur in the ing form, in line with the semantics of this
form, viz. the exclusion of the endpoints of the process coded by the verb.
Thirdly, the distinction between the external and internal parts of the affected
respiratory system, is an additional experiential factor that motivates the transitive/ ergative distinction. It is quite logical that the external perspective lines up
with the focus on the Agent, since, most naturally, the affected parts are those that
can be easily accessed by an external Agent. Moreover, the more external parts of
the respiratory system are also those not really active in the respiration process.
As is typical of transitives, ‘smother’, ‘strangle’, and ‘throttle’ altogether omit the
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.11 (658-713)
The conceptualization of causality 
notion of the victim’s participation. These verbs conceptualize the interruption of
normal respiration as ‘externally’ inflicted on a fully inert patient. In other words,
the more perceptible (and consequently also more external) the cause, the more
likely it is that a transitive conception is used to encode the event. Clearly, it is not
excluded that typically ergative verbs, like ‘suffocate’ or ‘choke’, are selected to encode events where the cause is clearly and exclusively external. Strikingly, however,
is that in those cases the coding is typically effective (‘causative’); an intuition that
is confirmed by both diachronic and synchronic data – see examples below.
(1) The broider’d band That underbraced his helmet at the chin. . . Choaked him.
(oed, 1790)
(2) The man who choked the Emir
(3) . . . suddenly he’d . . . grab him by the throat and choke him.
(oed, 1866)
(wsj)
Alternatively, both diachronic and synchronic data indicate that a non-effective
construction occurs with internal causes. For example, one of the typical uses of
‘choke’ is in the context of swallowing the wrong way; an event in which the cause
is maximally internal (an unbreathable substance entering the trachea), and the
bodily (re)action of the victim is quite salient. As can be expected, a non-effective
construction is used, and is, in fact, the only possibility:
(4) a. Riddle laughs so hard he starts to choke on his salad.
b. *the salad choked Riddle.
(5) a. Sorrille almost choked on his tongue.
b. *Sorrille was almost choked by his tongue.
The b-sentences can only be acceptable when the salad and the tongue are, in one
way or another, seen as external causes. It could be argued that in reference to
swallowing the wrong way, ‘choke’ no longer codes an instigatable process, but
realizes an intransitive construal of the event, with the cause, the unbreathable substance, in a periphrastic coding in a ‘on-’ complement.7 Once again, one should
not misinterpret these observations as indicating that the ontological situation
rigidly predicts the construal. Nevertheless, the data unequivocally reveal a subtle alignment between the specifics of the event and the occurrence of an effective
or a non-effective construal, or an ergative and transitive coding.
The ergative predilection of the suffocate verbs (as opposed to the Agentoriented murder verbs) also emerges from a diachronic analysis of verbs of killing.
Halliday (1985: 146) observes that “the coming of [the ergative] pattern to predominance in the system of modern English is one of a number of related developments that have been taking place in the language over the past five hundred years
or more”. My data show that such an ergativization may show up unexpectedly
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.12 (713-777)
 Maarten Lemmens
(e.g., with prototypically intransitive ‘die’), but also that particularly the suffocate verbs seem to have been subject to ergativization (see Lemmens 1998a).
From the late 16th century up to the early 20th century, all suffocate verbs
started to occur in ergative constructions. On top of that, there was considerable
lexical overlapping, with some verbs encroaching on the semantic space of others.
For example, the verb ‘strangle’, in present-day English, is fairly restricted in semantic coverage, and came to be used to report on deaths caused by the sword,
disease, poisoning, drowning or suffocation by gases, etc. In these cases ‘strangle’ often occurred in ergative non-effective constructions – consider examples
(6)–(9) below.
(6) Hanybal . . . stranglyd with poisoun.
(oed, 1443)
(7) The swearde shal strangle them
(oed, 1535)
(8) She fell into the pond yesterday . . . She nearly strangled . . .
(nov)
This lexical and constructional flexibility is absent for the murder verbs, for which
no ergativized use has been attested in any of the copora. They are too strongly
tied to their Agent-centredness to allow a conceptual shift of focus to the second
participant involved in the process.
The ergativization process that characterized Early Modern English has not
yet been analysed to the fullest, and it is unclear what motivates it for the whole
of the English lexicon. I like to believe that it can be explained by changes in how
humans came to see (i.e., experience) the world, but this is an hypothesis that
needs to be verified in a much larger context than I have done so far. In any case,
the ergativization of the suffocate verbs can be explained against the background
of the typical conceptualisation of a suffocation event.
Ideologically determined transitivization
That the experience of events is highly influenced by cultural and ideological assumptions is nicely illustrated by the lexical and constructional evolution of the
verb ‘abort’. This has been described in full detail elsewhere (see Lemmens 1997,
1998a), but it can be reiterated here that changes in the way we interact with
the world have had a definite impact on the conceptualization of an abort-event.
Previously, the conceptualization underlying the item pertained to a spontaneous
termination of a pregnancy, and logically, it activated the ergative model. Indeed,
the second participant, the fetus, was experienced as having the potential of selfinstigating and sustaining the process, which escaped the control of the woman.
However, medical and technological advances have given us greater control over
such processes: parents and doctors are now more in control, as volitional beings
who can target the abortion process onto a fetus or, in more recent usages, onto
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.13 (777-851)
The conceptualization of causality 
the woman. Logically, then, the causative model, activated by present-day ‘abort’,
is the transitive one in which the fetus is reduced to an inert Goal, or even omitted
from the episode altogether in an objectless transitive – e.g., the mother can abort,
if she so chooses; or in a relatively recent type of construction with the woman as
Affected – e.g., the doctor aborted the woman, or fewer women are aborted. Here are
two recent examples (returned by a Google search)
(9) Many Tibetan women are aborted and sterilised after the first birth
(www.hsph.harvard.edu/Organizations/healthnet/ SAsia/repro3/tibet.html)
(10) in Communist China where women are aborted by government order when
pregnant for the second time
(www.creationism.org/csshs/v14n1p15.htm)
Apparently, ‘abort’ is also evolving in its lexical structure. While in the wsj corpus dating from 1989, the literal (prototypical) use can still be argued to be that
of prematurely terminating a pregnancy (75% of the cases), recent data (and recent comments by native speakers) suggest that more and more, the prototype is
shifting to the more schematic meaning of, “to terminate a process”, of which the
halting of a pregnancy is a specific instantiation. It can, however, not be denied that
this instantiation differs from the others in its selection of a causative model, as it
only activates the transitive, whereas the others can invoke the ergative model –
e.g., the takeoff aborted, vs. the pilot aborted the takeoff. This type of usage has become quite common in the domain of computer terminology – e.g., the program
will abort.
. Conclusions and prospects
The above analyses have illustrated that the characteristics of a specific event subtly influence the way in which this event is conceptualized, in ways that may not
be immediately obvious from introspection. While variations in conceptualization cannot be excluded, some clear tendencies have been highlighted, from which
certain predictive power can be distilled.
In a nutshell, the more volitional a participant seems to be whilst engaged in
some causative process, the more likely it is that s/he will surface as a volitional
Actor in a transitive construction; the more autonomous the process, vis-à-vis its
cause, the more likely an ergative conception will be triggered. Moreover, within
the field of ‘killing’, another parameter has been shown to be relevant – i.e., the internal or external nature of the cause. While these conclusions are quite compatible
with what is commonly assumed in Cognitive Linguistics (e.g., Lakoff 1987: 54–
55), I do not characterize it in terms of directness or indirectness of causation,
but in terms of participation in the process, which for the transitive and ergative
construal is quite different.
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.14 (851-899)
 Maarten Lemmens
The present study presents only the onset of a more fully-fledged onomasiological analysis – i.e., one that starts from the referent and examines how it is
coded. Further research is warranted, preferably of the type that does not start
from textual material. Clearly, corpus analysis, as that underlying the present research, is most relevant for discovering certain tendencies that would otherwise
go unnoticed. For example, in addition to the observations above, my analysis of
unprototypical (i.e., inanimate) Actors with ‘kill’ has revealed that there is a certain alignment with the type of Affected: Goals that are lower on the Silverstein
hierarchy (e.g., low-level organisms) tend to take a low-level Actor as well (e.g., the
antibody kills infected cells). This correlation is worth exploring further with other
types of events, and may correct some of the widely held views on the prototypicality of certain types of Agents. A possible drawback inherent to corpus analysis
is that it is still fairly much a posteriori: it starts from a conceptualization and tries
to discover the experiential motivation behind it. To counterbalance that bias, one
could set up experiments to elicit narrations from speakers when watching a filmed
event, for example (cf. also Slobin 1996, 2000; Lemmens 2005b on this type of research). Such an investigation can shed further light on different mental imagery
employed in encoding an event and what triggers it.
Notes
* All correspondences concerning this article should be sent to Maarten Lemmens, at Université
Lille3, U.F.R. Angellier (English), Lille, France. Email: [email protected]
. This paper regroups and reconsiders some of the more elaborate descriptions in Lemmens
(1997, 1998b). I thank the anonymous reviewer for the valuable comments on an earlier version
of this paper. Responsibility for the final product is of course mine.
. The corpora used are the ACL-Wall Street Journal Corpus (WSJ: 5,353,500 words); the Leuven Drama Corpus (1,029,660 words); a collection of contemporary American short stories
(1,066,875 words); a collection of 19th centuy novels (746,525 words); and citations from the
OED on CD-ROM (11,713 citations).
. The reader is referred to Davidse (1991, 1992) for more elaborate descriptions. For my own
modifications to her theory and some changes in terminology, see Lemmens (1998b).
. Not all constructions with prototypically ergative verbs allow the non-effective construction – e.g., John opened a tin of baked beans (Davidse 1991: 63) or The horn drowned out the
opening bell (see Lemmens 1998b: 178–187). These constructions no longer activate the ergative
model. For further discussion, see the cited works.
. Among other things, the two works cited will provide ample illustration that virtually all
transitive verbs allow the objectless transitive in the proper context.
. See also Langacker’s notion of “scope of predication”, Nishimura’s (1994) “coherent actions
chain”, or DeLancey’s (1984) “event schema”.
. On the use of on (and with in other contexts), see Lemmens (1998b: 168–173).
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.15 (899-1005)
The conceptualization of causality 
References
Casad, Eugene (1995). Seeing It in More than One Way. In John Taylor & Robert E. MacLaury
(Eds.), Language and the Cognitive Construal of the World (pp. 23–49). Berlin & New York:
Mouton de Gruyter.
Davidse, Kristin (1991). Categories of Experiential Grammar. Unpublished PhD Thesis, K. U.
Leuven.
Davidse, Kristin (1992). Transitivity/Ergativity: The Janus-Headed Grammar of Actions and
Events. In M. Davies & L. Ravelli (Eds.), Advances in Systemic Linguistics (pp. 105–135).
London: Pinter.
DeLancey, Scott (1984). Notes on Agentivity and Causation. Studies in Language, 8, 181–213.
Dirven, René (1993). Dividing up Physical and Mental Space into Conceptual Categories by
means of English Prepositions. In Cornelia Zelinsky-Zwibbelt (Ed.), The Semantics of
Prepositions. From Mental Processing to Natural Language Processing (pp. 73–97). Berlin &
New York: Mouton de Gruyter.
Emanatian, Michele (1993). Figure-Ground Reversal in Grammar. Paper presented at the Third
International Cognitive Linguistics Association Conference, Leuven, 18–23 July.
Fillmore, Charles J. (1985). Frames and The Semantics of Understanding. Quaderni di
Semantica, 6, 222–254.
Geeraerts, Dirk, Peter Bakema, & Stefan Grondelaers (1994). The Structure of Lexical Variation:
Meaning, Naming, and Context. Berlin & New York: Mouton de Gruyter.
Halliday, M. A. K. (1985). An Introduction to Functional Grammar. London: Arnold.
Haspelmath, Martin (1993). More on the Typology of Inchoative/Causative Verb Alternations.
In Bernard Comrie & Maria Polinsky (Eds.), Causatives and Transitivity (pp. 87–120).
Amsterdam & Philadelphia: John Benjamins.
Lakoff, G. (1987). Women, Fire and Dangerous Things. What Categories Reveal about the Mind.
Chicago: Chicago University Press.
Langacker, Ronald W. (1987). Foundations of Cognitive Grammar, Vol. 1. Stanford: Stanford
University Press.
Langacker, Ronald W. (1988). An Overview of Cognitive Grammar. In Brygida RudzkaOstyn (Ed.), Topics in Cognitive Linguistics (pp. 3–48). Amsterdam & Philadelphia: John
Benjamins.
Langacker, Ronald W. (1991a). Concept, Image, and Symbol. Berlin & New York: Mouton de
Gruyter.
Langacker, Ronald W. (1991b). Foundations of Cognitive Grammar, Vol. 2. Stanford: Stanford
Univ. Press.
Lemmens, Maarten (1997). The Influence of World Conception on Transitivity and Ergativity:
a Case Study. In Eve Sweetser, Kee Dong Lee, & Marjolijn Verspoor (Eds.), Lexical and
Syntactic Constructions and the Construction of Meaning (pp. 363–382). Amsterdam &
Philadelphia: John Benjamins.
Lemmens, Maarten (1998a). The experiential basis of lexical and constructional flexibility: a
diachronic and synchronic study. Leuvense Bijdragen, 87, 79–113.
Lemmens, Maarten (1998b). Lexical Perspectives on Transitivity and Ergativity. Causative
Constructions in English [CILT 166]. Amsterdam/Philadelphia: Benjamins.
JB[v.20020404] Prn:21/03/2006; 15:24
F: HCP1512.tex / p.16 (1005-1073)
 Maarten Lemmens
Lemmens, Maarten (2005a). Des constructions causatives sans objet: un complément à l’analyse
récente de Goldberg. In Claude Delmas & Mireille Quivy (Eds.), 6 Etudes en linguistique
anglaise, CERCLES Revue pluridisciplinaire du monde anglophone [Occasional Papers] (pp.
79–113). Université de Rouen.
Lemmens, Maarten (2005b). Motion and location: toward a cognitive typology. In Geneviève
Girard-Gillet (Ed.), Parcours linguistiques. Domaine anglais [CIEREC Travaux 122] (pp.
223–244).
Lemmens, Maarten (forthcoming). More on objectless transitives and ergativization patterns in
English. Thematic issue of Constructions.
Levin, Beth (1993). English Verb Classes and Alternations. A Preliminary Investigation. Chicago:
Chicago University Press.
Nishimura, Yoshiki (1994). Agentivity in Cognitive Linguistics. In W. Noth (Ed.), Origins of
Semiosis (pp. 487–530). Berlin, New York: Mouton.
Rice, Sally (1988). Unlikely Lexical Entries. Berkeley Linguistics Society, 14, 202–212.
Ryder, Mary-Ellen (1991). Mixers, Mufflers and Mousers: The Extending of the -er Suffix as a
Case of Prototype Reanalysis. Berkeley Linguistics Society, 17, 299–311.
Sansò, A. (2000). The domain of demotion: a new view of passive constructions. Paper presented
at the 2nd International conference on Contrastive Semantics & Pragmatics. Cambridge, 10–
13 September, 2000.
Smith, Carlota S. (1978). Jespersen’s ‘Move and Change’ Class and Causative Verbs in English. In
M. A. Jazayery, E. C. Polome, & W. Winter (Eds.), Linguistic and Literary Studies in Honor
of Archibald A. Hill. Vol. II: Descriptive Studies (pp. 101–109). The Hague: Mouton.
Slobin, Dan I. (1996). Two ways to travel: Verbs of motion in English and Spanish. In M. S.
Shibatani & S. A. Thompson (Eds.), Grammatical Constructions: Their form and meaning
(pp. 195–200). The Hague: Mouton.
Slobin, Dan I. (2000). Saturation of a semantic field. Paper presented at the conference on
Language Culture and Cognition, Leiden, 22–23 March, 2000.
Talmy, Leonard (1985). Lexicalization Patterns: Semantic Structure in Lexical Forms. In
Timothy Shopen (Ed.), Language Typology and Syntactic Description. Vol. III: Grammatical
Categories and the Lexicon (pp. 57–149). Cambridge: Cambridge University Press.
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.1 (47-109)
chapter 
Internal state predicates in Japanese
A cognitive approach*
Satoshi Uehara
Tohoku University, Japan
This paper examines internal state predicates, or “subjective predicates”, in
Japanese, which exhibit some grammatical behavior different from their
counterparts in other languages like English. It employs Langacker’s framework
on subjectivity (1985, 1991a), and argues that these predicates represent those
that Langacker calls “egocentric viewing arrangement”, in which the construal of
the event/situation is optimally subjective. The consideration presented in this
paper demonstrates that this “subjective construal” analysis of Japanese internal
state predicates can uniformly account for the grammatical behaviors exhibited
by them, namely, the person restriction, the implicit reference to their
experiencer subject, and their formation of the so-called “double nominative”
constructions. It furthermore discusses implications on the cognitive
framework’s cross-linguistic applicability.
Keywords: subjectivity, internal state predicates, viewing arrangement, person
restriction, Japanese
.
Introduction
It is widely known that Japanese possesses a large group of internal state predicates called subjective predicates (Kuroda 1973; Kuno 1973; Aoki 1986; Iwasaki
1993; Backhouse 1993; Uehara 1998b). They are so called because they denote, by
default, the speaker’s internal states, such as feelings and emotional reactions. They
have attracted linguists’ attention because they exhibit some grammatical behavior different from internal state predicates in other languages like English. In this
paper I will employ a cognitive linguistic framework for subjectivity (Langacker
1985, 1991a) in order to analyze the internal state predicates in Japanese, and I will
argue that these predicates represent what Langacker calls “subjective construal”
(1991a: 316), in which the construal of the situation (by, the conceptualizer) is
highly subjective. I will also demonstrate that this framework can uniformly ac-
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.2 (109-170)
 Satoshi Uehara
count for the grammatical behaviors exhibited by internal state predicates, and
also that it is compatible with, and thus receives strong support from, the recent
discourse studies on Japanese (Iwasaki 1993; Uehara 1998b).
. Japanese internal state predicates
Following Iwasaki’s (1993: 15) description, internal state predicates are said to
be those “adjectival predicates showing sensations (atui ‘hot’), emotions (kanasii
‘sad’) or desires (hosii ‘want’)”. Of course, other languages like English have words
with similar semantic functions (e.g., the English glosses for the Japanese internal
state predicates given above). The most basic and widely known characteristic of
internal state predicates in Japanese is that they strictly require the sentence subject
to be in the first person when used in simple declarative sentences. An example is
uresii ‘glad’, illustrated in (1) and (2):1, 2
(1) Watasi wa uresii.
1.sg top glad.nonpast
“I am glad.”
(2) *Zyon wa uresii.
John top glad.nonpast
“John is glad.”
To express the proposition in (2) in Japanese, one must make explicit reference to
the evidence on which it is based. For example, one must say something like “John
looks glad”, “John is showing the signs of being glad”, and so on, as illustrated in
(3) and (4) below:
(3) Zyon wa uresii yoo-da.
John top glad appear.nonpast
“John appears to be glad.”
(4) Zyon wa uresi-gatte iru.
John top glad-showing the sign of.nonpast
“John shows the signs of being glad.”
Two points should be noted here about the person restriction of Japanese internal state predicates. First, this restriction is lifted in ‘non-reportive’ or noncommunicative discourse such as the literary mode (Kuroda 1973). This seems
to be because the point of view in sentences in the literary mode is not necessarily
associated with the narrator, but can be freely interpreted with respect to a specific character in the story.3 Thus, although the sentence in (2) sounds infelicitous
and is starred in naturally-occurring, communicative discourse in Japanese, this
would be acceptable in the literary mode. Secondly, manifestation of this restric-
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.3 (170-220)
Internal state predicates in Japanese 
tion in the internal state predicates of a language is a matter of degree, and its
degree seems to vary from language to language. Thus some may point out that
internal state predicates in other languages (e.g., in English) do show a similar restriction. In the case of Japanese however, this structural restriction appears to be
relatively strong, which may explain the attention this restriction in Japanese has
attracted in the linguistics literature, as well as the large variety of Japanese evidential markers used to get around this restriction (see Aoki’s 1986 discussion of
Japanese evidential markers).
Other properties of Japanese internal state predicates
The most well-known of the characteristics of internal state predicates in Japanese
is the restriction regarding the person of predicate subjects discussed in the previous section. However, it is not the only characteristic property, and I will discuss
here two other properties that are much less well-known but are highly relevant
to an understanding of this phenomenon. First is that the experiencer subjects
of these internal state predicates are typically unexpressed in Japanese discourse.
Japanese is a ‘pro-drop’ language, and utterances frequently do not linguistically
code their clausal arguments. This is especially true of the experiencer subject
argument of internal state predicates – i.e., the subject argument whose role is experiencer (cognizer) of the state denoted by them, such as watasi, ‘I’ in (1). Thus,
Uehara (1998b), who analyzed an English novel and its Japanese translation, has
shown that by default the first person (cognizer) subject of the internal state predicates is unexpressed unless some discourse factors favor explicit mention of it. For
example, example (1) given earlier is typically expressed without the first person
subject, as in (5):
(5) Uresii.
glad.nonpast
“I am glad.”
One can easily see a functional link between this second property of the internal
state predicates in Japanese and the first. Since the predicate form can carry the
function of expressing the (first) person of its experiencer (i.e., if the predicate
form is not marked with any evidential markers, then the experiencer is in the
first person), there is no need for the first person experiencer to be overt; it can be
unexpressed without causing any ambiguity.
The other property of internal state predicates in Japanese to be noted here,
which does not appear to be any way related to the other two, is that internal
state predicates are among those predicates which take what Kuno (1973: 79) calls
“Ga for Object Marking”. The particle ga is a nominative marker, and is usually
used to mark the subject of a sentence (while the particle o is used to mark the
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.4 (220-285)
 Satoshi Uehara
object). However, according to Kuno, ga is used to mark the object for these predicates in question. Since the ‘real’ subject whose role is experiencer is marked with
ga as well (or frequently topicalized and marked with the topic marker wa), internal state predicates form what is known as a ‘double nominative’ (or ‘double
subject’) construction, as illustrated by using an internal state predicate of desire
hosii, ‘want’, in (6):
(6) (Watasi ga/wa)
sake ga hosii.
1.sg
nom/top sake nom want.nonpast
“I want sake.”
In fact, they constitute a major group of the double nominative constructions, and
Kuno lists the following in (7) as subjective predicates in his list of predicates which
take ga for object marking:4
(7) -tai ‘be anxious to’
arigatai ‘to be grateful for’
hazukasii ‘to be bashful/ashamed of ’
hosii ‘to want’
itosii ‘to think tenderly of ’
kawaii ‘to hold dear’
kutiosii ‘to be regretful of ’
natukasii ‘to miss, to feel yearning for’
netamasii ‘to be jealous of ’
nikurasii ‘to be hateful of ’
omosiroi ‘to be interested in’
osorosii ‘to be afraid/fearful of ’
urayamasii ‘to be envious of ’
-Tai ‘be anxious to’ at the top of (7) combines with verbal stems to yield compound
adjectives ‘be anxious/want to do . . . ’, and marks the verbal object with ga as in
(8) and (9):
(8) (Watasi wa) sake ga nomi-tai.
1.sg
top sake nom drink-be anxious to.nonpast
“I am anxious to/want to drink sake.”
(9) (Watasi wa) kono hon ga yomi-tai.
1.sg
top this book nom read-be anxious to.nonpast
“I am anxious to/want to read this book.”
As a result the number of subjective predicates involving ga for object marking is
not limited to a dozen listed above in (7); the instances can be easily multiplied.
In this section, internal state predicates in Japanese have been examined in
terms of the characteristic properties that distinguish them from their close equiv-
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.5 (285-338)
Internal state predicates in Japanese 
alents in other languages like English. They are namely: (1) there is a strong
structural restriction in terms of the person of their subjects; (2) they typically
appear without their (experiencer) subjects; and (3) the nominative marker ga
in these constructions marks what are considered to be grammatical objects. Although these three properties of internal state predicates in Japanese have been
discussed, or described, separately from each other in the existing linguistic literature, they have never before been treated as a single system. In the next section
I will consider a cognitive semantic analysis of these internal state predicates, to
see if these three characteristics can come together to reveal common, underlying
principles.
. Cognitive Grammar approach to subjectivity
One of the foundational claims of cognitive semantics is that “an expression’s
meaning cannot be reduced to an objective characterization of the situation
described: equally important for linguistic semantics is how the conceptualizer chooses to construe the situation and portray it for expressive purposes”
(Langacker 1991a: 315). In analyzing subjective predicates in Japanese, a formal
framework is needed by which the relationship between the speaker (i.e., the conceptualizer) and event described (as well as the role the speaker plays in conceptualizing the event) can be explicitly described and examined. Langacker’s (1985,
1991a, and 1991b) cognitive semantic theory of subjectivity represents one such
framework, and provides a theoretical vocabulary for analyzing the internal state
predicates in Japanese; the necessary constructs are briefly introduced below.
Definition of linguistic subjectivity
Langacker explains ‘subjectivity’ as follows: “Subjectivity pertains to the observer
role in viewing situations where the observer/observed asymmetry is maximized”
(Langacker 1985: 109). Consider such a viewing situation, which he calls an optimal viewing arrangement, diagramed in Figure 1a. In the diagram, ‘S’ stands for
the subject of conception (or observer), ‘O’ for the object of conception (or observed), the arrow for the direction of conception, and the broken-line circle for
the objective scene.
With respect to an optimal viewing arrangement in Figure 1a, he notes, “S
can be characterized as maximally subjective, and O as maximally objective”
(Langacker 1985: 121). This optimal viewing arrangement is contrasted with what
he calls the egocentric viewing arrangement, diagramed in Figure 1b, where the
locus of viewing attention is expanded to include the position of S and his/her
immediate surroundings. The subject of conception S is no longer simply an ob-
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.6 (338-390)
 Satoshi Uehara
(a) optimal viewing arrangement
S
O
(b) egocentric viewing arrangement
S
O
Figure 1. From Langacker (1985: 121)
server, but to some degree an object of conception as well, and in this situation,
S receives a more objective construal while the scene conceived becomes more
subjective. Thus, the conceptualization diagrammed in Figure 1b represents the
semantic structure of ‘subjective’ expressions, and Langacker defines a subjective
expression “as one that includes the ground – or some facet of the ground – in its
scope of predication (i.e., its base)” (Langacker 1985: 113, emphasis in the original). (The term ‘ground’ is used in Cognitive Grammar to indicate the speech
event, its setting, and its participants.)
The subjectivity scale
As it is obvious from expressions like ‘maximally subjective’ and ‘more subjective’,
subjectivity is a matter of degree, depending on how prominent the ground is
conceived in the overall conceptualization. Thus, Langacker introduces the notion
of a subjectivity scale, along which linguistic expressions can be ranked. The pair
of sentences in (10) illustrates two levels of the gradience (Langacker 1991a: 326).
(10) a. Vanessa is sitting across the table from Veronica.
b. Vanessa is sitting across the table from me.
So far as the locative relationships are concerned, the sense of across in (10a) is fully
objective in that it profiles the spatial configuration without regard to speakerhearer position.5 (10b) is more subjective in that one of the participants, namely,
the reference-point for the across relation is the speaker (first person); the expression includes the ground in its scope of predication, as in the conceptualization
pattern represented by the diagram in Figure 1b above.
Langacker identifies a further subjective level of construal, using the same situation for the already subjective construal in (10b). The contrast between the two
levels, according to Langacker (1991a: 328), is reflected in the following pair of
sentences in (11):
(11) a. Vanessa is sitting across the table from me. (=10b)
b. Vanessa is sitting across the table.
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.7 (390-409)
Internal state predicates in Japanese 
(a)
(b)
Figure 2.
The two describe precisely the same spatial configuration, and they both presuppose the reference point for the spatial configuration they describe (so (11b) may
sound awkward without context). The only structural difference is that the presupposed reference point is covert in (11b). Conceptually, they are both subjective in
that they take the speaker as the reference point. However, the formal distinction
between overt (11a) and covert (11b) reference to the speaker, iconically reflects
its being construed with a lesser or greater degree of subjectivity.6 (11a) suggests a
detached outlook in which the speaker treats her own participation as being on a
par with anybody else’s (‘objective construal’ of the speaker), whereas (11b) identifies the reference point with the speaker and portrays the situation “through her
eyes”. In other words, the scene depicted by the sentence in (11a) and that in (11b)
are like (a) and (b), respectively, in Figure 2.
In Figure 2a, the person with the glasses is the speaker, myself, while in Figure
2b, I the speaker sit on this side of the table, the vantage point for the scene. The
thick line in Figure 2b represents the speaker’s camera angle and indicates that the
vantage point is specified with the speaker. In (11b), as Traugott (1995: 49) puts
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.8 (409-476)
 Satoshi Uehara
it, “what is strengthened is specifically the subjective stance of the speaker.” Thus,
of the two already subjective expressions in (11), b is the more subjective. The
representations in Figure 2 of the scenes depicted by the sentences in (11a) and
(11b) illustrate why (12a) is considerably more natural than (12b):
(12) a.
Look! My picture’s in the paper! And Vanessa is sitting across the table from
me!
b. ?Look! My picture’s in the paper! And Vanessa is sitting across the table!
Langacker (1991a: 329) explains as follows: “Examining a picture of oneself involves self-construal that has a high degree of objectivity, for it literally implies
an external vantage point. This is consistent with the objectivity conveyed by the
speaker’s explicit self-reference in the final clause of (12a), but inconsistent with
the subjectivity signaled by the lack of the speaker’s explicit self-reference in (12b).”
A formal way of representing the fine-grained contrast between two subjective
structures like the pair in (11) is due, since they both represent the same situation in the diagram in Figure 1b, and since those in Figure 2 depict only one
specific instance (with the two ‘slots’ elaborated with Vanessa and myself). In fact,
the sitting-across case in (11) represents a productive, spatial configuration pattern
in English as illustrated in (13) and (14) [taken from Langacker 1985, 1991a]:
(13) a. There is snow all around me.
b. There is snow all around.
(14) a. The store is through the tunnel from here.
b. The store is through the tunnel.
Thus, many spatial configuration expressions in English have a pair of established senses analogous to those of sitting-across. The difference between the two
semantic structures are shown in Figure 3, using Cognitive Grammar style representations.
In Figure 3, abbreviated as tr is the trajector, the technical term for ‘figure
within a profiled relation’ and lm is the landmark, ‘less prominent entity in the
relation’. In the case of (11), the trajector is Vanessa, and the landmark is the
table in the across relationship. The profiled across relationship is represented by
the thick dotted arrow, whose starting point indicates R, the reference point with
respect to which the trajector is located. In (a), Sp, the speaker, takes a vantage
point external to the described situation (Sp’), where the reference point happens
to be the speaker, while in (b), the reference point is identified with the speaker’s
vantage point.
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.9 (476-514)
Internal state predicates in Japanese 
(b)
(a)
lm
lm
tr
rp
tr
rp
Sp
Sp
OS
OS
Sp’
OS: onstage region
: across relation
tr: trajector
lm: landmark
rp: reference point
Sp: speaker
Figure 3.
. Subjective construal and Japanese internal state predicates
Having examined the Cognitive Grammar analysis on the sitting-across predicate
in English, one can easily see exactly the same analysis holds for subjective, spatial
configuration expressions in Japanese as well, such as the me-no-mae-ni-suwatteiru (‘sitting-right-in-front’) predicate in (15):
(15) a.
Hanako ga watasi no me no mae ni suwatte iru.
Hanako nom 1.sg gen eye gen front at sitting be.nonpast
“Hanako is sitting right in front of me.”
b. Hanako ga me no mae ni suwatte iru.
Hanako nom eye gen front at sitting be.nonpast
“Hanako is sitting right in front.”
Thus, basically Figures 2a and 3a and their discussions apply to (15a) and Figures 2b and 3b to (15b). The Cognitive Grammar analysis can uniformly apply
to the subjectivity phenomena in spatial configuration expressions in English
and Japanese.
I now will propose a similar line of analysis for the internal state predicates
in Japanese, which are also called subjective predicates. By examining similarities
and differences between the two phenomena, I will demonstrate that the Cognitive
Grammar approach to subjectivity not only has a cross-linguistic applicability but
also can help explicate cross-linguistic variations in linguistic subjectivity.
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.10 (514-581)
 Satoshi Uehara
The speaker’s role
Internal state predicates in Japanese resemble the (English) sentences of spatial
relation in (11) in that the speaker (the conceptualizer) plays a prominent role
in their conceptual structure. Whether or not encoded linguistically, the speaker
occupies the role of experiencer/cognizer of a state denoted by the predicate in the
former, and that of viewer of a denoted situation in the latter. Consider the hosii
‘want’ example reproduced in (16):
(16) a.
Watasi wa sake ga hosii.
1.sg
top sake nom want.nonpast
“I (explicit) want sake.”
b. Sake ga hosii.
sake nom want.nonpast
“I (implicit) want sake.”
The hosii ‘want’ sentences in (16) describe a cognizer’s psychological state, where
the speaker is the cognizer, and functions as the subject (as opposed to the object) of cognition. In a very similar manner, the sitting-across sentences in (11),
describe a spatial configuration, where the speaker is the reference point for the
figure in the across relationship, and functions as the viewer, i.e., the subject of
perception. Thus, in either case, the speaker functions as the subject of conception. The Japanese internal state predicate patterns resemble the patterns of spatial
configuration in another respect. In both, the (b) pattern, where the subject of
conception is not linguistically coded, typically assumes the speaker for that implicit role. Thus ‘speaker’, not just a schematic ‘person’, is present in the semantic
structure of (b) sentences. These two points of resemblance motivate to posit for
the two subjective ‘want’ sentences in (16a) and (16b), the semantic structures
illustrated in Figures 4a and 4b, respectively.
In Figure 4a, the cognizer of Japanese internal state predicates (i.e., the
speaker) is explicitly encoded as in (16a), and the speaker is somewhat objectively
construed since he puts himself more or less onstage and sees himself as if through
the eyes of someone else. In Figure 4b, in contrast, the speaker is not linguistically
encoded but is necessarily evoked as in (16b), and the construal of the situation is
highly subjective since the speaker is equated with the cognizer and serves as the
vantage point for what is conceptualized.
The scenes depicted by the specific sentences Watasi wa sake ga hosii ‘I (explicit) want sake’ in (16a) and Sake ga hosii ‘I (implicit) want sake’ in (16b) can be
shown in Figure 5a and Figure 5b, respectively.
In the scene in Figure 5a, the person with the glasses on the left, who is wanting
and imagining the sake, is the speaker, myself. The scene in Figure 5b, on the other
hand, is what I the speaker cognize, that is, what appears in the conceptual space of
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.11 (581-581)
Internal state predicates in Japanese 
(b)
(a)
cd
cr
Sp
Sp’
(a)
(b)
Figure 5.
cd
Sp
OS
Figure 4.
cr
OS
OS: onstage region
: cognizing relation
cr: cognizer
cd: cognized entity
Sp: speaker
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.12 (581-636)
 Satoshi Uehara
the person on the left in Figure 5a, whose viewpoint is the assumed vantage point
for the conceptualization. Again, the thick line represents the speaker’s cognition
and indicates that the vantage point is specified with the speaker.7
Difference in the default pattern in subjectivity
Japanese internal state predicates differ most from spatial configuration expressions (in English and Japanese) in the default degree of subjectivity. Although
construals of different degrees of subjectivity are available in both types, a close examination into the behavioral and structural patterns of each construal in the two
types reveals a different construal for the default one in each. Objective construal
seems to be more of the default case in the spatial configuration expressions, while
in the Japanese internal state predicates the subjective perspective is the norm. This
can be seen in the fact that the most subjective construal in spatial configuration
expressions is compositional in its formulation, while that in Japanese internal
state predicates is non-compositional and its subjectivity is inherent to the lexicon
(as their alternative name subjective predicates suggests).
The (English) predications of spatial relations (e.g., across, around) have nothing inherently subjective in their semantic structure. In other words, their profile
does not involve any element of the speaker (the ground). Thus around, for example, has a profile whose trajector and landmark are both instantiated by things
in the objective scene, so that the composite structure has the whole profile in
the on-stage region. The composite relational profile becomes subjective only
if the spatial predication is combined with the ground (as in Figure 1b) in its
compositional process.
In sharp contrast with the spatial configuration expressions, Japanese internal
state predicates can be analyzed as subjective in their lexical structure. The cognizer slot (‘e-site’) of their lexical semantic structure is specified with the ground,
which can be made overt by the compositional process with its overt form such
as watasi, ‘I’. In terms of the lexical structure organization, Japanese internal state
predicates thus resemble a ‘deictic verb’, come (Langacker 1985), or kuru, ‘come’,
in Japanese, for that matter, where the goal position of its movement is specified
with the ground. Its ground element can be made overt, as in, He came here, but
its phonological presence is not required (or “subcategorized”) as in, He came.
Thus the subjective pattern in spatial configuration expressions in (11) is subjective only by virtue of its composition with the ground, while the subjective pattern
in Japanese internal state predicates in (16) is subjective by itself.
This contrast in the default structure status is in fact compatible with other
observations about them. The compositional status of the subjectivity of the spatial configuration patterns is evident in Langacker’s description of the subjective
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.13 (636-691)
Internal state predicates in Japanese 
construal as the extension of an objective one, and the fact that in some situations,
like in (12) above, the most subjective construal becomes infelicitous.
The lexical and default status of the subjectivity of the Japanese internal state
predicate patterns in (16) is nicely in line with its typological markedness pattern.8 Japanese internal state predicates can be used in three patterns: with the
unexpressed first person cognizer, with the explicit first person cognizer, and with
the third person cognizer. Of the three, the pattern with the third person cognizer
is the most marked, since the predicate has to be structurally marked (i.e., with
evidential markers, as in §2). Of the two less marked patterns with the first person cognizer, the unexpressed cognizer pattern is the less marked one behaviorally
since it is the typical and most frequently attested pattern for them (as in §2).
Thus, in the case of internal state predicates in Japanese, the markedness order (from less to more marked) is: (i) the subjective construal of the speaker (i.e.,
(16b)); (ii) the more objective construal of him/her (i.e., (16a)); (iii) the objective
construal of the third person (e.g., (3) and (4)). In fact, for the obligatory use of
evidential markers in the most objective pattern, it can be postulated that the most
subjective construal (the first person cognizer) is so much inherent to the semantic structure of internal state predicates in Japanese that the maximally objective
construal, where the third person takes the cognizer role, is rendered unacceptable
as it is. Such obvious markedness patterns, structural or behavioral, do not appear
to exist in the example of spatial relations.
In sum, the two conceptual structures, spatial configuration expressions and
Japanese internal state predicates, resemble each other as seen in the previous section, but shared similarity is present only at the composite structure level. They
differ in their compositional pattern, and, more importantly, in their component
lexical structure.
The role in the event structure
The analysis of Japanese internal state predicates so far presented offers an interesting account of the nominative ga for object marking, the other of their
characteristic properties. In Cognitive Grammar, the grammatical subject is defined as the trajector of the clause level relational predication, and the trajector is
in turn defined as the primary figure in a profiled relation. The most salient figure
in the unmarked type of conceptual structure (the most subjective construal) of
internal state predicates in Japanese is what the speaker conceives – i.e., the object of her cognition (see Figure 4b and, more specifically, Figure 5b). In fact, it is
the only profiled ‘thing’ in her conceptual space,9 where the construal of the cognizer speaker is subjective and she is away from the scope of her conception into
its background. This means that this cognized thing is the trajector, and therefore
the subject rather than the object of the predicates. Thus, the use of the nomi-
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.14 (691-738)
 Satoshi Uehara
Figure 6.
native case marker ga for the cognized thing is not without reason: the cognized
thing is marked with the nominative because it is the subject in the lexical semantic
structure of internal state predicates in question.
This pattern parallels that of the fore-mentioned deictic verb kuru, ‘come’,
where the thing moving toward the speaker is the trajector/subject and is marked
with the nominative marker ga in Japanese. Let us compare the scene depicted by
an internal state predicate hosii ‘want’ in Figure 5b to that for the deictic verb come
in Figure 6 [slightly modified from a visual aide for the verb come in Hatasa et al.
(www.sla.purdue.edu/fll/JapanProj/FLClipart/)].
In Figure 6, the thing/person moving toward the speaker is the object of perception (or ‘the perceived thing’) from the vantage point of the speaker at the goal
point of the profiled movement, but its position in the overall conceptual structure
(with the speaker simply assumed in the background) motivates it being marked
with the nominative.
This analysis of the cognized entity as the subject, rather than the object of
internal state predicates in Japanese naturally raises another question regarding
the status of the other overt nominal (i.e., the speaker) in the somewhat objective construal in Figures 4a and 5a. The present analysis, accordingly, analyzes it
more or less as an entity in the discourse space, somewhat external to the lexicosyntactic semantic structure of the internal state predicates. As Uehara (1998b)
has shown, the speaker is made overt only when discourse factors (e.g., focus)
favor explicit mention of the cognizer role for Japanese internal state predicates.
In other words, the somewhat objective construal in Figure 4a holds only when
the discourse space is evoked. Hence, the overt cognizer role is in the discourse
space, and need not be assigned any grammatical role in the lexical or sentence
level conceptual structure.10
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.15 (738-797)
Internal state predicates in Japanese 
This can be illustrated again with the parallel case of the deictic motion verb
kuru, ‘come’. The speaker’s role as the perceiver of the thing moving toward her is
crucial in the deictic motion profile whether she is linguistically encoded or not.
However, its crucialness lies in its function as the vantage point for the conceptualization, and differs from that of the subject, the most salient figure in a relational
profile. Thus, the moving thing is the subject; the cognizer speaker is not.
A large corpus text analysis conducted and published in 1997 by Kokuritu
Kokugo Kenkyujo (The National Language Research Institute) strongly supports
the claim here. They examined a total of 3146 instances of the nominative particle
ga in the text corpus, and made a list of all the semantic roles they occupy and
the predicates they occur with. All and only instances of ga for such internal state
predicates as listed in (7) above are used with what is cognized (among what they
call ‘objective’). They attested instances of ga with the cognizer role for internal
state predicates as well (among what they call ‘experiencer’), but all their predicate
forms are marked with the evidential marker -garu, ‘to show the signs of ’, such
as hosi-garu, ‘show signs of wanting’, indicating that the cognizer in question is
not the speaker (see (4) in §2). Unfortunately, they do not extend their analysis
to cover the 3477 nouns marked with the topic particle wa in the data, except that
they note that they would posit ga as the underlying case marker for 3382 instances
(97.3%) of them (1997: 210). Therefore, the text discourse data shows that only
the cognized thing is marked with the nominative particle ga for the unmarked
forms of internal state predicates, while for their forms with evidential markers,
the (third person) cognizer is marked with ga. The first person cognizer might be
overt when the form is most likely to be marked with the discourse marker of topic.
In sum, the cognitive semantic analysis presented here can nicely account for
the otherwise problematic use of the nominative particle ga for the grammatical
object, as well as the other characteristic properties of internal state predicates in
Japanese. Their lexical semantic structure is inherently specified with the speaker
as the cognizer, which is realized as their person restriction to the effect that some
extra morpheme (evidential marker) is required to rid themselves of the lexical
specification in expressing the third person’s internal states. The same inherent
specification with the speaker leads to their default and most frequent use in which
the cognizer speaker is assumed and covert. The cognizer role of Japanese internal
state predicates, which is specified with the speaker, is in the background (like the
speaker’s role in the verb come), leaving the stimulus role for the most salient figure
on the onstage region – the best candidate for the predicate subject – and thus motivating its nominative marking. The construal patterns of internal state predicates
in Japanese motivate their behavior in morphology, syntax and discourse.
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.16 (797-847)
 Satoshi Uehara
Subjective construal in the Japanese language
The discussion so far demonstrates that internal state predicates in Japanese can
be best characterized as ‘deictic’ verbs (Langacker 1985) on a par with the verb
come in English or kuru, ‘come’, in Japanese. Their lexical semantic structure is
characterized with the inherent presence of the speaker, or the ground element,
in the base somewhere off of its central, onstage region. What they profile is the
object of conception from the vantage point of the speaker. In the case of internal
state predicates in Japanese, what they profile is what is cognized by the speaker,
where the speaker’s role as the cognizer is assumed and in the background.
The fact that the lexical entries of a deictic nature exist not only for motion
verbs but also for internal state predicates in Japanese, and that such internal state
predicates are large in number and productive as well in forming other subjective
predicates combining with stem forms of verbs in general, suggests the following:
the speaker’s perspective is rather frequently assumed in Japanese and prominent
for the viewing arrangement for expressions in the language. In other words, the
subjective construal more or less represents the unmarked pattern in Japanese,
unlike other languages such as English.
Some universalists may doubt that the speaker’s perspective is typically assumed and represents the unmarked pattern in Japanese, despite strong support
from recent studies on Japanese discourse (Iwasaki 1993; Uehara 1998b). These
studies have demonstrated that the relationship between the speaker’s perspective and the sentential subject in Japanese is more direct than other languages like
English. For example, Uehara (1998b) analyzed an English story and its Japanese
translation, and observed that when the speaker is coded as the subject in English
sentences such as in (17), these are typically rendered into Japanese like in (17’)
[see also Iwasaki 1993: 80]:
(17) Then I saw a girl standing there.
(17’) Suruto, onnanoko ga soko ni tatte
ita.
then
a girl
nom there at standing be.past
(lit.) “Then, a girl was standing there.”
In the contrastive patterns in (17) and (17’), the speaker’s role as discoverer of a
situation, which is encoded explicitly as the subject of the main (discovery) predicate in English, is structurally missing altogether along with the predicate of the
discovery in the typical Japanese construal. Here again, the speaker’s perspective is
assumed and linguistically unencoded in Japanese.
Getting back to the cognitive analysis of the nominative ga for the grammatical object, we can see the parallelism in the sentences in (17) and (17’). One can
see that a girl is the grammatical object in, I saw a girl standing there, but not in,
a girl was standing there. Ga appears to mark the grammatical object only if we
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.17 (847-907)
Internal state predicates in Japanese 
(force ourselves to) take the more objective construal typical in English, for the
originally ‘subjective’ predicates in Japanese. The ga-marked nominal is in fact
functioning as the grammatical subject in the unmarked subjective conception of
them, as in (17’).
The cross-linguistic variation in the unmarked viewing arrangements pointed
out here, has a profound implication for the theories of verbal semantics. In cognitive as well as many other linguistic theories, discussions of event structures
assume the optimal viewing arrangement as the unmarked one, where “the roles
of the observer and the observed are fully distinct” (Langacker 1991b: 550). Thus,
in Cognitive Grammar the ‘canonical’ event model represents “the normal observation of a prototypical action. . . a single event observed from a vantage point
external to its setting” (1991b: 545 [emphasis added]), and accordingly, the canonical viewing arrangement “incorporates the canonical event model” [emphasis in
the original].
Languages like Japanese, therefore, present a counter-example to the canonical status of such a viewing arrangement. In the viewing arrangement prominent
in Japanese, the roles of the observer and the observed are not fully distinct, and
events are frequently observed from a vantage point internal to its setting. The
canonical event model is in fact canonical as the event structure model for languages like English, but not so for Japanese. The term ‘canonical’ in the ‘canonical
event model’, then, should not be taken in the cross-linguistic sense, but in a
language-specific sense.
Taking a cross-linguistic approach and characterizing the event structure
model in terms of different viewing arrangements does not, of course, undermine
any of the previous, fruitful work in verbal semantics. Rather, it adds another
dimension to them, by making it possible now to discuss the canonical viewing
arrangement and the canonical event model for each language, and differences
in the degree of subjectivity among languages’ canonical viewing arrangements.
This newly added dimension is the “subjective axis” (Achard 1996: 1168) viewing arrangement, and without it events are assumed to hold in the “objective
axis” (Achard 1996) alone. The subjective axis is, as it were, the ‘third’ dimension
added to the former, ‘two dimensional’ event structure model. (Event structures
are depicted two-dimensionally, anyway.) The cognitive analysis presented above
for the problematic case marking of internal state predicates in Japanese thus represents an approach using this new event structure model. The traditional ga for
the grammatical object analysis, on the other hand, is a natural consequence of its
theoretical presupposition that event structures be of two dimensions.
Furthermore, the current research can arguably shed new light on the study of
linguistic subjectivity itself. For example, the existence of speaker-perspective oriented languages like Japanese seems to suggest cases of objectification, as opposed
to those of subjectification, exemplified by Langacker (1991a) with the synchronic
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.18 (907-957)
 Satoshi Uehara
compositional process of spatial configuration expressions.11 The semantic compositional process of internal state predicates in Japanese, in fact, represents the
synchronic process of objectification: lexically subjective, internal state predicates
become less subjectively construed in their compositional process to the sentence
and discourse levels. Taking an evidential marker in expressing the third person’s internal states as in (3) and (4) is literally a process of objectifying lexically
subjective, internal state predicates, with the evidential marker functioning as an
objectifying morpheme.
Cases of the diachronic objectification process in Japanese must await future research,12 but Uehara’s (1998a) analysis on the major parts-of-speech in
Japanese suggests a positive outlook. Uehara has shown that adjectival predicates
are structurally divided into the two categories, (Canonical) Adjectives and Nominal Adjectives, and that the former are represented by words in the native stratum,
while the latter are words of later coinage, mostly of non-native origin. Interestingly, most, if not all, of subjective predicates (including all of those listed in (7)
above) belong to the Canonical Adjective category. This suggests that later additions to the language’s lexical stock are all words of non-subjective type, possibly
leading to an objectification of the overall lexical structures. Of course, further and
more detailed research on this aspect of linguistic subjectivity is necessary.13
. Conclusion
In this paper I have given a cognitive semantic account for the grammatical behaviors exhibited by internal state predicates in Japanese. In this account, their
grammatical behaviors, such as ga being used for object marking and the impossibility of a third person subject, all come together and have conceptual motivations.
The analysis of internal state predicates in Japanese presented here crucially relies on, and therefore provides support for, the Cognitive Grammar theory of
subjectivity. This approach has cross-linguistic applicability in that it is possible to uniformly examine and analyze subjectivity-related phenomena in more
than one language, and thus provide a mechanism for considering cross-linguistic
variations in subjectivity, as well as handle language-specific realizations of this
phenomenon.
Notes
* Earlier versions of this paper were presented at the Research Issues for Cognitive Linguistics
Workshop at the Australian Linguistics Institute in July 1998; and at the Fourth Conference
on Conceptual Structure, Discourse and Language, in October 1998. I am grateful to the au-
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.19 (957-1023)
Internal state predicates in Japanese 
diences of those conferences, especially to Eve Sweetser and Ron Langacker for their valuable
comments. I would also like to thank June Luchjenbroers, the editor of the volume, and anonymous reviewers for their insightful comments and encouragement. Bob Sanders and Andrew
Barke also deserve my thanks for checking my English. All the remaining faults are my own. The
research was supported in part by a 1998 Grant-in-Aid from the Ministry of Education, Science and Culture (# 10680303). All correspondences concerning this article should be sent to:
Satoshi Uehara, Graduate School of International Cultural Studies, Tohoku University, Japan.
Email: [email protected]
. The following abbreviations are used in the gloss: ACC = Accusative; GEN = Genitive;
NOM = Nominative; NONPAST = Nonpast tense marker; PAST = Past tense marker; PL =
Plural; POL = Politeness marker; SG = Singular; TOP = Topic marker.
. In normal circumstances, the subjective predicates in Japanese can be used only with the 1st
person singular experiencer. Only in some contexts, however, where the speaker can know and
representatively express the emotion shared by her group members, the plural form is possible:
(18) Konna kekkoona mono o
itadaite, watasitati wa hizyooni uresii
desu.
this
valuable thing acc receiving 1.pl
top extremely glad.nonpast pol
‘We are extremely glad receiving this valuable thing (from you).’
It should be also noted here that the term subjective subsumes a number of related concepts
even in cognitive linguistics alone, and that the term here is different from that in Achard (1996),
which analyzes the French equivalent of want (vouloir) to be inherently subjective. The current
analysis examines the subjectivity of main predicates while his discusses that of complement
clauses. The following French data from his analysis (slightly modified) shows that the person
restriction dealt here for Japanese subjective predicates does not apply to French vouloir ‘want’:
a.
b.
Je veux revenir. ‘I want to come back.’
Jean veut revenir. ‘John wants to come back.’
. Similar distinctions in mode of discourse in other languages are noted in Benveniste (1971)
and Banfield (1982). Banfield (1982: 12), for example, compares Kuno’s ‘non-reportive’ style in
Japanese to “a literary style known to modern grammarians under the French term style indirect
libre and the German erlebte Rede.”
. There are predicates, other than subjective ones, which also form the double nominative constructions in Japanese (e.g. dekiru ‘capable/possible’ as in Taroo wa tenisu ga dekiru (Taro top
tennis nom capable.nonpast) ‘Taro is capable of/good at playing tennis.’). See Croft (1991) for
an analysis from a cognitive and typological perspective on such non-subjective, double nominative (and other non-canonical case-marking) constructions in Japanese and other languages.
The double nominative construction in Japanese arguably constitutes a case of constructional
polysemy, where non-subjective senses can be argued to be extensions from the core, subjective
ones. However, it is beyond the scope of the current paper. See the discussion on the diachronic
objectification process below for a relevant point.
. Thus, Langacker intends that (10a) is spoken by someone other than Veronica: As shown
in Figure 1a, Veronica is fully distinct from the speaker as his object of conception O in (10a).
Interestingly, however, we can also think of a situation in which (10a) is spoken by Veronica
herself (a more felicitous example would probably be Talk to George uttered by the former US
President George Bush pointing to himself). Such an interpretation of (10a), which is called
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.20 (1023-1087)
 Satoshi Uehara
(10a’) here, is more subjective than (10a) because the former does refer to the speaker. (10a’) is
more objective than (10b) because the former uses a non-deictic expression.
It should be noted in this connection that Langacker (1991a) discusses the ‘maximally objective’
level (e.g. Vanessa jumped across the table), compared to which, the senses of across in (10a) and
(10b) are more subjective in that they represent not concrete motion, but the abstract construal
of a conceptualizer (the speaker) tracing a mental path “in order to locate the trajector vis-à-vis
the reference point” (ibid.: 327). This is another kind of subjectification phenomenon and often
referred to as “subjective motion”.
. Traugott and Dasher (2002: 98) argues against Langacker’s analysis here observing the fact
that the reference point is not necessarily the speaker, but it could be someone else for the
sentence (11b) giving certain contexts for it. However, they seem to be confusing “maximally
subjective” in the basic level and that in the secondary level or the cases of what I call “perspective transfer” (Uehara 1998b), where the speaker/narrator more or less takes, and describes the
situation from, the perspective of someone else. The case under discussion for (11b) is the basic
level case where the speaker is the default reference point. (11b) in that sense depicts what the
speaker sees in real time.
It should also be noted in passing that according to Traugott (1995), this principle that zero
expression is iconic of maximum subjectivity holds only for subjectification processes involving “constructions which originate in argument structure (events, particularly motion events,
and the participants in them)” (Traugott 1985: 48), but not for those of others such as adversative connectives and focus particles. In contrast with Langacker’s, Traugott’s discussion of
subjectification originated with the latter.
. Without that specification, the viewpoint can be anybody else’s, and such scenes are objectively construed ones, like the one for the expression Sake ga aru (sake nom exist) ‘There is some
sake.’
. The concept of markedness, as typologically interpreted, can handle multi-valued categories
(e.g. “most/more/less/least marked” in addition to “marked/unmarked” in the classical twovalued model) and thus connect markedness patterns to typological hierarchies. Please see Croft
(1990), for example, for a detailed discussion of the model of typological markedness.
. ‘Thing’ here is a technical term in Cognitive Grammar, which designates a region in some
domain, and corresponds to the semantic pole of a noun.
. It should be noted that whether a certain role is external or internal to the lexical semantic
structure is a matter of degree (Cognitive Grammar assumes the lexicon-syntax-discourse continuum). Thus, some internal state predicates in Japanese may have their cognizer role almost as
inherent as those internal to them, which may explain why many take the cognizer role for the
subject argument of such internal state predicates degrading their stimulus role to their object
argument.
. Traugot (1995) points out that “Langacker’s work focuses primarily on ‘subjectivity’ as a
gradient phenomenon found synchronically,” (p. 32).
. Some native speakers of Japanese have pointed out that in their speech some of the internal
state predicates no longer require any evidential markers in expressing the third person’s internal
states. If research finds that more internal state predicates have lost their person restriction in
more people’s speech, it can constitute an instance of diachronic objectification.
JB[v.20020404] Prn:13/02/2006; 13:35
F: HCP1513.tex / p.21 (1087-1202)
Internal state predicates in Japanese 
. Ikegami (1999: 93) claims that the so-called “pro-drop” nature of Japanese, whereby thirdperson (and other person) subjects are frequently omitted, is the result of an extension from the
“cognizer-less” nature of “prototypical” subjective predicates.
References
Achard, Michel (1996). Perspective and syntactic realization: French sentential complements.
Linguistics, 34, 1159–1198.
Aoki, Haruo (1986). Evidentials in Japanese. In Chafe & Nichols (Eds.), Evidentiality: The
linguistic encoding of epistemology. Norwood: Ablex.
Backhouse, A. E. (1993). The Japanese language: An introduction. Oxford: Oxford University
Press.
Banfield, Ann (1982). Unspeakable sentences. Boston: Routledge and Kegan Paul.
Benveniste, Emile (1971). Problems in general linguistics. (Translation by Mary E. Meek). Coral
Gables: University of Miami Press.
Croft, William (1990). Typology and universals. Cambridge: Cambridge University Press.
Croft, William (1991). Syntactic categories and grammatical relations: The cognitive organization
of information. Chicago: University of Chicago Press.
Ikegami, Yoshihiko (1999). Nihongo rashisa no naka no “syukansei” (“Subjectivity” in Japaneselike-ness). Gengo (Language), 84–94. Tokyo: Taishûkan.
Iwasaki, Shoichi (1993). Subjectivity in grammar and discourse: Theoretical considerations and a
case study of Japanese spoken discourse. Philadelphia, PA: John Benjamins.
Kokuritsu Kokugo Kenkyûjo (1997). Nihongo ni okeru hyôsôkaku to shinsôkaku no taiôkankei
(Cases and Japanese postpositions), Kokuritsu Kokugo Kenkyûjo Report 113. Tokyo:
Sanseidô.
Kuno, Susumu (1973). The structure of the Japanese language. Cambridge, MA: MIT Press. 25
Kuroda, S.-Y. (1973). Where epistemology, style and grammar meet: A case study from Japanese.
In S. R. Anderson & P. Kiparsky A Festschrift for Moris Halle (pp. 377–391). New York:
Rinehart and Winston.
Langacker, Ronald W. (1985). Observations and speculations on subjectivity. In Haiman (Ed.),
Iconicity in Syntax. Amsterdam/Philadelphia: John Benjamins.
Langacker, Ronald W. (1987). Foundations of Cognitive Grammar. Vol. 1, Theoretical prerequisites.
Stanford: Stanford University Press.
Langacker, Ronald W. (1991a). Concept, image, and symbol: The cognitive basis of grammar.
Berlin: Mouton de Gruyter.
Langacker, Ronald W. (1991b). Foundations of Cognitive Grammar. Vol. 2, Descriptive application. Stanford: Stanford University Press.
Traugot, Elizabeth Closs (1995). Subjectification in grammaticalisation. In Stein & Wright
(Eds.), Subjectivity and subjectivisation: Linguistic perspectives (pp. 31–54). Cambridge:
Cambridge University Press.
Traugott, Elizabeth Closs & Richard B. Dasher (2002). Regularity in semantic change. Cambridge:
Cambridge University Press.
Uehara, Satoshi (1998a). Syntactic categories in Japanese: A cognitive and typological introduction.
Tokyo: Kurosio Publishers.
Uehara, Satoshi (1998b). Pronoun drop and perspective in Japanese. In Akatsuka, Hoji, Iwasaki,
Sohn, & Strauss (Eds.), Japanese/Korean linguistics Vol. 7 (pp. 275–289). Stanford: CSLI.
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.1 (47-117)
chapter 
Figure, ground and connexity
Evidence from Xhosa narrative
David Gough
Christchurch Polytechnic Institute of Technology, New Zealand
This article examines how the discourse concepts of background/foreground and
connexity are significant organisational factors in the Xhosa verbal system. After
defining background/foreground and connexity, evidence from Xhosa folk
narrative is presented to show how these features provide a richer and more
coherent explanation of the structure of the verbal system than the traditional
analysis. The article advocates an orientation to language which holds that its
nature can and should be explained in terms discourse/cognitive factors rather
than seeing it as a discreet and separate and internally describable ‘module’
Keywords: background, foreground, connexity, discourse, Xhosa (Bantu)
.
Introduction
In this paper the perspective taken is that discourse pragmatic and cognitive factors are essential to understanding the way in which language is organised and
structured. Such an approach focuses on the notion that the nature of language
can and, indeed, should be described in terms outside language itself – beyond,
that is, the dominant conceptualisation of language which sees it as an independent mental module or faculty and therefore being the way it is because it is the
way it is (for a convincing critique of this perspective, see Givón 1990). This paper
will argue specifically, that the concepts of ‘grounding’ and ‘connexity’ contribute
significantly to explaining the structure of, at least, the Xhosa verbal system. It will
do so by focussing on data taken from folk narratives.
. Connexity/dependence
Analysis of discourse reveals a basic principle: the relative syntactic dependence
of a clause signals its relative conceptual connection or integration to its dis-
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.2 (117-161)
 David Gough
course context (Gough 1986: 79; see also Givón 1990 for a similar perspective).
The concept of connexity in this regard refers to the degree to which information in a particular clause is seen as, on the one hand, independent from or, on
the other hand, integrated with or ‘dependent on’ the information in a previous
clause. While there are a variety of lexical and grammatical features that encode
such connexity (cf. Thompson 1987), of particular interest to this paper is the fact
that in many languages ‘dependent’ or ‘subordinate verb’ forms tend to display less
prototypical features of verbs, such as tense-aspect, modality and agreement (see
Gough 1986), which together constitute the traditional category of ‘finiteness’ (see
also Givón 1990; Carlson 1992).
. Background and foreground information
The distinction between background and foreground is, of course, basic to human perception. It is also one of the most basic concepts in discourse analysis (see
Wallace 1982; Givón 1987; and Tomlin et al. 1997 for overviews). In metaphorical terms the foreground event clauses of a narrative form its skeleton – its basic
structure, which advances the story itself. The event clauses are arranged in terms
of temporal sequence forming an event line. According to Hopper and Thompson
(1980) background information adds flesh to this skeleton, not advancing the story
but rather characterising the backdrop against which the story develops. For this
reason it is also known as durative descriptive information (to be referred to as d/d
information in the discussion below).
(1) a.
b.
c.
d.
e.
f.
g.
h.
Yahamba lahamba.
He travelled and travelled.
Lithe lisahamba njalo ladibana nomvundla.
While he was so travelling, he met a rabbit.
Lafika ijoni labuza kumvundla ukuba khange liwubone umvundla.
The soldier arrived and asked the rabbit whether it had seen a rabbit at
all.
Umvundla lo nawo wayenxiba indevu apha phezu komlomo.
(The rabbit was wearing a moustache here above the mouth)
Wabuza umvundla, ‘kunjani lo mvundla uwufunayo?’
The rabbit asked, ‘What’s this rabbit like that you’re looking for?’
Lathi elijoni ukuphendula ukuphendula, ‘Ufana nawe.’
The soldier answered, ‘He looks like you.’
Wathi umvundla, ‘Hayi, zange ndiwubone umvundla oneendevu.’
The rabbit said, ‘I’ve never seen a rabbit with a beard.’
Wathi umvundla, ‘Hayi, hamba, mlhawumbi uphazamile.’
The rabbit said again, ‘No, go, maybe you’re mistaken.’
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.3 (161-223)
Figure, ground and connexity 
i.
Hayi ke, nejoni laqonda okokuba mhlawumbi liphazamile.
Anyway, the soldier too thought he was perhaps mistaken.
j. Lahamba, labuyela umva.
He travelled and went back.
k. Lithe lisahamba njalo, laqonda ukuba, ‘Hayi. . . ’
While he was so travelling, he thought, ‘No. . . ’
Here we may note that each successive event clause advances the story line and
that it is either temporally or causally consequential to the clause that precedes it.
Changing the order of any of these clauses would change our interpretation of the
events they encode. The d/d information, however, is off the event line. We may
note that (d) for example, is not temporally or causally related to the events that
precede or follow it. Rather it represents parenthetical background information
necessary for the comprehension of the events.
In conceptual terms, the distinction between durative descriptive and foregrounded ‘event’ information can be seen in terms of temporal grounding. Such
temporal grounding is parallel to the organisation of visual information. According to Eysenck (1984: 33) a fundamental way in which visual information is organised is the ‘segregation of the visual field into one part called the figure and
another part called the ground’. In general, the figure has ‘thing-like’ qualities, is
well-defined and bounded; while the ground in which the figure is perceived is,
in contrast, continuous, less definite and boundless. An example of this is the figure of a house perceived against the background of the sky. Events can be seen as
temporal figures: perceived as temporally bound and discrete against a temporal
background of continuous and durative situations. Such grounding ,which is basic to perception, thus also appears to form an important organisational principle
in language. Wallace (1982: 214) for instance, presents the hypothesis that certain linguistic categories “function to differentiate linguistic figure from linguistic
ground”, a perspective more recently illustrated in Langacker’s Cognitive grammar, as discussed by Ungerer and Schmid (1996). Similarly, Longacre (1981: 329)
notes that the figure-ground categories, once distinguished solely on semantic basis, are “more and more seen to correlate with the morphosyntactics of the world’s
languages”. My analysis supports this particular perspective.
. Some points on past approaches
The framework sketched above is of significance for two broad reasons. Firstly, it
allows an alternative analysis of Xhosa verbal categories as opposed to the dominant taxonomic framework which continues to hold sway in the categorisation of
Bantu verbal forms. Traditionally, this framework, known as the Dokean frame-
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.4 (223-259)
 David Gough
work, uses the terms ‘mood’ and ‘tense’ as convenient labels to refer to quite
diverse verbal categories. (The framework was named after the Bantuist Clement
C. Doke. For a critique see Khoali 1993.) The result was a mixed bag of verbal
inflections all falling under the same general rubric with little indication of systematicity. Mood itself is rather vaguely defined (beyond simply being a ‘form of the
verb’), and the the discussion of these various ‘forms’ does not posit any underlying systematicity. Traditional accounts (e.g. Davey 1973) typically simply name the
‘moods’ (typically indicative, consecutive, participial and subjunctive) and then go
on to describe these as disparate items. With regard to two of the verbal categories
discussed in this chapter for instance, the consecutive is described simply as a ‘subordinate mood type’ signalling consecutive actions in the past (Davey 1973: 106),
while the participial mood is merely described as a verb form used for actions simultaneous to those in a main clause (Davey 1973: 106; Du Plessis 1978: 135). As
will be demonstrated in more detail in this chapter, these verbal categories appear
though to be systematically structured around key discourse concepts.
The second reason why a functionally oriented framework is significant is
that it allows some insight into some of the debates that have emerged on discourse pragmatic accounts of grammar, particularly surrounding the coding of
foreground and background information. Research has specifically questioned the
claim that there is a straightforward relationship between subordinate clauses and
background information on the one hand; as well as independent clauses with
foreground information on the other (cf. Thompson 1987; and Wallace 1987 for
some questions in this regard). As grounding and dependence are treated separately here, this may present an alternative approach to the issues involved.
. Connexity, grounding and Xhosa narrative
The concepts of grounding and connexity referred to above appear to form the
organisational basis of a good deal of the Xhosa verbal system. In particular I will
show that the verbal forms, referred to as the participial, consecutive and indicative moods as well as the so-called ‘continuous tense’ rather than being isolated
grammatical structures, form a sub-system that is structured around grounding
and connexity.
The consecutive mood
The consecutive marker is -a- (to be referred to here as cons). The structure of the
consecutive is: Subject concord-a-Verb Stem), for example,
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.5 (259-316)
Figure, ground and connexity 
(2) ixhego
li-a-thetha > ixhego lathetha
old-man he-cons-talk
“and the old man spoke”
The consecutive has been traditionally described as a ‘subordinate mood type’
with the function of, inter alia, encoding consecutive actions in the past (Davey
1973: 106). Consider the following example:
(3) UThemba uye
evenkileni wathenga
ukutya
Themba he-perf-ind-go loc-shop he-cons-buy food
wagoduka
he-cons go-home
“Themba went to the shop, bought food and went home.”
Here the first (non-consecutive) clause of the sentence uses the ‘independent’ indicative mood (perfect) while the second (consecutive) clause uses the dependent
consecutive mood. Connection is thus not expressed through an overt conjunction
such as ‘and’ in English, but rather through a verbal inflection.
It is significant to note that the consecutive is not marked for tense; it inherits
polarity from a preceding main clause; and it has limited aspectual marking in
relation to verb forms traditionally regarded as ‘independent’. It is thus marked
for less finiteness than other verb forms. The following is a textual example of the
consecutive taken from a folk narrative:
(4) a.
wabetha
kuyo ephondweni
he-cons-hit to-it loc-hom
“He hit it on the horn”
b. kwasuka
kwaphuma
ukuyta
it-cons-go it-cons-come-out food
“and some food came out”
c. watya
He-cons-eat
“and he ate”
d. wahlutha
he-cons-full
“and got full”
e. wagoduka
he-cons-go-home
“and went home.”
The consecutive according to this approach encodes two things: connexity and
foregrounded event information. Unlike the indicative past or perfect, the consecutive is marked for connexity, signalled by its less than finite form, to the clause
that precedes it. Furthermore, unlike the participial which, as we shall see, also
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.6 (316-370)
 David Gough
encodes such connexity, it does not involve a focus on the internal structure of
the situation it encodes. All the consecutive clauses in (3), for example, refer to
temporally bounded situations that move the time of the story forward, and all
can be answers to the question, “what happened then?”. With no focus on either
the internal structure of situation, nor its temporal orientation, the focus of the
consecutive is the occurrence of the event itself.
If the consecutive signals connexity, then breaks in the conceptual relatedness
of the narrative should be indicated by the non-use of the consecutive. In such
places the so called independent indicative mood should occur. This is indeed
supported by the following example (here indperf indicates the indicative perfect) taken from a Xhosa folk narrative. Note here that example (4.2) follows on
from example (4.1).
(4.1) a.
hayi ke uhambile
ke
umntwana nenqwelo
yakhe
no-then she-travel-perf then child
with-carriage of-her
“So then, the child travelled with her carriage.”
b. wayifihla
ke
lo mtwana inqwelo etyholweni
she-cons-it-hid then this child
carriage loc-bush
“Then the child hid the carriage in the bush.”
c. wafika
apha emdanisweni
she-cons-arrived here loc-dance
“She arrived here at the dance.”
d. yaye
inkosi idanisa
nezaa
ntombi zimbini
he-pct chief he-part-dance with-those girls
they-two
“The chief was dancing with those two girls.”
(4.2) a.
hayi okunene uyithathile
le ntombi isangena
no truly
he-her-take-perf this girl
she-part-enter
emnyango
loc-doorway
“So then truly, he took the girl as she entered the door.”
b. wayixhwila ngoko
he-cons-her twirl then
“He twirled her around then,”
c. wathi
nanku umfazi ungenile
he-cons-say here-is wife
she-part-enter-perf
“and said, ‘This is my wife, she has entered,”’
d. wadinisa
naye
ngobusuku bonke
he-cons-dance with-her with-night all
“and he danced with her the whole night.”’
In both of these cases, the sections are distinct: in Givón’s terms (1990: 826), there
is a thematic break between these sections. In (4.1) the common orientation of the
clauses is the series of events leading up to the girl’s arrival at the chief ’s party. The
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.7 (370-433)
Figure, ground and connexity 
ideas in (4.2) are distinct from those in (4.1) as the orientation now switches to
focus on the chief ’s actions. Just as there is a break in conceptual connexity, there
is a matching break in syntactic connexity or dependence with the occurrence of a
clause using the indicative mood.
The participial mood
The form of the positive participial is: Subject Concord + Verb stem, for example,
(5) ixhego
li-cula > ixhego licula
old-man he-part-sing
“the old man singing”
The participial morpheme itself is realised supra-segmentally through certain perturbations of the tonal form of the verb (historically as the result of a high toned
morpheme -*ki-) and it has a durative significance. It occurs in subordinate clauses
and while it displays both polarity and a range of aspectual markings, it is not
marked for tense, with the time orientation being an inherited feature of the
associated independent clause.
Consider the following individual examples with their associated discourse
contexts:
(6) a.
baya
emdanisweni elila
njalo
lo mntwana
they-cons-go loc-dance she-part-crying like-this this child
“They went to the dance, this child crying so.”
b. wahamba
ethwela
umthwalo
she-cons-travel she-part-carry load
“Then she travelled, carrying her load.”
c. wafika
engekho
he-cons-arrive she-neg-part-there
“Then he arrived, she not being there.”
Traditionally participial clauses of the above type have been described as a mood
type occurring only in subordinate clauses and encoding actions simultaneous to
those in the main clause (for example, Du Plessis 1978: 135). If this were an adequate description then the information encoded in the participial would have the
same status as that encoded in consecutive clauses, that is, encoding foreground
events. However, it appears that the information is of a different status encoding rather background information as defined above. The participial clauses in
the examples above, as well as participial clauses more generally, do not, I claim,
code events and do not thus form part of the event line advancing the story line.
They, like the consecutive, encode syntactic connexity to the clauses they follow.
Unlike the consecutive, however, they are marked for ‘durative’ aspect, and thus,
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.8 (433-487)
 David Gough
rather than representing actions or events, they encode unbounded temporally
continuous situations. It is in terms of these situations that the associated consecutive, representing bound events, are foregrounded. The situation is therefore not,
as traditional descriptions would have it, simultaneous to the event, but forms,
rather, its durative background so that the bounded and momentary event is located within the temporally durative framework established by the participial. In
(6a) above, for example, the event of the girl’s going to the dance is given the
temporal backdrop of the girl’s crying and in (6b) the girl’s travelling is similarly
located in the durative backdrop of her carrying a load. Neither of these clauses
contributes to the movement of narrative time.
Research into the participial in other Bantu languages supports this view. Wald
(1975) and Poulos (1982) argue, respectively, that in Swahili and Zulu the participial is, in both form and function, a temporal relative clause. Poulos (1982: 210)
states that the participial, like other relative clauses, has a ‘restrictive force’; what
participial clauses restrict as relative clauses is the “dimension of time” (1982: 219).
This approach is supportive of the present view of the participial in terms of its
backgrounding function.
The continuous tense
The form of the so-called continuous tense is Subject Concord-a-(ye/be) participial,
for example:
(7) si-a-(yebe)
sihamba > sasihamba
we-past-pct we-part travel
“we were travelling”
The form given above has been traditionally labelled the (remote) past continuous tense (pct) which has been described as indicating “an action which was in
progress . . . at some time in the past” (Davey 1973: 87).
The pct, typically a fully finite form, is a compound utilising an auxiliary verb
-be (also realised as -ye and optionally elided), which encodes the notion of ‘being’, preceded by the past tense marker -a-. As complement to this auxiliary, the
participial indicates the temporal domain or durational situation of this being. In
the illustration above the being is restricted to the temporal domain of ‘travelling’.
The pct encodes, in terms of this durational basis, an unbounded situation as opposed to an event. It is important to note in this respect that the pct does not as
a whole form the durative background of a contingent event as does the participial on its own. Rather, the pct indicates an independent ‘scene’. In narrative, pcts
usually cluster together to form the initial settings of the tale which functions as
an orientation to the body of the story events. Consider the following example:
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.9 (487-555)
Figure, ground and connexity 
(8) a.
kwakukho
umntwana ekwakusithiwa
ngujon
nabanye
It-pct-it-present child
part-it-pct-said cop-John with-others
abantwana bakokwabo
children
of-home
“There once was a child called John and other children at home.”
b. ke
ngoku ke lo mntwana wayengathandwa
kokwabo
Then now then this child
he-pct-neg-like-pass cop-home
enikwa
iinkonzo zombona
he-part-give-pass husks
of-maize
“Now then, this child was not liked at home, being given maize husks.”
In such settings there is no focus on the movement of narrative time as such.
Rather, the durative setting orientating the audience to the story world is described
before the events occurring in this backdrop are described. The following examples illustrate the use of pcts, not in the initial setting, but in the body of the
narrative itself:
(9) a.
laflka
ijoni labuza
kumvundla ukuba khange
He-pct-arrive soldier he-pct-ask loc-rabbit that ever
uwubone
na
umvundla
he-it-see-subj ques rabbit
“The soldier arrived and asked the rabbit whether it had seen a rabbit at
all.”
b. umvundla nawo wayenxiba indevu
apha phezu komlomo
Rabbit
with-it he-pct-wear moustache here above of-mouth
(“The rabbit was wearing a moustache here above the mouth”)
c. wabuza
umvundla unjani lo mvundla uwufunayo
He-cons-ask rabbit
it-how this rabbit
you-it-want-rel
“The rabbit asked, ‘What’s this rabbit like that you’re looking for’?”’
In these examples we may see that pct clauses are clearly off the event line, representing background information. The pct forms are thus backgrounding in
function. They encode, not the bounded events holding only for the moment
of their occurrence, but temporally unbounded situations which hold for the
narrative world in general. Furthermore, unlike the participial, the pct indicate
independent scene.
We are now in a position to see how the concepts of grounding and connexity are fundamental to the organisation of the Xhosa verbal system. This can be
represented in the following diagram:
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.10 (555-588)
 David Gough
Table 1. Grounding and Coherence relations
GROUNDING:
Foregound
event
Background
non-event
COHESION:
Consecutive mood
Participial mood
Indicative mood
Non-continuouns
Aspect
Connected
Nonconnected
. Discussion
The framework proposed here is an attempt to show that in Xhosa there is a systematic basis, in terms of discourse functions, to what have been labelled fairly
arbitrarily as ‘moods’ . Through this paper, I hope to have demonstrated the value
of an orientation to language which holds that its nature can and should be explained in terms of factors outside of language , as narrowly conceived by some
branches of both traditional taxonomic and current theoretical language study.
Without this orientation that does not see language as the product of a separate
‘module’, we will remain at the whim of a view of language that is effectively removed, abstracted and isolated from the humans whose cognitive activities it is
supposed to define. From this perspective, it is hoped that the concept of connexity, in addition to that of grounding as explored in this paper, may allow some
insight into language study as an essentially human endeavour.
References
Carlson, R. (1992). Narrative, subjunctive and finiteness. Journal of African Languages and
Linguistics, 13, 59–85.
Davey, A. S. (1973). The Moods and Tenses of the Verb in Xhosa. Master’s dissertation,
University of South Africa, Pretoria.
Du Plessis, J. A. (1978). Isixhosa 4. Goodwood: Audiovista.
Eyesenck, M. W. (1984). A Handbook of Cognitive Psychology. London: Lawrence Erlbaum.
Givón, T. (1987). Beyond Background and Foreground. In R. Tomlin (Ed.), Coherence and
Grounding in Discourse (pp. 175–187). Philadelphia: John Benjamins.
JB[v.20020404] Prn:10/02/2006; 8:34
F: HCP1514.tex / p.11 (588-647)
Figure, ground and connexity 
Givón, T. (1990). Syntax: A Functional Typological Introduction, Vol. 11. Amsterdam: John
Benjamins.
Gough, D. (1986). Xhosa Narrative: An Analysis of the Production and Linguistic Properties of
Discourse with Particular Reference to Iintsomi Texts. Doctoral thesis, Rhodes University.
Hopper, R. J. & S. A. Thompson (1980). Transitivity in grammar and discourse. Language, 56,
251–299.
Khoali, B. T. (1993). Cole’s Dodean model: Issues and Implications. South African Journal of
African Languages, 13(1): 29–32.
Longacre, R. (1981). A spectrum and profile approach to discourse analysis. Text, 1(4), 337–359.
Poulous, G. (1982). Issues in Zulu Relativization. Department of African Languages, Rhodes
University. Communication No. 7.
Thompson, S. A. (1987). “Subordination” and Narrative Event Structure. In R. Tomlin (Ed.),
Coherence and Grounding in Discourse (pp. 435–452). Philadelphia: Benjamins.
Tomlin, Russell S., L. Forrest, M.-M. Pu, & H. K. Myung (1997). Discourse Semantics. In T. van
Dijk (Ed.), Discourse as Structure and Process (pp. 63–111). London: Sage.
Ungerer, F. & H.-J. Schmid (1996). An introduction to cognitive linguistics. Harlow: Addison
Wesley Longman.
Wald, B. (1975). Variation in the System of Tense Markers of Mombassa Swahili. Doctoral thesis,
Columbia University, New York.
Wallace, S. (1982). Figure and Ground: The Interrelationships of Linguistic Categories. In P. J.
Hopper (Ed.), Tense-aspect between Semantics and Pragmatics (pp. 201–223). Philadelphia:
John Benjamins.
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.1 (48-119)
chapter 
Discourse organization and coherence
Ming-Ming Pu
University of Maine at Farmington
This chapter investigates discourse organization and coherence from a cognitive
perspective and demonstrates that stories produced in different forms and
languages are strikingly similar with regard to their structural organization,
coherence building, and event coding. Speakers/writers are generally quite
sensitive to the episode boundary information, and organize narratives into
separate yet interrelated episodes. They seek and achieve coherence through
establishing story frame, focusing on the central character, systematically
tracking references, and maintaining topic continuity. The discourse
organization and coherence establishment seems to be a systematic and even
automatic process, which is governed by our underlying cognitive activities and
driven by our subconscious attempt to enable our addressee to establish mental
representations congruent with our own in discourse processing.
Keywords: discourse structure, discourse coherence, cognitive activities, episode
and episode boundaries
.
Introduction
Researchers in various fields have investigated and shed light on how speakers and
writers organize discourse to achieve coherence in terms of thematic structure
and information units. It has been shown that coherence is not only an observable artifact of the external text or discourse, but also a cognitive phenomenon
in the mind that processes the discourse. Van Dijk and Kintsch (1983), for example, propose the construction of a mental representation of text as consisting of
both microstructure and macrostructure, reflecting local and global organization
respectively. Similarly, Givón (1995: 63) argues that text is represented in part as a
network of connected nodes (chunks). This network structure displays both hierarchical organization, where nodes are connected both ‘upward’ and ‘downward’
to other hierarchically adjacent nodes, and sequential chaining, where nodes are
connected to both preceding and following sequentially adjacent nodes. Furthermore, many studies (Chafe 1992, 1994; Fox 1987; Lichtenberk 1996; Pu 1995; and
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.2 (119-168)
 Ming-Ming Pu
Tomlin 1987) have not only explicated the importance of speakers’ and writers’ underlying cognitive constraints but also their awareness of the addressee in discourse
processing by signaling discourse units and prompting the information retrieval.
These studies demonstrate how speakers employ explicit versus implicit anaphora
to mark changes of episode, shifts in location, and interventions of main storyline. Chafe (1992, 1994), in particular, describes the constraints upon speakers in
casual and unplanned conversation and how speakers assess the current status of a
given idea/event/referent in their listeners’ mind and systematically verbalize it as
given/old, accessible, or new information in an ongoing discourse.
Although researchers have agreed that cognitive operations underlie the overall discourse structure to guarantee coherence in the external discourse, it is not
always clear how mental representations of discourse are construed during comprehension, and how they are realized during production. One of the most evasive
and slippery issues is discourse structure, which is paramount in the study of
discourse organization and coherence, but the structural units such as episode,
paragraph, event, theme, etc. are not conceptually and theoretically well defined
and prone to misinterpretation. The identification and discussion of mental representations of these discourse units, on the other hand, are also problematic because
they are based mostly on some text-oriented notions such as ‘paragraph’, ‘discourse segment’, ‘sequence of thematically related sentences’ etc., and hence risk
the problem of circularity.
While also taking a cognitive approach to discourse organization and coherence, the present study aims to investigate cognitive activities underlying discourse
organization and coherence, specifically the structure of episodes and its mental
representations. The study first tries to define and identify, independent of linguistic information, conceptual structures of episode, and then addresses the issue
of how these structural units are construed and represented in discourse comprehension and production. The study uses narrative data elicited from both English
and Mandarin Chinese speakers to demonstrate the universal characteristics of
discourse organization and information packaging regardless of the speakers’ linguistic background since discourse processing is constrained by general human
cognitive activities (Chafe 1994; Gernsbacher 1990; Tomlin 1987).
The following section details a narrative study, and in subsequent sections I
will consider a number of arguments and claims that have appeared in the literature relating to discourse structure and coherence, using speech and written
samples taken from the study.
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.3 (168-214)
Discourse organization and coherence 
. The narrative study
The narrative study was conducted to examine how speakers process incoming
information and organize it into a structured and coherent discourse, and how
they deliver such a structure in discourse production. The different conditions
and tasks were designed to test if the structural unit of episode has psychological
relevance, and what are common characteristics or ‘universal rules’ of structural
organization and information packaging in producing narratives, given some general cognitive activities underlying discourse processing.
Episode and episode boundary
The present study argues that the basic structural unit of narrative discourse is
the episode, which corresponds to the speaker’s mental representations of a narrative. Since the construction of episodes plays a crucial role in the present study,
definitions are given below for the theoretical concepts of episode and episode
boundary, which are drawn basically from Chafe (1994), van Dijk and Kintsch
(1983), Pu (1995) and Tomlin (1987).
An episode is defined cognitively as a memory unit in the flow of information processing. Linguistically, it is a semantic unit subsumed under a macroproposition, which functions to unify ideas of the unit. The macroproposition is
generally a topical expression, featuring a global predicate (that denotes a global
event or actions), a specific cast of participants, and/or time and place coordinates.
Episodes in a discourse may be of varying length or scope.
An episode is conceived of as a part of a whole discourse, having a beginning
and an end. The beginning and end of an episode are defined in terms of propositions subsumed under the same macroproposition, while the propositions preceding the first and following the last proposition of an episode should be subsumed
under different macropropositions. The transition between macro-propositions
represents episode boundaries. They are normally marked by expressions denoting changes in time, place, scenery, participants, perspective, possible world, etc.
Cognitively, boundaries may also be manifestations of attention shifts.
Studies have shown the existence of episodes as chunks in narrative memory and episode-formation appears to be a virtually automatic process in story
processing (Black & Bower 1979; Guindon & Kintsch 1982). Other research into
story comprehension (Haberlandt, Berian, & Sandson 1980; Gernsbacher 1990)
suggests that cognitive processes inside an episode are different from those at
or around the episode. Comprehenders map the current information onto a developing structure within an episode when incoming information coheres with
the previously presented information, while they shift from actively building one
structure to start another between episodes when incoming information is less co-
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.4 (214-272)
 Ming-Ming Pu
herent. The process of shifting costs more mental effort than mapping and thus
comprehenders have more difficulty accessing information that occurs after an
episode boundary than before a boundary (Anderson, Garrod, & Sanford 1983).
It seems that comprehenders are quite sensitive to the cues that prompt them
to carry out either a mapping or a shifting process. On the other hand, in order to convey their intended message successfully, speakers and writers must give
their addressees signals or cues to help them build up a discourse representation
congruent with his/her own. The present study aims to investigate and demonstrate how speakers and writers organize narrative discourse into episodes during
narrative production and how they orally convey the structural network to their
addressees.
Stimulus material
The stimulus material came from a children’s picture storybook (Krahn 1981),
which has no written text and depicts several adventures of a little boy on a certain
day. The book consists of 8 episodes, each of which has 8 pictures and is headed by
a subtitle denoting a particular adventure with a picture clock showing the time
of the day. In each of the episodes, the main character, a little boy named Alex
Pumpernickel, is accompanied by a different secondary character of either the
same- or different-gender. Three episodes were chosen for our study. A total of 24
pictures (Krahn, F. (1981). Here comes Alex Pumpernickel! Boston: Little, Brown &
Co), with the subtitle and picture clock removed, were made into an adapted picture book of 12 pages (2 pictures per page). The purpose of the experiment was to
establish whether the subjects would perceive, organize, produce and retrieve the
non-verbal information as episodes, as would be predicted by the episode theory
(Schank & Ableson 1977; van Dijk & Kintsch 1978).
Visual rather than the verbal material was chosen because (1) the processing and organization of information is considered general, rather than languagespecific cognitive activities (Bagget 1979); (2) with the subtitle and picture clock
removed from the stimulus material, the subjects’ recognition of episodes in this
experiment would be independent of linguistic information, and we would thus
avoid risking the problem of circularity in defining and identifying episodes in
discourse; and (3) the picture book consists of separate but related episodes.
If episodes have psychological content, subjects should be able to identify and
store these episodes as memory representations, and recall them verbally as
separate episodes.
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.5 (272-329)
Discourse organization and coherence 
. Method and procedures
There are two narrative tasks for the subjects: an oral on-line (i.e., impromptu)
description of the picture sequence and a recall of the pictures afterwards. The
instruction was presented to the subjects in written form, which did not mention
or suggest that the pictures ‘tell a story’ or ‘stories’. In the on-line task, subjects were
asked to describe each picture while paging through the picture sequence for the
first time. It was expected, as explained earlier, that subjects would recognize visual
cues at the beginning of an episode, and would employ larger coding material at
such junctures, to lay a foundation for the new substructure and signal the shift
to the listener. In the recall task following the on-line description, subjects were
asked to retell the picture sequence from memory.
The purpose of the dual-narrative task was to see how a speaker would construct a narrative without a specific discourse plan (i.e., without knowing what
was happening next), as contrasted to a planned or structured oral narrative from
memory. The recall task, on the other hand, was carried out in either oral or written forms: half of the subjects retold the story orally and the other half wrote
the recall.
Forty subjects participated voluntarily in the experiment. Twenty were native
English speakers from Northern State University in the United States, and twenty
were native Mandarin Chinese speakers from the Central China University of Finance and Economics in China. All subjects are undergraduates and about half in
each group are women.
. Results and discussion
In general, the speakers and writers of both languages produced very similar narratives in terms of episode organization, event coding and information patterning.
They recognized the three episodes in the picture sequence and used them in their
story construction. They followed the main story-line and encoded the important
events of each episode. They also processed story information such as given-new
and background-foreground in a consistent way. The remaining sections of this
paper will discuss the general characteristics of episode construction, information processing and event encoding. I will use oral and written narrative samples
from both languages to illustrate these characteristics, with each example coded
to capture: the relevant language (E-English or C-Chinese); task type (O-on-line
or R-recall); recall mode (RS-spoken recall or RW-written recall); and the subject
number (1 to 20).
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.6 (329-370)
 Ming-Ming Pu
Episodic structure
Many studies have demonstrated the existence of episodes as memory chunks in
discourse processing. Speakers, who are constrained by working memory limitations, would try to organize the overall discourse contributions into smaller semantic units, each of which is dominated by a macroproposition. Comprehenders,
on the other hand, would capture this episode structure in their mental representation by building separate substructures to represent each episode (Gernsbacher
1990) because whatever portion of the incoming information that is to survive
in longer-term memory must be translated rapidly into some form of episodic
mental representation (Givón 1995: 62). The results of the present study give support to the psychological relevance of episode structure in discourse production
and comprehension: although there was no written/linguistic clue in the stimulus
material suggesting there were three episodes in the picture sequence, subjects of
both languages recognized them, often with overt remarks. In both on-line and
recall conditions, subjects consistently organized the picture sequence into three
semantic units in their narrative production (as was intended by the author in
the original picture storybook), and frequently signaled and separated the units
linguistically.
More interestingly, in the recall task five subjects (three English and two Chinese) could only recall two of the episodes at first and then realized that one
(always the middle) episode was missing from their recall. The way they finally
recalled the second episode (‘Boy and Fly’) was informative. Each subject first recalled the macroproposition, and then the whole episode came flowing out. Some
exact wordings used by the subjects are: “Well, I remembered it’s the boy chasing
the fly”, “Okay, it’s about the kid swatting a fly,” or “Yes, it’s about the child and the
fly.” The memory relapse in the recall task gives further evidence to the existence
of episodes as chunks in memory and the monitoring role that macropropositions
play in discourse processing: information is organized, stored, retrieved, and forgotten as episodes, and macropropositions function to unify ideas of the episode.
In the on-line description task, speakers were very sensitive to the nonlinguistic cues of episode shifts, such as change of location, change of scenery,
change of activities, and change of characters, which were used in building episode
structure. Most speakers recognize boundaries between excerpts in the picture
sequence and mark the beginning of a new episode in their oral narratives accordingly. A new episode normally starts with an adverbial phrase of time or location,
as exemplified by the following.
(1) Outside in the backyard, the boy is playing tennis with a girl. . . .
(EO3)
(2) And then, the boy is walking on the street. . . .
(EO6)
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.7 (370-424)
Discourse organization and coherence
(3) zhe yi.ding shi ling.yi.ge gu.shi, yin.wei zhe nan.hai xian.zai zai
this must be another story because this boy
now
at
ke.ting
li, . . .
living-room in
“This must be another story because the boy is now in a living room. . . . ”
(CO2)
(4) ran.hou, zhe nan.hai chu.qu he yi.ge nu.hai da wang.qiu, . . .
then
this boy
go-out with a
girl
play tennis
“After that, the boy goes out to play tennis with a girl. . . . ”
(CO4)
In most cases, the adverbial phrase is accompanied by a reinstatement of the major
character with a full NP. The function of the full NP, however, is two fold. First,
building a new mental structure for a new episode consumes more cognitive effort
on the part of the speaker, for whom information of the previous episode becomes
less accessible at this point (Chafe 1994; Gernsbacher 1990). The speakers would
then use a full NP or a proper name to quickly reactivate reference because “[t]he
less predictable the information is, or the more important, the more prominent or
larger coding it will receive” (Givón 1993: 196, emphases in the original). Second,
the use of a full NP at the beginning of a new episode serves as a signal to the
listener, who needs to build a new mental structure for the incoming episode.
In the written narrative the episode boundary was made even more explicit. In
addition to adverbial phrases, eight out of ten English writers and all ten Chinese
writers used blank lines, numerical devices, or paragraph structure with or without indentation to separate episodes. Of the two English subjects who recalled
their story in one written piece without any visual demarcation, one managed to
indicate a new episode by repeating and underlining an adverb at the beginning of
the episode: “next, . . . ”.
The general characteristics of the language user’s perception and formation
of discourse units demonstrate the nature of discourse organization and the importance of episode structure in language production and comprehension. The
story information was not only hierarchically organized and produced as a series
of episodes, but also so stored and retrieved.
Achieving and maintaining coherence
This section examines how speakers construct macrostructures for episodes, and
how they relate events within an episode both linearly and hierarchically to achieve
local and global coherence. Our data show that during the on-line description
task when the episodic structure was being built, subjects tried to seek coherence
both externally in non-verbal materials and internally in mind. Specifically, they
construed a story frame (with temporal and spatial reference and central character

JB[v.20020404] Prn:16/03/2006; 16:24

F: HCP1515.tex / p.8 (424-486)
Ming-Ming Pu
effect), and maintained referential and topical continuity to achieve coherence of
the discourse.
Story frame
Although ‘story-telling’ was not mentioned in the instruction of the narrative
tasks, almost all subjects were prepared to tell a story of some sort at the beginning. They tried to organize the not-yet-known information into a familiar and
controllable structure or frame – a story, thus making their first attempt at obtaining discourse coherence. The story frame sets a macrostructure for the discourse,
to which incoming information can be related and explained. The following examples are the typical start of the on-line description.
(5) Once upon a time, there was a little boy
(EO4)
(6) The story starts with a boy . . .
(EO2)
(7) zai zhe.ge xiao gu.shi li, wo kan.jian yi.ge nan.hai, . . .
at this little story in I see
a
boy
“In this little story, I see a boy . . . ”
(CO7)
(8) xian.zai wo yao gei ni.men jiang yi.ge gu.shi
now
I want to you
tell a
story
“Now I’m going to tell you a story . . . ”
(CO3)
Once the ‘story-frame’ was set but little other information was available, subjects
tried to derive macropropositions as quickly as possible so as to relate subordinate
actions and information to the macro-proposition (see also Guindon & Kintsch
1982; Kintsch 1995) in a story. One such macroproposition is the establishment
of the central character. Subjects quickly identified the central character at the beginning of on-line task (as shown in (5)–(7) above), and then concentrated on his
actions and purposes to achieve discourse coherence. Though required to describe
each picture in the storybook, which contains a great deal of information, subjects did not describe indiscriminately everything in the picture sequence but were
more concerned about the actions and goals of the central character, and the cause
and outcome of the actions and events. They elaborated on the pictures that were
regarded as important in carrying out the story line, and explained the events and
actions that added to the understanding of the story, but only touched upon (some
even omitted) the pictures that were not critically related to the theme of the story.
Our data show that the overall mentions of the main character are more than twice
as many as those of any secondary character. In the first episode (‘Boy and Tennis
Ball’), for example, despite both characters appearing together in each of the eight
pictures, two thirds of the subjects focused on the boy, describing his actions and
adventure in detail yet mentioned the other character only occasionally. Examples
are given in the following passages.
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.9 (486-559)
Discourse organization and coherence
(9) He’s climbing up the window, and he’s looking into the house for the ball. He
goes into the house and tries to find the ball, . . . and all the while, the little
girl is standing there, watching. . . .
(EO1)
(10) ta pa.shang chuang.zi, wang wu
li kan, ta faxiang na qiu zai
he climb-up window toward room in see he find
that ball at
yige guo li. yu.shi ta pa.jin wu
li, xiang ba na.ge qiu cong guo
a
pot in so
he get-in room in want om that ball from pot
li lao
chu.lai, . . . zui.hou ta ba qiu lao.le chu.lai, shang.mian
in scoop out
finally he om ball scoop out
around-side
zhan.le xu.duo tang.xi ran.hou ta he xiao nu.hai ba qiu na.dao
stick much candy then
he and little girl
om ball take-to
(om=object marker)
hou.yuan li, . . .
back-yard in
“He climbs up the window and looks into the room, and he finds the ball in
a pot. So he gets into the room and tries to scoop the ball out of the pot. . . .
Finally he gets the ball out of the pot, which is wrapped in sticky candy. Then
he and the little girl take the ball to the back-yard, . . . ”
(CO2)
The central character is the back-bone of the story, chaining actions and events
throughout the main storyline. Focusing on the central character is a very important strategy in storytelling, which affords speakers to be selective in presenting
the incoming information, enables them to stay on the main storyline, and hence
allows them to obtain and maintain coherence. The ‘central character’ strategy
plays an important role not only in achieving discourse coherence but also in facilitating comprehension. A story is considered coherent and easy to comprehend
as long as the actions, intentions and purposes of the central character are stated
and explained, while those of secondary characters can be marginalized. Indeed,
as observed by Garrod and Sanford (1988: 174), “an unexplained action on the
part of a main character results in a delay to processing the sentence in which that
action occurs, while such an action on part of a secondary character results in no
such delay.”
Accompanying the central character in the early stage of the on-line task is the
set-up of the temporal and spatial reference, which not only delineates the story
frame but also functions to mark the shift or transition at the beginning of a new
episode. For example,
(11) Once upon a time, there was a little boy, . . .
(EO4)
(12) Late that day, the boy is out on the street walking, . . .
(EO11)
(13) tian wan.le, nanhai hui.dao wu.li, . . .
it
late
boy
return house
“It’s late now, the boy goes back to the house, . . . ”
(CO2)

JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.10 (559-633)
 Ming-Ming Pu
(14) yige qing.lang.de xiawu
yige nanhai he yige nuhai zai . . .
a
fine
afternoon a
boy
and a
girl are
“One fine afternoon, a boy and a girl are playing . . . ”
(CO3)
It is of interest here that the time of an episode or events only existed in the
speaker’s mind since nothing in the visual stimuli themselves indicates time with
the removal of the picture clock. The spatial reference, on the other hand, is established when speakers take cues from the pictures. As mentioned previously,
these subjects were very sensitive to boundary information and employed them in
encoding events. In the picture sequence the most readily available episode-shift
information was a change of location, such as from a living room to a street, from
the street to a backyard, etc. Subjects immediately recognize the shift, and mark
it linguistically in their narratives to lay the foundation, so to speak, for the new
episode. Some of the examples are:
(15) Outside on the street, the boy . . .
(EO7)
(16) xian.zai zhe nan.hai chu.xian.zai da.jie shang. . .
now
this boy
appear at
street on
“Now the boy appears on the street. . . . ”
(CO5)
Once temporal and/or spatial references are set globally, they are maintained locally throughout an episode to give the listener a coherent time frame and the
spatial orientation of the episode. The following examples contain some of the
typical episode-medial time and locative phrases, which help achieve and maintain
local coherence of the episode.
(17) Just as he puts the newspapers together, . . .
(18) He climbs onto a chair next to the couch, . . .
(19) deng lao tai.tai yi zhuan.guo jie
jiao, . . .
wait old lady just turn-over street corner
“As soon as the old lady turns around the street corner . . . ”
(20) chuangzi li shi yi.ge chu.fang, . . .
window in is a
kitchen
“It’s a kitchen (inside the window), . . . ”
(EO6)
(EO15)
(CO9)
(CO17)
The use of temporal and spatial reference is another attempt that speakers make
to obtain and maintain coherence of the discourse. Although explicit cohesion
markers, as mentioned above, occurred frequently in our narrative data, they are
not necessary nor sufficient to make a discourse coherent. During the on-line
description, subjects also maintained temporal and spatial coherence implicitly
by connecting the order of events sequentially, and moreover, they sought and
achieved discourse coherence by relating subordinating actions and events hierarchically to the higher level goal or dominant macroproposition of the episode,
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.11 (633-685)
Discourse organization and coherence
once the temporal and/or spatial framework was set. This was shown in speakers’ descriptions when they were puzzled by an action or a motive of the central
character in an episode. Though unsure of the goal or purpose of a particular action or scene, subjects would try to tie it to the macroproposition of the episode.
For example, in the second episode (‘boy and fly’), many speakers were not certain
why the boy was messing with the newspapers in pictures 6 and 7. They paused,
hesitated, and/or expressed uncertainty about the ongoing event, but managed
nevertheless to come up with explanations that contribute to the theme of the
episode, viz., the boy’s attempt to swat a fly. The following excerpts exemplify such
an effort.
(21) He’s going through the papers . . . I guess he’s looking for the fly.
(EO8)
(22) ran.hou, ta ba bao.zhi
pao qi.lai, ren.de dao.chu dou.shi. ta zai
then
he om newspaper throw up
throw everywhere
he is
ta yi.ding shi xiang rang na.ge cang.ying fei chu.lai
he must is want let that fly
fly out
“Then, he throws the newspapers everywhere. He must be trying to get the fly
to fly out.”
Furthermore, when a surprising outcome or climax occurred late in an episode,
subjects would try to make sense out of it and incorporate it into the developing
episode, especially if it did not meet the speaker’s earlier expectations. For example,
in the third episode (‘boy and lobster’), the main character’s true objective did not
become evident until the 7th picture, in which the boy opens the bag he helps the
old lady carry. Every subject described the event and many commented:
(23) The boy didn’t really want to help the lady, he was just too curious.
(EO9)
(24) ta hen xiang zhi.dao bao li cang.zhe she.me dong.xi, suoyi ta cai
he very want know bag in hide
what thing
so
he just
yao bang.mang
offer help
“He was curious about what’s in the bag. That was why he offered help.”
(CO13)
Our narrative data have given further support to the linear and hierarchical organization of discourse, demonstrating how subjects link micropropositions to one
another at a local level, and at the same time relate them to the global macroproposition of an episode. In general, subjects used cues from the picture sequence to
maintain local coherence, but more importantly, it is their extensive background
knowledge, along with the picture information, that enabled them to achieve
global coherence, i.e., to infer goals and plans, and use them to explain actions
and events throughout discourse. These inferences hold over large distances in a

JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.12 (685-770)
 Ming-Ming Pu
network and are made regardless of local coherence being possible (Trabasso, Suh,
& Payton 1995: 212).
In the recall task, on the other hand, subjects used the same strategies in
achieving temporal, spatial and thematic coherence, but were more organized and
concise in their recall since they had already had the settings, plans, actions and
goals of the episodes in mind and were freer in their choice of picture/event description. The following passages are exemplary of the orally recalled episode that
follows the main story-line and describes only the major events.
(25) The next episode has to do with the boy attempting to swat a fly. He is on a chair
trying to swat a fly. He leaps off the chair to get the fly and swats the newspapers
on his dad instead. His dad sits up, looks around, and loses the newspapers on the
floor. The boy searches through the papers to find the fly swatter and once again
goes after the fly.
(ERS5)
jia
le. zai
lu.shang kanjian yige
(26) tian wan le, xiao nan.hai hui
it
dark little boy
return home at street see
an
old
lao taitai lin.zhe liang.ge hen chen.de daizi. xiao nan.hai hen xiang
lady
carry two
very heavy bag little boy
very want
zhidao bao.li shi sheme jiu pao guo.qu yao bang lao tai.tai ti
know bag in is what just run over want help old lady carry
dai.zi. lao tai.tai hen gan.dong jiu ba yige dai.zi gei ta bei
bag
old lady very touched just om a
bag give him carry
dang lao tai.tai zhuan.shen jin.ru yi.ge xiao xiang.zi, xiao.hai
when old lady turn
enter a
small alley
little-kid
gan.jin
dun.xia.lai, ba dai.zi da.kai. mei xiang.dao dai.zi li
hurriedly squat down om bag open not expect
bag in
pa.chu
yi.zhi da long.xia, yao.le ta yi.kou. ta teng.de zhi.du,
climb.out a
big lobster bite him a.bite he hurt
cry
dan.ye zhi.hao wu.ke.nai.he.di geng.zai lao tai.tai houmian hui
but
have.to helpless
follow old woman behind return
jia.le
home
“It was late and the little boy went home. (He) saw an old lady on the street,
carrying two heavy bags. The boy very much wanted to know what’s inside the
bag, so (he) ran over to help the old lady. The old lady was touched and gave
him a bag to carry. When the old lady turned into an alley, the kid hurriedly
squatted down and opened the bag. Out climbed a big lobster unexpectedly
and bit him. He was hurt and crying, and helplessly followed the old lady
home.”
(CRS2)
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.13 (770-810)
Discourse organization and coherence
Reference tracking
Another implicit way of establishing and maintaining coherence is reference tracking, which subjects managed in consistent ways. It has long been noted that there is
a correlation between the cognitive status of a referent and the linguistic form encoding the referent. Researchers have demonstrated that forms that signal the most
restrictive cognitive status (in high focus) are always those with less or least phonetic content, namely unstressed pronouns, clitics, and zero pronominals (Chafe
1987; Givón 1989; Gundel, Hedberg, & Zacharski 1993; Pu 1995; Tomlin & Pu
1991). Indeed, our narrative data show that once the central character, ‘the little
boy’, was established at the beginning of the storytelling, it was very frequently encoded by pronominals (e.g., lexical and zero pronouns) throughout the remainder
of the narrative because it was the focus of attention of subjects in their description tasks. The supporting character (i.e., the old lady, the man, and the little girl,
respectively), on the other hand, has to reside mostly outside of subjects’ focus of
attention due to the limited capacity of focal attention (Just & Carpenter 1992;
Gathercold & Baddeley 1993; Gundel 1998), and therefore frequently referred to
by full NPs. Examples (25) and (26) above are taken from the spoken recall data,
where both subjects systematically pronominalized the central character and nominalized the secondary character within the episode, even though the secondary
character (i.e., ‘his dad’ in (25) and ‘the old lady’ in (26)) was just mentioned in
the preceding sentence. Examples (27) and (28) below are taken from the on-line
task and the written recall respectively, which reveal the same patterns of reference
tracking in the narrative.
(27) A little boy is walking on the street. He meets an old lady carrying some bags. He
asks the lady what’s in the bags, and the lady gives him one of the bags. The lady
walks off and he’s holding the bag. . . .
(EO11)
(28) ta zhai.zai yi.ge yi.zi shang da cang.ying. cang.ying fei wang
he stand-on a
chair on
swat fly
fly
fly toward
sha.fa shang de yi.dui bao.zhi
shang. ta hui.pai
da quo.qu,
sofa on
a-pile newspaper on
he raise-swatter hit over
que bu.liao
jin.xin.le bao.zhi
di.xia tang.zhe.de yi.ge nanren zhe
but not-expect awake
newspaper under lie
a
man
this
nanren zheng tang zai sha.fa hang shui.jiao, bei ta da.xin.le, hen
man just lie on sofa on sleep
by him hit-awake very
sheng.qi. nan.ren yi.xia.zi zuo.qi.lai, . . .
angry
man
suddenly sit.up
“He was standing on a chair to swat the fly. The fly flew toward a pile of newspapers on the couch. He raised the swatter to hit it, only to wake a man who
was lying under the newspapers. The man was sleeping on the couch and was
(CRW2)
very angry when woke up by him. The man sat up suddenly . . . ”

JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.14 (810-916)
 Ming-Ming Pu
Table 1. Reference-tracking results
English
NP
PN
Zero
Total
Chinese
Central
N
%
Secondary
N
%
Central
N
%
N
246
494
191
931
297
116
40
453
239
285
347
871
313
29
58
400
26.42
53.06
20.52
65.56
25.61
8.83
27.44
32.72
39.84
Secondary
%
78.25
7.25
14.50
Total
1095
924
636
2655
Table 2. Boundary results for all tasks
NP
English
Chinese
Total
50
51
101
On-line task
PN
Zero
10
9
19
0
0
0
NP
29
28
57
Oral Recall
PN
Zero
1
2
3
0
0
0
Written Recall
NP
PN
Zero
Total
28
30
58
120
120
240
2
0
2
0
0
0
Table 1 indicates the results of anaphor use in tracking reference in our narrative
data, in which the tokens of full NPs (=NP), lexical pronouns (=PN), and zero
anaphors (=Zero) and their respective distribution rates are calculated with regard
to the central and secondary character.
Table 1 shows the distinct patterns of tracking characters in narrating the
story: subjects focused their attention on the central character throughout the narrative (the anaphoric tokens for the central character are twice as many as those for
the secondary characters), and consistently used less explicit coding forms to refer
to it due to its restrictive or privileged cognitive status. On average, lexical and zero
pronouns account for about 73% of all anaphors referring to the main character,
whereas these reduced forms account for only about 28% of all references made to
the supporting characters.
Also of interest in the reference management is the ‘boundary effect’, which
accounts for the relatively higher rate of full NPs referring to the central character
(about 27% on average), as discussed briefly in the section of Episodic Structure.
Although the central character of the story remains the same throughout the three
episodes, subjects would nonetheless use a full NP (e.g., a definite or demonstrative NP, or a repeated proper name) to reinstate the referent at the beginning of
a new episode, regardless of its referential distance (i.e., the number of clauses
between the current and the last mention of the referent; see Givón 1987). The
boundary results are presented in Table 2, where the alternative anaphoric forms
used at the beginning of an episode for the first mention of the central character
are listed for each of the narrative tasks.
The boundary effect is found to be very strong in our narrative study. In the
on-line task, the majority of subjects (13 in Chinese and 14 in English groups)
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.15 (916-966)
Discourse organization and coherence 
used a full NP for the first mention of the central character in each episode. In the
recall task, be it oral or written, when subjects had established the three episodes
in their mental representations, the boundary effect is shown to be even stronger:
the overwhelming majority (19 in Chinese and 18 in English) consistently used
a full NP to reinstate the central character at the beginning of each of the three
episodes. The boundary effect is again a manifestation of our cognitive constraints
and activities underlying the pronominalization process. Within an episode when
speakers’ attention sustains, a referent that has been focused on (e.g., the central
character) can keep its cognitively privileged status of being most accessible and
identifiable, and speakers would use a less explicit anaphor to code the referent.
However, between episodes when speakers’ attention shifts, a referent that has been
focused on would lose its privileged activation status due to the change in the
memorial and attentional process and becomes less accessible, at which juncture
speakers would use an explicit anaphor to reactivate the referent.
Not only is speakers’ referential choice governed by their own cognitive activities, but it is also based partially on their assessment of the hearers’ cognitive
status with respect to a particular referent in order to facilitate comprehension.
Speakers would use pronominals for the central character within an episode to
maintain referential coherence and to keep listeners focused on the same character. At the beginning of a new episode, however, they would facilitate listeners’
shifting process by using a self-defining NP for the quick and easy reactivation of
the same referent because shifting is cognitively more costly than mapping, and
thus comprehenders have more difficulty accessing information that occur after a
unit boundary than within a boundary (Gernsbacher 1990).
Topic continuity
Closely related to the strategy of reference tracking is the establishment of topic
continuity, another important means of maintaining local discourse coherence.
Topic continuity is best embodied in a topic chain that consists of several clauses
over a span of discourse, within which each clause is understood as being about the
same topic (Li & Thompson 1979: 33). In a topic chain, the topic is set up in the
first clause and typically left unspecified (i.e., with zero anaphora) in subsequent
clauses because its referent is most accessible, identifiable and recoverable from
discourse context.
It has been argued that topic chains are largely responsible for the prevalence
of zero anaphora in Chinese discourse. Indeed, our recall data show that topic
chains are a common device used by Chinese speakers in their coding of events
within an episode, where a topic persists over a span of discourse. For example,
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.16 (966-1017)
 Ming-Ming Pu
(29) ta zhan.zai yi.zi.shang, ju.zhe
cang.yin pai,
kan.jian cang.ying luo
he stand
in
chair-on raise
swatter see
fly
fall
zai bao.zhi
shang, jiu hao.bu.yu.yu.di pai.le xia.qu, que bu.liao
at newspaper on
just not.hesitate
swat down but not-expect
pai.zai yi.ge ren shen.shang, . . .
hit-at a
man body
“He stood on a chair, Ø poised his flyswatter, Ø saw the fly fall onto a pile of
newspapers, and Ø swung the flyswatter down without hesitation, but Ø hit a
man instead . . . ”
(CRS15)
(30) ta zou shang.qian, re.xin.di yao bang lao nai.nai na
na.ge xiao
he walk forward
eagerly offer help old granny carry that small
bao, jie.guo cheng nai.nai guai.wan
shi tou.tou da.kai kou.dai,
bag end-up as
granny turn-corner time stealthily open bag
jie.guo fa.xian shi yi.dai pang.xie
end-up find
is a-bag crab
“He steps forward, Ø eagerly offers to help the old granny carry a small bag,
as the old granny turns the corner, Ø secretly opens the bag, and Ø finds a bag
full of crabs.”
(CRS9)
Passage (29) describes, in a topic chain, an action sequence of the boy swatting a
fly. The topic chain is all about the topic, the boy. Once the topic is established in
the first clause of the action sequence, it is encoded by a zero anaphor in the remainder of the sequence. In fact, topic chains can be formed in Chinese discourse
regardless of whether there is intervening material between two clauses containing the topic, and regardless of whether there is another discourse entity that may
cause referential ambiguity. Passage (30) above is another excerpt taken from the
Chinese oral recall data. The topic is again ‘the boy’. Although there is an intervening clause in the middle of this event sequence, i.e., ‘as the old granny doesn’t
pay attention,’ the topic chain resumes after the preceding clause mentioning a
referent other than the topic. Moreover, there are two characters described in the
event sequence, but the chain of zero anaphora refers unambiguously to the topic
even though the last two zero anaphors could syntactically be coreferential with
the secondary character, ‘the old lady’.
It is not surprising that Chinese speakers used topic chains to keep the story
flowing within an episode because Chinese is considered a discourse-oriented and
topic-prominent language. English, in contrast, is regarded as a subject-oriented
language, where an explicit subject is usually required. Nevertheless, the use of
topic-chains is not uncommon in English recalls. When the topic persists over a
span of discourse and can be easily identified, English speakers leave it unspecified,
as do Chinese speakers. For example,
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.17 (1017-1080)
Discourse organization and coherence
(31) Then the curiosity of the boy got the better of him and he stopped. He opened up
the bag, but Ø was bitten by this big lobster that jumped out of the bag. He was
scared, Ø cried a little bit, Ø picked up the bag, and Ø followed his mother back
into the house.
(ERS12)
The excerpt is taken from the English spoken recall data, which describes the climax and ending of the third episode (Boy and Lobster). Within the episode, the
central character, ‘the boy’ is repeatedly referred to by either a lexical pronoun or
a zero anaphor because the referent is continuous and is already activated in the
preceding clause. Nevertheless, example (31) shows that even zero anaphora and
lexical pronouns are not used indistinguishably in English discourse. In this passage, the climax of the episode is described in two clauses (i.e., ‘he opened up the
bag, but Ø was bitten . . .’), the first of which creates some kind of suspense and
the second reveals the unexpected outcome. If a lexical pronoun had been used to
refer to ‘the boy’ in the second clause, the tight cause-result sequence would have
been broken, and the continuity of the climax lost. Similarly, the next four clauses
describing the ending of the episode are chained by zero anaphora, illustrating an
action sequence of the topic, viz., the boy. Much like Chinese, such topic chains
indicate maximum coherence within an episode, which are used to encode action
or event sequences of the same topic, among other things. However, when such
maximum coherence is disrupted in an episode, the topic chain would end. In the
above example, there is a transition or minor thematic gap between the climax
and the ending of the episode (i.e., ‘he was scared, Ø cried a little bit . . .’), where
the speaker used a lexical pronoun to end the last topic chain and starts the next
one. The alternative use of lexical versus zero pronouns are further exemplified in
(32) below.
(32) The boy looks fairly upset. He starts to try to straighten the newspapers, but he
(uh), kind of gives up, Ø gathers them together, Ø throws them toward the man,
and Ø continues to pursue the fly with the flyswatter.
(ERS15)
This passage depicts the boy’s persistent pursuit of the fly in the second episode
(Boy and Fly), where the first clause describes how ‘the boy’ looks, and then a series of clauses are used to describe what he does (i.e., ‘he starts to try to straighten
the newspapers , but he . . .’). Although the series of clauses are about the same
topic, there is a minor thematic gap between the first and the rest of the clauses,
namely, the boy’s attempt to straighten the newspapers is in conflict with his purpose to swat the fly. Hence the topic chain does not start at the first clause of the
series, but after the occurrence of the minor discontinuity. Whereas zero anaphora
is employed in the topic chain to describe the boy’s continued effort to pursue his
goal, a lexical pronoun is used (‘but he (uh), kind of gives up . . .) at the juncture
of the minor thematic gap.

JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.18 (1080-1157)
 Ming-Ming Pu
The correlation between minor thematic gap and the use of lexical pronoun
within an episode is further evidenced in our on-line data. In the on-line description task, minor discontinuity exist not so much in the picture sequence of an
episode as in subjects’ mental representations because the page-turning itself creates a gap between the description of the last pair of pictures and that of the next
pair, at which point subjects did not know what to expect but had to get prepared
to quickly comprehend and connect the incoming information with the previously
presented information in order to tell a coherent story. Therefore after turning
a page when the description continues, speakers would use a lexical pronoun to
resume the central character even though the new pairs of pictures continue to
depict the same action or event sequence of the same referent. For example,
(33) The old lady walks off, and he’s holding the bag. And he looks into the bag, he
looks interested. (laugh . . . ) A crab comes out and bites him on the hand. The
(EO13)
boy looks pretty upset, and he follows the old lady home.
(34) ranhou ta ba bao.zhi ren
xiang ta ba, ta kan.jian cang.ying
then
he om paper throw to
his dad he see
fly from
cong bao.zhi li fei chu.lai, ta you qu zhui cang.ying le
from paper in fly out
he again go chase fly
“He then threw the paper at his dad, he saw the fly fly out from inside the
papers, and he ran after the fly again.”
(CO11)
(35) The boy goes through all the papers, looking for something, maybe the fly. The
boy throws the newspaper onto the man on the couch. He then finds, . . . spots the
fly, and he continues chasing the fly.
(EO9)
(36) ta gou bu.zhao cang.ying, suoyi ta cong yi.zi shang tiao xia.lai
he reach not
fly
so
he from chair on
jump down
da, dan ta que
yi.pai.zi pai.zai bao.zhi shang
swat but he instead a-swatter hit-at paper on
“He can’t reach the fly, so he jumps off the chair, but he swats the newspapers
instead.”
(CO17)
The passages (33)–(36) describe the same scenes from the second and third
episodes as do passages (29)–(32). However, the latter frequently employs topic
chains, while the same topic is realized in the former by a more explicit anaphora
in each of the clauses. As I have explained, the page-turning imposes a minor gap
in our sustained attention in the description task, and an attention gap in mind results in a thematic discontinuity in text. The operation of topic continuity in both
English and Chinese narrative production reflects a general cognitive principle in
language processing: “Expand only as much energy on a task as is required for
its performance” (Givón 1983: 18). In other words, the least explicit anaphora assumes the most thematic or topic continuity, while more explicit anaphora bridges
thematic gaps or signals thematic discontinuity of various degrees.
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.19 (1157-1216)
Discourse organization and coherence 
. Conclusion
The present study has demonstrated, with data taken from a narrative study, that
stories produced in different forms and languages are strikingly similar with regard
to their structural organization, coherence building, and event coding. Speakers
and writers are largely responded to the episode boundary information, and generally organize their narratives into separate yet interrelated episodes. Within an
episode when the incoming information is mapped onto the previously presented
information, speakers sought and achieved local and global coherence through
establishing story frame, focusing on the central character, systematically tracking references, and maintaining topic continuity. Between episodes when speakers
(and listeners also) shift from actively building one structure to start another,
they would try to mark the episode boundary. The boundary effect not only reflects speakers’ mental representations of episodes in discourse production, but
also serves to signal to their addressee the advent of such a boundary in order to
facilitate comprehension. In general, discourse organization and coherence establishment seems to be a systematic and even automatic process, which is governed
by our underlying cognitive activities and driven by our subconscious attempt to
enable our addressee to establish mental representations congruent with our own
in discourse processing.
References
Anderson, A., S. Garrod, & A. Sanford (1983). The accessibility of pronominal antecedents as
a function of episode shifts in narrative text. Quarterly journal of experimental psychology,
35A, 427–440.
Baggett, P. (1979). Structural equivalent stories in movie and text and the effect of the medium
on recall. Journal of verbal learning and verbal behavior, 18, 333–356.
Black, J. B. & G. H. Bower (1979). Episodes as chunks in narrative memory. Journal of verbal
learning and verbal behavior, 18, 109–118.
Chafe, Wallace (1992). The flow of ideas in a sample of written language. In W. C. Mann & S.
A. Thompson (Eds.), Discourse description: Diverse linguistic analysis of a fund-raising text
(pp. 268–294). Amsterdam: John Benjamims.
Chafe, Wallace (1994). Discourse, consciousness, and time: The flow and displacement of conscious
experience in speaking and writing. Chicago: The University of Chicago Press.
Fox, B. A. (1987). Anaphora in popular written English narratives. In R. S. Tomlin (Ed.),
Coherence and grounding in discourse (pp. 121–167). Amsterdam: John Benjamins.
Garrod, S. C. & A. J. Sanford (1988). Thematic subjecthood and cognitive constrains on
discourse structure. Journal of pragmatics, 12, 57–72.
Gathercold, S. E. & A. D. Baddeley (1993). Working memory and language. Hillsdale, NJ:
Lawrence Erlbaum.
Gernsbacher, M. A. (1990). Language comprehension as structure building. Hillsdale: Erlbaum.
JB[v.20020404] Prn:16/03/2006; 16:24
F: HCP1515.tex / p.20 (1216-1337)
 Ming-Ming Pu
Givón, T. (1983). Topic continuity and word order pragmatics in Ute. In T. Givón (Ed.), Topic
continuity in discourse: Quantitative cross-language studies (pp. 343–363). Amsterdam: John
Benjamins.
Givón, T. (1987). Beyond foreground and background. In R. Tomlin (Ed.), Coherence and
grounding in discourse (pp. 173–188). Amsterdam: John Benjamins.
Givón, T. (1989). Mind, code and context: Essays in pragmatics. New Jersey: Erlbaum.
Givón, T. (1993). English Grammar: A Function-based Introduction, Vol. I & II. Amsterdam: John
Benjamins.
Givón, T. (1995). Coherence in text vs. coherence in mind. In A. M. Gernsbacher & T. Givón
(Eds.), Coherence in spontaneous text: Typological studies in language 31 (pp. 59–115).
Amsterdam: John Benjamins.
Guindon, R. & W. Kintsch (1982). Priming macrostructures. Technical report. Colorado:
University of Colorado.
Gundel, J. K. (1998). Centering Theory and the Givenness Hierarchy: Towards a Synthesis. In
W. Walker, A. Joshi, & E. Prince (Eds.), Centering theory in discourse (pp. 183–198). Oxford:
Clarendon Press.
Gundel, J., N. Hedberg, & R. Zacharski (1993). Cognitive status and the form of referring
expressions in discourse. Language, 69, 274–307.
Haberlandt, K., C. Berian, & J. Sandson (1980). The episode schema in story processing. Journal
of verbal learning and verbal behavior, 19, 635–651.
Just, M. A. & P. A. Carpenter (1992). A capacity theory of comprehension: Individual differences
in working memory. Psychological Review, 99(1), 122–149.
Kintsch, W. (1995). How readers construct situation models for stories. In A. M. Gernsbacher
& T. Givón (Eds.), Coherence in spontaneous text: Typological studies in language 31 (pp.
139–160). Amsterdam: John Benjamins.
Krahn, F. (1981). Here comes Alex Pumpernickel! Boston: Little, Brown & Co.
Lichtenberk, F. (1996). Patterns of anaphora in To’aba’ita narrative discourse. In B. Fox (Ed.),
Studies in anaphora: Typological studies in language 33 (pp. 379–411). Amsterdam: John
Benjamins.
Li, Charles N. & S. A. Thompson (1979). Third person pronouns and zero-pronouns in Chinese
discourse. In T. Givón (Ed.), Discourse and syntax (pp. 311–335). New York: Academic
Press.
Pu, Ming-Ming (1995). Anaphoric patterning in English and Mandarin narrative production.
Discourse processes, 19(2), 279–300.
Schank, Roger C. & R. P. Abelson (1977). Scripts, plans, goals and understanding. Hillsdale, NJ:
Erlbaum.
Tomlin, Russell S. (1987). Linguistic reflections on cognitive events. In R. S. Tomlin (Ed.),
Coherence and grounding in discourse: Outcome of a symposium (pp. 455–479). Amsterdam:
Benjamins.
Tomlin, R. S. & M. M. Pu (1991). The management of reference in Mandarin discourse.
Cognitive linguistics, 2(1), 65–93.
Trabasso, Tom, Soyoung Suh, & Paula Payton (1995). Explanatory coherence in understanding
and talking about events. In A. M. Gernsbacher & T. Givón (Eds.), Coherence in spontaneous
text: Typological studies in language 31 (pp. 189–214). John Benjamins: Amsterdam.
van Dijk, T. & W. Kintsch (1978). Cognitive psychology and discourse: retelling and
summarizing stories. In W. U. Dressler (Ed.), Current trends in text linguistics (pp. 61–81).
Berlin and New York: Mouton de Gruyter.
van Dijk, T. & W. Kintsch (1983). Strategies in discourse comprehension. NY: Academic Press.
JB[v.20020404] Prn:21/04/2006; 9:24
F: HCP15NI.tex / p.1 (48-198)
Name index
A
Achard, Michel , , 
Allan, Scott , 
Amberber, Mengistu 
Ameka, Felix 
Anderson, A. , 
Aoki, Haruo , 
Ariel, Mira 
Arin, Dorothea Neal , , ,

Aristotle , 
Aslin, R. 
Atalay, Besir , 
Athanasiadou, A. 
B
Backhouse, A. E. 
Baddeley, Alan , 
Baggett, P. 
Bakema, Peter 
Baker, C. 
Ball, T. M. , 
Banfield, Ann 
Bardon, Geoff 
Barnlund, Dean 
Barsalou, Larry W. 
Basso, Keith , 
Bates, Elizabeth A. , , ,

Bauer, Laurie 
Bell, Allan , –, 
Bensch, P. A. 
Benveniste, Emile 
Berian, C. 
Bernárdez, Enrique 
Bever, T. 
Black, J. B. 
Blumstein, Sheila E. 
Boomer, David S. 
Boroditsky, L. 
Bower, G. H. 
Bowerman, Melissa , , ,
–, 
Bownds, M. D. 
Brown, G. , , 
Brown, Roger , , 
Brugman, Claudia , , ,

Buck, Carl D. 
Bugenhagen, Robert D. 
Burgess, C. –, , 
C
Carlson, R. 
Carpenter, K. , 
Carroll, David W. , , 
Carroll, Pat , , 
Casad, Eugene 
Chafe, Wallace , , –,
, 
Chalkley, M. 
Chao, Y.-R. 
Chappell, Hilary 
Chater, N. 
Chomsky, Noam 
Church, Kenneth W. , , ,
, –, 
Cienki, Alan , 
Clark, Eve , , , , , ,
, , , 
Clark, Herbert H. , , , ,
, , , , , 
Cocude, M. 
Coleman, Linda 
Colston, H. 
Comrie, Bernard 
Contini-Morava, Ellen , ,
, 
Cooper, L. A. 
Coulson, Seana , , , , ,
, , 
Creider, Chet 
Croft, William , , , 
Cruse, D. A. , 
Cutler, Anne 
D
Davey, A. S. , , 
Davidse, Kristin –, ,

Dechert, Herbert W. , 
DeLancey, Scott , 
Denis, M. 
Deutsch, W. 
Deverson, Tony 
Dirven, René , 
Dixon, Robert M. W. , 
Doi, Takeo 
Du Plessis, J. A. , 
Duranti, Alessandro , 
E
Elman, Jeffrey, L. , , , ,
, , , , 
Emanatian, Michele 
Enfield, Nick J. , 
Erman, Britt 
Evans, Zoe 
Eysenck, M. W. 
F
Fauconnier, Gilles , , , ,

Feld, Steven 
Fillmore, Charles J. , , 
Finch, S. 
Fodor, Jerry 
Forceville, Charles 
Fortune, G. 
Fox, Barbara A. , 
Friedrich, Paul 
G
Ganong, William F. 
Garrod, S. C. , 
Gaskell, M. 
Geeraerts, Dirk , , , ,

JB[v.20020404] Prn:21/04/2006; 9:24
F: HCP15NI.tex / p.2 (198-338)
 Name index
Gernsbacher, M. A. , ,
, , 
Gibbs, Raymond W. Jr. , ,
, , 
Givón, Talmy , , , ,
, , , , , 
Glenberg, A. M. , , 
Goddard, Cliff , , , ,
, , –
Goldberg, Adele , 
Gordon, Elizabeth , , 
Gough, Dave , , , 
Grice, Paul 
Grondelaers, Stefan , 
Guindon, R. , 
Gundel, J. 
Guthrie, Malcolm –
H
Haaften, Ton van 
Haberlandt, K. 
Halliday, M. A. K. , 
Hannan, M. 
Harkins, Jean 
Hart, B. 
Hasada, Rie 
Haspelmath, Martin 
Hawkins, Bruce 
Hebb, Donald 
Hedberg, N. 
Herskovits, Anna 
Hinton, Geoffrey 
Hoenkamp, Edward , ,
–, 
Holland, Dorothy 
Holmes, Janet , –, 
Hopper, R. J. 
Hutchins, Edwin 
Hymes, Dell 
I
Ibarretxe-Antuñano, Iraide ,
, , , , 
Ikegami, Yoshihiko 
Iwasaki, Shoichi , , 
J
Jackendoff, Ray , , ,
–, 
Janda, Laura 
Johnson, Mark H. , , , ,
, , , , , , ,
, , , , , 
Johnson-Laird, P. N. , , ,
, , , , , , ,
, , , , , , ,

Junker, Marie-Odile 
K
Karmiloff-Smith, Annette ,
, 
Kay, Paul , 
Kempen, Gerard , ,
–, 
Kempson, Ruth 
Kendon, Adam 
Kennedy, J. M. 
Kerzel, D 
Khoali, B. T. 
Kintsch, W. , , , 
Kitto, Catherine 
Klahr, D. , 
Klatt, Dennis H. , 
Kohonen, T. 
Kokuritsu Kokugo Kenkyûjo

Kornacki, Pawel 
Kosslyn, S. M. , 
Kövecses, Zoltán , , ,
, , 
Krahn, F. 
Krauss, R. M. 
Kuczaj, S. 
Kuipers, Joel C. 
Kuno, Susumu , , , 
Kurath, Hans 
Kuroda, S.-Y. , 
Kurtböke, N. Petek , 
L
Lachter, J. 
Lakoff, George , –, –,
, , , , , , , ,
, , , , , , ,
, , , , , , ,
, , , 
Lambrecht, Knud 
Langacker, Ronald W. , ,
–, , , , –, ,
, , , , , , ,
–, , , , , ,
, , , , , , 
Leakey, Louis S. B. , 
Lebra, Takie Sugiyama 
Lee, P. U. , , 
Lehrer, Adrienne , 
Leinbach, J. , 
Lemmens, Maarten , , ,
, , , , , ,

Levelt, Willem J. M. –,
, 
Levin, Beth , 
Li, Charles N. , , , , ,
, , , –, –,

Li, Ping , , , , , ,
, , –, –, 
Liddle, Scott 
Lipka, Leonhard 
Longacre, R. 
Lucas, Margery M. 
Luchjenbroers, June , , , ,
, , , , , , , ,

Lund, K. –, , 
M
Maclagan, Margaret A. , 
MacWhinney, Brian , , ,
, , , , , –,
, , 
Mandler, Jean M. 
Mann, V. A. , 
Maratsos, M. 
Marchand, H. 
Marchman, Virginia A. 
Marslen-Wilson, William D.
, , , 
Matlock, Teenie , , , , ,
, 
Matsumoto, Yo , 
McCawley, James D. 
McClelland, James L. , ,
, 
McCloud, S. 
McNeill, David , 
Metzler, J. 
Miikkulainen, R , 
Miller, G. A. 
Moiseeva, Nadezda 
Moliner, María 
Munn, Nancy D. 
Mühlhäusler, Peter 
Mylne, Tom , , , 
JB[v.20020404] Prn:21/04/2006; 9:24
F: HCP15NI.tex / p.3 (338-480)
Name index 
N
Newmeyer, F. J. 
Newport, E. 
Niemeier, Susanne 
Nishimura, Yoshiki 
Nolan, Francis J , 
Noordman, Leo 
Nooteboom, Sieb 
O
Olbrechts-Tyteca, L. 
Onishi, Masayuki 
P
Palmer, Gary , –, , –,
, , , , , –, , ,
, 
Pardoen, Justine A. 
Parisi, D. , 
Parker, Simon 
Payton, Paula 
Pederson, E. , 
Peeters, Bert , 
Perelman, C. 
Pinker, Steven , , 
Plunkett, Kim , 
Prince, Alan 
Pu, Ming-Ming –, , ,

Pullum, Geoffrey K. 
Pustejovsky, James 
Pylyshyn, Zenon 
R
Radden, G. 
Rader, Russell , , 
Raupauch, Marius , 
Redington, M. 
Reiser, B. J. 
Repp, Bruno H. , 
Rice, S. , 
Risley, T. 
Rumelhart, David , , ,

Ryder, Mary-Ellen 
S
Saffran, J. 
Sanders, G. , 
Sanders, Ted , 
Sandson, J. 
Sanford, A. J. , 
Sansò, A. 
Saussure, Ferdinand de 
Schank, Roger C. 
Schilperoord, Joost , , ,
, , , , 
Schmid, H.-J. 
Schvaneveldt, Roger W. 
Scollon, Ron 
Seidenberg, Mark 
Shepard, R. N. 
Shirai, Y. 
Silverstein, Michael , 
Skinner, Debra 
Slobin, Dan I. , , 
Smith, Carlota S. 
Spitulnik, Debra A. , , , 
Spitzer, M. , , 
Spooren, Wilbert , 
Stanwood, Ryo E. 
Starks, Donna , 
Stevens, Kenneth N. 
Stubbs, Michael 
Suh, Soyoung 
Sweetser, Eve , , , , ,
, , , , 
Swinney, David A. 
T
Tabakowska, E. 
Talmy, Leonard , , , , ,
, , , , 
Tanenhaus, Michael K. 
Thompson, Sandra A. , ,
, 
Tomasello, Michael 
Tomlin, Russell S. , , ,
, 
Trabasso, Tom 
Traugot, Elizabeth Closs 
Travis, Catherine , , ,
, , , 
Trilling, Lionel 
Turner, Mark , , , , ,
, , 
Turner, Robin , , , , ,
, , 
Tversky, B. , , , 
Tyler, Lorraine K. , 
U
Uehara, Satoshi , , –,
, , –
Ungerer, F 
V
van Dijk, Teun A. , , 
Vandeloise, Claude 
Vendler, Z 
Verhagen, Arie , , , ,
–
W
Wald, B. 
Wallace, S. , –
Warren, Beatrice , , , ,
, 
Warren, Paul , , , , ,

Watson, Catherine I. , ,

Whorf, Benjamin Lee –,
, , , , 
Wierzbicka, Anna , , –,
, , –, , , 
Wilkins, David P. , 
Williams, R. 
Wong Scollon, Suzanne 
Woodman, Claudia , , ,
, , 
Y
Ye, Zhengdao 
Yoon, Kyung-Joo 
Z
Zacharski, R. 
JB[v.20020404] Prn:12/04/2006; 9:47
F: HCP15SI.tex / p.1 (48-178)
Subject index
A
accessibility , 
accessible (information) , ,
, , , 
acquisition , , , , –,
–, –, , , 
activation , , , , ,
, , , , , , 
actor , , –, , , , ,
, –, , 
addressee , , , , , ,
, 
affective, affected (case) , ,
, , , , , ,
–, , , , 
agent (case) , , , , –,
, , , , , , , ,
, , , 
agreement , , , , , ,

ambiguity , , , , ,
, , , 
animacy , 
argument , , , , , , ,
, , , , , , ,
, , , , , , 
association , , , , 
attention –, , , , , ,
, , , , , , ,
, , , , , ,
–, 
Australian , , , , , ,
, , , , , 
Austronesian , , , , , 
autonomous 
B
back-propagation , , 
background (v. foreground) ,
, , , , , , ,
, –, –,
–, , 
Bantu , , , , –, , ,
, , , 
Basque , , , , ,
–, , , ,
–
beats 
Blending Theory , , , 
(Conceptual Integration
Theory) , , 
completion , 
composition , , 
elaboration , , 
blends , , , , , 
‘bottom-up’ (processing) 
C
case , , , , , , , , ,
, , , , , , , , ,
, , , , , , , , ,
, , , , , , ,
, , –, –, ,
, , , , , , ,
, , , , ,
–, , , ,
–, , , , ,
, , , , , ,
–, , 
categorization , , , , ,

causality , 
causation , , , , 
causative , , , , 
external , , , ,
, , –, , ,
, , , , 
internal , , , , ,
, , , –, ,
–, , , –,
, 
non-causative 
causatives , , 
Chewa , 
Chinese , , , , , ,
, , –, –
cluster tree , 
cognition , , , , , , ,
, , , 
cognitive
domain , , , , ,
–, , , , , , ,
, , , , , ,
–, –, , ,
, 
factors , , , , , ,
, , , , , ,
, , –, , ,
, , , , 
Grammar –, , , , ,
–, , , , , ,
–, , , ,
–, , , , ,
, , , , , ,
, , , , 
model , , , , , ,
, , , , , , ,
, , –, ,
–, , , , ,
, , , , –,
, , , , ,
–, , 
Semantics –, , , , , ,
, , , , , , , ,
, , , , –,
, , , , , ,
, , , , , ,
, , 
status , , , , , ,
, , , , , ,
, , , , , ,
, , –, , ,
, –, , , ,
–
coherence , , , , ,
, –, , , 
cohesion 
JB[v.20020404] Prn:12/04/2006; 9:47
F: HCP15SI.tex / p.2 (178-312)
 Subject index
cohesion markers 
‘comfort zone’ , , –, 
completion , 
composition , , 
comprehension –, , , , ,
, , , , , , ,
, , , , , , 
concept , –, , , , ,
, , , , , , , ,
, , , , , , ,
, , , , , , ,
, , , 
conceptual –, , , , , –,
, , , , , –,
–, , , , , –,
, , , , –, ,
, , –, –,
, , , –, ,
, , , –, ,
–, –, , ,
, –, –, ,
, , , , , ,
, , , , 
blending , , –, –,
, 
dependency , , 
structure , , , , , , , ,
, , , , , , , ,
–, –, –, , ,
, , , , , , ,
, , , , –,
, , , , , ,
, , , , , ,
, , , , , ,
, , , , , ,
, , , –, ,
, , , , –,
–, , 
conceptual blends 
Conceptual Integration Theory
(Blending Theory) , , 
conceptualizer , , , ,
, 
configuration , , ,
–, , , 
connectionism , 
connexity , , , –,
, 
constraint , , , 
construal , , , , , , ,
–, , , , , ,
–, , , , ,
, , –, 
construction , , , , , ,
, , , , , , , ,
, –, –, , ,
, , , , –,
, , , , , ,
, 
context , , , , , , ,
, , , , , , , ,
, , , , , , ,
, , –, , , ,
, , , , , ,
, 
conversation –, , , ,
, 
cooperative 
corpus , , , , , , ,
, , , , , , ,
, , , , , , ,
, , , , , 
cross-linguistic , , , , ,
, , , , , , ,
, , , , , , 
cryptotype , –, –,
–
cue , , , , –, ,

cultural linguistics –, , 
culture , , , , , , , ,
, , , , , , , ,
, , , , , 
culture-specific , , , ,
, , , , , , 
D
decontextualized (image) 
default , , , , , ,
, , , , , 
deictic , , , –, ,
–
deixis 
dependence , , , 
discourse function 
discourse production , ,

discourse structure , , ,
, 
discourse unit 
discursives –
Dokean (framework) 
domain , , , , , –,
, , , , , , , ,
, , , , –,
–, , , , 
double subject 
durative , , –
Dutch , , , –, ,
–, , 
Dyirbal , , –, , 
E
effective constructions 
egocentric , 
elaboration , , 
emblem , 
embodiment , , 
emergent , , , , , , ,
, , –
emergent structure , 
emotions , , , , , ,
, , , , 
English , –, , , , , ,
, , , –, , –,
, , , , , –,
, , , –, , ,
–, , , , , ,
, –, –, , ,
–, , , , ,
, , , , , –,
, –, , , ,
, , , –, –
episode , –
episode boundary , , ,
, 
ergative , , , , –,
–
event , , , , , , , ,
, , , , , –,
–, –, , ,
, , , , ,
–, –, , , ,
–
experiencer , , –, , ,
, , –, , , 
experiential , , , , , ,
, , , , , ,
, , 
F
F-space , , , –
‘inside’ , , , , ,
–, , , , ,
, , 
‘outside’ , , , , , ,
–, –, , ,
, , , , , ,

JB[v.20020404] Prn:12/04/2006; 9:47
F: HCP15SI.tex / p.3 (312-446)
Subject index
feature overlap , , 
fictive (motion) , , , –,
–, –
figurative , –, , 
figure , –, , , , , ,
, , , , –, –,
, , –, –,
–, –, , , ,
–, , , 
focus (of attention) , , ,
–, , , , , , , ,
, , , , , , ,
, , , , , ,
, , , , , , 
force dynamics , , 
foreground (v. background)
, , , 
Formal, Formalist , , , ,
, , , , , , ,
, , , , , 
frame , , , , , , ,
–, , , , , ,
, –, 
function words , , ,
–, –, –,
–
functional , , , , , ,
, –, , , , ,
, , , , 
functorization –, ,
–, , 
G
gender , , 
generativist 
gesture –, , , , –,
, 
complex gestures , , 
deictic gestures 
simple gestures , 
given (information) , , ,
, , , , , –, , ,
, –, , , , ,
, , , , , , ,
, , , , , , ,
, , , , , ,
–, , , , ,
, , , , , 
goal (case) , , , , , ,
, –, , –, ,
, , , , , 
ground , , , , , , ,
, , , , 
grounding , , , , ,
, , , , , , 
H
hidden units 
homophony , , , ,

I
iconic , , , , , ,

Idealized Cognitive Model
(ICM) , , , 
ideology 
image schema (schemata) ,

Incremental Procedural
Grammar 
indexicals , , 
Indo-European , , 
inference , 
innateness 
input , , , , , , , ,
, , , , , ,
–, , , –,

data (information) –, ,
, , , –, , ,
, , , , , ,
–, , , , ,
, , , –, ,
, , , , , ,
, , , , , ,
, , , , , ,
, –
spaces , , , , , , ,
, , –, –
input spaces , 
instigator , , , 
integration , , , –, ,
, –, , , , 
interactive , , 
internal state –, , ,
–
L
Landmark (LM) , , ,

language acquisition , , ,
, –, , , , 
learning , , , –,
–, –, 
lexical co-occurrence –,
, 
lexicalization, lexicalisation ,
–, , , –, ,
, , 
lexically driven , , ,
–, , , , 
M
Mandarin (Chinese) , , ,

mapping , , , , , ,
–, , , , , ,
, , , 
markers , , , , ,
, , , , , ,
, , 
case , , , , , , , ,
, , , , , , , ,
, , , , , , , ,
, , , , , , ,
, , , , ,
–, –, , ,
, , , , , ,
, , , , ,
–, , , ,
–, , , , ,
, , , , , ,
–, , 
cohesion 
evidential , , , ,
, 
nominative , , ,
–, –, 
tense , , , , , ,
, , , , , ,

topic , , , , , , ,
, , , , , ,
, , , , ,
–
medium , , , , 
memory , , , , –,
, , , , , ,
–
mental spaces , , , , , ,
, , –
metalanguage , , , ,
, , , 
metaphor , , , , , ,
, , , , , –, ,
, , , , , , ,


JB[v.20020404] Prn:12/04/2006; 9:47
F: HCP15SI.tex / p.4 (446-579)
 Subject index
metaphorical scope , , 
model , , , , , , , ,
, , , , , , ,
–, , –, ,
, , , , , ,
, –, , , ,
, , –, , 
computational , , , , ,
, –, , , ,
, , , 
module , 
mood (participial) , ,
–
motion , , –, , , ,

mutual 
ground , , , , , ,
, , , , , 
information , , , –, ,
, , , , –, , ,
–, , , , –,
, , , , , ,
, , , , , ,
, , , –, ,
, , , , , ,
, , , , ,
–, , , –,
, , 
N
narrative –, , , –,
, , –, , , ,
, , 
network , , , , , , ,
–, , , , 
neural , , , , 
semantic –, , , , ,
, , , , , , , ,
, , , , , ,
–, –, –,
, , , , –,
, , , , –,
, , , , , ,
, , –, , ,
, , , , , ,
, , , –, ,
, , , 
neutralization , , , 
complete , , , , ,
, , , , , 
partial , , , , , ,
, , 
new (information) , , , , ,
, , , , , , , , ,
, , , , , , ,
, , , –, –,
, , , , , , ,
, , , , , –,
, , , , 
New Zealand English , ,
, , 
O
om , –, , , ,
, , , 
output , , –, 
layer , 
pattern , , , , , ,
, , , , , ,
, , , , , ,
–, 
P
Pakeha , 
pantomime 
paradigm , , , , –
parallel (processing) , , ,
, , , , , , 
path , , , , , , ,

patient , , , , , 
pattern completion 
pause patterns , , , ,
, , 
perception , , , , , ,
, , , , –,
, , , , 
performance 
perspective , , , , , ,
, , , , , , ,
, , , , , ,
, , , , –,
, , 
polysemy , , , , ,
, , , –, 
problem solving 
processing –, , , , , ,
, , , , , , ,
, , , , , ,
, –, , , ,
, , –, , ,

bottom-up 
language –, , , , –,
, , , , , –, ,
, , , , , –,
–, , , , , ,
, –, , –,
, , –, –,
, , , , –,
, , , –,
–, –, , ,
, , , , , ,
–, , , –,
, , , , , ,

phonetic , , –, ,
, –, , , 
top-down 
production , , , , , , ,
, –, –, ,
–, –, , , ,
, –, , , , 
projection 
partial , , , , , ,
, , 
prominence , 
proposition , , , ,
, , , , 
prototype , , –, , ,
, , , , , , ,
, 
prototype effects , , 
R
range (pseudo-goal) , , , ,
, , , , , , , ,
, , , , , , ,
, , , , , ,
, 
recall , , –, –
recognition , –, , ,
–, , 
recovery , , , , 
reference , , , , , ,
, , , , , , ,
, , , , , , ,
, , , , , ,
–
register 
relativism 
role , –, , , , , , ,
, , , , , –,
, , , , , ,
, , , , , ,
, , –, , ,
, , , , , ,
, , , , ,
JB[v.20020404] Prn:12/04/2006; 9:47
F: HCP15SI.tex / p.5 (579-695)
Subject index 
–, , –, ,
, , 
semantic –, , , , ,
, , , , , , , ,
, , , , , ,
–, –, –,
, , , , –,
, , , , –,
, , , , , ,
, , –, , ,
, , , , , ,
, , , –, ,
, , , 
syntactic , , , , ,
–, , , –,
, , 
rule , , , , , , ,
, , 
S
salience , , , , , , ,
, 
scenario , , , , , , ,
, , , –, , , , ,
, , , , , ,
–
schema , , , , , ,
, –, , , , ,
, , 
schematicity 
schematization , , 
scope , , , , , ,
, , , , , ,
, 
self organization
network –
map –
semantic
extension , , , , , ,
, , , , , , ,
, , , –,
–, –, , 
field , , , , , , , ,
, , , , , ,
, –, , 
structure , , , , , , , ,
, , , , , , , ,
–, –, –, , ,
, , , , , , ,
, , , , –,
, , , , , ,
, , , , , ,
, , , , , ,
, , , , , ,
, , , –, ,
, , , , –,
–, , 
semanic space , 
semantics –, , , , , , ,
, , , , , , ,
, , , , –,
, , , , , ,
, , , , , ,
, , 
cognitive –, , , –, ,
, , , , , , , ,
–, , , , , ,
, , , , ,
–, , –, ,
–, , , , ,
, , , , –,
, , , –, ,
, , , , , ,
–, , , ,
–, , –, ,

formal , , , , ,
, , , , , ,
, , , 
Shona , , , , –, , 
sign language 
social , , , , , , , ,
, , , , , , , , ,
, , , , , , ,
, 
space , , , , , , –,
, , , , , , –,
, , –, , ,
–, , , , ,
, , , , , ,
, , 
blended , , , , , ,

mental –, , , , , ,
, , , , , , , ,
–, , , , ,
, , , , , ,
, , , –, ,
, , , 
physical , , , , , ,
, , , , , , , ,
–, , , –, ,
, , , , ,
–, –
Spanish , , , , , ,
–, , , , ,

speaker –, , , , , , ,
, –, –, , , ,
–, , –, , ,
, , , , , ,
, , –, ,
–, , , , ,
, , , 
speech act , 
statistical learning , , ,
, 
structure building , , 
subjectivity , –, ,
, –
subjectivity scale 
subordinate mood type , 
surrogate 
symbolic , , , , , , ,
, , , , , , ,

T
Tagalog , , , , –, ,
, , 
temporal grounding 
tense , , , , , , ,
, , , , , 
conceptualization , , ,
, , , , –,
, , –, , ,

continuous , , , ,
, , , , 
thematic coherence 
theme , , 
theory , , , , , , , ,
, , , , , , , –,
, , , , , ,
–, , , , , ,
, , , , , 
thinking for speaking 
‘top-down’ (processing) 
topic , , , , , , , ,
, , , , , , ,
, , , –
topic continuity , , ,

training , , 
trajector , , , , , –,
, , , –, 
trajectory (TR) , , , ,

transitive , , , , , ,
–, –, , 
JB[v.20020404] Prn:12/04/2006; 9:47
F: HCP15SI.tex / p.6 (695-750)
 Subject index
Turkish , , , , , ,
, , –
turn-taking 
U
unit , , , , , , ,
, , , , , 
processing –, , , , ,
, , , , , , ,
, , , , , ,
, , –, , ,
, , , –, ,
, 
universal –, , , ,
, 
usage-based 
V
variation , , , , ,
, , , , 
verb , , , , , , , , ,
, –, , –, ,
–, –, , ,
, , , , , ,
–, , –, ,
–, , , ,
–, , , , ,

‘killing’ , , , , ,
–, , 
emotion , , –, , ,
, , , 
motion , , –, , ,
, 
processing –, , , , ,
, , , , , , ,
, , , , , ,
, , –, , ,
, , , –, ,
, 
viewpoint , , 
visual , , , , , –,
, , , , , , 
voice , , , , , , , ,
, , , , , , , 
W
weights , , –, , 
Whorf –, , , , ,

working memory , –,

X
Xhosa , , , , , ,
, 
In the series Human Cognitive Processing the following titles have been published thus far or are
scheduled for publication:
17 LANGLOTZ, Andreas: Idiomatic Creativity. A cognitive-linguistic model of idiom-representation and idiomvariation in English. 2006. xii, 326 pp.
16 TSUR, Reuven: ‘Kubla Khan’ – Poetic Structure, Hypnotic Quality and Cognitive Style. A study in mental, vocal
and critical performance. 2006. xii, 252 pp.
15 LUCHJENBROERS, June (ed.): Cognitive Linguistics Investigations. Across languages, fields and philosophical
boundaries. 2006. xiii, 334 pp.
14 ITKONEN, Esa: Analogy as Structure and Process. Approaches in linguistics, cognitive psychology and
philosophy of science. 2005. xiv, 249 pp.
13 PRANDI, Michele: The Building Blocks of Meaning. Ideas for a philosophical grammar. 2004. xviii, 521 pp.
12 EVANS, Vyvyan: The Structure of Time. Language, meaning and temporal cognition. 2004. x, 286 pp.
11 SHELLEY, Cameron: Multiple Analogies in Science and Philosophy. 2003. xvi, 168 pp.
10 SKOUSEN, Royal, Deryle LONSDALE and Dilworth B. PARKINSON (eds.): Analogical Modeling. An
exemplar-based approach to language. 2002. x, 417 pp.
9 GRAUMANN, Carl Friedrich and Werner KALLMEYER (eds.): Perspective and Perspectivation in Discourse.
2002. vi, 401 pp.
8 SANDERS, Ted J.M., Joost SCHILPEROORD and Wilbert SPOOREN (eds.): Text Representation. Linguistic
and psycholinguistic aspects. 2001. viii, 364 pp.
7 SCHLESINGER, Izchak M., Tamar KEREN-PORTNOY and Tamar PARUSH: The Structure of Arguments.
2001. xx, 264 pp.
6 FORTESCUE, Michael: Pattern and Process. A Whiteheadian perspective on linguistics. 2001. viii, 312 pp.
5 NUYTS, Jan: Epistemic Modality, Language, and Conceptualization. A cognitive-pragmatic perspective. 2001.
xx, 429 pp.
4 PANTHER, Klaus-Uwe and Günter RADDEN (eds.): Metonymy in Language and Thought. 1999. vii, 410 pp.
3 FUCHS, Catherine and Stéphane ROBERT (eds.): Language Diversity and Cognitive Representations. 1999.
x, 229 pp.
2 COOPER, David L.: Linguistic Attractors. The cognitive dynamics of language acquisition and change. 1999.
xv, 375 pp.
1 YU, Ning: The Contemporary Theory of Metaphor. A perspective from Chinese. 1998. x, 278 pp.