WORDS AND MEANINGS: ARBITRARINESS, LISTING AND LEXICALISATION László Kristó No one disputes the fact that linguistic signs are arbitrary. But it is often easier to discover a truth than to assign it to its correct place. The principle stated above is the organising principle for the whole of linguistics (....) The consequences which flow from this principle are innumerable. It is true that they do not all appear at first equally evident. One discovers them after many circuitous deviations, and so realises the fundamental importance of the principle. Ferdinand de Saussure1 What’s in a name? that which we call a rose, By any other word would smell as sweet William Shakespeare 0 Prologue This essay explores the nature of lexical storage in the mind. The ideas outlined in it are by no means mine; rather, I wish to sum up what I find the most interesting recent views on the subject. The paper is written primarily for the non-linguist. The reason for this is twofold. First, most essays in this volume are on literary topics, which is why the readership is more likely to be inclined towards literature rather than theoretical linguistics. Second, it is my firm belief that this topic is potentially of great interest to literary historians, critics and philologists — indeed, to anyone who works with language, even if not as a linguist. I have therefore attempted to formulate my points as informally as possible, so that readers without a special training in the relevant fields of linguistic science would have no difficulty in following the main trains of thought. On the other hand, I hope that linguists who do not specialise in this particular topic will also find this paper interesting. This paper is structured as follows. In Section 1, I summarise the main points about the basic concept of arbitrariness. Sections 2 and 4 deal with possible definitions of the elusive concept “word”. In Sections 3 and 5, I present a current view on where words come from for the speaker, i.e., what mental devices produce the words the speaker utters. Section 6 gives an overview of the widespread phenomenon of lexicalisation and shows that the model presented in previous sections can accommodate it quite conveniently. Finally, Section 7 concludes the discussion. 1 What we talk about when we talk about arbitrariness It is commonplace wisdom, taught in introductory linguistic courses, that the linguistic sign is arbitrary. A sign, as understood in semiotics, is the union of some physical form with a meaning. No physically existing entity is in itself a sign until a particular meaning is attached to it. The letter K, for example, is a sign because it has a meaning: it is a graphic form denoting the sound [k] in (most) Greek and Latin based alphabetic systems. For an illiterate person, or for someone who does not use a Greco-Latin writing system, K is not a sign but a mere graphic shape2. Signs which are used specifically for purposes of communication are of two kinds: symbols and icons. Symbols are arbitrary signs, in which the relationship between form and 1 2 Saussure (1983:68) Please note that “meaning” is used here in a rather wide sense, something like “function”. 2 meaning is purely conventional, a matter of sheer agreement, which must be learnt. Traffic lights are, for instance, symbols. Icons are non-arbitrary signs, in which there is a perceivable, logical or natural connection between form and meaning — in other words, the form suggests the meaning, and vice versa. A picture of a hamburger above the entrance to a fast food restaurant is a good example. The idea that the linguistic sign (roughly, the word; but see below) is a symbol has long been noted, but in modern linguistics, it was first clearly articulated and emphasised by the Swiss linguist Ferdinand de Saussure (1857—1913). To illustrate it, using Steven Pinker’s brilliant example (Pinker 1994), the sound string3 dog does not resemble the animal it denotes: it does not have four legs, it isn’t furry, it doesn’t bark or wag its tail, and so on. The sole reason why English speakers think of Canis familiaris when hearing the string dog is that in their childhood, they learnt to associate the concept with that string. The most striking confirmation of the conventional nature of the linguistic sign is that different languages have different signs (differing in sound or meaning or both): Canis familiaris is called sobáka in Russian, chien in French, perro in Spanish, Hund in German, pes in Czech, ci in Welsh, cão in Portuguese, for instance; Hungarian happens to have two words for it, viz. eb and kutya. These strings resemble neither each other nor the animal they denote. Furthermore, the denoted concepts are conventional, too: for instance, there is no English noun denoting the same concept as Hungarian asztal (covered in English partly by desk, partly by table, but the latter word also corresponds to Hungarian tábla/táblázat); on the other hand, no Hungarian verb means the same as English get. There are many other manifestations of arbitrariness, but I will spare the reader from further examples. Nevertheless, it is clear that many words are not quite arbitrary, but iconic. The English noun catlike, for instance, has a completely transparent meaning, something like ‘resembling a cat’ — e.g., A huge catlike animal attacked the tourist. Even if one has never heard the word before, one can guess its meaning, because it is compositional: the word consists of the root cat and the affix –like and its meaning is the “sum total” of the meanings of these constituents4. Structuralist linguistics in the first half of the 20th century, of course, noted this, and introduced the term morpheme to denote the minimal (unanalysable) sign. Under this interpretation, the word is not the smallest unit of meaning: it’s the morpheme. Polymorphemic words, then, are not arbitrary but iconic. So far, the picture seems clear, but this is elusive. In fact, many polymorphemic words are not — or not entirely — iconic. The word curiosity, for example, in the sentence I entered a shop which turned out to be full of old curiosities is clearly arbitrary: based on the meaning of curious and –ity, one expects curiosity to mean ‘curiousness’. Indeed, it can have that meaning (cf. Curiosity killed the cat), but the meaning in our sentence does not follow from the meaning of the word’s constituents: it must be learnt. In other words, curiosity is formally composite (polymorphemic), but semantically it isn’t. As opposed to words like catlike, which are semantically transparent, curiosity is opaque. A Hungarian example is the 3 I use conventional orthographic representations (rather than phonetic transcriptions) here, but note the term sound string: what counts is the pronounced form represented by the given (conventional) spelling. A sound behaves in the same way in English (and all other languages) however it is spelt, cf. English words like kit, can, queen, chemical, beginning with different letters but the same sound. Note also that what I’m talking about in this essay is also valid for illiterate speakers (children, for example!), for whom the spelling conventions are completely irrelevant — not to mention the majority of languages, which are not written in any form. 4 In fact, this is due to the predictable behaviour of the suffix. If one hears a sentence A huge wuglike animal attacked the tourist, one cannot know what the animal is like but it undoubtedly looks like a wug, whatever a wug may be. Of course, wug is not a conventional word in everyday English (but it is used by linguists as a nonsense word). 3 adjective népszerű ‘popular’, which does not mean ‘resembling the people’ (cf. macskaszerű ‘catlike’, hajszerű ‘hairlike’, and many others)5. All right, one might say, so what? Clearly, exceptions do exist. The fact that there are some such items does not contradict the basic principle that composite signs are iconic. However, closer inspection shows that this view is untenable — simply due to the sheer number of “exceptions”. Languages abound in non-iconic composite signs. Indeed, possibly the majority of compounds, and a great number of derived words, are arbitrary, an issue this essay will largely be about. Dismissing non-iconic composite signs as “exceptions” leaves a substantial portion of the lexicon unexplained (and it is counterintuitive as well to treat so many items as exceptions). Worse still, the great number of such items may suggest something very essential about the nature of the lexicon. Not only do we find opaque polymorphemic words: there is a number of instances (in a variety of languages) of meaningless morphemes. A classic instance in English is the “berry” names cran-berry, huckle-berry, etc., where the first morpheme has no meaning. Even worse are prefixed items such as be-gin or re-ceive, where neither the prefix nor the root has a meaning on its own: it is their combination that is meaningful. Hungarian examples include the morpheme üld- in üldöz ‘chase’ or egy- in egyház ‘church’, among others6. Observations of this kind led to the view that it is not the morpheme, but the word, that should be regarded as the minimal sign7. This view is basically compatible with the model I will present here (see Section 7, too). 2 What we talk about when we talk about words — Part I So far, we have been talking about the word as if it were something self-evident what it is. The reality is far from that, as formulated so appropriately in Radford et al. (1999:146): Of all linguistic constructs, the word is probably closest to familiar physical objects, but, as the history of physical science has shown, beneath these everyday objects lies a world (...) which is organised in ways which few of us can readily understand. The chief reason why the concept is so elusive is that our intuitive, everyday notion of the word corresponds to several linguistically significant entities. In this section, I present some “word definitions” which are relevant to the subsequent discussion. There are many other aspects of defining the word; readers are referred to the essay entitled “On the notion word and its role as a phonological constituent”, this volume. In one sense, the word is a string of sounds, e.g., chant, particular, houses, invented, etc. In writing, such strings are usually spelt as one unit (i.e., without an internal space)8. We can call them word forms: they are phonetic forms that are words. This is one definition of the word. There is, however, a more abstract sense of the word. Suppose one wants to convey the act of ‘acquiring something from someone by giving money for it’. The common English 5 As usual, I use single quotes (‘...’) to give meanings. Hungarians are tempted to object here, saying that egy means ‘one’. Note, however, that the meaning of the numeral is perfectly incompatible with egyház; it is more realistic to say that this is a case of homonymy, like mész ‘lime’ vs. mész ‘thou goest’. (Historically, the egy in egyház used to mean ‘holy’, but as an independent word, it has been lost — hence the present-day situation. We’ll discuss this situation later.) 7 This stance is argued for in Aronoff (1976), probably the most influential work on the issue within the generative tradition. 8 Not always, though: compounds in English are often written as two orthographic words, e.g., student committee, history teacher, White House, etc. Occasionally, it happens in Hungarian, too, e.g., Fehér Ház ‘White House’. 6 4 verb expressing this particular act is buy. Note, however, that it does not always appear in this form, as shown by the sentences below: (1) (a) The children always buy chocolate in Mr Evans’ shop. (b) Jane often buys books on early English history. (c) I bought a new watch yesterday. (d) Mother was buying some fruits when it started to rain. As most English verbs, the one meaning ‘acquire something from someone by giving money for it’ is realised by four different word forms — in this case, buy, buys, bought, buying. The choice between them is grammatically governed rather than being dependent on the speaker’s intentions, as shown by the ungrammatical sentences in (2)9: (2) (a) *The children always buys chocolate in Mr Evans’ shop. (b) *Jane often buying books on early English history. (c) *I buy a new watch yesterday. (d) *Mother was bought some fruits when it started to rain. Which form is used depends, then, on the word’s relationship to other items in the sentence as well as some grammatical properties of the sentence itself (tense, aspect, voice, etc.). In a way, therefore, these four forms are in sense one and the same word. The word in this sense is usually called a lexeme. The word forms that realise the lexeme constitute its paradigm. Lexemes are often printed in small caps to distinguish them from word forms — a practice I adopt here. Consider now the following sentences: (3) (a) I usually cut the bread on the table. (b) Yesterday I cut the bread in the sink. (c) The boy has cut his finger and it’s bleeding badly. Compare these with the ones in (4): (4) (a) I usually drink coffee in the morning. (b) Jake often drank milk when he was a boy. (c) I think you have drunk too much, let’s go home. The lexeme DRINK is realised by three different word forms in (4), but CUT always shows up as cut. Yet, the three instances of cut are, in a sense, different words: in (a), it represents the present tense of CUT, in (b), we find a past tense form, and in (c), it is a past participle. (Many English verbs have different forms representing these grammatical categories — such as DRINK.) The ‘present tense of CUT’ is, grammatically speaking, a different word from ‘the past tense of CUT’: we can call them different grammatical words10. The different word forms acting as members of a single paradigm are related to each other by inflectional morphology. In English (like in Hungarian), inflection usually takes the form of suffixation: ‘CUT—Sg3Present’ is realised as the base (stem) cut plus a suffix –s (cuts), etc. Often, however, a different base is used, as sang for ‘SING-Past’ (rather than 9 As usual, the asterisk before the sentence indicates ill-formedness. The terminology used here follows Lyons (1981) and Katamba (1993). For a different terminology, see, e.g., Matthews (1974). 10 5 *singed), a phenomenon known as suppletion. It must be noted, though, that regular inflection (both in English and in Hungarian) is based on suffixation11. One final note: the word form cut can realise another lexeme, too, as in the sentence Leila has a bad cut on her face: here, the word form is a member of a different paradigm, that of the noun CUT. In cases like this — when a form realises different lexemes — we talk about homonymy, a very frequent phenomenon in natural language. (Homonymy within a paradigm — when a single form represents several grammatical words, as illustrated by (3) — is referred to as syncretism.) 3 Sources of words: Morphology and the Lexicon — Part I One of the most fascinating properties of human language is its sheer vastness. Using a few dozen speech sounds, combining them in a finite number of ways, human beings can produce and understand a potentially infinite number of sentences. Not only is the number of sentences awesome: the number of words in any natural language is incredibly high, too (even if probably not infinite)12. But where do words come from? How does a speaker “find” the word to express the concept he/she wants to express? It may be illuminating to first consider the way sentences are produced. In essence, sentences are not memorised bits and pieces of information: speakers assemble them “on line”, as it were, according to the syntactic rules of their language13. This fact is the chief reason why humans can so easily communicate their ideas to others. (Imagine the alternative where sentences are learnt one by one, rather than constructed in situ!) Now, let us turn to words. First of all, let us remind ourselves that knowledge of language includes grammatical and lexical knowledge — the latter is, simplifying somewhat, the knowledge of lexemes (though what in fact constitutes lexical knowledge is a matter of debate; it will be taken up later)14. Lexemes are, à la de Saussure, (arbitrary) pairings of sound and meaning, so lexical knowledge includes knowing what concepts are denoted by what strings and vice versa. In English, for instance, the concept ‘female sibling’ is denoted by the sound string sister; in (European) Portuguese, the string garoto denotes the concept ‘small white coffee’. English has no lexeme which unites a string with this meaning: the concept is lexically coded in European Portuguese, but not in English (or Hungarian)15. Of course, lexical coding does not only involve simplex arbitrary signs; the word bookshop, for instance, is a conventional one to denote a ‘shop where books are sold’, and it is clearly composite and iconic. Items of this sort — conventionalised lexemes — are stored in memory, that is, the mental lexicon. If I want to refer to a concept that is lexically coded, I simply retrieve the relevant lexeme from my mental lexicon, and insert it in the appropriate place in the sentence. This process, clearly, is different from how sentences are produced. In fact, sentence production assumes prior lexical 11 The term suppletion is often reserved for pairs such as go — went or bad — worse, where the two bases represent historically different lexemes. In this view, sing — sang is not a suppletive pair, because they were at some point regularly related to each other and share what is historically the same (lexeme) base. Synchronically, however, I can see no particular reason why sing — sang and go — went should be treated differently: both are related to each other in an irregular (arbitrary) fashion and must be learnt as exceptions. In other words, neither sang nor went are iconic, as opposed to worked, for instance. 12 Sentences are potentially infinite in number because there is no principled (grammatical) limit to their length; and if a language can have infinitely long sentences, they can be infinitely different. Words do not seem to behave that way, though see Bauer (1983) for a discussion. 13 Except for idiomatic/conventionalised ones, of course, such as Let the sleeping dog lie or The early bird catches the worm, etc., which are learnt as units. 14 Plus, of course, pragmatic competence — e.g., that Hello! is not used in English to take leave but it is used so in Hungarian, or that English speakers don’t wish Good appetite before a meal but continentals do, etc. 15 To the best of my knowledge, garoto hasn’t got this meaning in Brazilian Portuguese. 6 retrieval: lexemes are retrieved from memory as “ready-made” items, then put together “on line” — in the appropriate grammatical form — by the rules of syntax. So far so good; but speakers often utter words that are probably not conventionalised ones but nonce items, used on the spot, for the particular occasion, to express some concept for which no ready-made item is available. To use a familiar example, if I hear the sentence from my neighbour My son adores Tarzanlike heroes, I will know immediately what he wishes to express. Moreover, if I want to convey to him the (unlikely) situation that my kid, on the other hand, is fond of heroes who look like Jabba the Hutt (the vile disgusting monster of the Star Wars saga), I can reply without hesitation, Oh, my son loves Jabbalike ones. Note that this may well be the very first occasion you have ever come across the word Jabbalike; indeed, it is the very first time I have ever produced it, and I have never heard/read it either. In actual fact, speakers produce (and understand) such nonce forms quite frequently, unawares, as it were: the production of such words seems to be as automatic as the production of a sentence16. As alluded to above, the reason why these particular adjectives exhibit such a regular behaviour is the highly predictable (iconic) nature — the semantic transparency — of the derivational affix –like; we may even formulate a general statement describing word formation using this affix17: (5) [[X]N like]Adj = ‘looking like X’ Or, to state it in “prose”: (6) English speakers can form an adjective from a noun X using the suffix –like, and the meaning of Xlike is ‘looking like X’. In general, semantically transparent derivational affixes tend to be productive, i.e., used freely to form new words. This is understandable: if all words formed with the affix show the same meaning pattern (as in (5) above), speakers will not hesitate to use it to coin a nonce word, as they can be certain that listeners will not misunderstand them18. The very fact that such “on line” production is possible suggests that speakers are not limited to what is already stored in memory. Instead, natural language equips humans with a device to coin new items if necessary, and this device behaves much like grammar: not only is it used “on line”, but it is also rule-governed. As it is clear from the formulation in (5) and (6), our suffix –like can’t be added to just any old base: the base must be a noun. Speakers follow this mental rule19 with amazing consistency: no one would say *I feel swimlike today or *She was wearing a greenlike dress to convey the idea that they ‘feel like swimming’ or that the colour of the dress is ‘something resembling green’, even though the concepts themselves are 16 Native Hungarian speakers might find a Hungarian example illuminating. Imagine someone describing a fat, disgusting man as follows: Ritka ocsmány dagadék, tudod, olyan Jabbaszerű ‘He’s an exceptionally hideous fatso, you know, sort of Jabbalike’. 17 I follow the formalism used in Aronoff (1976, passim). The bracketing serves to indicate wordhood: X and Xlike are enclosed between exclusive pairs of brackets, showing that they are word forms; like has no exclusive pair of brackets, showing that it isn’t. 18 It must be added, though, that productivity is a complex issue that depends on many factors apart from semantic transparency. See Aronoff (1976: Ch 3), Bauer (2001, passim), for example. 19 The word rule is used here, of course, to mean “mental (subconscious) regularity or pattern” rather than in the prescriptive sense (= speakers do not behave according to it because that’s what is written in the grammar books). 7 conceivable20. This rule-governed nature of word formation — that it is used not merely according to semantic necessity but on the basis of grammatical (formal) categories such as parts of speech, etc. — suggests that it be included in the grammatical component of linguistic knowledge. Of course, just like syntax, it takes its “ready-made” elements from the mental lexicon (I know the noun Jabba and the affix –like, and put them together “on line”). Schematically: (7) Sources of words LEXICON WORD-FORMATION (GRAMMAR) Conventional words Nonce words As it appears from the diagram, the lexicon is the ultimate source of words. The word GRAMMAR is put in parentheses because Grammar, of course, includes much more than word-formation rules. Indeed, it is reasonable to ask which component of the grammar is responsible for the task of word formation. To put it differently, what does the “word formation component” resemble? In one sense — as mentioned above — it works much like syntax, i.e., sentence production: it is an on-line process, predictable and rule-governed. On the other hand, there is a difference, too: it produces words from words, which is not a property of syntax: the latter takes words as the minimal (ready-made) constituents and combines them into phrasal and clausal structures21. Moreover, the syntax “cannot see” inside words: syntactically speaking, an adjective formed from another word (e.g., unhappy from happy) is an unanalysable element just like a monomorphemic adjective (e.g., silly), and it can (or cannot) be used in the same syntactic environments, as shown by the examples in (8): (8) (a) This girl looks silly/unhappy. (b) the silly/unhappy girl (c) *a silly/unhappy arrived (d) *the girl sang silly/unhappy (subject complement) (attributive function) (*subject) (*modifying a verb) The number of differences suggests that word formation is not carried out by the syntax. The question is, what else is there in the grammar which behaves in a rule-governed, on-line way? In fact, there is something: inflectional morphology, such as the formation of English plurals in –(e)s or past tenses in –(e)d. For example, I wish to express the idea of ‘(me) working yesterday’. The form I will use is worked, as in I worked an awful lot yesterday. The formation of worked bears a ghostly resemblance to that of Jabbalike: (9) 20 [[X]V ed]V = ‘X-past’ As testified by the word greenish, expressing precisely this idea. We might say that -ish is the (partial) “adjectival” counterpart of –like. 21 Such as the combination of an Article and a Noun into a Noun Phrase (the boy) or an Adverb and an Adjective into an Adjectival Phrase (really good). This isn’t characteristic of word formation. 8 The suffix –ed is freely added to verbs to express the category ‘past’. There is an important difference, of course: inflection does not form new lexemes, but serves to indicate grammatical categories and syntactic relations (one result being that it never changes the part of speech of the word, whereas word formation may22). This, however, is more a difference in function than a difference in kind: in other words, we may say that both inflection and word formation produce new word forms, but the function of the process is different: inflection creates grammatical words, whereas word formation produces new lexemes. This view is adopted and argued for in Aronoff (1994, passim) as well as Aronoff and Anshen (1998)23. The observant reader will have noted, however, that the above statement (i.e., that the suffix –ed is freely added to verbs to express the category ‘past’) is not true: there are many verbs which do not form their past tense forms in this manner, e.g., sing, keep, buy, teach → sang, kept, bought, taught. It seems, therefore, that inflection is, after all, not that similar to word formation: there are irregularities. How can we account for them? First of all, let us note that word formation also shows such “irregularities”. Consider the non-existent words in (10): (10) *ungood, *unsad, *unnice, *unstupid To understand irregularities, it is perhaps useful to consider why the words in (10) are illformed. After all, the prefix un-, added to an adjective, forms another adjective with the opposite meaning, as formulated in (11): (11) [un [X]Adj]Adj = ‘not X’ Here are some examples: (12) unhappy, un-English, unnatural, unkind Sometimes there are phonological reasons why a word formation process is not applicable in a specific case: for instance, the adverb-forming suffix –ly (as in sadly) cannot be added to adjectives which themselves end in –ly (hence *sillily), or the suffix –er (as in Londoner, meaning ‘inhabitant of X’), fails to attach to nouns ending in –er (hence *Denverer). Often, the inhibiting reason is semantic: the verbal prefix –un (as in unlock, meaning ‘reverse the effect of X-ing’) doesn’t attach to verbs whose effect cannot be undone, since it would express an unconceivable concept (hence *unbreak, *uneat). The reason why *ungood et al. are not possible is not a phonological or semantic one, however; instead, it is lexical. Simply, *ungood doesn’t exist because there is already a word (lexeme) for the intended concept: BAD — the concept is already lexically coded. This phenomenon has been called blocking — “the nonoccurrence of one form due to the simple existence of another” (Aronoff 1976:43). Note, however, that there are instances when blocking fails to take effect, as shown by (existing) words such as unhappy (cf. sad), unwise (cf. stupid), etc24. We can now return to why verbs such as sing, keep, buy, teach, etc. cannot receive the past suffix –ed to yield *singed, *keeped, *buyed, *teached, etc. The reason is that they already have a past form of their own, viz. sang, kept, bought, taught; recall that irregular past 22 It doesn’t have to, cf. happy → unhappy (both adjectives), or brother → brotherhood (both nouns). May I note that what I present here is, to put it mildly, oversimplified: this view is by far not universally accepted, and many linguists take inflection to be a syntactic phenomenon (for functional reasons, among others), but it would be hopelessly beyond the scope of this paper to present arguments for or against. 24 But note also that unhappy isn’t entirely synonymous with sad, and neither is unwise the same as stupid — indeed, such cases seem to be of this kind, i.e., there’s always a semantic (or at least stylistic) difference. 23 9 forms are stored in memory (the mental lexicon). Essentially, then, what we have here is a case of blocking: the concept ‘SING-past’, for instance, is lexically coded in English, hence the expected — grammatically produced, “on-line” — form is blocked25. The concept ‘WORKpast’ is not coded lexically, on the other hand: it is assembled on line, using the verb work and the suffix –ed. In other words, regular verbs simply haven’t got a past tense form — in the sense that there is no past form for them stored in memory! Just like with word formation, blocking sometimes fails with inflection, too. Several irregulars can form their past tense (or plural, etc.) in the regular, on-line way, e.g., dream, learn, burn, bet; in such cases, we have doublets (dreamt — dreamed, etc.). In time, the regular forms can take over and the irregular ones are lost, as it happened to help, whose past form used to be holp. The most reasonable explanation for how doublets arise is that inflected forms are provided in the same way as lexemes: partly by the lexicon, partly by the grammar (in fact, we can now be more precise and use morphology instead of grammar)26. Here, then, is the overall proposed scheme: (13) Sources of words (Completed to include inflected forms) LEXICON MORPHOLOGY Conventional words Irregular inflection Nonce words Regular inflection This dual model of morphology can be called the “Words and Rules” theory27. According to it, then, speakers produce words in two ways: by lexical retrieval (= conventionally existing lexemes and irregular inflected forms) or by lexical retrieval of elements, to be assembled “on line” by the rules of morphology (nonce words and regularly inflected forms). 4 What we talk about when we talk about words — Part II The model outlined above groups conventionally used words and irregular inflected forms together in one, which is calling for a name: what do we call this group? Let us first see if one of our existing word definitions could serve this purpose. It is clear that the term word form will not do, for nonce words and regularly inflected forms are word forms, too. The term grammatical word could be used for irregulars, but again, worked is a grammatical word as well. Besides, there is no way conventional or nonce lexemes — independent of grammatical function — could be sensibly called “grammatical words”. The term lexeme is out, too, for two reasons. First, irregular inflected forms are not independent lexemes. If the items retrieved directly from the lexicon are all lexemes, then we are forced to say that brought and bring are different lexemes, but this contradicts the way we defined the lexeme, since the two forms are members of the same paradigm, that of the 25 The same goes, of course, for all irregulars, including nouns (men, mice, children, etc.), adjectives (better, worse, etc.) and adverbs (fast, hard, etc.). 26 This issue will be elaborated and made more precise in Section 5. 27 The most easily accessible description of the model is Pinker (1999), a book I warmly recommend to Anglicists, including non-linguists (especially because it is a popularising book, written primarily for nonspecialists). 10 lexeme BRING. Second, it would be awkward to say that nonce words like Jabbalike are not lexemes — they show all properties that conventional lexemes do (they mean something, have a part of speech category and inflect according to it, etc). It seems, therefore, that we need a new term. Pinker (1994, 1999) proposes the term listeme, based on the common property of conventional lexemes and irregularly inflected forms — namely, that they are listed in the lexicon, i.e., memorised and, accordingly, retrieved from there if needed. The term listeme, then, is based on the phenomenon of listing — basically, storage in memory. Using this term has several advantages. First, as mentioned right now, it covers the concepts ‘conventional lexeme’ and ‘irregularly inflected form’. Second, it enables one to restrict the term lexeme to word-level items. It has long been a problem how linguists should term idiomatic phrases (kick the bucket) and sentences (Let the sleeping dog lie), which are certainly memorised — but are they lexemes? After all, it is quite counterintuitive to say that WORK is the same type of thing as Let the sleeping dog lie (they differ in everything except that they are memorised bits). In this model, idioms can safely be called listemes. At the same time, affixes — which aren’t words — can be referred to as listemes, too, since they are memorised bits. Third, it enables one to use the term lexeme for both stored (listed) and unstored word-level items28. We can now set up a table: (14) Lexeme Listeme Has a part of speech category? YES YES/NO Is it one word form? YES YES/NO Memorised (= listed)? YES/NO YES The table in (15) shows some examples: (15) play played bring brought by -ed -ly curiosity Jabbalike 28 Independent lexeme? YES NO: related to play by inflection YES NO: related to bring by inflection YES: a preposition NO: not a word form NO: not a word form YES YES Listeme? YES NO: regular inflection YES YES: irregular Grammatical word? YES: PLAY-Present YES: PLAY-Past YES: BRING-Present YES: BRING-Past YES YES YES YES NO30 YES29 NO NO YES: CURIOSITY-Sg YES Only those unstored word-level items, of course, which are produced by word formation (e.g., Jabbalike); inflected (regular) forms like worked are not independent lexemes — this is the difference between word formation and inflection. Hence, Jabbalike is a different lexeme from Jabba (its base noun), but worked is not a different lexeme from work: they realise the same lexeme (just like buy and bought). 29 Note that it is the single form realising the preposition — in English, prepositions are never inflected. In some languages, though, they are, as in Welsh (e.g., wrth — pronounced, roughly, oorth — ‘by’ has forms such as wrtha ‘by me’, wrthi ‘by her’, etc.) or Hungarian (alatt ‘below’ — alattam ‘below me’, alattad ‘below you’, etc.) — although traditional Hungarian grammars do not call this type of word a preposition but a postposition, the two things are really the same, the difference being in word order (like the definite article being put after the noun in Swedish or Romanian, for example, as opposed to English or Hungarian). The same goes for adjectives. 30 Transparent words may be listed too, as it will be shown in Section 6. 11 5 Sources of words: Morphology and the Lexicon — Part II The “Words-and-Rules” model suggests a balanced division of labour: when one needs to produce a word form to express a concept, it is either retrieved directly from the lexicon as a listeme, or assembled from listemes by the morphology. If I want to convey the concept ‘not downloadable’, I check the lexicon for a listeme meaning precisely that; since there is none, the morphology steps in to produce undownloadable; the same procedure doesn’t take place with ‘not good’, because prior checking in the lexicon provides bad, hence I will not say *ungood. The same goes for inflection: if I wish to say ‘WALK-Past’, the lexicon fails to produce a listed form, and the morphology assembles walked; with ‘BRING-Past’, the lexicon comes up with brought, a listed form, blocking the production of *bringed. In fact, the choice between the two methods isn’t simply an “either-or” one. Instead, it has been proposed that lexical retrieval and the morphology do not work according to a neat “division of labour” pattern, but are rivals31. (This is especially true for inflection, since it must be admitted that one needs to use inflected forms all the time, but new lexemes are relatively more rarely necessary.) When a word form is needed, the lexicon and the morphology are invoked simultaneously, and the faster one wins. One proof of this is the existence of doublets such as burnt/burned, mentioned earlier. How do such doublets arise? In the overwhelming majority of cases, the irregular forms are older32. (Or, as many people, including several linguists, like to say, “original”. As a historian, I show some reluctance in using this word: language is in a constant state of change, and nothing is really “original”, unless we go back to Babel.) This is surely not a coincidence: but how does it happen? The scapegoats for the phenomenon have long been children: they, it is said, don’t know the “correct” (How funny it is to use this word for something that’s irregular from a regular grammatical viewpoint!) past form, and produce a regular one. If a significant number of kids go on using it, it will slowly but surely replace the irregular one33. While there is something in this view, it fails to explain an important point: why are there still so many irregulars? Why don’t the darn kids regularise sing, come, go, get, and so on? The funny thing is that they do. At about the age of three, children start to produce forms like singed, comed, goed, getted, etc. This comes normally at the age when they discover grammatical regularities, including the regular past tense formation: that’s when their morphology starts to work. Later on, however, they memorise the irregulars which, in turn, block the regular formations. But why doesn’t this happen in all cases? The answer, as so often in science, is surprisingly simple: it’s called memory. Note that irregulars are (must be, because they are arbitrary) stored. Now, memory is fallible, and it relies heavily on frequency: the more we hear something, the more likely it is to stay in our permanent memory. This isn’t just the case with words, but with any kind of information stored in the brain: if something does not come up again and again, we are more likely to forget it; why should words be any different? Indeed, irregulars are constantly being regularised — except for the most frequent ones! It is no accident that the irregularly inflected words — in all languages — are among the most frequent items (the verb be, the most 31 E.g., Aronoff and Anshen (1998). In some cases, the opposite happens, too, as with the verb catch in English, which was borrowed into Middle English from Anglo-Norman (the dialect of Norman French spoken in post-conquest England), and was adopted as a regular word — as newcomers usually are — without an “own” (listed) past form: hence, people said catched when referring to ‘CATCH-Past’. The irregular form caught was probably born on the analogy of similarly sounding verbs like teach, possibly as a jocular form, but later on, it caught (“catched”) on and became listed, blocking the production of catched. Think of the playful Hungarian irregular plural tanarak ‘teachers’, based on madarak ‘birds’ (← madár), besides the regular tanárok. 33 The other “explanation” is that people are lazy, forgetting their own mother tongue, and it is going to the dogs, but this explanation is, of course, not quite serious. 32 12 frequent one of all, is notoriously irregular all round the place34). And this is where the answer lies: the more frequent a form is, the more easily retrievable it is. The model which treats the lexicon and the morphology as rivals gives a straightforward explanation: if the item is frequent, and the stored irregular form is, as a result, easily retrieved, the lexicon wins: it provides the form faster. If, on the other hand, the needed item is relatively rare, the memory may fail, and the morphology will be faster in producing the output. In other words, the morphology “doesn’t wait for” the lexicon until the latter manages to come up with something: it starts assembling the regular item at the same time as the lexicon starts the search for an irregular (listed) one. If there’s no listed (= irregular) form, it is only natural that the (regular) morphology wins. If there is one, the outcome depends on who is faster. Historically speaking, the rarer an irregular item is, the more likely it is to become regular. The “Words and Rules” model, incorporating the idea that the Lexicon and the Morphology are rivals, offers an ideal starting point for the final part of our discussion. 6 Arbitrariness, listing, and lexicalisation Though we seem to have assigned many things to their appropriate place, one important issue must still be clarified: the relationship between listing and arbitrariness. More precisely, what is the relationship between listed (stored) and arbitrary items? In one direction, the answer is quite straightforward. As said before, arbitrary signs — precisely because they are arbitrary — are stored in the lexicon (they are memorised bits). There are non-arbitrary (iconic) forms, however: what about them? The traditional mainstream answer has been that the lexicon is the storehouse of arbitrary information, and the rest — whatever is predictable — is provided by the grammar (the phonology, the morphology, and the syntax). This is roughly the position of the classical model of Generative Grammar as outlined in Chomsky’s early works on syntax (Chomsky (1957, 1965)) as well as the classical generative work on phonology (Chomsky and Halle (1968)). The essence of the claim is that only arbitrary forms are listed. Regular (predictable) ones are not, so lexemes that are semantically transparent (such as unnatural, happiness, Jabbalike etc. are assembled on line each time the speaker produces them (from un+natural, happy+ness, Jabba+like, resp.), and this is precisely what the hearer does, albeit in the opposite direction: he/she uses the morphology to analyse these forms into their constituent bits, and it is these bits that are looked up in the lexicon. The same goes for inflection: the memory does not store items such as dogs, happier, played, etc., but the forms are put together on line each time the speaker opens his mouth (from dog+s, happy+er, play+ed, resp.). The argument goes like this: since the word happiness, for instance, is semantically transparent (non-arbitrary, because it follows automatically from the meaning of the constituents), there is no need to memorise it when one comes across it first, because it can be safely assembled any time it is needed (and it can be understood any time one hears it later on). The same goes for regular inflection. The chief motivation behind this view is that it was long assumed that the mind tries to minimise lexical storage, since “memorisation is expensive” (i.e., it burdens the brain); if the regular morphological rules are available anyway (and they are, as shown by the fact that you understand Jabbalike and other nonce words you have never heard before, and it is no problem to find out, either, that the form pockled in He drank a beer and then pockled off is a past tense form even though pockle is a non-word), why not make as much use of it as possible? Clearly, it “costs” the memory less. 34 A good illustration of the highly irregular nature of be comes right from English. In Old English, a number of verbs had two different stems in the past, one serving for first and third person singular, and another one for the rest. By now, this distinction has been levelled out and English verbs have one single past form — except be! 13 Recent psycholinguistic research has revealed, however, that this assumption may be false: the memory can easily cope with a much larger amount of stored information than assumed before. Furthermore, there is a widespread phenomenon in natural language which is hardly understandable if we maintain this view: lexicalisation. The term lexicalisation has been used in two different (though related) senses in the literature. The first sense of it is what I have called lexical coding; I will not use it in this sense here. The other sense involves composite signs. It has long been noted that compounds and derived words which were at some point transparent can lose their transparency and become arbitrary signs. Let us start with compounds. The English word cupboard, for example, used to mean ‘a board where cups are held’, and its meaning was as transparent as that of the modern word bookshelf. In time, however, the object it denoted underwent physical changes (it ceased to be a board, but was transformed into what is more like a chest, then they started to store other things in it besides cups, etc.), but the name remained — and the transparent relation between cup + board and cupboard was lost: it became an opaque (arbitrary) item35. This is one instance of lexicalisation, due to the word being retained while the concept changes. A similar example is lord, from Old English hlaf-weard (lit. “loafward”, i.e., ‘guardian of the bread’), whose semantic content moved to ‘leader of the household’ then ‘master’, etc. (The phonology followed suit: it became reduced to hlavord then by Early Middle English, loverd, then even further to lord.) Another type of lexicalisation is when one of the components is lost as an independent word but the compound is retained. Examples include playwright (and some other items ending in –wright), where the wright part used to be an independent word meaning ‘maker, craftsman’, but as it has been lost as an independent word, the meaning of playwright is no longer predictable from the meanings of the parts (this is obvious: wright, a non-word, doesn’t mean anything) — so it has become a minimal (arbitrary) sign. In Hungarian, egyház ‘church’ is a similar example, the first element being an Old Hungarian adjective meaning ‘holy’, but as it was lost as an independent word, the compound became arbitrary. The same thing can happen to derived words. The English gerund building, for instance, came to be used as a concrete noun (‘edifice’). Since the suffix –ing has no regular function of this kind (i.e., forming concrete nouns, such as the products of the act — ratifying, for instance, doesn’t mean the product of the action), building in this sense became a separate word (different from the gerund, which still exists, as in I love building cottages) that is arbitrary. A similar Hungarian example is tojás ‘egg’, which used to be the gerund of tojik ‘lay eggs’ — note that, here, too, the word still exists in an iconic gerund function, as in the sentence A tyúk tojás közben megdöglött ‘The hen died while laying eggs’. An affix can die out, too, as it happened to the abstract nominal suffix –th, as in depth, width, length, etc. These are clearly arbitrary forms, since –th is no longer a regular (productive) affix (it only survives in a handful of words, and cannot be used to form new lexemes); note also the highly irregular (unpredictable) alteration of the form of the base (not *longth, for instance). Lexicalisation, then, may be defined as follows: a form is lexicalised if it could not be produced by the application of regular morphological processes. In other words, this means that lexicalised items must be listed in the lexicon, because their meaning is not predictable. Lexicalisation is a very frequent phenomenon in natural language: the reader is invited to check a few dozen compounds or derived words, and I am sure that he/she will find several lexicalised ones among them. But why can lexicalised forms come into being? How does lexicalisation fit into the model outlined here? 35 The phonology followed suit, reducing the stress on the second element, and eliminating the monomorphemically illicit pb cluster; in fact, were it not for the archaic spelling, no one would ever think of cupboard as something having to do with cups. See also “On the notion word and its role as a phonological constituent”, this volume. 14 It is the basic nature of the linguistic sign that it’s arbitrary. So far, we have treated this as a mere observation that holds for many items — specifically, for listemes. However, the arbitrariness of the linguistic sign has a very serious consequence, historically speaking. If the relationship between form and meaning is a matter of mere convention, there is no reason why this relationship ought to be constant — in other words, arbitrariness implies the possibility of change, either in the form, — or the meaning. It is arbitrariness that sanctions linguistic change. And that’s exactly what we find. Both sides of the sign keep changing all the time in all languages, without exception. Some examples will suffice: the English words time and house were pronounced something like “teem” and “hoos” in Middle English, night was pronounced roughly like Modern German nicht, etc; meanings change, too, often accompanied by changes in form: tide, for example, used to be pronounced “teed” in Old English, and it meant ‘time, hour’ (a meaning that it has lost as an independent word, but cf. noontide); or, to cite a recent example, web has come to mean ‘internet’; in Hungarian, király ‘king’ has taken on an adjectival role, meaning ‘cool, very good’ in colloquial usage; etc. Check an etymological dictionary, such as Onions (1966), and you will find that most words used to mean something else than today — and possibly, also pronounced differently. Arbitrariness, then, almost “calls for” change. Once listed, words “persist and change”, as Mark Aronoff puts it36. No listeme is safe from the turn of the tides. But why do non-arbitrary items change (i.e., become lexicalised)? The assumption presented at the beginning of this section is that only arbitrary forms are stored in memory (= are listemes): predictable ones are not. If they are re-created again and again each time the speaker uses them according to independent (general) morphological rules, how can they undergo a change in meaning? This is a grave problem with the “only-arbitrary-forms-arelisted” view, a problem I like to call the lexicalisation paradox. Let us illustrate it with one specific example. Consider the highly productive English suffix –able, whose morphology and semantics can be described as in (16): (16) [[X]V able]Adj = ‘able to be X-ed’ That is, it is added to verbs to form adjectives with the meaning specified in (16); here are some examples: countable, returnable, washable, downloadable, etc. Consider now the item readable: it simply does not mean ‘able to be read’, but ‘enjoyable as a reading’. It clearly shifts away from its expected meaning, taking on an unpredictable one, i.e., it is lexicalised. But if productive word formation (and the one in (16) surely is productive) is an on-line process whose products are not listed, how can this happen? Specifically, I form this word using the regular morphology upon need, but the resulting form is not stored in my memory (not listed). Next time I need it, I re-produce it again from read and –able using the regular morphology. Note that the meaning of the constituents has undergone no change: how can the meaning of the word change then? This is the paradox I referred to above: in order to undergo lexicalisation, words must be listed; if an item is not listed, but produced “on line”, it is impossible for it to undergo meaning shift, hence, lexicalisation. The paradox is apparent, though: the problem is not with lexicalisation, but with the “only-arbitrary-forms-are-listed” assumption. The problem is that this view assumes the following logic: arbitrary forms must be listed, so they are; non-arbitrary forms need not be listed, so they aren’t. Here’s the mistake: while it is true that what must be listed is obviously 36 Aronoff (1976:18) 15 listed, the second half of the assumption is not true: if something need not be listed, it does not mean it mustn’t be: it can. The adherents of the “Words and Rules” model, in fact, assume this very stance: the model merely says that non-arbitrary (regular) forms need not be listed; it doesn’t say they can’t be. In other words, regular forms can be produced anytime “on line”, if needed; this is not true for irregular (let’s now say: lexicalised) ones. If one presumes that certain regular formations are listed, the lexicalisation paradox disappears within a twinkling of an eye: all we need to say is that readable became listed, and then it started its journey towards arbitrariness. The only criterion for lexicalisation, then, is mere lexical listing, rather than arbitrariness: if something is listed, it will sooner or later become lexicalised, no matter if it’s arbitrary or not at the point when it’s listed. But what is, then, listed, from the set of non-arbitrary forms? The answer, again, is provided by memory. Recall what we said about frequency: the more frequent something is, the easier to it is to retain in permanent memory. But there’s more to it: if something is frequent enough, it will be memorised (think of TV commercials, for instance, which one often learns by heart even without intending to do so). The most striking proof comes from irregular inflection. As said before, the irregularly inflected items in any language tend to be among the most frequent ones. But there is more to it. The reader is invited to make a guess at what are the ten most frequent verbs in English. Pinker (1999:122) provides a list of the “top ten” — here it is (with decreasing frequency): be, have, do, say, make, go, take, come, see, get37. And now, surprise, surprise: they are all irregulars! By contrast, rare words are basically always regular in any language. We can safely assume that frequently used regularly inflected/derived forms are also listed, and — because of their frequency — they are so easily retrievable from memory that speakers will produce them that way, rather than assembling them on line. Once in the lexicon, they are treated as listemes, and they can undergo all sorts of unpredictable changes — or, for that matter, they can resist the reorganisation of the regular morphology. In fact, that’s how irregular inflected items mostly come about. The plural men, for instance, was at some point perfectly regularly related to man. In time, though, the plural formation rule that produced it ceased to be productive, and another plural rule replaced it as the regular one: the modern –(e)s plural. Frequent words, though, resisted the change: man retained its irregular plural. This is only explicable if we assume that men had long been listed, blocking the formation of *mans. Finally, here is a new, this time (hopefully) final version of the word sources diagram: (17) Sources of words (Final version) LEXICON MORPHOLOGY Conventional words Irregular inflection 37 Nonce words Regular inflection Based on the number of occurrences in a million words of text; be leads with 39,175; the last one is get with 1,486. 16 7 Epilogue I hope to have presented a roughly coherent picture of how words and meanings are related to each other, and how they are made available for the speaker — either by lexical retrieval or the morphology, revealing a kind of “great chain”, a perpetual move from the lexicon — into the lexicon. At the heart of the matter is the Saussurean doctrine of arbitrariness, a truly essential property of the lexicon which, as a kind of “black hole”, constantly attracts words into itself — showing, I hope, that the word (in the listeme sense) does have a central role in the life of languages. References Aronoff, Mark (1976) Word formation in generative grammar. Cambridge, MA: MIT Press. Aronoff, Mark (1994) Morphology by itself. Cambridge, MA: MIT Press. Aronoff, Mark and Frank Anshen (1998) Morphology and the lexicon: Lexicalisation and productivity. In: Spencer, Andrew and Arnold M. Zwicky (eds.) The handbook of morphology. Blackwell. Bauer, Laurie (1983) English word-formation. Cambridge: CUP. Bauer, Laurie (2001) Morphological productivity. Cambridge: CUP. Chomsky, Noam (1957) Syntactic structures. The Hague: Mouton. Chomsky, Noam (1965) Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, Noam and Morris Halle (1968) The sound pattern of English. New York: Harper and Row. Katamba, Francis (1993) Morphology. MacMillan Press. Lyons, John (1981) Language and linguistics. Cambridge: CUP. Matthews, P.H. (1974) Morphology. Cambridge: CUP. Onions, C. T. (1966) The Oxford dictionary of English etymology. Oxford: Clarendon. Pinker, Steven (1994) The language instinct: How the mind creates language. New York: Harper Collins. Pinker, Steven (1999) Words and rules: The ingredients of language. New York: Harper Collins. Radford et al. (1999) = Radford, A., M. Atkinson, D. Britain, H. Clahsen and A. Spencer. Linguistics: An introduction. Cambridge: CUP. Saussure, Ferdinand de (1983) Course in general linguistics. London: Duckworth.
© Copyright 2026 Paperzz