Draft 23/01/200808:32:56 Comments invited Good versus bad antonyms using textual and experimental techniques to measure canonicity1 Carita Paradis, Caroline Willners, M. Lynne Murphy & Steven Jones Abstract: We combine corpus methodology with experimental methods to gain insights into the nature of antonymy as a lexico-semantic relations and the degree of entrenchment and routinization of antonymic word-pairs in memory. 1. Introduction It has long been assumed in the linguistics literature that contrast is fundamental to human thinking and that antonymy as a lexico-semantic relation plays an important role in organizing and constraining languages’ vocabularies (Lyons 1977, Cruse 1986, Fellbaum 1998, Murphy 2006, Paradis & Willners 2006). While corpus-based methodologies and experimental techniques have been used to investigate antonymy, little has been done to combine the insights available from these methods.2 The purpose of this paper is to fill this gap and thereby add to our knowledge of the nature of antonymy through empirical results informed by both corpus- and experimental data. The central issue concerns ‘goodness of antonymy’ or ‘degree of antonym canonicity’, and the data consist of adjectives that speakers consider to be strongly antonymic and adjectives that have no strong candidate for this relationship. For instance, it is likely that speakers of English would regard slow – fast as a good example of a pair of strongly antonymic adjectives, while slow – quick and slow – rapid may be perceived as less good pairings and fast – dull less good pairings than slow – quick and slow – rapid. All these pairs in turn will be better examples of antonymy than unrelated relations such as slow – black or synonyms such as slow – dull. Our hypothesis is thus that there is a scale of ‘antonym goodness’ where some pairings are perceived as very good pairs of antonyms, while others are less good and variable with respect to strength of affinity. This challenges the notion tha a strict contrast exists between canonical and non-canonical antonyms (or direct and indirect, lexical and conceptual which describe similar dichotomies) as assumed in some of the literature (e.g. Gross & Miller 1990). We assume that different techniques and different experimental task produce slightly different result, but that there will always be a limited core of highly opposable couplings that are lexicalized as pairs, while most coupling are more or less strongly related. There are probably various converging reasons for people’s perceptions of ‘goodness of antonym pairings’, e.g. frequency of co-occurrence, co-occurrence in certain types of constructions, e.g. whether slow or fast, stylistic co-occurrence preferences and pairwise acquisition. 1 Thanks to Joost van de Weijer for help with the statstics, to Anders Sjöström for help with producing figures and to Simone Löhndorf for help with data collection. 2 Corpus studies: Muehleisen 1997, Willners 2001, Jones 2002, 2006, 2007, Murphy & Jones (2005), Murphy, Paradis Willners & Jones (in preparation), Jones, Murphy, Willners & Paradis (2007). Experiments: Deese 1965, Becker 1980, Herrmann et al. 1979, Paradis & Willners 2006. 1 Draft 23/01/200808:32:56 Comments invited The theoretical implications of our study concern the extent to which extent antonymy is a lexical relation rather than merely a semantic/conceptual relation. Methodologically, the study has implications for cross-linguistic investigations of antonymy. Both textual evidence and psycholinguistic evidence are used in order to shed light on these issues. The data, consisting of pairs of antonyms that co-occur in sentences significantly more often than chance would predict, were retrieved from The British National Corpus (henceforth the BNC3) and used as test items in two different types of experiments, an elicitation experiment and a judgement experiment. The paper is organized as follows. We start with a discussion of the notions of antonymy and canonicity and a review of previous work in relation to them. Section 3 presents the aim and the general hypothesis of the present study. Section 4 presents the corpus-driven method for the retrieval of data for the experiments, the motivation for using such a method and the final test set. Sections 5 and 6 present the design and the results of the judgement and elicitation experiments from the point of view of ‘degree of canonicity’ followed by a general discussion of the results in Section 7 and, finally, the discussion is concluded in Section 8. 2. Antonymy and canonicity Antonyms are at the same time minimally and maximally different from one another. They express the same conceptual property, but they denote opposite poles/parts of that domain. (Paradis 1997: 43 – 59, Willners 2001: 17, Murphy 2003: 43 – 45).4 Word meanings that most people associate with antonymy are adjectival. This is reflected in what we find in dictionaries. The majority of antonyms provided in a learner’s dictionary are adjectivals (Paradis & Willners 2007). Most of the pairings that we intuitively judge to be good pairings are gradable adjectives. Gradable adjectival meanings that have antonyms are either UNBOUNDED expressing a range on a SCALE such as good – bad, or BOUNDED expressing a definite ‘either-or’ mode being able to express totality and partiality such as dead – alive, disastrous – terrific (Paradis, 2001, forthcoming, Croft and Cruse 2004: 164 – 92, Paradis & Willners 2006), but there are also non-gradables such as male – female.5 This points up the issue of there being more strongly associated lexico-semantic parings, less strongly associated pairings and lexically unrelated pairings that may very well be used contrastively given the right contextual requirements. Throughout the linguistic and psycholinguistic literature, researchers have distinguished between two types of opposites: so-called ‘lexical/direct/canonical’ and ‘semantic/indirect/non-canonical’ opposites. The former categories are closely linked both semantically and lexically, while members of the latter category are opposites only by virtue of their semantic incompatibility. This view assumes that canonical antonyms, unlike noncanonicals, are represented in the mental lexicon as word pairs, and so their relation is not re3 The British National Corpus is a 100 million-word corpus of written and spoken British English, see http://www.natcorp.ox.ac.uk. 4 For various definitions and studies of antonymy see Lehrer & Lehrer 1982; Cruse 1986; Muehleisen 1997; Paradis 1997, 2001; Fellbaum, 1998; Jones, 2002; Murphy 2003; Paradis & Willners 2006a, as well as http://www.f.waseda.jp/vicky/complexica/index.html. 5 The different types of gradablity configurations in language can be tested using degree modifiers as criterial elements. UNBOUNDED gradable meaning are combinable with scaling degree modifiers, e.g. very good, BOUNDED gradable meanings with totality modifiers, e.g. absolutely terrific, and non-gradable meanings are not combinable with any degree modifiers at all (Paradis 1997). We are using ‘antonymy’ as a cover term for all different kinds of oppositeness in this paper. This is different from how the term is used in Paradis (1997, 2000a, 2000b, 2001, fothcoming), Cruse 1986 and Croft and Cruse (2004), where antonymy is reserved for opposites that are associated with a SCALE. 2 Draft 23/01/200808:32:56 Comments invited calculated each time the words are encountered. But while these notions have repercussions for linguistic theories, they have never been defined in a principled way. When researchers distinguish between canonical and non-canonical antonyms for psycholinguistic experiments for which this distinction is important or when lexicographers decide which relations to represent in their dictionaries or databases (for example, Princeton WordNet), they do so intuitively and often with unbalanced and irregular results (Sampson 2000, Murphy 2003, Paradis & Willners 2007). Since lexical structure of the Princeton WordNet presupposes the existence of direct antonyms, they are forced to make up a place-holder for missing members. For instance, angry has no partner and therefore UNANGRY is supplied as the antonym. Antonyms have also been used as stimuli in ERP studies (Bentin 1987), but again the results of the data collection are strange. The selection of antonym pairs was based on a list of 120 words given to 6 students. The students were asked to specify the best antonym they could think of. The 80 pairs that were selected for the study were thos for which the same antonym was unanimously chosen. This seems to be a very high figure for what could be categorized as canonical antonyms. Figure 1 shows the Princeton WordNet model for the antonym pair dry – wet. (Gross & Miller 1990: 268). Figure 1 here As Figure 1 shows, the WordNet model distinguishes between direct and indirect antonyms. The direct antonyms are lexically related while the indirect ones are linked to the direct antonyms by virtue of being members of their conceptual synonym-sets. The direct antonyms are central to the structure of the adjectival vocabulary in this model. Psycholinguistic indicators of antonym canonicity include the tendency of antonyms to elicit one another (and not any other semantic opposites) in psychological tests such as free word association (Palermo & Jenkins 1964, Deese 1965, Charles & Miller 1989) and the tendency for speakers to identify them as opposites at a faster speed (Herrmann et al. 1979, Gross et al. 1989, Charles et al. 1994). For instance, Charles et al. 1994 found that indirect antonym reaction times were affected by the semantic distance between the members of the pair, while reaction times for direct antonyms were not. Moreover, in semantic priming tests, canonical antonyms have been found to prime each other more strongly than non-canonical opposites (Becker 1980). However, this might be an over-simplified means to classify antonyms as there is evidence (for example, Herrmann et al. 1986) that canonicity is a scalar rather than absolute property. In one of their experiments, Herrmann et al. asked informants to rate word pairs on a scale from one to five. From the results of their experiment it emerges that there is a scale of ‘goodness of antonyms’ with scores ranging from 5.00 (maximize – minimize) to 1.14 (courageous – diseased, clever – accepting, daring – sick). Herrmann et al (1986: 134 – 135) define antonymy in terms of four relational elements. The first element concerns the clarity of the dimensions on which the pairs of antonyms are based. Their assumption is that the clearer the relation is the better the antonym pairing. For instance, according to them the dimension on which good – bad is based is clearer than the dimension on which holy – bad relies. The clarity stems from the single component goodness for the first pair as compared to the latter pair which they claim relies on at least two pairs, goodness and moral correctness. In other words, the clearer the dimension the stronger the antonymic relation. Secondly, the dimension has to be predominantly denotative rather than predominantly connotative. The third element is concerned with the position of the word meaning on the dimensions. In order to be good antonyms they word pairs should occupy the opposite sides of the midpoint, e.g. hot – cold, rather than the same 3 Draft 23/01/200808:32:56 Comments invited side, e.g. cool – cold (Ogden 1932, Osgood et al 1957). Finally the distances from the midpoint should be of equal magnitude. Each of these elements is a necessary but not a sufficient condition for antonymy, which means that word pairs can fail to conform to the definition of antonymy by failing any one of the four conditions. Herrmann et al. propose that semantic memory tasks involve the processing of relation elements inherent in the stimulus words. The selection of the relational elements was carried out through an elaborated process. The pairs varied on three of the relation elements described above: clarity, nature of oppositeness and symmetry. The variation across the elements was ascertained by the Herrmann et al. who constructed 50 pairs that were denotatively opposed and 50 that were connotatively opposed. The former pairs were drawn from the Webster’s New Dictionary of Synonyms (1973). Both good and poor examples were selected by the authors. The connotative pairs were also selected by the authors. They started out with a pair of words that involved both the denotative and the connotative dimension of the words, e.g. popular – unpopular. They then selected one of the members based on its connotative meaning and matched it with a connotative synonym, e.g. unpopular – shy, and finally substituted the first member with the connotative opposite, e.g. popular – shy. Finally, pairs were constructed in sets of two pairs opposed on the same dimension but at different distances from the middle of the dimension, e.g. huge – tiny as well as large – small. Furthermore, all pairs varied in clarity. The dimensions underlying the pairings were developed by informants, e.g. WEALTH as the dimension for rich – poor and the clarity value was subsequently computed on the basis of the agreement across the the informants with respect to the dimensions suggested. Symmetry of opposition was assessed by another group of informants, who were asked to indicate the relative positions of the members of a pair on the dimension by making marks on a line. Based on these marks the element of symmetry was assessed. In the subsequent judgement experiment the informants rated the 100 pairs for degree of antonymy on a scale from ‘not antonyms’(1) to ‘perfect antonyms’ (5). The results show that the degree of antonymy was influenced by the three antonym elements, i.e. that the two words are denotatively opposed; that the dimension of denotative opposition is sufficiently clear; that the opposition of two words is symmetric about the centre of the dimension.6 While the standard view of antonymy is that direct antonyms are represented in the lexicon, our take on antonym relations, canonical as well as non-canonical, is that they are conceptual in nature and that some lexical pairings are routinized and perceived as strongly coupled pairings or the pairs of antonyms. The assumption in this study is that pairs that have lexicalized status are pairs that co-occur frequently in text (Paradis & Willners 2007), and we therefore use them as test items in the design of the experiment (see Section 4). In her model of lexical knowledge, Murphy (2000: 330) makes three distinctions (yet, no claims are being made as to whether it is possible to actually set up the dividing lines). She argues that the human mind holds the following general kinds of information. Firstly, it holds lexical knowledge of words which allows people to competently use words. This means that we know how to use the word short in a sentence. Secondly, conceptual knowledge relating to words, or knowledge about words includes facts that we store in memory that make it possible for us to have a conversation about short. Finally, conceptual knowledge relating to the denotata of words, or knowledge of/about the world, involves knowledge of lexical relations and how 6 Herrmann et al’s study also involved a study of synonymy and similarity. The results of that experiment showed that the ratings of degree of synonymy were influenced by two elements of similarity, i.e. that the denotative meaning of two words overlap and that the denotative overlap is total. As we show in Section 4, our study is methodologically different from Herrmann et al’s in that the main part of the test items are retrieved corpus-driven, that we combine a judgment experiment and an elicitation experiment and that new experimental software has been developed which contribute to better experimental environment 4 Draft 23/01/200808:32:56 Comments invited categories relate to one another. For instance, how the meaning of short is conceptually represented in relations to other meanings such as opposite meanings. In more traditional terms one may say that these three types of knowledge represent our knowledge of the word, knowledge about its sense, and finally knowledge of/about the entity or relation itself, the referent. The above three distinctions are useful in the discussion of any theory of lexical semantics irrespective of whether we think that we are able to identify the dividing lines between the three types of knowledge or not. It follows from this that there is no lexicon in the sense of a repository containing semantic and grammatical information about lexical items, i.e. defined as a component that holds specifications that are both arbitrary and relevant to linguistic competence. In this study we restrict ourselves to a model that is subdivided into two types of conceptual knowledge that form a continuum. The first type is (i) conceptual knowledge of/about the words of the encoding language. The source of lexical items are conventional routines available to users of a specific language These routines may potentially be referred to as the lexicon. We do not attempt (or even think it is possible) to separate linguistic knowledge from non-linguistic knowledge in a general and principled way. Lexical items evoke and are evoked by concepts involving all kinds of meanings specifications relevant, or irrelevant, in the actual usage-event and for our purpose we see no point in postulating a separate non-conceptual type of lexical knowledge. The other type of lexical knowledge is the conceptual knowledge of/about the world, which, of course is essential to meaning and inseparable from conceptual knowledge of/about words once connections have been established. Conceptual space is highly structured and will thereby ease lexical retrieval (Murphy 2003: conclusion, Paradis 2003: 225 – 227). In addition to evidence of antonym canonicity at the conceptual level, there is also textual evidence that support degrees of lexical canonicity. Justeson & Katz (1991, 1992) and Willners (2001) established that members of canonical pairs tend to co-occur at higher than chance rates and that canonical pairings are preferred over other semantically possible pairings (Willners 2001). Knowing antonym pairs is not just a matter of knowing set phrases in which they occur, like the long and the short of it or neither here nor there, but that the antonym pairs co-occur across a range of different phrases. Indeed, Mettinger (1994, 1999), Fellbaum (1995), Jones (2002, 2006, 2007), Murphy & Jones (forthcoming), Murphy, Paradis, Willners & Jones (in preparation) and Jones, Murphy, Paradis & Willners (2007) have used both written and spoken corpora to demonstrate that antonyms frequently co-occur in contexts such as more X than Y, difference between X and Y, X rather than Y. Treating relations as relations among concepts of words, rather than relations between lexical items is consistent with a number of facts about the behaviour of relations. Firstly, lexical relations display prototypicality effects in that there are ‘better’ and ‘less good’ instances of relations. In other words, not only is dry the most salient and well-established antonym of wet, but the relation as such may also be perceived as a better antonym relation than, say, tough – tender. When asked to give examples of opposites, people most often offer pairs like good – bad, weak – strong, black – white and small – large, i.e. common lexical items in conventionalized pairings along salient dimensions. Secondly, just like non-linguistic concepts, relations in language are about construals of similarity, contrast and inclusion. For instance, antonyms may play a role in metonymization and metaphorization. At times new metonymic or metaphorical coinages seem to be triggered by relations. One such example is slow food as the opposite of fast food. Finally, lexical pairs are learnt as pairs or construed as such in the same contexts. Canonicity plays a role in new uses of one of a pair of a salient relation. When a member of a pair of antonyms acquires a new sense, the opposition can be carried into a new domain which is an indication that we perceive the words as related also in that domain. Lehrer (2002) notes that if two lexical items are in a strong relation with one another, the relations can be 5 Draft 23/01/200808:32:56 Comments invited transposed by analogy to other senses of those words. Lehrer illustrates this with He traded in his hot car for a cold one. Along the temperature dimension, hot contrasts with cold, and the relation is carried over to a dimension related to whether the car was legally or illegally acquired. For speakers to be able to understand cold in this sense when it is first encountered, they must first of all know the meaning of hot car and they must also be familiar with the canonicity of the antonym relation underlying hot and cold in the temperature dimension. Murphy (2006) gives examples of the same phenomenon using black and white. For instance, black was in regular use before white in expressions such as black coffee, white coffee, black market, white market, black people, white people, black box testing, white box testing while the opposite is true of white light and black light (visible /invisible types of radiation). 3. Aim The general aim of this study is to gain new insights into the nature of antonymy as a lexicosemantic relation of contrast. Our hypothesis is that semantically opposed pairs of adjectives are distributed on a scale from what we might call canonical antonyms to pairings that are hardly antonyms at all. Characteristic of canonical antonyms is that they are conventionalized expressions of the opposing poles. Such pairings are relatively few and they differ significantly from other pairings that are potentially opposable. The great majority of antonym pairings are more loosely connected to one another. Like other categories, the category of antonymy shows prototypicality effects and has internal structure. Our secondary aim is to combine and compare textual and psycholinguistic methodologies in order to see if they yield the same results in the pursuit of more strongly routinized pairings and the continuum of less strongly associated pairs. The principle and method of selection of test items is based on corpus-driven statistical methods described in Section 4 and the items are subsequently put test in two different types of experiments, a judgement experiment and an elicitation experiment, see Sections 5 and 6 respectively. Our hypothesis is that irrespective of the techniques used, the results will select the same pairings as the best examples of antonyms. 4. Method of data extraction As reported above, antonyms co-occur in sentences significantly more often than chance would predict, and some antonym pairs co-occur more often than others. The pairs that have been shown to co-occur most often in sentences are also the pairings that speaker intuitively deem to be good antonym pairs when asked. The rationale for the selection of test items for the experiments profits from the statistical findings of co-occurrence of word pairs in textual studies previously carried out by (Justeson & Katz (1991), Willners (2001 and Jones (2002)). The hypothesis underlying these corpus-driven textual analyses, i.e. Justeson & Katz and Willners, is that all the words in a corpus are randomly distributed. Both the above studies prove the null hypothesis wrong, i.e. there are word-pairs that occur in the same sentence much more often than expected. Willners (2001) further showed that, along a scale, the antonym words co-occurred significantly more often than all other possible pairings. Using the insights of previous work on antonym co-occurrence as our point of departure, we developed a methodology for selecting data for our experiments. We agreed on a set of seven dimensions that we perceived central meaning dimensions in human communication and therefore good candidates for a high degree of canonicity and identified the pairs of antonyms 6 Draft 23/01/200808:32:56 Comments invited that we thought were the best word pairs of these dimensions (see Table 1) and checked that the antonyms were all represented as direct antonyms in Princeton WordNet.7 Table 1 here The wordpairs in Table 1 were then searched in the BNC using a computer program called Coco developed by Willners (2001: 83) and Willners & Holtsberg (2001). Coco calculates expected and observed sentential co-occurrences of words in a given set and their levels of probability. Unlike the program used by Justeson & Katz (1991), Coco has the advantage of taking sentence length variations into account, which gives more accurate statistics. It was established that the seven adjective pairs co-occurred significantly at the sentence level. Table 2 here The results of using Coco on the seven word pairs in the BNC are shown in Table 2. N1 and N2 are the numbers of times that Word1 and Word2 occur in the corpus. Co is the number of times they co-occur in the same sentence, while ExpctCo is the number of times they are expected to co-occur that chance would predict.8 The rightmost column, p-value, shows the probability of finding the number of co-occurrences actually observed, or more, that the cooccurrences are due to pure chance alone. The calculations were made under the assumption that all words are randomly distributed in the corpus and the P-values are all lower than 10-4. Next all synonyms of the 14 adjectives were collected from Princeton WordNet. This resulted in a list of words potentially related to each dimensions, for example in the SPEED dimension, fast and all the synonyms given in Princeton WordNet synonyms and slow and all its synonyms and so on. All of the words in this list, regardless of their semantic relation, were searched for sentential co-occurrence in the BNC in all possible pairings and orderings. It was established that the seven adjective pairs co-occurred significantly at sentence level as did many of their various synonyms too. Table 3 shows the pairs in the BNC related to the -4 SPEED dimension that co-occur five times or more in the corpus with a p-value of 10 or lower. Table 3 here It is worth noting that the matching of all synonyms in a certain dimension throws up antonym co-occurrences, synonym co-occurrences as well as co-occurrences that might neither be antonyms nor synonyms in any context. For the dimension of SPEED, Table 3 shows that there are significantly co-occurring antonyms, such as fast – slow, rapid – slow quick – slow and significantly co-occurring synonyms, such as fast – quick, fast – rapid, boring – dull and sudden – swift as well as pairs that might neither be antonyms, nor synonyms in any context but nevertheless co-occur significantly, such as dense – hot. Two of the word-pairs that we assume to be canonical did not top the lists when sorted according to number of sentential co-occurrences: dark – light was on even footing with, for example, black – white, 7 Our intuition was later confirmed by the corpus data with figures showing a strong tendency for the seven pairs to co-occur in text. For a more detailed description of the methodology see Paradis & Willners (2007). The idea of using a number of dimensions as the starting-point is motivated by the fact that we carry out cross-linguistic experiments and are therefore keen to establish a robust ground for comparisons. 8 It deserves to be pointed out already here that Word1 and Word2 are to be considered as two different variables in Table 2 and 3. They are not given in any particular order. In Table 6, however, they are ordered on semantic grounds to match the design of the experiment. 7 Draft 23/01/200808:32:56 Comments invited blue – white and black – dark, while narrow – wide was found t be less frequent than word pairs that rather belong to the more complex dimension of SIZE, such as big – large. We used the antonym pairs that co-occur significantly at a level of 10-4 and that occur in the corpus more than five times as a basis for our experiments of antonym canonicity. The test items As stated in the aim, Section 3, the hypothesis we are testing is that oppositeness is a continuum and opposite pairings are distributed on a scale from canonical ‘well-established’ antonyms to pairings that are not ‘well-established’ as lexicalized pairs. However, we also predict that there is a group of strongly lexicalized antonym pairings that differ significantly from the rest of the antonyms and that the potentially opposable synonyms and unrelated pairings in turn differ significantly from antonyms. For this reason we included in the test set, the antonyms listed in Table 1 that were all found to be well-established intuitively as well as in terms of sentential co-occurrence. We call them canonical antonyms to distinguish the two antonym conditions. Using WordNet and dictionaries we tagged the significantly co-occurring word-pairs according to semantic relation: antonym, synonym or unrelated. An additional criterion to qualify as antonym in the test set was that they should all be compatible with scalar degree modifiers such as very. We included two pairs of antonyms, two pairs of synonyms for each dimension and one pair of co-occurring adjectives that did not appear to be related at all, but which still co-occurred with a p-value at 10-4 or lower in our searches. Table 4 shows the complete set of pairs retrieved by this method from the BNC: altogether 42 pairs. Table 4 here In addition to the BNC-derived pairs, our test set is also based on a subset of 100 pairs from Herrmann et al.’s (1986) data set. Their experiment includes mainly adjectives but also verbs and nouns and it shows that there is a scale of ‘goodness of antonyms’ with scores ranging from 5.00 (maximize – minimize) to 1.14 (courageous – diseased, clever – accepting, daring – sick). Since our study focuses on scalar adjectives, we have excluded the verbs, nouns and non-scalar adjectives from Herrmann et al.’s list, leaving us with 63 scalar adjectives that had not already qualified in our BNC test set. We sorted the word pairs according to decreasing scores and then picked out every 6th pair counting from the bottom of the list. This method left us with the eleven word pairs shown in Table 3. The order of the pairs is the order given by Herrmann et al. (1986: 148-150). Table 5 here The left column of Table 5 shows the 11 pairs we selected from Herrmann et al.’s data set. The column in the middle show the mean scores given by the experiment participants and in the right column our categorization of the pairs as antonyms or not. The word pairs were categorized according to the same principles as the word pairs in Table 4. Intuitively, we conceive of beautiful – ugly as a routinized pairing, immaculate – filthy down to sober – exciting as less strongly paired. All the pairs with a score lower than 2.50 we refer to the group of unrelated pairings. The entire test set is presented in Table 6 below. In the next section, we present the judgement experiment, the predictions, the design and the results. 8 Draft 23/01/200808:32:56 Comments invited 5. Judgement experiment This section describes the judgement experiment in which participants were asked to evaluate word pairings in terms of how good they thought each pair is as a pair of antonyms. The experiment was carried out through a computer interface. The design of the screen is shown in Figure 2. Figure 2 here The participants were presented with questions of the form: How good is X – Y as a pair of opposites? and How good is Y – X as a pair of opposites?, as shown in Figure 2. The question was formulated using good (not bad) in order for the participants to understand the questions as an impartial how-questions. How bad is fat – lean as a pair of opposites presupposes ‘badness’. This principle is consistent with Lehrer’s (1985: 400) markedness properties for members of antonym pairs. For instance, good in How GOOD is it? with the principal tone on good carries no supposition as to which part of the scale of MERIT is involved (Cruse 1986). This may be an overstatement, which can be reformulated; good in How GOOD is it? is less impartial than bad in How BAD is it? The end-points of the scale were designated with both icons and text. On the left-hand side there is a sad face and underneath the sad face says very bad. On the right-hand side is a happy face and the text underneath is excellent. We assume that the ordering of the word pairs has an effect on the participants evaluations. For that reason we presented half of the participants with the test items in the order Word 1 – Word 2 and the other half in the opposite order, Word 2 – Word 1. The task of the participants was then to tick a box on a scale consisting of eleven boxes. Our predictions were as follows. • • • • The eight test pairings that we categorized as canonical will receive an average score that is significantly higher than the other word pairs in the test set. The order of presentation will not significantly affect judgements of ‘goodness’ in canonical and non-canonical antonyms. The presentation of an antonym pair as Word 1 – Word 2 will give the same results as the presentation of the same pairs Word 2 – Word 1. The principle for orderings of pairs are presented in Table 6 below. There will be significant differences between the judgements about canonical antonyms, non-canonical antonyms, synonyms and unrelated pairings, with canonical antonyms at one extreme, unrelated at the other extreme, and antonyms and synonyms in between. The judgements about goodness of oppositeness will be significantly faster for canonical antonyms than for antonyms. The judgements for antonyms will be significantly faster for antonyms than for synonyms and unrelated. Stimuli The stimuli in the judgement experiment were presented in pairs. The test items were automatically randomized for each participant. The order of the presentation of the individual pairs was designed so that half of the participants were given the test items in the order Word 1 – Word 2, while the other half were presented the words in reverse order, i.e. Word 2 – Word 1, as listed in Table 6. We call the two conditions non-reversed (Word 1 – Word 2) and reversed (Word 2 – Word 1) respectively. Table 6 here 9 Draft 23/01/200808:32:56 Comments invited The principle underlying the ordering of the pairs is one of polarity. The meaning of Word1 denotes ‘little’ of the property expressed by both members of the pairs, and inversely Word2 denotes ‘much’ of the property, e.g. slow (little) – fast (much) of the property SPEED. In the cases where there is no clear pattern of lacking vs. having of the property denoted, they meanings are evaluative, bad – good. The principle we used for them is that Word1 denotes negative evaluation, which we viewed as corresponding to ‘little of the property’, and Word2 is positively viewed as corresponding to ‘much of the property’, e.g. bad (negative) – good (positive) of the property of MERIT. These distinctions apply to the two sets of antonyms only, i.e. canonical antonyms and antonyms. The ordering predictions are not applicable to the synonym and unrelated categories. Participants Fifty native speakers of English participated in the judgement test. None of them had previously participated in the elicitation test. Thirty-two of the participants were women and 18 were men between 20 and 70 years old. All had English as their first language. Six participants had a parent with a native language other than English (French, Polish, Hebrew, Welsh and Greek) and one participant’s parents both spoke Luganda. Procedure The judgement experiment was performed using E-prime as experimental software. E-prime is a commercially available Windows-based presentation program with a graphical interface, a scripting language similar to Visual Basic and response collection features.9 E-prime logged the ratings as well as the response times in separate files for each of the participants. The participants were presented with a new screen for each word pair (see Figure 2). The task of the participants was to tick a box on a scale consisting of eleven boxes. The screen immediately disappeared upon clicking, which prevented the participants from going back and changing their responses. The screen also disappeared if the participant accidentally clicked outside the boxes, in which case the answer was marked as a zero. Between each judgement task a screen with only an asterisk was presented; when the participants were ready for the next task they clicked on the screen with the mouse. Before the actual test started, the participants were asked to give some personal data (name, age, sex, occupation, native language and parents’ native language[s]). Then some instructions followed such as how to click the mouse and information about the fact that the test was self-paced. Each participant had two test trials before the actual judgement test of the 53 test items. The purpose of the study was revealed to the participants in the instructions. As has already been mentioned, the judgement experiment was divided into two parts: 25 participants were given the test set as Non-Reverse (Word 1 – Word 2, e.g. slow – fast) and 25 participants were given the test set in the reverse order: Reverse (Word 2 – Word 1, e.g. fast – slow). This was done to measure whether directionality influenced the results in any way. In both these experiments, the Non-Reverse and the Reverse, a subject analysis and an item analysis were performed. The factors involved were directionality, category (canonical antonyms, antonyms, synonyms and unrelated) and the interaction between directionality and category. In the subject analysis each participant was the basic element for analysis. All judgements for the individual participants were averaged within each of the four conditions yielding four numbers per participant. Then a repeated-measures ANOVA analysis of variance was performed on both data sets. In the item analysis each item (i.e. word pair) was the basic element for analysis. The judgements given by each participant on each condition were 9 For more information about E-prime on http://www.pstnet.com/products/e-prime/. 10 Draft 23/01/200808:32:56 Comments invited averaged resulting in four numbers for each item and a Univariate General Linear Model analysis was performed. Finally, Bonferroni’s PostHoc test was used to compare the differences between the categories. The same procedure was used for the response times. 5.1. Results of judgement experiment This section reports on the results of the judgement experiment. Three different aspects were measured: (i) whether the differences between the four categories (canonical antonyms, antonyms, synonyms and unrelated parings) were significant and their levels of strength of opposability, (ii) directionality and (iii) response time. Before going into details about the results, it should be mentioned that there were 10 zeroes among the responses due to the fact that some participants had ticked outside the boxes on the scale. They were included in the analysis. Also, by mistake, the test item gloomy-bright appeared as gloomy-gloomy on the screen. The word pair was excluded from the analysis leaving us with 52 test items in total. 5.1.1 Canonical antonyms, antonyms, synonyms and unrelated parings Table 7 shows that the mean response values for the canonical antonyms was 10.63, the antonyms 7.66, the synonyms 1.63 and the unrelated 1.77, see Table 7. The standard deviation is much larger for the antonyms than for the other three categories; that is, the informants agree less strongly about the judgement of the non-canonical antonyms than of the other categories of word pairs. Since the order did not have any effect, see Section 5.1.2, we conflated all data here. Table 7 here The results for each of the word pairs in the judgement test are presented in Table 8, sorted according to falling response mean. It shows that the participants were in agreement about the canonical antonyms. The top nine word pairs of antonyms, which we call canonical yield mean responses over the subjects between 10.30 and 10.82. The next 17 word pairs were classified as antonyms before the test and their mean responses vary between 3.04 and 9.92. The variation is also reflected in the standard deviation of the antonyms, which is much larger than for the antonyms than for other three categories. It is 3.23 for the antonyms, while it varies between 1.20 and 1.51 for the other categories. The antonyms with a mean response close to 10 are stronger opposites than those further down in the table which appear to be more context-bound, i.e. only good antonyms in particular contexts or sub-senses. Table 8 here One of our main hypotheses was that the variation in strength of lexical coupling would yield significant differences between four different groups of word pairs: canonical antonyms, antonyms, synonyms and unrelated word pairs. We also expected very strong agreement among the participants concerning the canonical antonyms. We did find significant differences between the judgements of the canonical antonyms, the antonyms and the other two categories, synonyms and unrelated word pairs. This is obvious from Table 8 where the top eight pairs are the antonyms that we chose to call canonical antonyms, followed by the rest of the antonyms, 19 pairs. Synonyms and unrelated word pairs are mixed in the bottom part of the table. However there was no significant difference between synonyms and unrelated word pairs. 11 Draft 23/01/200808:32:56 Comments invited Figure 3 here According to the repeated-measures ANOVA for the subject analysis and the Univariate general linear model analysis for the item analysis, the differences between the canonical antonyms and antonyms as well as antonyms and the two other categories (synonyms and unrelated) were significant both in the subject analysis (F1[3,147]=1625.775, p<0.001) and in the item analysis (F2[3,48]=95.736, p<0.001). Post-hoc comparisons using Tukey’s HSD procedure suggested that the four conditions form in three subgroups: (1) canonical antonyms, (2) antonyms and (3) synonyms and unrelated. 5.1.2 Directionality As has already been mentioned, the test was performed in such a way that half of the participants were presented with the test items in the order Word1 – Word2, and the other half in reversed order, Word 2 – Word 1. We assumed that the order would have an impact on the results, at least for the canonical antonyms. In both these experiment parts, the non-reverse and the reverse, a subject analysis and an item analysis were performed. The factors involved were directionality, category (canonical antonyms, antonyms,), and the interaction between directionality and category. Contrary to our predictions, the statistical analysis shows that directionality does not have any effect on the judgements, yielding very low F-values: F1[1,48] = 0.558, p = 0.459, F2[1,96] = 0.101, p=0.751. The interaction between the directionality and category shows no effect either: F1(3,144) = 1,582, p = 0.196, F2(3,96) = 0.186, p = 0.906. Category on the other hand has an effect: F1(3,144) = 1645,082, p < 0.001, F2(3,96) = 187,449, p < 0.001. Figure 4 shows that the two test batches (marked with REV= 0 and REV=1) follow the same pattern. Since the direction does not have an impact on the results, the data for the two directions are treated as one batch as in 5.1.1. Figure 4 here 5.1.3. Response times The experiment was self-paced, i.e. the participants could use the time they needed for each judgement, we did record response times. Since we did not control for timing, we cannot draw any far-reaching conclusions about the time it took for the participants to make their decisions. Still it may be of some interest to see that that the response times varied greatly across the conditions (see Table 9 and Figure 5). The overall effect was significant in the subject analysis (F1[3,147]=27.256, p<0.001) and in the item analysis (F2[3,48] = 23.733, p<0.001). According to the Post-hoc test using Tukey’s HSD procedure, the canonical antonyms take significantly shorter times to process than the unrelated word pairs and the antonyms do. The antonyms take significantly longer to process than the synonyms as well, while there is no significant difference between the response times of the synonyms and the unrelated word pairs. The main result of the analysis of the response times is that the canonical antonyms are significantly faster to process than the antonyms. Table 9 here Figure 5 12 Draft 23/01/200808:32:56 Comments invited 6. Elicitation experiment This section reports on the design and the results of the elicitation experiment. The participants were asked to provide the best opposite for all the individual test items. Our predictions are as follows: • • • The 16 test items that we deem to have canonical antonyms will elicit only one another. The test items that we do not deem to be canonical will elicit varying numbers of antonyms – the better the antonym pairing, the fewer the number of elicited antonyms. The elicitation experiment will produce a curve from high participant agreement (only one antonym suggested) to low participant agreement (many suggested antonyms). Stimuli and procedure The test set for the elicitation test involves the individual adjectives that were extracted as cooccurring pairs from the BNC and the selected word pairs from Herrmann et al.’s list of adjectives, which were perceived by informants as better and less good examples of antonyms. Some of the individual words occur in more than one pair, i.e. they might occur only once, twice or three times. For instance, small occurs three times and large occurs twice (see Table 6). All doubles and triplets were removed from the elicitation test set, which means that small and large occur once in the elicitation experiment. When this was done, the order of the adjectives was automatically randomized (see Appendix A for instructions to the elicitation test). All in all, the test contains 85 stimulus words. The words were presented to the participants in the same order on the test occasion. The participants were asked to write down the best opposites they could think of for each of the 85 stimulus words in the test set. For instance, The opposite of LITTLE is __________________ The opposite of DELIGHTFUL is __________________ The experiment was performed using paper and pencil and the participants were instructed to do the test sequentially that is, to start from word one, work their way through the experiment and not go back to check or change anything. We did not control for time, but the participants were asked to write the first opposite word that came to their mind (see the instructions to the participants in Appendix A). Each participant also filled in a cover page with information about name, sex, age, occupation, native language and parents’ native language(s). All the responses were then coded into a database using the stimulus words as anchor words. Participants Fifty native speakers of English participated in the elicitation experiment: 36 women and 14 men between 19 and 88 years of age. The experiments were carried out in Brighton, England. 6.1 Results of elicitation experiment The data were analyzed with respect to the total distribution of responses across participants, omitted responses were identified and strength of bidirectional elicitability measured through cluster analysis. 13 Draft 23/01/200808:32:56 Comments invited 6.1.1. Distribution of participant responses The result of the elicitation experiment indicates that a continuum of lexical association exists between antonym pairs. The results in Appendix B have been listed in order of the number of participants’ responses—that is, response diversity. At the top of the list we find the test words for which all participants suggested one and the same antonym, given in brackets: bad (good), beautiful (ugly), clean (dirty), heavy (light), hot (cold), poor (rich) and weak (strong). Then the test items for which the participants suggested two opposites in total are listed, e.g. narrow (wide, broad) and slow (fast, quick) and then the stimulus words with three different answers and so on. The very last item is calm, for which 29 different antonyms were suggested by the 50 participants. The shape of the list of elicited antonyms across test items in Appendix B strongly suggests a scale of canonicity from very good matches to test items with no clear partners. While Appendix B gives all the elicited antonyms across the test items, it does not provide information about the scores for the various individual elicited responses. Figure 6 gives an example of the numbers of the responses to the test item pale. Dark was suggested by 21 participants and therefore was the ‘best’ antonym followed by dull, dim, gloomy, stupid and obscure with steadily decreasing numbers. Figure 6 here Figure 7 gives the complete three-dimensional picture of the responses. The X-axis gives the total number of the antonyms suggested across each test word. The Y-axis shows all the test items of which every tenth word is supplied along the axis. The Z-axis shows the number of given participant responses participants per antonym. The bars represent the various elicited antonyms in response to the test items. The height of the bars indicates the number of participants who suggested the antonym in question. There is a gradual decrease across stimuli in participant agreement of the best antonym for a given word. All the low bars at the front are bars representing a single antonym suggested by one experiment participant. Figure 7 here 6.1.2 Omitted responses Although the participants in the elicitation experiment were asked to give opposites for all words, not all participants responded to all the test items. Out of a total of 4,250 responses (85 test words x 50 participants), participants failed to supply an antonym for 94 test items. The majority of those 94 test items were among the test items that attracted the highest number of suggested antonyms. Table 10 shows the test words for which all participants suggested the best antonym and the test items for which responses were omitted by at least one participant. Table 10 here The two groups are about the same size: there are 41 words in the group where all participants answered and 44 words in the group with omitted answers. The mean of the number of suggested antonyms in the left column of Table 10, i.e. the test items for which all 50 participants suggested antonyms, is 5.3 and the median is 3. In comparison, the arithmetic mean of the number of suggested antonyms in the column to the right (where the participants failed to suggest antonyms) is 12.5 and the median is 13. This means that the test items for which participants abstained from providing an antonym elicited many more suggestions on average than did the ones that where all participants suggested an antonym. 14 Draft 23/01/200808:32:56 Comments invited 6.1.3. Bidirectionality In addition to the distribution of the responses for all the test items across all the participants we also investigated to what extent the test items elicited one another in both directions. For instance, 50 participants gave strong as an antonym of weak, good for bad, ugly for beautiful and light for heavy, but the pattern was not the same in the other direction. This is part of the information in Appendix B and Figure 7, but it is not obvious from the way the information is presented. For the test items that speakers of English intuitively deem to be good pairs of antonyms, the strong agreement held true in both directions, although not at the level of a oneto-one match, but a one-to-two or one-to-three (see Appendix B). For example, while 50 participants supplied good as the best opposite of bad, two antonyms were suggested for good: bad by 42 participants and evil by 8 participants. This points to the possibility that there is a stronger relationship between good and bad than between good and evil. Dark and heavy were given for light, and beautiful, pretty and attractive for ugly, which shows that there are differences with respect to which test items of a given pair the participants were confronted with (cf. the results of the judgement experiment, Section 5). In summary, Figure 5 shows that the more canonical pairs elicit only one or two antonyms, while there is a steady increase in numbers of preferred antonyms the further we move toward the right-hand side of the figure. 6.1.4 Cluster analysis In order to shed light on the strength of the lexicalized oppositeness in the elicitation experiment, a cluster analysis of strength of antonymic affinity between the lexical items that co-occurred in both directions was performed. More precisely, a hierarchical agglomerative cluster analysis using the Ward amalgamation strategy was performed on the subset of the data that were bidirectional. Agglomerative cluster analysis is a bottom-up method that takes each entity, i.e. in this case each antonym pairing, as a single cluster to start with and then builds larger and larger clusters by grouping together entities on the basis of similarity. It merges the closest clusters in an iterative fashion by satisfying a number of similarity criteria until the whole dataset forms one cluster. The advantage of cluster analysis is that it highlights associations between features as well as the hierarchical relations between these associations. It is not a confirmatory analysis but a useful tool for exploratory purposes (Glynn et al. 2007, Gries & Divjak forthcoming) It is important to note that for this experiment only items that were also test items were eligible as candidates for participation in bidirectional relations. This means that not all of the pairings suggested by the participants were included in the cluster analysis. The participants were free to suggest any word they thought was the best antonym. For instance, quick was considered the best antonym of slow by five of the participants (as compared to 45 for fast), but since quick was not included among the stimulus items, the pairing was not measured in the cluster analysis. The results of the cluster analysis are, however, comparable to the results of sentential co-occurrence of antonyms in the corpus data and the results of the judgment experiment, since the same word pairs are included (see Section 4 about the test set). Figure 8 shows the dendrogram produced by the cluster analysis. It is a hierarchical structure of clusters with two branches at the top. The left branch hosts Cluster 1 and Cluster 2 and the right branch Cluster 3 and Cluster 4. The closeness of the fork to the sub-clusters reveals that there is a closer relation between Clusters 3 and 4 than between Clusters 1 and 2. Figure 8 here 15 Draft 23/01/200808:32:56 Comments invited The actual pairings are given in the boxes at the end of the branches. There are fewer pairs at the end of the left-most branches than at the ends of the branches on the right-hand side. Six of the word pairs in Cluster 1 were included in the test set as canonical antonyms (subscripted c in Figure 8): bad – good, beautiful – ugly, soft – hard, light – dark, narrow – wide, and weak – strong. The rest of the word pairs in Cluster 1 were not included as pairs in the experiment. They appear in the Cluster 1 because the seed word was in the data set and the participants were just asked to provide the best antonym they could think and the pairings were suggested by a large number of the participants. Cluster 2 includes two word pairs featured in the test set as canonical: large – small and thick – thin. Both of these word pairs have different combinatorial preferences (cf. big – large – small – little and thick – thin – fat – fit – slim) or are clearly polysemous like heavy – light (cf. dark – light). The rest of the word pairs in Cluster 2 are good examples of opposability. They were, however, not among the pairings that we deemed canonical in the design of the test set, e.g. feeble – strong, filthy – clean, enormous – tiny. 7. Discussion The main goal of this paper has been to investigate the nature of the relation of antonymy in language using a set of English adjectives as test items. Since lexicalization of antonyms is most common for adjectival meanings and among them for scalar adjectives, the scope of our study was limited to scalar adjectives (Paradis & Willners 2007). The issues at the heart of the present investigation concern what kind of lexical meanings language users consider to be good pairings and what the characteristics of such ‘lexicalized’ pairings might be. We investigated whether there is support for the standard view of a dichotomy of two different types of antonyms in language, i.e. canonical antonyms, which are associated by convention, e.g. fast and slow, and non-canonical antonyms which are not felt to be conventional pairing but are associated by semantic opposition in particular contexts, e.g. slow – clever, experienced – green and cool – pathetic, as suggested by e.g. Gross & Miller 1990. Our secondary goal has been to make use of different observational techniques in order to be able to uncover as many of the complexities concerning strength of antonym opposability in text, discourse and memory as possible. Two different types of experiments, a judgement experiment and an elicitation experiment, were designed on the basis of a corpus driven methodology. This combination of a principled textual method for retrieval of test items for the psycholinguistic experiments is of crucial importance to the study.10 Our main hypothesis was that opposite pairs of adjectives are distributed on a scale from canonical antonyms to hardly antonyms at all. While the number of less strongly associated antonyms is assumed to be in principle infinite in that almost any two meanings can make a pair of antonyms given a suitable context, the number of conventionally associated pairings was assumed to be very limited. We expected the category of antonymy to have a core of prototypical antonyms which are associated by semantic opposition as well as by linguistic convention. In contrast to the standard view, we also predicted that like any other category, the category of antonymy thus has internal structure, i.e. a core of strongly related lexical pairings and a steady increase in partners towards the borders of the category. We found it important to carry out the retrieval of data for the experiments in a principled way that involved as little interference from the members of the research group as 10 In addition to the techniques used in the present study, we have carried out extensive corpus based investigations on the use of antonyms in both speech and writing, which we return to for comparison later in this section. 16 Draft 23/01/200808:32:56 Comments invited possible. For that reason we opted for a corpus-driven methodology of data extraction with minimum involvement of intuitive judgements by native speakers and minimum assumptions made by the research team.11 As has already been mentioned, it is well known that antonyms co-occur in sentences significantly more often than chance would predict (Justeson & Katz 1991, Jones 2002, 2006) and canonical antonyms co-occur more often than contextually restricted antonyms (Willners 2001). It is important to note that our retrieval technique threw up not only antonym pairs but also pairs of synonyms and pairs that are unrelated. This means that additional manual classification of the pairings were in fact also necessary. This methodological circumstance was useful, since we were also looking for other types of pairings along the similaritydifference dimension to be used as conditions in the experiment. Four conditions were investigated: (i) canonical antonyms consisting of the pairings that occurred most frequently in the corpus search, (ii) the rest of the antonyms, (iii) synonyms and (iv) unrelated word pairs. The bonus of using a corpus driven method of data extraction was that this technique also yields results in itself. The corpus evidence gave at hand that the patterns for pairs that demonstrate strongly significant co-occurrence in text coincide with the patterns for strong and less strong canonicity judgement in the experiments (see Table 3). We supplemented the corpus-driven data (42 pairs) with a set of test items (11 pairs) from Herrmann et al’s data set in order to be able to compare their results with ours. The antonyms, synonyms and unrelated pairings that were retrieved on the basis of the sentential co-occurrence in the BNC and the 11 pairs already investigated in Herrmann et al’s experiments were used as test items in the experiments, and the participants were asked to respond to How good is X – Y as a pair of opposites?. The pairs formed the four classes that we chose to call: canonical antonyms, antonyms, synonyms and unrelated pairs. It was important for us to include synonyms in the experiment too since like antonyms their relation is based on both similarity and difference. The synonym relation is a similarity relation between the meanings of different forms, while the antonym relation, seen from the point of view of lexical opposition, is a relation of difference across both meanings and forms. However, this difference presupposes that the meanings represent opposite poles/parts on the same dimension. It is therefore a reasonable assumption that the participants would judge synonyms to be low-degree opposites that would be lower than that for antonyms but higher than for unrelated. Our prediction that the eight antonym pairs that scored the highest in the corpus as well as the top notch pair beautiful – ugly from Herrmann et al’s study form a group of canonical antonyms was borne out. It was shown that weak – strong, small – large, light – dark, narrow – wide, thin – thick, bad – good, slow – fast and ugly – beautiful were judged by the participants as examples of very good antonym pairs, i.e. the ones that we call canonical antonyms. They differ significantly from the rest of the antonym pairings in the judgement experiment. Their mean score was 10.63 out of 11 and the standard deviation only 1.2, while the mean for the other antonyms was 7.66 with a relatively high standard deviation of 3.23. As predicted, the scoring for the synonyms and the unrelated were very low on the 11-point scale, 1.63 and 1.77 respectively. What was unexpected was that the synonyms and the unrelated pairings did not differ significantly and that both categories had low standard 11 In current empirical research where corpora are used, a distinction is being made between corpus-based and corpus driven methodologies (Francis 1993, Tognini-Bonelli 2001: 65-100, Storjohann 2005, Paradis & Willners 2007). The distinction is that the corpus based methodology makes use of the corpus to test hypotheses, expound theories or for retrieval of real examples, while in corpus driven methodologies the corpus as such serves as the empirical basis from which researchers extract their data with minimum prior assumption. In the latter approach all claims are made on the basis of the corpus evidence with the necessary proviso that the researcher determines the search items in the first place. 17 Draft 23/01/200808:32:56 Comments invited deviations. We expected to give rise to results which indicated a high degree of variation in the judgements given by the participants. The standard deviations, however, show that the participants agreed to a large extent about the ratings for the canonical antonyms, the synonyms and the unrelated, while there was a great deal of disagreement regarding antonyms, i.e. the large group with pairings that obtained lower figures for co-occurrence in the BNC. The judgement experiments also ruled out the fact that some of the pairings may be judged as better antonyms in the opposite order. The experiment was run in both directions, i.e. Word 1 – Word 2 and Word 2 – Word 1, and the result was that there was no significant difference with respect to the order of the individual antonyms, at least not according to the ordering principles employed in the experiment. The upshot of the judgement experiment is that there is a small set of canonical antonyms and a much larger set of antonyms and there is no difference between synonyms and unrelated with respect to degree of goodness/badness of opposition. The result of the judgement experiment reflects the figures for sentential cooccurrence in the BNC. This is also reflected in the response times. Moreover, regarding the 11 pairs that we included form Herrmann et al’s (1986) study, the results of our judgement experiment was consistent with Herrmann et al’s result as shown in Figure 9 below. Figure 9 here Next, the lexical pairs that we retrieved for the corpus and used as test items in judgement experiment were also used in our elicitation experiments. In this experiment the task of the participants was to provide the best antonym they could think of for a given seed word. The first prediction we made for the outcome of the elicitation experiment was that the individual members of the canonical pairings would elicit one another, i.e. given good all the participants would suggest bad and vice versa. Secondly, it was predicted that the rest of the seed words would elicit varying numbers of antonyms. The more strongly a test item is associated with an antonym the fewer the number of suggested antonyms and the higher the participant agreement. The elicitation thus complements both the corpus search and the judgement experiment in that it was designed to tap the participants’ memory for non-contextualized lexico-semantic information. The overall prediction was that the elicitations would form a curve from total participant agreement for the strongly lexicalized antonyms to low participant agreement for the weakly lexicalized pairings. There were altogether 85 test items in the elicitation experiment – all the individual adjectives retrieved from the corpus and the selection from Herrmann et al’s study. This means that the participants were also asked about antonyms for all the test items including the ones that were retrieved as synonyms and unrelated parings from the corpus that were subsequently used as the synonym and the unrelated conditions in the judgement experiment. Comparisons of lexicalized antonym affinity could therefore only be carried out on the items that were included as pairs of antonyms in the judgement experiment. The principal outcome of the elicitation experiment is that there is a limited number of test items that elicit only one or two opposites. All the participants suggested one and the same antonym for bad (good), beautiful (ugly), clean (dirty), heavy (light), hot (cold), poor (rich) and weak (strong) – yet, in this order only. Then there was a group of test items that yielded two antonyms. They were black, fast, narrow, slow, soft, good, hard, open, big, easy, white, light, dark, and three antonyms: large, rapid, small, ugly, exciting, thick and so on. The test items that elicited the highest number of antonyms were great (18), firm (18), daring (19), confused (19), bold (19), mediocre (19), yielding (20), irritated (21), alert (22), disturbed (23), slight (26), delightful (27), calm (29). Clearly, the test items that elicited few antonyms are all words that are very frequent in the language and they are stylistically neutral with 18 Draft 23/01/200808:32:56 Comments invited respect to genre and level of formality, i.e. can be used in a wide range of ontological contexts along the same meaning dimension. The actual elicitations for each of the test words are shown in Appendix B, which is a two-dimensional representation of the distribution of every elicitation across all the test words. Figure 5 shows the 3D distribution of elicitations across all the test items and also the number of times each antonym was elicited. The top left hand side of Figure 5 shows the 7 test items for which all the participants suggested one and the same antonym and at the bottom right hand side of the Figure are the test items that gave rise to the largest number of elicitations. There is a steady cline between the two extremes. There is also a cline within the scope of each test item from the antonyms that most participants considered the best antonym of calm, i.e. stressed to the antonym that only one participant considered to be a good antonym of calm, i.e. troubled. This raises the question of polysemy patterns and contextual restrictions. The design of the experiment was such that there were no constraints on the elicitation. On the contrary, the purpose of the elicitation experiment, in contrast to the judgement experiment, was to encourage the participants to respond spontaneously to the trigger word. This means that the participants set their own contexts and chose the antonyms accordingly. The pattern of the curve then is not only a reflection of a word’s antonym partner in a certain context but also a reflection of their various readings in different contexts as well as the relative strength of these when no context is given. For instance, while all the participants provided good as an antonym of bad, both bad and evil were suggested for good. This reflects the underlying language ontologies, since lexico-semantic relations, both paradigmatically and syntagmatically, are relations betweens lexical items in that reading based on ontological relations between the contextual readings of these words (Paradis 2004, 2005). Figure 10 here As Figure 10 shows, forty-two, out of the fifty participants, suggested bad when given good, while only eight suggested evil. This result indicates a strong coupling between good and bad, or put differently, for the majority of the participants good is more strongly coupled with bad which suggests that contexts in which the readings of good and bad represent the ontological relation between the concepts that constitute their readings is more salient for most of the participants than the ontological relation between concepts that relate the readings of good and evil. The pattern across the responses for good in contrast to bad and evil is a simple pattern which positions itself among the strongly lexicalized pairings. Figure 11 also shows that mediocre – good has a very weak relation. Figure 11 Figure 11 shows a more complex web of relations involving thick – thin. Thick elicits thin in 45 cases, while thin only elicits thick in 13 cases. Instead, thin has a strong relation to fat and so does fat to thin. Both lean and slim have a relatively strong relation with fat, but this relation is very weak in the other direction. The reason for this is probably that fat has a wider application in more contexts, while lean and slim are more limited to the contexts of meat and human bodies respectively. The same is true of the relationship between thick and fine, where fine suggests a context that involves slices of something or the like while thick has wider application. Also, a cluster analysis was carried out in order to see if all the parings that were deemed canonical in the judgement experiment were also felt to be strong pairings in the 19 Draft 23/01/200808:32:56 Comments invited elicitation experiment. The result of the cluster analysis is shown in Figure 6. The pairings fall into two main clusters. The smallest cluster with the 21 stronger couplings contains the following pairings: fat – thin, big – small, bad – good, beautiful – ugly, soft – hard, light – dark, narrow – wide, weak – strong, black – white, fast – slow (Cluster 1) feeble – strong, filthy – clean, enormous – tiny, slim – fat, broad – narrow, lean – fat, heavy – light, rapid – slow, large – small, thick – thin, evil – good (Cluster 2). This cluster contains partly different pairings from the ones in the judgement experiment. The reason for this is that all the individual words from the data bank that we retrieved from the corpus were given as a seed word in the elicitation. This means that some pairs that rank highly on this list, e.g. big – small, black – white, fat – thin, were not included as pairings in the judgement experiment, and for that reason no comparison can be made concerning them. In addition to our eight canonical parings this list also contains other pairings that native speakers also intuitively perceive as good parings. None of the canonical pairings from the judgement experiment was found in the larger group consisting of 38 pairings. In relation to the number of elicitations given, the number of times the participants abstained from providing an antonym is of interest. In the group of test items for which all the participants suggested antonyms, the mean of suggestions is 5.3 and the median 3, while for the group where participants neglected to suggest an antonym, the mean is 12.5 and the median is 13. The indicates that test items for which there is a lexicalized antonymic partner are easier to retrieve from memory, while test items which are not members of lexicalized antonymic pairings are more demanding for the participants, which results in more omitted responses as well as a larger number of different elicitations. The corpus-driven retrieval of data for this study and the two different types of experiments may on the surface look like they are yielding slightly different results. The results of the corpus driven extractions show clearly that there is a limited number of very strong pairings and a larger number of pairings which are strongly coupled but less so than the ones we called the canonical ones. The same pairings were also judged to be the best pairings in the judgement experiment in which the participants were instructed to make conscious decisions about good vs. bad antonyms. The canonical antonyms were shown to be significantly different from the rest of the anonym pairings, and these antonyms in turn were shown to differ significantly from synonyms and unrelated. In contrast to the result of the judgement experiments, the result of the elicitation experiment points up a cline from a very limited number of test items, i.e. bad, beautiful, clean, heavy, hot, poor and weak, for which the participants were in total agreement about the best antonym to the test item calm for which the participants delivered 29 different suggestions and among these 29 suggestions there was a continuum from stronger to very weak agreement across the participants in terms of how many suggestions each of the 29 antonyms elicited. The results of the elicitation experiment are not in conflict with the results from the judgement experiment. In the judgement experiment three significantly different groups crystallized (canonical antonyms, antonyms and the rest), while when the participants were given more freedom as in the elicitation experiment, the result was instead a cline with no clear cut-off points. The evidence from the elicitation experiment reveals the internal prototypicality structure of the category of antonymy, while the judgement experiments shows that there is in fact significant difference between canonical antonyms and other more contextual and less lexicalized pairings. The seemingly diverging results are of course partly due to the nature of experiment as well as the design of the experiment and the statistical calculations. In the judgement experiment two distinct types were assumed as part of the design, while no such assumptions were involved, neither in the design of the experiments nor in the statistical calculations. The corpus-driven extraction method was not associated with any assumptions at all besides the actual dimensions selected for the scope of this study. 20 Draft 23/01/200808:32:56 Comments invited We thus analyze the results from the different parts of this study as pointing in the same direction, namely that at the level of lexico-semantic relations there are few strongly associated pairs that we might want to call canonical antonyms and there is also a large group of pairings that are less strongly associated, at least at the lexical level and with minimum ontological context. We interpret these seemingly contradictory results as an indication of a core of the lexico-semantic antonymy and a large number of pairings from more to less strongly lexically coupled. Granted that we take the patterning of the results of this study to reveal something about the category of antonymy in language, the picture that emerges is a prototypicality structure with a centre and a steadily fading strength of lexicalized opposability and so it proceeds ad infinitum. It should again be emphasized that these results reflect our corpus-driven method and the two different experiment types. In another study, Jones et al. (2007) we used the WorldWide-Web as corpus. We approached the issue of antonym canonicity by building specifically on research that has demonstrated the tendency of antonyms to favour certain lexicogrammatical constructions in discourse, such as X and Y alike, between X and Y, both X and Y, either X or Y, from X to Y, X versus Y and whether X or Y (Justeson and Katz 1991, Mettinger 1994, Fellbaum 1995, Jones 2002, Jones 2006, Jones 2007, Murphy et al. submitted). In the present article, we argued that best antonym pairs of a language can reasonably be expected to co-occur with highest fidelity in such constructions – that is, they will co-occur with each other, in preference to other semantically plausible pairings, across the widest possible range of appropriate contexts. Seven constructions were used for retrieval of a range of contrast items across a number of seed words and a strong correlations emerged between those items retrieved most frequently and adjectives cited as ‘good opposites’ in the elicitation experiments. Indeed, in case of nine of the ten seed words selected as a starting point for the googlings, i.e. beautiful, poor, open, large, rapid, exciting, strong, wide, thin and dull, the adjectives retrieved most often in searches were the same as the adjectives that were suggested by the participants in the elicitation experiment. Only thin did not retrieve fat most frequently in the textual googling study, but the antonym that was retrieved most commonly was instead thick which ranked second in the elicitation experiment. However, there is agreement between the corpus driven ranking and the judgment experiment in the present study where thick was thrown up with the strongest co-occurrence figures. Unfortunately, the thin – fat pairing was not a test item in the judgement experiment. A very striking result from the googling study was that the second most common antonym of open is laparoscopic, describing two types of surgery (Jones et al. in press). When used as a seed word, laparoscopic retrieved open in thirteen of the fourteen frames and accounted for nearly 60% of all output (only three adjectives retrieved any antonym at a higher rate in this entire study). This repeated co-occurrence within a large proportion of antonymic frames make laparoscopic – open a strong candidate to be considered a canonical pair, even if most English speakers would be very unlikely to propose it in an elicitation test. However, this pair provides evidence that the antonyms favoured in elicitation experiments are not necessarily the same as those that are coupled in texts. Context-free elicitation reflect both frequency and associative strength because participants are attempting to supply familiar opposites. Therefore, the more well-known the pair, the more strongly they will elicit one another. However, antonyms operating in more restricted contexts (such as laparoscopic – open or, to use Murphy’s (2003:178) morphological illustration, derivational -– inflectional or Paradis’ & Willners’ (2007:271) morphological – phonological may well be stronger in terms of their opposability, even though they are less widely known. The results presented here confirm that strength of association is separable from plain word/sense frequency and that less common pairs may be equally as canonical within their particular register as the everyday 21 Draft 23/01/200808:32:56 Comments invited pairs identified in elicitation tests. Again, this demonstrates that different techniques yield slightly different results for a complex relation such as antonymy. In addition, it is clear from the textual results above that antonymy is a syntagmatic as well as a paradigmatic relation. Murphy (2006) argues that antonym pairs constitute a particular type of construction in the Construction Grammar sense. Her position relies on three observations about antonymy in discourse: (i) antonyms tend to co-occur in sentences as was shown in Willners (2001) and in our corpus-driven part of the present study, (ii) they tend to co-occur in particular contrastive constructions as was shown in e.g. Jones 2007 and in the googling study reported above (Jones et al (2007). and (iii) that antonymy is lexical as well as semantic in nature as we have shown in this study. Construction Grammar theorists (Goldberg 1992, 2006, Jackendoff 1998, Kay & Fillmore 1999) emphasize that a true account of a language’s grammar must be maximalist and account for all the types of constructions that occur in it, not just some set of core structures. It is in this spirit that the antonym construction idea was explored and the theory could be extended to account for preferences for putting particular words together even when those words are not associated with any particular phrasal construction. Canonical antonym pairs fit the constructional bill, in that they are formmeanings associations. The form of the Antonym Construction proposed is that of a word pair with matching syntactic category and semantic frame, and its meaning guarantees that the two members of the pair are incompatible and contrastive. 8. Conclusion The aim of this paper was to investigate the nature of antonymy in language using scalar adjectival meanings in English as test items in order to find out whether there are good antonym pairings, less good pairings and pairings that are not antonymous at all. In addition to that question there are two corollary questions, namely if there are better and worse pairings, what are the lexical meanings involved and why. An important secondary aim of the study was to carry out the investigation using different methods in order to see in what way the results were influenced by the different techniques. For that purpose we were using both textual and psycholinguistic methods. The textual method was mainly used to generate data, but in itself it also yielded results. The extraction of the test items for the psycholinguistic experiments was carried out using a computer program Coco for retrieval of antonyms, synonyms and unrelated pairings along seven dimensions of meaning, namely SPEED, LUMINOSITY, STRENGTH, SIZE, WIDTH, MERIT and THICKNESS primarily represented by the pairings slow – fast, light – dark, weak – strong, small – large, narrow – wide, bad – good and thin – thick respectively. We also included a small subset of test items from a study by Herrmann et al (1986) of which the strongest pairing was beautiful – ugly. These test items and combinations of their various synonyms as pairs of antonyms, synonyms and unrelated significantly co-occurring in sentences in the BNC were used as the main body of test items for the experiments. Two types of experiments were carried out: a judgement experiment in which the participants were asked to evaluate the goodness of a given word pair on a scale from very bad to excellent, and an elicitation experiment in which the participants were given the complete list of the same test items and were asked to provide the best antonym of all the individual seed words. The hypothesis under investigation was that a scale exists between at the one extreme, pairings that are strongly lexicalized as antonyms and, at the other extreme, pairings which may be opposable in some context and pairings for which it is very difficult to think of a context in which they could be used as antonyms. We also hypothesized that there will be a very limited number of excellent antonyms for which there will be complete or almost complete agreement across the participants about their special status as canonical antonyms. These pairings will coincide with the pairings retrieved from the corpus with the strongest figures for sentential co-occurrence, i.e. the above mentioned pairs that are the principal 22 Draft 23/01/200808:32:56 Comments invited representatives of the meaning dimensions. Both hypotheses were both proven right with some qualification. The corpus driven method of extraction showed that the above pairings co-occurred sententially more strongly than any of the other pairings that were used in the experiment, although the level of significance for all of them was set to at least 10-4. The results from the judgement experiment revealed a significant different between the above eight pairings and the rest of the antonym pairings as well as a difference between the anonym pairings and the synonym pairings. There was no significant difference between the synonyms and the unrelated pairings. The result of the elicitation experiment in which the participants suggested the best antonym and in doing so implicitly determined the ontological basis for the pairing themselves present a slightly different picture of antonyms in language. The pairings for which most participants agreed as to their partners were more or less consistent with the canononical pairings from the judgement experiment in the sense that all the pairings that were judged to be excellent pairings in the judgement experiment were also among the pairing with highest participant agreement and fewest suggested antonyms per seed word – in some cases all of the participants were in full agreement about the best antonym. This investigation thus shows that on the one hand there is a distinct but small group of what we might call lexicalized canonical antonyms. This was shown most by the outcome of the judgement experiment since the design and the statistical calculations of that experiment were geared towards the boundaries between the four conditions, i.e. canonical antonyms, antonyms, synonyms and unrelated. On the other hand, the elicitation experiment point up a continuum from excellent antonym pairings with a total participant consensus to pairings with a steady decrease in agreement. The elicitation experiment and the attendant calculations were designed to shed light on the structure rather than on the borders. The outcome of the both the experiments and the corpus driven retrievals converge in a picture of the category of antonymic lexical meanings in English as a prototypicality structure with a clear core of excellent representatives of the category to category members on the outskirts that are hard to conceive of as antonym pairings and for which a partner does not come easy. On the basis of these results, we can conclude that the lexical items that receive the status of canonicity and lexicalization as pairings are all frequent in the language.12 They top the list both taken individually and in pairs in the corpus study. The actual frequency for adjectival meanings of this type is taken to be a sign of their being applicable in large range of ontological meaning structures and useful in a large range of contexts individually and as pairs in which case the dimensional meaning structure is guaranteed. The relationship between the members of the canonical pairings is also symmetrical in the majority of the cases. This means that order does not seem to play any role in the judgement of goodness and in the case of elicitations both members trigger the other to the same extent. Another factor that seems to be of importance for the best pairings is the salience of the dimension. The dimension of which the canonical antonyms are representatives is salient in the sense that it is easily identifiable. For instance, the SPEED dimension underlying slow – fast is easily identifiable, while the dimension behind say rare – abundant, calm – disturbed, lean – fat, open – laparoscopic or narrow – open are not. This has to do with the more specialized ontological applications of these adjectives to nominal meanings which concern different readings and sometimes also different meanings of these words and to certain very restricted styles and genres. This also means that polysemy and multiple readings as such do not prevent a word from participating in a canonical relation with another word. For instance, light – dark and light – heavy, narrow – wide and narrow – open. We argue that lexicalized pairings are 12 This does not mean that items that are less frequent in language cannot form strongly lexicalized pairings. For instance, had we included verbs, it is most likely that maximize – minimize would have scored high both in terms of sentential co-occurrence and in the experimental investigations, as indeed was shown by Herrmann et al. 1986. 23 Draft 23/01/200808:32:56 Comments invited learnt as pairs early on because they are contextually versatile. Contextual versatility is a reflections of ontological versatility, i.e. that the use potential of these antonyms applies in a wide range of ontological domains, and they are frequent in constructions and contrasting frames in text and discourse. The theoretical implications of our investigation are first and foremost that antonymy is both a lexical relation between members of more or less strong couplings and a conceptual relation in that contrast is always a possibility in meaning construal. Pairings for which the participants suggested many different antonyms are more likely to be contextually idiosyncratic, i.e. not strongly routinized as a pair in our minds, or very weakly lexicalized due to extreme genre or register restrictions. By and large any two lexical meanings can be construed as a pair of antonyms. Contrast is an extremely powerful construal in human thinking, important to both the organization of coherent discourse and to the mental organization of the vocabulary. This research is important for any theory of mental models. It is also of great importance to language technological applications such as further development of semantic webs, lexicology, applied teaching and learning and cross-linguistic lexical typology. Last but not least, research on language and cognition calls for evidence from different sources and cross-fertilization of scientific techniques. Our next step will be extended studies using corpus methodologies as well as both psycholinguitic and neurolingusitic methods in order to shed light on contrast in both text language and memory. 9 References Becker, C. A. (1980) Semantic context effects in visual word recognition: an analysis of semantic strategies. Memory and Cognition 8, 493–512. Charles & Miller (1989) Charles, Walter G. & George A. Miller (1989). Contexts of antonymous adjectives. Applied Psycholinguistics 10, 357–375. Charles, W. G., M. A. Reed & D. Derryberry (1994). Conceptual and associative processing in antonymy and synonymy. Applied Psycholinguistics 15, 329–354. Croft, W. & D.A. Cruse (2004) Cognitive Linguistics, Cambridge, Cambridge University Press. Cruse, D.A. (1986) Lexical Semantics, Cambridge, Cambridge University Press. Deese, J. (1965) The structure of associations in language and thought. Baltimore: Johns Hopkins University Press. Fellbaum, C. (1995) Fellbaum, C. (ed.) (1998) WordNet: An Electronic Lexical Database, Cambridge, MA, MIT Press. Francis, G. (1993) A corpus driven approach to grammar: principles methods and examples. In M Baker, G Francis & E Bonini-Tognelli (Eds.) 137-156. Glynn, D., D. Geeraerts, D. Speelman (2007). Entrenchment and social cognition. A corpusdriven study in the methodlogy of language description. Paper given at the first SALC (Swedish Association for Language and Cognition) conference at Lund University 29 December – 1 December. Gries, S. & D.S. Divjak. (forthcoming) Behavioral profiles: a corpus-based approach towards cognitive semantic analysis. In V. Evans & S.S. Pourcel (eds.), New directions in cognitive linguistics. Amsterdam: John Benjamins. 24 Draft 23/01/200808:32:56 Comments invited Goldberg, Adele E. 2006. Constructions at work. The nature of generalization in language. Oxford: Oxford University Press Gross, D., U. Fischer & G.A. Miller (1989) The organization of adjectival meanings. Journal of memory and language 28, 92-106. Gross, D. & K.J. Miller (1990) Adjectives in WordNet. In WordNet: an on-line lexical database. Special issue of International Journal of Lexicography 3 (4) 265-267. Halliday, M.A.K. & R. Hasan (1976) Cohesion in English, Harlow, Longman. Herrmann, D. J., Chaffin, R., Conti, G., Peters, D. & Robbins, P. H. (1979) Comprehension of antonymy and the generality of categorization models. Journal of experimental psychology: human learning and memory 5, 585–597. Herrmann, D.J., R. Chaffin, M.P. Daniel & R.S. Wool (1986) ‘The role of elements of relation definition in antonym and synonym comprehension.’ Zeitschrift für Psychologie 194, 133-53. Jackendoff, R. (1997) The architecture of the language faculty. Cambridge, MA: MIT Press. Jones, S. (2002) Antonymy: a corpus-based perspective. London: Routledge. Jones, S. (2006) The discourse functions of antonymy in spoken English. Text and Talk 26 (1), 191-216. Jones, S. (2007) ‘Opposites’ in discourse: a comparison of antonym use across four domains. Journal of Pragmatics 39 (6), 1105-1119. Jones, S. (2005) Using corpora to investigate antonym acquisition. International Journal of Corpus Linguistics 10 (3), 401-422. Jones, S. M. L. Murphy, C. Paradis, & C. Willners (2007) Googling for opposites: a phraseological approach to assessing antonym canonicity. Corpora Justeson, J. S. & S.M. Katz (1991) ‘Co-occurrences of antonymous adjectives and their contexts.’ Computational Linguistics 17, 1-19. Justeson, J. S. & S. M. Katz (1992) Redifining antonymy: the textual structure of a semantic relation. Literary and Linguistic Computing 7, 176-184. Kay, P. & C. Fillmore (1999) Grammatical constructions and linguistic generalizations: the What's X doing Y. Language 75: 1-33. Lehrer, A. (1985) Markedness and antonymy. Journal of Linguistics 21, 397 – 429. Lehrer, A. (2002) ‘Gradable antonymy and complementarity’ in Cruse A. Hundschnurcher F. Job M. Lutzeier P. (eds.), Handbook of lexicology. Berlin, de Gruyter. Lehrer, A. & K. Lehrer (1982) Antonymy. Linguistics and philosophy 5, 483-501. Lyons, J. (1977) Semantics Vol. 1. Cambridge: Cambridge University Press. Mettinger, A. (1994) Aspects of semantic opposition. Oxford: Clarendon Press. Mettinger, A. (1999) Contrastivity: the interplay of language and mind. Unpublished Habitilationsschrift University of Vienna. Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross & K.J. Miller, K. J. 1990. Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography 3(4):235-244. Muehleisen, V. (1997) Antonymy and Semantic Range in English. PhD.diss.Evanston, IL, Northwest University. Murphy, M.L. (2003). Semantic relations and the lexicon: antonyms, synonyms and other semantic paradigms. Cambridge, Cambridge University Press. Murphy, M. L. (2006) Antonyms as lexical constructions: or, why paradigmatic construction is not an oxymoron. Constructions SV1-8/2006 (www.constructions-online.de, urn:nbn:de:0009-4-6857, ISSN 1860-2010) 35. Murphy, M.L., C. Paradis, C. Willners & S. Jones (in preparation) Antonyms in Swedish and English discourse. 25 Draft 23/01/200808:32:56 Comments invited Ogden (1932) Opposition: A linguistic and psychological analysis. Bloomington: Indiana University Press. Osgood, C., Suci, G. & Tannenbaum, P. (1957). The measurement of meaning. Urbana, IL: University of Illinois Press. Palermo, D.S. & J.J. Jenkins (1964) Word association norms: grade school through college. Minneapolis: University of Minnesota Press. Paradis, C. (1997) Degree modifiers of adjectives in spoken British English. Lund Studies in English 92. Lund, Lund University Press. Paradis, C. (2001) Adjectives and boundedness. Cognitive Linguistics 12.1: 47-65. Paradis, C. (2003) ‘Is the notion of linguistic competence relevant in Cognitive Linguistics?’ Annual Review of Cognitive Linguistics 1.247-271. Paradis, C. 2004. Where does metonymy stop? Senses, facets and active zones. Metaphor and symbol, 19 (4) 245–264. Paradis, C. (2005) Ontologies and construal in lexical semantics. Axiomathes 15: 541-573. Paradis, C. (forthcoming) Configurations, construals and change: expressions of DEGREE. English Language and Lingusitics. Paradis, C. & C. Willners. (2006) Antonymy and negation: the boundedness hypothesis. Journal of Pragmatics, 38 (7) 1051 –1080 Paradis, C. & Willners C. (2007) Antonyms in dictionary entries: methodlogical aspects. Studia Linguistica. 61 (3) 261-277. Rubenstein, H. & J.B. Goodenough (1965) Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633 Sampson, G. (2000) Review of C. Fellbaum (ed.), WordNet: an electronic lexical database. International Journal of Lexicography 13:54-59. Scriver, A. (2006) Semantic Distance in WordNet: A Simplified and Improved Measure of Semantic Relatedness. Master Thesis, University of Waterloo, Canada. Storjohann, P. 2005 Corpus-driven vs. corpus-based approach to the study of relational patterns. Proceedings of the Corpus Linguistics Conference 2005 in Birmingham. Vol. 1, no. 1, http://www.corpus.bham.ac.uk/conference2005/index.htm. Stockholm-Umeå Corpus. 23 November 2005. http://www.ling.su.se/DaLi. Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Studies in Corpus Linguistics 6. Amsterdam/Philadelphia: John Benjamins. Willners, C. (2001) Antonyms in Context. A Corpus-based Semantic Analysis of Swedish Descriptive Adjectives . Travaux de Institut de Linguistique de Lund 40. Lund, Sweden: Dept. of Linguistics, Lund University. Willners, C. & A. Holtsberg (2001). ‘Statistics for sentential co-occurrence.’ Working Papers Lund, Sweden: Dept. of Linguistics, Lund University 48, 135-148. 7. Other resources British National Corpus. 24 November 2005. http://www.natcorp.ox.ac.uk/ The Comparative Lexical Relations Group. 24 November 2005. http://www.f.waseda.jp/vicky/complexica/index.html 26 Draft 23/01/200808:32:56 Comments invited 11. Appendix A: Questionnaire You are going to be given a list with 85 English words. For each word, write down the word that you think is the best opposite for it in the blank line next to it. • • • • • Don’t think too hard about it — write the first opposite that you think of. There are no ‘wrong’ answers. Give only one answer for each word. Give opposites for all the words, even when the word doesn’t seem to have an obvious opposite. Don’t use the word not in order to create an opposite phrase. Your answer should be one word. Example: The opposite of MASCULINE is _______________ You might answer feminine. You may leave the experiment whenever you want to. Your answers are anonymous and will be handled confidentially. If you are interested in knowing about the aims or results of this experiment or if you have any other questions, please let us know. 27 Draft 23/01/200808:32:56 Comments invited 12. Appendix B: Responses to English stimuli The responses for each stimulus are ordered according to falling frequency. The stimuli are ordered according to rising number of responses. Omitted responses are not included. bad good beautiful ugly clean dirty heavy light hot cold poor rich weak strong young old black white colour fast slow fast narrow wide broad slow fast quick soft hard rough good bad evil hard soft easy open closed shut big small little easy hard difficult white black dark light dark heavy dark light pale large small little slim rapid slow sluggish fast small big large tall ugly beautiful pretty attractive exciting boring dull unexciting thick thin clever fine strong weak feeble mild slight wide narrow thin skinny slim evil good kind angelic pure thin fat thick overweight wide sober drunk frivolous inebriated intoxicated pissed filthy clean spotless immaculate pristine sparkling huge tiny small little minute petite sick well healthy fine ill yum enormous tiny miniscule small little minute slight dull bright exciting interesting shiny lively sharp bright dark dull dim gloomy stupid obscure fat thin slim Iean skinny thick wrong rare common comonplace ubiquitous frequent plentiful well-known feeble strong robust hard impressive powerful steadfast broad narrow thin slim small lean slight smooth rough bumpy hard jagged hairy resistent healthy unhealthy sick ill lame diseased poorly sickly tiny huge large big enormous massive giant gigantic lean fat fatty flabby large plump support stocky wide heroic cowardly unheroic scared wimpish villainous disappointing reticent weak glad sad unhappy sorry upset disappointed regretful cross worried bare covered clothed dressed abundant cluttered full loaded patterned slim fat broad big chubby wide large obese plump round tough weak tender easy soft flimsy gentle sensitive weedy wimpy gradual immediate sudden rapid fast quickly instant abrupt incremental swift tired awake energetic alert lively fresh wakeful energized peppy perky rested sudden gradual slow prolonged expected incremental immediate delayed foreseen infrequent predictable idle busy active energetic hard-working working awake conscious diligent industrious pro-active workaholic gloomy bright happy cheerful cheery light sunny clear illumined nice merry pleasant tender tough rough hard well-done cold robust chewy harsh mean nash strong uncaring pale dark bright tanned bold brown coloured red ruddy colourful healthy rosey swarthy vivid nervous calm confident bold brave relaxed alert assured excited fine innervous ready steady uncaring limited unlimited extensive abundent comprehensive endless plenty available broad capacious common fat infinite widespread robust weak fragile feeble flimsy shoddy thin brittle frail lethagic natural skinny slim vunerable fine thick coarse bad bold dull wide blunt clumsy cloudy mad ok rough wet unwell abundant scarce rare sparse little lacking disciplined few limited needed none meagre plentiful sparing threadbare pure impure tainted contaminated corrupt dirty tarnished evil adulterated bad foul mixture sinful unclean unpure immaculate untidy dirty messy filthy scruffy dishevelled boring faulty ramshacky spotted stained tarnished terrible tawdry civil uncivil rude anarchic barbaric belligerent childish corperate couth horrible impolite mean nasty military savage unfair extensive limited small intensive narrow restricted brief minimal constrained inextensive insufficient scanty short superficial sparse unextensive vague grim nice happy bright cheerful pleasant positive hopeful good pleasant carefree clear cosy fun jolly reassuring welcoming 28 Draft 23/01/200808:32:56 Comments invited slender fat broad plump wide bulky chubby thick well-built big chunky curvy lean massive obese podgy portly rotund delicate robust strong tough sturdy hardy rough coarse unbreakable bold bulky crude course gross hard hard-wearing harsh heavy immediate later delayed slow gradual distant deferred extended anon eventually far forever longterm pending postponed prolonged soon whenever modest boastful immodest arrogant bigheaded brash conceited extravagant vain outgoing blasé confident forward ignorant modest proud quiet shy great small rubbish terrible bad average awful crap dreadful insignificant lowlyI mediocre microscopic obscure ok shit tiny poor unremarkable firm soft weak floppy lenient wobbly flexible flimsy gentle groundless relenting lax limp loose saggy shaky undecided unsolid unstable confused clear understood knowing sure lucid organised certain alert clued-up clearheaded coherent comprehending confident enlightened fine focused notconfused scatty together bold timid shy cowardly faint fine italic thin nervous cautious faded feint frightened hairy meek quiet scared timorous weak yellow daring cowardly timid nervous scared boring carefully cautious shy afraid careful fat faltering fearful reticient safe staid undaring wimpish mediocre outstanding excellent exceptional brilliant amazing good great challenging charge clever extreme fair interesting mediocre rare special superb unusual wicked yielding unyielding firm resisting dormant hard stubborn agressive dying fighting fixed lose losing obdurate rigid steadfast steamrollering strong stuck tough unproductive irritated calm content relaxed amused fine placid serene soothed comfortable easy even good-humoured happy laid-back normal ok patient pleased tranquil unperturbed unruffled alert sleepy tired asleep dozy oblivious distracted dull drowsy groggy lazy slow apathetic awake complacent dim dopey lethargic spacey torpid unaware unconscious unresponsive disturbed calm undisturbed sane peaceful settled stable untouched alone balanced content fine ignored normal quiet relaxed together tranquil unaffected uninterrupted untroubled welcome well-adjusted well-balanced slight large great strong big heavy considerable enormous huge major substantial very alot extensive heavyset lots marked massive plenty pronounced robust rough severe thick unslight well-built wide delightful horrible awful unpleasant boring disgusting repulsive tedious abhorrent annoying crap difficult distasteful dredful dull grim hateful horrendous horrid irritating miserable nasty repellent revolting rubbish terrible uninteresting yuk calm stressed stormy rough agitated excited hyper panicked angry annoyed anxious choppy crazy flustered frantic frenzied hectic hubbub hysterical irrate irrational jumpy lively loud nervous neurotic rage reckless tense troubled 29 Draft 23/01/200808:32:56 Comments invited Figures and Tables watery parched damp arid wet dry anhydrous moist sere humid dried-up soggy Figure 1. The direct relation of antonymy as illustrated by wet and dry. The synonym sets of wet (i.e. watery, damp, moist, humid, soggy) and dry (i.e. parched, arid, anhydrous, sere, dried-up) appear as crescents round wet and dry respectively. They are all indirect antonyms of the direct ones (the figure is adapted from Gross & Miller 1990: 268). Antonyms slow-fast light-dark weak-strong small-large narrow-wide bad-good thin-thick DIMENSION SPEED LUMINOSITY STRENGTH SIZE WIDTH MERIT THICKNESS Table 1. Seven dimensions and their corresponding canonical antonym pairs in English . Word 1 slow dark strong large narrow bad thick Word 2 fast light weak small wide good thin N1 5760 12907 19550 47184 5338 26204 5119 N2 6707 12396 4522 51865 16812 124542 5536 Co 163 402 455 3642 191 1957 130 Expct Co 9,6609 40,0103 22,1076 611,9756 22,4421 816,1094 7.0867 P-value 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Table 2. Sentential co-occurrences of the canonical antonyms in the test set. 30 Draft 23/01/200808:32:56 Word 1 fast rapid quick fast firm fast gradual gradual gradual boring dense sudden dull instant lazy lazy slow delayed fast slow smooth faithful dense dumb boring fast Word 2 slow slow slow quick smooth rapid sudden slow immediate dull hot swift slow quick slow stupid tedious immediate high-speed sluggish swift loyal smooth stupid tedious speeding N1 6707 3526 6670 6707 6157 6707 1066 1066 1066 1669 1060 3920 1837 1638 819 819 5760 450 6707 5760 3052 1005 1060 755 1669 6707 N2 5760 5760 5760 6670 3052 3526 3920 5760 6104 1837 9445 920 5760 6670 5760 3234 543 6104 359 220 920 1320 3052 3234 543 104 Co 163 54 39 34 34 29 22 22 18 17 15 14 14 13 10 9 9 9 8 8 7 7 7 7 6 6 Expct Co 9.6609 5.0789 9.6076 11.1871 4.6991 5.9139 1.0450 1.5355 1.6272 0.7667 2.5036 0.9019 2.6460 2.7322 1.1797 0.6624 0.7821 0.6869 0.6021 0.3169 0.7022 0.3317 0.8090 0.6106 0.2266 0.1744 Comments invited P-value 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Table 3. Sentential co-occurrences of synonyms of fast and slow in the BNC with p-value ≤ 10-4, co-occurring more than 5 times. Canonical antonyms slow-fast light-dark weak-strong small-large narrow-wide bad-good thin-thick Antonyms Synonyms (2 per canonical pair) (2 per canonical pair) slow-sudden gradual-immediate gloomy-bright pure-black delicate-robust tender-tough modest-great small-enormous narrow-open limited-extensive bad-mediocre evil-good lean-fat rare-abundant slow-dull fast-rapid light-pale dark-grim weak-feeble strong-firm small-tiny large-huge narrow-slender wide-broad bad-poor good-healthy thin-fine thick-heavy Unrelated hot-smooth clean-easy slight-soft heroic-young bare-slender big-white pale-slim Table 4. The test items retrieved from the BNC. 31 Draft 23/01/200808:32:56 Comments invited Antonym pairs Score Category beautiful-ugly immaculate-filthy tired-alert disturbed-calm hard-yielding glad-irritated sober-exciting nervous-idle delightful-confused bold-civil daring-sick 4.90 4.62 4.14 3.95 3.28 3.00 2.67 2.24 1.90 1.57 1.14 Canonical antonyms Antonyms Antonyms Antonyms Antonyms Antonyms Antonyms Unrelated Unrelated Unrelated Unrelated Table 5. The sample of eleven antonym pairs of decreasing degrees of goodness of antonymy from Herrmann et al’s (1986) test items. How good is slow-fast as a pair of opposites? very bad excellent Figure 2. Screen snapshot: An example of a judgement task in the online experiment. Canonical antonyms Word 1 Word 2 Antonyms Word 1 Word 2 Synonyms Word 1 Word 2 Unrelated Word 1 Word 2 32 Draft slow light weak small narrow bad thin ugly 23/01/200808:32:56 fast dark strong large wide good thick beautiful slow gradual gloomy pure delicate tender modest small narrow limited mediocre evil lean rare filthy calm tired hard sober irritated sudden immediate bright black robust tough great enormous open extensive bad good fat abundant immaculate disturbed alert yielding exciting glad slow fast light dark weak strong small large narrow wide bad good thin thick Comments invited dull rapid pale grim feeble firm tiny huge slender broad poor healthy fine heavy hot clean slight heroic bare big pale nervous delightful bold daring smooth easy soft young slender white slim idle confused civil sick Table 6. The complete set of test items for the judgement experiment. Mean Std. Deviation 10.63 7.66 1.63 1.77 1.20 3.23 1.51 1.30 Category Canonical antonyms Antonyms Synonyms Unrelated Table 7. Mean responses for canonical antonyms, antonyms, synonyms and unrelated word pairs. Figure 3. Mean responses for canonical antonyms, antonyms, synonyms and unrelated word pairs. 14,00 12,00 10,00 8,00 6,00 4,00 * * 2,00 0,00 Canonical antonyms Word1 Antonyms Word2 Synonyms Mean response Unrelated Std. Deviation Category 33 Draft 23/01/200808:32:56 Comments invited weak strong 10.82 0.44 C small large 10.82 0.39 C light dark 10.68 1.35 C narrow wide 10.66 0.72 C thin thick 10.66 0.77 C bad good 10.64 1.21 C slow fast 10.42 1.73 C ugly beautiful 10.30 1.93 C evil good 10.24 1.65 A limited extensive 9.92 1.18 A delicate robust 9.84 1.11 A lean fat 9.76 1.76 A rare abundant 9.66 1.71 A small enormous 9.30 2.01 A filthy immaculate 9.30 2.17 A calm disturbed 9.30 1.69 A tired alert 9.10 1.22 A tender tough 9.08 2.11 A gradual immediate 8.78 1.84 A hard yielding 7.60 2.65 A slow sudden 6.44 3.04 A narrow open 5.82 3.32 A sober exciting 5.38 2.64 A irritated glad 5.20 2.84 A modest great 4.72 2.98 A pure black 3.04 2.45 A mediocre bad 3.00 1.98 A bold civil 2.88 1.96 U idle nervous 2.40 1.80 U delightful confused 2.28 1.59 U bad poor 2.26 2.42 S good healthy 1.96 1.76 S wide broad 1.88 2.44 S light pale 1.76 1.62 S small tiny 1.66 1.80 S narrow slender 1.64 1.72 S slight soft 1.58 0.84 U slow dull 1.58 1.07 S hot smooth 1.58 1.30 U thin fine 1.58 1.14 S bare slender 1.56 1.09 U heroic young 1.54 0.97 U daring sick 1.52 0.89 U strong firm 1.48 0.79 S fast rapid 1.48 1.61 S dark grim 1.46 0.71 S clean easy 1.46 0.65 U 0.84 S thick heavy 1.44 pale slim 1.40 0.73 U weak feeble 1.38 0.64 S 0.84 S large huge 1.32 big white 1.32 0.65 U Table 8. Mean responses, Standard deviations and Categorization of the word pairs in the test set. 34 Draft 23/01/200808:32:56 Comments invited Estimated Marginal Means of resp_mean REV 12,00 0 1 Estimated Marginal Means 10,00 8,00 6,00 4,00 2,00 0,00 1,00 2,00 3,00 4,00 category Figure 4. Directionality: There is no significant difference between the mean answers of the two test batches. Category Mean response time Canonical antonyms 4303.5933 Std. Deviation 1699,90 Antonyms 7648.5294 3517,32 Synonyms 5446.1427 2504,50 Unrelated 6381.9891 3080,64 Table 9. Mean response times for canonical antonyms, antonyms, synonyms and unrelated word pairs. 35 Draft 23/01/200808:32:56 Comments invited 14000 12000 10000 8000 6000 4000 2000 0 Canonical antonyms Synonyms Unrelated Antonyms Figure 5. Mean response times for canonical antonyms, antonyms, synonyms and unrelated word pairs. 25 21 20 14 15 10 7 5 5 2 1 0 dark dull dim gloomy stupid obscure Figure 6. The participants’ responses to pale. 36 Draft 23/01/200808:32:56 Comments invited Figure 7. The distribution of English antonyms in the Elicitation experiment. The Y-axis gives the test items, whereof every tenth test item is written in full, the X-axis gives the number of suggested antonyms across the participants given on the Z-axis. 37 Draft 23/01/200808:32:56 Comments invited Table 10. Test words for which all participants suggested an antonym and test words for which some of the participants did not provide an antonym. Test words with 50 responses bad beautiful clean heavy hot poor weak black fast narrow slow soft good hard open big easy large rapid small ugly exciting strong wide evil thin filthy huge enormous dull bright smooth healthy tiny tough tired idle tender great alert calm Number of suggested antonyms 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 5 5 6 6 6 6 7 7 9 10 11 12 18 22 29 Test words with omitted responses young white light dark thick sober sick fat rare feeble broad lean heroic glad bare gradual slim sudden gloomy pale nervous limited robust fine abundant pure immaculate civil extensive grim slender delicate immediate modest firm daring confused bold mediocre yielding irritated disturbed slight delightful Number of suggested antonyms 1 2 2 2 3 5 5 6 6 6 6 8 8 8 8 9 9 10 11 13 13 13 13 14 14 14 14 15 16 16 17 17 17 17 18 19 19 19 19 20 21 23 26 27 38 Draft 23/01/200808:32:56 Comments invited 25 20 15 10 5 0 Cluster 1 fat-thin big-small bad-goodc beautiful-uglyc soft-hardc light-darkc narrow-widec weak-strongc black-white fast-slow Cluster 2 feeble-strong filthy-clean enormous-tiny slim-fat broad-narrow lean-fat heavy-light rapid-slow large-smallc thick-thinc evil-good Cluster 3 tender-tough sudden-gradual huge-tiny pale-dark robust-weak firm-soft gloomy-bright fine-thick huge-small bright-dark great-small tough-weak disturbed-calm tiny-large delicate-robust sudden-slow gradualimmediate limitedextensive Cluster 4 slight-robust tender-strong limited-broad mediocre-good yielding-hard gloomy-light slender-broad tough-soft enormous-small gradual-fast slender-wide extensive-narrow firm-weak robust-feeble slight-strong gradual-rapid pale-bright abundant-rare delicate-tough grim-bright delicate-strong immediate-slow immaculate-filthy alert-tired Figure 8. Dendrogram of the bidirectional data. 39 Draft 23/01/200808:32:56 Comments invited 100,00% 80,00% 60,00% Herrmann's score Judgement test score 40,00% 20,00% im be au t ifu m l- u ac gl u la y te -fil th y t ir e ddi s tu al er rb t ed c ha al m rd -y iel di gla ng di r ri so be t ate d r -e xc i t in ne de g rvo lig u ht s - id fu l-c le on fu se d bo ld civ da il rin gsic k 0,00% Figure 9. The scorings of Herrmann et al’s (1986) word-pairs in relation to the present judgement test. Figure 10. Relations between good, bad, evil and mediocre based on the elicitation experiment. The number of responses is marked by each arrow. 40 Draft 23/01/200808:32:56 Comments invited Figure 11. Relations between fat, lean, slim, thin, thick, and fine based on the elicitation experiment. The number of responses is marked by each arrow. 41 Draft 23/01/200808:32:56 Canonical 0 1 2 Antonyms 3 4 5 Synonyms 6 7 8 Comments invited Unrelated 9 10 11 Figure 12. Distribution of mean responses of canonical antonyms, antonyms, synonyms and unrelated word pairs along the 11-box scale. 42
© Copyright 2026 Paperzz