Lexical repulsion between sense-related pairs

John Benjamins Publishing Company
This is a contribution from International Journal of Corpus Linguistics 12:3
© 2007. John Benjamins Publishing Company
This electronic file may not be altered in any way.
The author(s) of this article is/are permitted to use this PDF file to generate printed copies to
be used by way of offprints, for their personal use only.
Permission is granted by the publishers to post this file on a closed server which is accessible
to members (students and staff) only of the author’s/s’ institute.
For any other use of this material prior written permission should be obtained from the
publishers or through the Copyright Clearance Center (for USA: www.copyright.com).
Please contact [email protected] or consult our website: www.benjamins.com
Tables of Contents, abstracts and guidelines are available at www.benjamins.com
Lexical repulsion between
sense-related pairs
Antoinette Renouf and Jayeeta Banerjee
University of Central England
This paper builds on the groundwork and setting up of methods for an innovative approach to analysing text. We have proposed that there is a hitherto
unexplored textual feature, which we call ‘repulsion’, which operates on the
construction of meaning in an opposing way to that of word collocation. To
illustrate, we do not say cheerfully happy even though we say blissfully happy.
We focus on ‘lexical repulsion’, by which we mean the intuitively-observed
tendency in conventional language use for certain pairs of words not to occur together, for no apparent reason other than convention. Our goal is to establish how repulsion as a whole operates and whether it can be assigned the
status of an objectively measurable ‘force’. It is anticipated that this approach
will have wide implications for corpus linguistics and NLP. In this paper, we
take the particular case of repulsion between sense-related word pairs.
Keywords: lexical repulsion, attraction, collocation, corpus linguistics,
synonyms, antonyms
.
Introduction
. Background
‘Corpus linguistics’ is a broad term, encompassing a cycle of theoretical and
practical activities which both precede and include the discovery in a corpus of
texts of many different kinds of information about language in use. The corpus
linguistic cycle typically begins with a hypothesis which brings a twinkle to
the eye of the corpus designer, then moves through the birth pangs of corpus
creation, through the detailed scrutiny of each instance of a word in a given
text corpus once constructed, and on to the extraction from the corpus of
more generalised, class-level knowledge, which in turn spawns new hypotheses
International Journal of Corpus Linguistics 12:3 (2007), 45–443.
issn 1384–6655 / e-issn 1569–9811 © John Benjamins Publishing Company
46 Antoinette Renouf and Jayeeta Banerjee
(Renouf 2007). In this paper, we focus on the latter stages of this methodological cycle; the procedures associated with the extraction of information about
word use from a corpus, and the derivation from this of knowledge about textual organisation.
Since corpus linguistics as we know it began around forty years ago with
the creation of the Brown Corpus (Kučera & Francis 1967), the bread and butter of corpus analytical practice has been the perusal of words arrayed in alphabetically-ordered contexts, with particular attention to their preferred word
neighbours. This routine serves to identify the semantic and other features of a
word which are revealed by the presence of their corresponding collocational
associations. The term ‘collocation’ (as defined by Firth (1957)) typically characterises the situation whereby a word is not evenly or randomly distributed
across texts, but is found close to its preferred word partners (or ‘collocates’). In
our previous work, the focus has been on the circumstances under which this
lexical ‘attraction’1 occurs, whether in adjacent word pairings or discontinuous
phrasal or grammatical frameworks. This fact of the language has in turn been
transformed into a ‘methodological tool’ for discovering further information;
for example, in the discovery of the meaning of an unknown word (e.g. Renouf
1996).
The possibility of the existence of repulsion had lain in the recesses of the
RDUES2 Unit’s collective consciousness through the years of collocational
study, and was finally disinterred for inspection a few years ago. Our intuitive
awareness was that one routinely utters some word combinations, such as Merry Christmas, Happy Christmas and Happy Birthday, but not Merry Birthday.
We tested this apparent phenomenon of avoidance against our existing z-score
collocational (span +/–1, case insensitive) statistics, and found that it indeed
proved to be identifiable. The measures for the three word pairs above revealed
(see Table 1) that, while Pairs 1 (Merry + Christmas) and 2 (Happy + Birthday)
and 4 (Happy + Christmas) collocate strongly, Pair 3 (Merry + Birthday) does
not collocate at all and produces a negative z-score.
Table 1. Collocation of merry, happy, christmas and birthday
Word1 Corpus freq.
(merry 2326)
(happy 8323)
(merry 2326)
(happy 8323)
Word2 Corpus freq.
(christmas 90,670)
(birthday 2416)
(birthday 2416)
(christmas 90,670)
© 2007. John Benjamins Publishing Company
All rights reserved
Collocates
450
526
0
299
Z score
393.205
516.16
–1.014
196.010
Lexical repulsion between sense-related pairs 47
This finding encouraged us to proceed to develop and test systematically a hypothesis about the existence and nature of ‘repulsion’ in text. Given that our
methodology is based on collocational considerations, we naturally focus on
‘lexical’ repulsion. By ‘lexical repulsion’, we mean the observed tendency in
conventional language use for certain pairs of words not to occur together.
Of course, we know that repulsion, if it exists, does so at several levels of
description and generality. This is also suggested by the existing studies of ‘cooccurrence restriction’ in language use. Other fields have traditionally focused
on what is linguistically allowable, stated in terms of constraints. In grammar,
a vast and established body of scholarship deals with the rules governing wordclass co-occurrence, concord and syntactic sequencing (e.g. Blache et al. 2003).
In semantics, there is a long-established tradition of studying ‘selectional constraints’ on word co-occurrence, encompassing both non-computational and
computational approaches (e.g. Resnik 1997). Morphologists and phonologists
talk of ‘blocking’ in word formation (e.g. Andrews 1990; Aronoff 1976; Suzuki
1998; Yip 1998; Kim 1998). Translators (e.g. Laviosa-Braithwaite 1996) and
foreign language teachers (e.g. Bonci 2004), refer impressionistically to ‘collocational constraints’, ‘restrictions’ and ‘clashes’ in relation to the preference
for certain pairings, such as round of applause over round of clapping, and take
a nap over take a sleep.
However, we find no real guidance elsewhere3 on ‘lexical repulsion’ per se,
in terms of active repulsion, as a measure of distance between two words, and
of its scope and potential as a supplementary tool in text analysis.
.2 Goals of the Repulsion Project
So against the background outlined so far, of our own intuitions and of work in
associated fields, our study goals are as follows.
.2. Notion of a force
We wish to introduce the notion that there is another ‘force’, which we call
‘repulsion’, which operates on the construction of text in an opposing way to
that of collocation or ‘lexical attraction’. The goal of the study is to establish
some understanding of how this phenomenon operates, and whether it can be
assigned the status of an objective and measurable feature of textual organisation.
© 2007. John Benjamins Publishing Company
All rights reserved
48 Antoinette Renouf and Jayeeta Banerjee
.2.2 Active distancing
Specifically, we investigate the existence and measurability of ‘active’ distancing
which may operate between words, rather than the known, routine constraints
on co-occurrence imposed by grammatical and other more easily observable
norms, and as distinct from ‘indifference’ (see 3.3).
.2.3 Lexical repulsion
We focus particularly on ‘lexical repulsion’, the trickiest case, where there appears to be no explanation for the non-cooccurrence of two particular words
beyond tradition, no objective means of prediction or rule to apply. This case
is a continuing thorn in the side of English language learners of all mother
tongues. Crucially, we seek to isolate the exceptional cases of active ‘repulsion’
(see Section 3.2) between two words in a text from the commoner relationship
of ‘indifference’,4 as defined in Section 3.3.
.2.4 Semantic repulsion
We shall also conduct a small-scale investigation into another area of repulsion which we assume will also be measurable by collocational means: that is,
‘semantic repulsion’ (e.g. bus and butter).
.2.4 Applications
We shall investigate ways (some of which we already have in mind) in which
the knowledge thereby cumulatively gained can be exploited to provide solutions to some problems in language teaching and NLP.
2. The scope of this paper
This paper is based on our initial investigations into lexical repulsion within
the larger project, as reported in Renouf and Banerjee (2007 and forthcoming). We focus in the paper on the repulsion obtaining between sense-related
pairings, with reference to synonymy and antonymy. At this point, however,
our interest is not in executing a rigorous and exhaustive investigation of sense
relations and repulsion per se.
The reason for prioritising sense-related items is two-fold. The first is that
sense-related pairings are likely to embody repulsion in its strongest form, and
thus generate a more linguistically significant sub-set of repelled items. Synonyms can be expected by virtue of their shared meanings to share significant
collocates in text. So where two synonyms repel each others’ collocates, the
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 49
repulsion is particularly surprising and noteworthy, and in principle, one could
expect the converse with antonyms; that the absence of repulsion would be
surprising. The second reason for prioritising sense-related pairs is a practical
one: if we focus only on repulsion between these pairs, we reduce the amount
of indifferent and irrelevant output generated, given that an individual word attracts only a few favoured collocates, and is more or less indifferent or “hostile”
to the rest of the vocabulary. In this paper, we also focus only on the contiguous
collocates of each synonym pair, in order to eliminate any ambiguity between
repulsion and simply close collocation.
3. Definition of terms used
As a prelude to the study proper, we shall clarify our use of the key terms in
the study.
3. Collocation
This is a property of language whereby words are not randomly distributed
across texts, but occur close to certain preferred word partners (or ‘collocates’),
a relationship refined according to their particular roles in each textual domain,
genre and text type. Collocation exists between both adjacent word pairings
and discontinuous phrasal or grammatical frameworks; and the significance
of co-occurrence is deemed to be measurable by the application of appropriate
statistical algorithms.
3.2 Repulsion
By ‘repulsion’, we mean the intuitively-observed tendency in conventional language use for certain pairs of words not to occur together, not simply due to
the fact that they are semantically incompatible, or grammatically disallowed,
or morphologically or phonologically blocked, but where there appears to be
no explanation other than convention. For instance, for no apparent reason, it
is conventional in English to say sheer guts, but not utter guts; and utter peace
but not sheer peace.
© 2007. John Benjamins Publishing Company
All rights reserved
420 Antoinette Renouf and Jayeeta Banerjee
3.3 Indifference
Furthermore, non-coocurrence is not necessarily a matter of indifference. ‘Indifference’ refers to the situation where two words are in a statistically neutral relationship as regards proximity; there is no statistical evidence that Word A is singling out Word B either to be significantly close to or significantly distant from.
3.4 Force
We borrow the term ‘force’ from the sciences, to reinforce our ‘attraction’ and
‘repulsion’ metaphors. In general, a force is an action on an object that causes
its momentum to change; in electromagnetism, a force is the repulsion of like,
and attraction of unlike, charges. The fact that electrically-charged particles
can also repel each other allows us to extend the metaphor to characterise the
relationship of ‘repulsion’. In the corpus linguistic context, however, we use the
term loosely, to mean a “statistically-significant tendency for something to
happen, specifically for certain words to repel or be repelled by each other”.
Our concept of repulsion is represented diagrammatically in Figure 1.
Word A and word B in Figure 1 are synonyms, shown within their respective ‘collocational spaces’. The circular area on the left represents the significant collocate set for Word A, and the circular area on the right represents
the significant collocate set for Word B. The overlap between their two sets
of collocates represents their shared significant collocates. The middle area of
the circle (between the overlap and crescent) for each word represents those
collocates which might also collocate with the other word, but only weakly
Repels word B
But strong
collocation
with word A
Repels word A
Word A
Collocate
space
Word B
Collocate
space
Shared A/B collocates
(strong collocation with both A and B)
Figure 1. Graphic representation of ‘Repulsion’
© 2007. John Benjamins Publishing Company
All rights reserved
But strong
collocation
with word B
Lexical repulsion between sense-related pairs 42
or insignificantly. Meanwhile, the extreme outer crescents represent areas of
actual repulsion — by Word B of Word A’s collocates, and by Word A of Word
B’s collocates.
4. Data and method
The source data used in our study consisted of a corpus of 800 million words
of written text from the domain of UK broadsheet journalism, the Independent
and the Guardian newspapers.5 In view of this data selection, it will be realised
that when we make reference to instances of lexical repulsion in this paper, we
only speak of that which is characteristic of “broadsheet” journalistic text. The
texts cover the period 1989 to end 2006 in an unbroken stretch, so allow observation on recent usage. Since the limits of repulsion can not, by definition,
extend beyond the confines of a complete text, it was informative to retain the
mark-up which indicates the boundary of each article.
The study entails the interaction of linguistics, statistics and software tool
development. The approach taken was the usual iterative, stochastic method of
articulating hypotheses, of developing tools and statistical measures to allow
them to be tested on the selected language data, and ultimately evolve into a
system which can extract the knowledge on a sufficiently large scale to allow its
ultimate application in the fields of language teaching and NLP.
4. Measuring repulsion
The statistical measures of repulsion which we applied were based on relative
frequency in relation to the 800 million word corpus as a whole. We have built
‘collocational profiles’ containing this information for each word in newspaper
text, and used a z-score cut-off to identify only the most significant collocates.
Collocational z-scores are a measure of the strength of a relationship based on
comparing (a) the frequency with which two observed words collocate within
a given span with (b) their expected frequency in a body of text if the occurrence of one word of the pair were at random relative to the other word. We
demonstrate the statistical thresholds (z-score cut-offs), with reference to the
synonymous word pair nearly and almost, as follows:
–
for ‘repulsion’: the strength of association was set at ≤ –2 that is, the words
nearly + certainly exhibit repulsion because their strength of association is
–5.959
© 2007. John Benjamins Publishing Company
All rights reserved
422 Antoinette Renouf and Jayeeta Banerjee
–
–
for ‘weak collocation’: the strength of association lay between –2 and 2
that is, the words nearly + commonplace exhibit weak collocation, with a
strength of association: –1.27; and the words almost + 50ft with a strength
of association: –1.103
for ‘strong collocation’: the strength of association was set at ≥ 2 that is, the
word almost + half have a collocation strength of 261.703; while the words
nearly + doubled have one of 284.358
It is important to note that the statistical thresholds for both collocational attraction and repulsion are thus set on the same scale, albeit at different points.
5. Lexical repulsion between sense-related word-pairs: synonyms
It was essentially a pragmatic move to focus on the repulsion behaviour of synonymous pairs, as explained, and an exhaustive study of all aspects of the phenomenon was beyond the scope of this project. There were many possible ways
to proceed. We had previously composed a candidate inventory based on three
main sources: our intuition (and that of others’, e.g. Palmer (1981)), the products
of our ACRONYM collocational similarity measures (Renouf 1996; Pacey et al.
1998); and those of targeted lexical signalling (e.g. Renouf 2001). But we began
this research by looking at classic examples of close synonymy within different
word classes and between polysemous items which are known to cause problems for language learners. From that beginning, our method of selection has
largely been one of exploratory boot-strapping, moving on to a new synonym
pair which suggested itself in examining a previous pair. Sometimes this was
“good research practice”, to change one linguistic variable to throw particular
light, even if obscure, such as a shift to a different word-class. At other times it
was to break a deadlock, or out of curiosity. In due course, we intend to shape
this incidental study into a more structured whole, investigating for example
how the typicality of behaviour varies according to scales of concreteness to
abstraction, monosemy to polysemy, and low to high frequency.
5. Case studies on synonym pairs to demonstrate new repulsion
methodology
The following cases studies have been selected to illustrate the phenomenon
of repulsion within different word classes. The pairs chosen are: almost and
nearly, seat and chair, argue and discuss, and pretty and attractive.
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 423
5.. Synonymous pair: ‘almost’ and ‘nearly’
5... Selection
The adverbial synonyms almost and nearly were selected because they are staples in the pedagogic challenge to English language teachers to explain the
appropriate use of near synonyms to learners. The general view (see also recent
analysis by Kjellmer (2003)) is that it is impossible to differentiate between
these two words sensibly. Linguists assert ‘attitudinal’ or ‘evaluative’ differences
as the key, with one being positively and the other negatively connotative, but
this nuance does not hold for long before a counter-example emerges. Most
dictionaries, even the corpus-based Collins-Cobuild Dictionary (2006), define
each word in terms of the other. We decided to see if our collocational measures (see 4.1) could reveal anything further about the pair.
5...2 Identifying degrees of association between synonyms
A cross tabulation (Table 2) was used to display the joint distribution of the
three categories of association: repulsion, weak collocation and strong collocation, between the collocates of almost and nearly.
Table 2. almost and nearly repulsion cross-tabulation
78
762
28
nearly
Total
repulsion
almost
weak
collocation
repulsion
weak
collocation
strong
collocation
Total
strong
collocation
28
28
335
335
78
1999
762
2839
78
1999
1127
3202
The word almost is more than twice as frequent as nearly, their frequencies
being 232,689 and 85,431 respectively, and Table 2 shows that almost collocates strongly with more word types (2839) than does almost (1127). Thus,
nearly operates within a more restricted domain and/or range of functions,
© 2007. John Benjamins Publishing Company
All rights reserved
424 Antoinette Renouf and Jayeeta Banerjee
and consequently repels more word types (78) than almost (28) in the shared
collocate space. The two words share many, in fact, 762, strong collocates. As
shown in Table 2, 762 collocates occupy the common shaded area between the
Table 3. Collocates exhibiting different textual uses of almost and nearly
almost repels
(nearly attracts)
far
man
Not
men
year
million
Christmas
taking
including
quarter
68
Over
average
pretty
weren’t
Very
77
aged
metres
sent
1990
31
married
put
population
serving
2.3
purchased
nearly repels
(almost attracts)
see
seems
become
times
feel
look
certainly
sure
appears
knew
expect
felt
looks
seemed
remains
immediately
word
worse
remain
feels
physical
surely
straight
equally
exactly
directly
academic
regarded
essential
hear
quiet
considered
desperate
bound
appear
guarantee
both attract
wish
intense
definitely
remained
imagine
afraid
entirely
Victorian
classical
daily
sweet
concentrate
focused
precisely
hidden
secondary
comic
amounts
depends
inevitably
romantic
unknown
acceptable
monthly
overnight
routine
sheer
bizarre
quietly
qualify
rely
immediate
completely
© 2007. John Benjamins Publishing Company
All rights reserved
half
40
doubled
two-thirds
pounds
two
200
three
twice
spent
one-third
killed
six
lost
dollars
impossible
seven
lasted
worth
drowned
died
tripled
eight
enough
fell
25
40000
1m
2m
100000
trebled
400000
ruined
5m
2
spend
collapsed
blew
250000
5bn
went
broke
1400
extinct
13
reached
300m
blind
two-and-a-half
attracting
scuppered
After
scored
17000
70000
1300
900000
midnight
10m
bankrupt
raised
2.5
Lexical repulsion between sense-related pairs 425
two circles, while 78 and 28 items occupy the crescents on either edge of the
circles respectively, representing repulsion. There are 1999 strong collocates of
almost which only weakly collocate with nearly, and conversely 335 strong collocates of nearly which only weakly collocate with almost. The other boxes are
empty because we have been considering only the significant collocate space
of the two words, and in this space, we are not likely to find any words that are
either repelled by both almost and nearly, or that weakly collocate with each of
them. Only when we consider all collocates of almost and nearly can we expect
these boxes to be filled. We shall later in our study also consider the contents
of these boxes. Table 3 lists some of the words that either almost or nearly repel
or attract.
5...3 Linguistic analysis and interpretation
As might be expected for words expressing approximation, there is a prevalence of words representing numerals in the results. Of the 762 words that collocate strongly with both words, 241 are numerals, with or without a unit of
measurement attached (e.g. 7m) and this does not include numbers written as
words (e.g. five).
The word almost seems to repel precise numbers, whereas nearly is found
modifying numerals, e.g. nearly 5, nearly 68 and so on. Consistent with this, almost repels verbs of “achievement” or “milestones”, such as aged, sent, married,
serving, purchased, which frequently occur in text with a unit of measurement
(e.g. married nearly 10 years) quantifying the achievement, and which collocate
with nearly. Further, in contrast with nearly, almost repels pretty and very, adverbials which can emphasise achievement.
The word nearly, meanwhile, repels verbs of perception, including some
forms of the lemmata LOOK, FEEL, HEAR, which collocate significantly with
almost. Nearly also repels some stative verbs, such as APPEAR, SEEM, BECOME,
verbs of reported opinion, such as considered, regarded; adverbs of “certainty”,
such as surely, definitely, completely, immediately; and verbs of “reassurance”,
such as bound, rely, guarantee.
Since other repelled items in Table 3 seem to be consistent with this analysis, let us tentatively claim that the data do show us a semantic difference
between these synonyms: that almost generally avoids physical measurement,
while nearly repels contexts of modality, used in hedging. In other words, nearly seems to modify precise numbers, while almost contributes to a discourse
which is down-playing certainty.
The words almost and nearly have a shared area of collocates, which seems
business-oriented, involving verbs to do with money (spend, blow, lost, doubled,
© 2007. John Benjamins Publishing Company
All rights reserved
426 Antoinette Renouf and Jayeeta Banerjee
trebled, tripled) and a number of verbs of catastrophe, both financial (bankrupt, broke, ruined) and existential (blind, scuppered, collapsed, killed, died,
drowned).
5..2 Synonyms: ‘seat’ and ‘chair’
5..2. Selection
The word seat and chair were selected because their semantic relationship had
been shown by our earlier ACRONYM project to vacillate in text. There was
an assumption in pre-computational lexical semantics that seat was a superordinate of chair, which hierarchical relationship we have observed to be inconsistent in real text. Rather, we tend to view the two as distinguishable referentially.
5..2.2 Identifying degrees of association between synonyms
The word seat occurs more (41,244 occs.) than chair (27,214 occs.) in the
corpus. The cross tabulation in Table 4 shows three categories of association:
repulsion, weak collocation and strong collocation, between the collocates of
chair and seat.
Table 4. seat and chair repulsion cross-tabulation
22
70
29
seat
Total
repulsion
chair
weak
collocation
repulsion
weak
collocation
strong
collocation
Total
strong
collocation
29
29
505
505
22
374
70
466
22
374
604
1000
Table 4 shows that seat collocates with more word types (604) than does chair
(466), indicating that seat operates over a less restricted range of functions,
and repels fewer word types (22) than chair (29). The two words share only
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 427
70 strong collocates, which occupy the common shaded area between the two
circles, whereas 29 and 22 items occupy the crescents on either edge of the
circles respectively, representing repulsion. There are 374 strong collocates of
chair which only weakly collocate with seat, and 505 strong collocates of seat
which only weakly collocate with chair.
Table 5. Collocates exhibiting different textual uses of seat and chair
chair repels
(seat attracts)
country
family
West
best
child
third
East
sales
prices
ground
front
economy
South
left
free
space
second
race
majority
sale
numbers
class
compared
target
middle
train
parliament
fit
held
seat repels
(chair attracts)
woman
today’s
national
year’s
made
Davies
Professor
King
Campbell
Smith
Jones
Commission
university
man
black
MP
party
personal
easy
leg
office
deputy
both attract
electric
director’s
editor’s
empty
leather
wooden
plastic
comfy
manager’s
executive’s
folding
reclining
former
comfortable
vacant
cane
garden
presenter’s
upholstered
opposite
captain’s
Labour
speaker’s
favourite
leader’s
covers
nearby
courtside
independent
minister’s
facing
non-executive
upright
child’s
uncomfortable
next
vacated
padded
currently
Democrat
5..2.3 Linguistic analysis and interpretation
In Table 5, it can be seen that, in our journalistic corpus, each specifically repels
the other’s phrasal “completives”; seat thus repels woman, national, Commission, university, man, easy, leg, office, deputy, and a list of surnames of chairmen; while chair repels country and family (as in country seat and family seat in
reference to ancestral homes), child, West, East, South, prices, numbers, train,
parliamentary.
Extrapolating further, one can say that the commoner word seat appears to
repel references to holding or carrying out academic or public office, particular
in the role of chairperson, while the rarer word chair repels references to the
© 2007. John Benjamins Publishing Company
All rights reserved
428 Antoinette Renouf and Jayeeta Banerjee
holding of inherited territory or property, the gaining of seats in government,
seats and seat prices in the theatre or cinema, and seats or seating spaces in
cars and aeroplanes (and possibly other forms of transport). In other words,
they are differentiated referentially as follows: a seat is generally an inherited
property or parliamentary representation, or alternatively a kind of paid seating space, while a chair is generally a role as chairperson in academic or other
public contexts, including governmental.
As for the information derivable from the right hand column of Table 5
about the collocates shared by both seat and chair, we learn that both synonyms can refer to “things to sit on”, with seat capable of referring either to the
object as a whole or specifically to its seating pad (e.g. padded, upholstered).
The collocates also play a role in their combinations with seat and chair in the
context, in themselves having more than one reference.
The shared collocates appear to suggest chair more strongly. This is because
they indeed generally collocate more strongly with chair. It is because the word
seat itself is much more frequent in the corpus than chair that chair’s collocates
also sometimes co-occur with it (that is, corpus frequency is affecting the significance scores). Moreover, where the collocate in question, e.g. reclining, is
rare in the corpus, the significance scores for its co-ocurrence are bigger. This
fact is known, and we have as an ongoing task the modification of statistical
thresholds and measures.
5..3 Synonyms: ‘argue’ and ‘discuss’
5..3. Selection
The synonyms argue and discuss were selected as verbs with similar frequencies, argue occurring 35,419 times and discuss 34,660 times. Intuitively, they
might be regarded as differing in terms of degree; folk wisdom regards the verb
argue as meaning [discuss + (the semantic component of) ‘anger’]. We were
interested in seeing whether this was confirmed in real text.
5..3.2 Identifying degrees of association between synonyms
As shown in Table 6, both argue and discuss have similar frequencies and thus
similar ratios of collocates (244 and 277). The number of strong collocates that
argue and discuss share are fewer than the number they repel.
The words argue and discuss share only 14 strong collocates; argue repels
58 words and discuss repels 19 words in the shared collocate space. The fact
that argue repels nearly 3 times the number of collocates than discuss, implies
discuss is used more selectively in the corpus.
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 429
Table 6. argue and discuss repulsion cross-tabulation
19
14
discuss
span1
Total
repulsion
argue
58
weak
collocation
repulsion
weak
collocation
strong
collocation
Total
strong
collocation
58
58
172
172
19
244
14
277
19
244
244
507
5..3.3 Linguistic analysis and interpretation
The pattern of repulsion, as revealed in Tables 6 and 7, seems to be as follows.
The word discuss particularly repels words which refer to people and institutions who are cited in newspapers as being in a position to put forward an
argument: people, companies, industry, authorities, scientists, analysts, banks,
etc.6 On the other hand, it does not significantly repel any words identifying
what the argument of proposition might be, (since these are rendered too disparate by the favoured syntax of the word argue for any right-hand collocates
to emerge as sufficiently statistically significant to count).
The word argue, meanwhile, repels many words which refer to the problems of society which are reported on in journalism, and ways to deal with
them. Specifically, these repelled items consist of adjectives and nouns which
designate a particular area or topic of interest, such as political, economic, human, industrial, nuclear; trade, peace, sex, security, safety, arms. They also comprise adjectives indicating the (time) scale involved: future, possible, potential,
propose, regional, global.
Argue also repels some discourse-organising nouns with future reference,
such as plans, strategy, policies, ideas, policy, efforts. Argue further repels certain
progressive verb forms with semantic implications of futurity, such as putting,
setting, increasing, buying, moving.
© 2007. John Benjamins Publishing Company
All rights reserved
430 Antoinette Renouf and Jayeeta Banerjee
Table 7. Collocates exhibiting different textual uses of argue and discuss
argue repels
(discuss attracts)
political
such
interest
future
things
possible
security
economic
human
problems
terms
trade
plans
potential
peace
questions
sex
military
personal
reports
common
events
drugs
ways
details
individual
moving
aid
global
nuclear
strategy
putting
policies
progress
buying
ideas
safety
England’s
arms
policy
issues
sexual
setting
alternative
fears
proposed
industrial
joint
efforts
increasing
turning
concerns
relations
various
alleged
regional
allegations
discuss repels
(argue attracts)
still
people
companies
now
instead
others
industry
banks
supporters
authorities
Americans
Tories
agencies
states
doctors
easily
groups
heads
organisations
Scientists
Observers
Analysts
both attract
economics
let’s
writers
openly
publicly
today
endlessly
even
leaders
seriously
privately
ministers
authors
experts
Overall, it was found that argue repels more of the collocates of discuss than
vice versa. This can be explained by the fact that the word argue is used to
report a proposition or thesis, occurring over 50% (22,736 : 39,540) with the
right-hand collocate that, and used as a reporting verb in the construction
‘<NN> argue(d) that…’ with the result that it can precede almost any word,
whereas discuss is used more frequently in constructions such as ‘discuss + <JJ>
+ <NN>…’ or ‘discuss + <Ving> + <JJ> + <NN>…’ .
What the two synonyms share can be seen in the right-hand column of Table 7 to be particular nouns which seem to denote human subjects who can both
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 43
“discuss” and “argue”: ministers, authors, experts, leaders, writers. The pair also
both collocate with certain adverbs of manner: openly, publicly, privately, endlessly, seriously, which can be employed to characterise both these verbal activities.
5..4 Synonyms:‘ pretty’ and ‘attractive’
5..4. Selection
The synonyms pretty and attractive were automatically extracted from the
‘nymic’ thesaural output of our ACRONYM system as two adjectives which
can be synonymous in relation to physical attributes. The word pretty is almost
twice as frequent as attractive in the corpus (70,767 and 32,283 respectively).
This is as expected, given that pretty in fact functions not just as an evaluative
adjective but also as a degree adverbial, while attractive is restricted to the former of these functions.
Table 8. attractive and pretty repulsion cross-tabulation
131
140
43
pretty
Total
repulsion
attractive
weak
collocation
repulsion
weak
collocation
strong
collocation
Total
strong
collocation
43
43
1324
1324
131
298
140
569
131
298
1507
1936
5..4.2 Identifying degrees of association between synonyms
Table 8 shows that though pretty is more frequent in the corpus and has a greater number of strong collocates (1,507) than attractive (569), pretty significantly
repels more collocates (131) than attractive (43). This implies that attractive
modifies a wider range of words than does pretty. Both words share more collocates (140) in common, than they repel. This follows as expected from their
synonymic relationship.
© 2007. John Benjamins Publishing Company
All rights reserved
432 Antoinette Renouf and Jayeeta Banerjee
5..4.3 Linguistic analysis and interpretation
Table 9 shows, perhaps unsurprisingly but logically, that the word pretty repels
nouns with human reference which are not described as pretty: man, men, person, people, team, quality. Above all, however, pretty repels words which deal
with aspects of inanimate business and finance: deal, offer, price, tax, interest,
shares, return, investment, insurance, stock, business, fund, growth, financial,
and so on.
The word attractive, in contrast, repels words that refer to states of affairs
that often are or are becoming difficult to handle. These include adjectives such
Table 9. Collocates exhibiting different textual uses of attractive and pretty
attractive repels
(pretty attracts)
get
sure
good
kept
got
early
hard
average
already
similar
important
significant
big
unlikely
difficult
firm
close
quickly
getting
happy
serious
nearly
soon
clear
gone
moving
comes
probably
mean
actually
poor
feeling
worked
keen
near
impossible
high
sitting
full
complex
bad
hot
felt
pretty repels
(attractive attracts)
way
People
man
income
team
performance
deal
subject
tax
markets
interest
single
shares
career
men
price
business
even
financial
insurance
return
property
find
offer
longer
Business
investment
provide
form
buy
market
source
growth
character
prices
target
rate
choice
least
benefits
quality
range
idea
rates
bid
terms
person
products
opportunity businesses
building
figure
stock
policies
areas
ideas
compared
partner
© 2007. John Benjamins Publishing Company
All rights reserved
both attract
look
looks
sight
particularly
young
confident
blonde
girl
sounds
extremely
looked
woman
girls
slim
seem
football
damned
looking
village
seemed
sound
still
intelligent
conventionally
enough
exceptionally
women
pictures
terribly
little
picture
smart
strikingly
equally
neat
female
Lexical repulsion between sense-related pairs 433
as hard, difficult, poor, bad, impossible, complex, unlikely, important, serious,
significant, but also more neutral terms, such as average, similar, and some semantically positive items such as sure, clear, high, firm, full, happy, keen.
To put this more simply, pretty functions primarily in journalism as an
adverbial, used to emphasise the seriousness of bad situations or the positivity of good ones. Meanwhile, it is perhaps a sad reflection on UK broadsheet
journalism that attractive is more closely associated with business deals than
with human appearance.
Looking to the final column in Table 9, we see that pretty and attractive
call a truce, and find common ground in strongly collocating with words dealing with physical good looks and sometimes intellectual calibre. Their shared
semantics of good looks is particularly attributed to the female stereotype: girl,
woman, women, little, neat, slim, young, blonde, confident. Both pretty and attractive are significantly often preceded by adverbs of degree such as: particularly, extremely, damned, exceptionally, strikingly, terribly.
6. Lexical repulsion between sense-related word-pairs: antonyms
Our attention to antonyms has so far been cursory: our intention is not to take
on a rigorous a-priori classification of antonymy, which would be beyond our
scope. Instead, we have side-stepped our main research thrust, and selected
a couple of antonyms, in the first instance, to gain a different perspective on
repulsion. Antonyms represent another subset of the lexicon which is likely to
generate restricted and surprising output.
Antonymy poses of course an even greater problem of interpretation than
synonymy, since our intuitions are weaker for this aspect of the thesaurus. We
know from the outset that, as with synonyms, antonymic pairs are only partially
antonymic, and will thus only repel certain of each other’s collocates. Furthermore, they will, in spite of their contrastive meanings, share some collocates.
6. Case studies on antonym pairs to demonstrate new repulsion
methodology
For this case-study, we have selected just two antonymous pairs, black and white
exemplifying complementarity and hot and cold representing gradability.
© 2007. John Benjamins Publishing Company
All rights reserved
434 Antoinette Renouf and Jayeeta Banerjee
6.. Antonyms:‘ black’ and ‘white’
6... Selection
The words black and white are complementary antonyms, expressing absolutely
opposite concepts. We selected them as an instance of ‘core’ antonyms; that is
to say most archetypal in being conceptually related most neutrally of various
scales of degree, evaluation and so on (e.g. Carter 1987), and because they both
refer to colours, physical appearance and racial type, which we were curious to
observe in its effect of the pattern of repulsion.
6...2 Identifying degrees of association between antonyms
As expected, these core antonyms generally seem to share more words in the
strongly collocating group and repel only a few selective words in their shared
collocate space. The pair are highly frequent and also fairly equi-frequent in
the corpus, black with 139,269 occurrences and white with 120,013, strongly
collocating with 1,232 words and repelling only 77 and 73 words respectively,
as seen in Table 10.
6...3 Linguistic analysis and interpretation
Table 11 shows that both black and white, as expected of core words in the language, do more than denote the obvious contrasts of colour, physical appearance and race. Many of the words repelled by black form established idiomatic
Table 10. black and white repulsion cross-tabulation
77
1232
73
white
Total
repulsion
black
weak
collocation
repulsion
weak
collocation
strong
collocation
Total
strong
collocation
73
73
1513
1513
77
1477
1232
2786
77
1477
2818
4372
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 435
phrases with white such as the metaphorical lie, paper, goods, meat, house, palace, heat, spirit; and the literal: board, (great white) hope. Similarly, many words
repelled by white combine in metaphors with black, such as day, book, market,
the new, rain, mark. Through these runs a common thread that black repels a
positive and white a negative connotation.
When white is viewed individually, a repulsion profile enfolds which again
reflects the conventionally negative or problematic associations with black, almost exclusively in the racial context. White repels issues referred to as experience, rights, struggle, and topics such as crime, criminal, employment, pensions.
On the other hand, white also repels a series of words which associate black positively with success: successful, talented, ambitious, leading, respected, popular,
Table 11. Collocates exhibiting different textual uses of black and white
black repels
(white attracts)
government
working
health
building
house
hope
board
energy
skills
parliament
hopes
bed
alternative
pictures
goods
regime
rose
week’s
image
fears
room
transport
ruling
eat
buildings
heat
Australian
garden
spirit
winter
lie
dead
landscape
English
photographs
smile
tourists
beach
meat
dominated
clinical
clean
medium
rooms
Red
justice
tender
lies
palace
footage
nights
port
paper
white repels
(black attracts)
day
market
new
book
first
experience
rights
leader
art
Tory
history
Africa
successful
term
newspaper
presence
mood
radio
Britain’s
criminal
talent
politics
dark
© 2007. John Benjamins Publishing Company
All rights reserved
both attract
writing
pepper
bra
rain
leather
artists
England’s
pudding
stuff
pitch
community Lycra
struggle
people
urban
music
hair
king
crime
velvet
girl
leading
peppercorns limousine
weekly
teenager
glossy
arts
smoke
lacquer
MPs
man
moustache
mark
shiny
gun
tie
affect
young
joke
trousers
referee
lace
jokes
matt
shadow
armbands
ambitious
thick
governor
box
respected
men
awareness
dress
pensions
ink
unemployment satin
popular
magic
oldest
Africans
436 Antoinette Renouf and Jayeeta Banerjee
and with successful social positions: leader, leading, governor, politics, shadow,
MPs, Tory. These are often regarded as newsworthy in being oldest, first, new.
On a different but also positive tack, white repels words which assume a connotation of sophistication in association with black as a colour term: lacquer,
bra, Lycra, limousine, moustache.
The word black, meanwhile, repels the major semantic areas associated
with white. These include race: regime, dominated, ruling, Australian, tourists;
but also the colour of décor: room, bed, winter, rose, dead, clinical; and the colour of landscape (through sun or possibly snow): garden, landscape, beach.
Absolute or canonical opposites, particularly the high-frequency, core pairings like black and white, are also most likely to share collocates, since they often
co-occur in fixed phrases and frameworks of the kind not black but white, both
black and white. These phrases can be metaphorical, as in in black and white (=
in writing) as well as literal, as in black and white (= monochromatic).
In the latter case, the phrasal nature of black and white, leads it to combine
as a noun-phrase modifier with words such as pictures, image, photos, montage.
When observing repulsion over a span of 1, this can throw the scores out, since
the phrases generate a strong collocation score with white but a falsely high
repulsion score with black.
The current output also cannot show that black repels classes of multiword phrase such as government/parliamentary/transport white paper. Clearly,
all this points to the necessity to extend this study to units above individual
word level.
6..2
Antonyms: ‘hot’ and‘ cold’
6..2. Selection
The terms hot and cold are classic, core gradable antonyms, sitting at two ends
of the spectrum but different from black and white in being (more readily)
amenable to modification, by adverbs of degree such as very. They were selected for this, and also because they are almost equally frequent in the corpus
(44,674 and 45,672 respectively).
6..2.2 Identifying degrees of association between synonyms
Table 12 shows that hot and cold share many strong collocates (252) and repel
only a few collocates of the other (38 and 28 respectively) (Table 12), which
probably indicates that they share several functions and contexts purely by virtue of being core items.
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 437
Table 12. hot and cold repulsion cross-tabulation
38
252
28
cold
Total
repulsion
hot
weak
collocation
repulsion
weak
collocation
strong
collocation
Total
strong
collocation
28
28
436
436
38
564
252
854
38
564
716
1318
6..2.3 Linguistic analysis and interpretation
Table 13 shows the words that both hot and cold attract and repel. To some
extent, we see similar patterns of repulsion here to those for black and white.
These core, multi-functional words override any notion of straightforward
antonymy where they each combine strongly in fixed idiomatic expressions.
Thus, hot repels the favoured pairings of cold as follows: cold meaning “unsolicited” in call, calling; cold meaning “pitiless” as with heart, truth, reality, eye;
cold meaning “not warm”, as with front, dead; and cold meaning “viral infection”, as with caught, common, bad. There is also a series of verbs which hot
repels, which clearly combine phrasally with cold: left, goes, went, gone. Conversely, cold repels the collocates combining significantly with hot in many,
more widely semantically-distributed idiomatic combinations as follows: (illegal) money, (immediate) line, (high-achieving) shot, (likely to succeed) favourite, (responsibility-carrying) seats, (strong) competition, (vibrant, sexy)
red, (highly-charged) dispute.
The significantly repelling items which are not idiomatically fixed with
these two antonyms confirm but also enlarge the picture. There are many food
and weather terms in their output: cold repels July, August; hot repels December, January, February, March. There is no doubt that hot, in spite of its equifrequency with cold, is the favoured term in journalism, and so cold repels the
many different collocates of hot which denote topical matters, such as issue,
© 2007. John Benjamins Publishing Company
All rights reserved
438 Antoinette Renouf and Jayeeta Banerjee
Table 13. Collocates exhibiting different textual uses of hot and cold
hot repels
(cold attracts)
left
went
old
front
gone
common
bad
March
dead
cash
economic
clear
goes
post
call
February
heart
truth
caught
peace
January
December
voters
north
eye
calling
reality
hard
cold repels
(hot attracts)
money
new
young
line
issue
news
shot
race
keep
subject
issues
debate
books
crime
August
favourite
currently
followed
date
both attract
fashion
July
band
became
video
natural
sex
pace
comes
seats
latest
gay
competition
weekend
small
takeover
red
dispute
property
water
spots
air
weather
potato
summer
oven
springs
bath
dry
spot
drinks
meals
long
milk
baths
showers
hot
shower
breath
day
iron
food
extremely
plate
blowing
unusually
serve
sweet
climate
towels
spring
uncomfortably
really
served
dishes
toast
add
blow
white
exceptionally
cup
issues, news, subject, debates; and which name fashionable topics: crime, sex,
band, books, fashion, gay, takeover; as well as journalistic adjectives and adverbs
such as currently, quick, latest, new, young, small. Cold also repels collocates
of hot in its “sexy” sense: in addition to those collocates in fixed expressions
above, we find weekend, video.
On looking at our entire list of collocates for these words, those displaying
weak repulsion/collocation included, we discover that there are many further
words which do not quite make the threshold for the ‘cold repels…’ group but
which are, nevertheless, semantically interesting in being phrasal completives
of hot: chocolate, tub, seat, pursuit, topic, dog, cakes, piping, coals, tip, dinners,
chilli, etc.
The advantage gained from studying antonyms as well as synonyms is that
it contributes a new perspective to our study of repulsion while incidentally
furthering our understanding of the nature of sense relations. We shall thus
continue with the investigation of other sense relations.
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 439
7.
Concluding remarks
At the outset, we knew that synonymy and antonymy, for various well-rehearsed reasons (e.g. Palmer 1981), are only partial phenomena, and that synonyms differ in meaning according to the particular functions they each fulfil,
and the contexts in which they typically occur, just as antonyms do something
like the reverse of this.
In our studies so far, we have discovered that where synonyms differ along
such parameters, they actively repel certain of each other’s collocates. A further
finding is that the differences between two synonyms which cannot readily be
ascertained by consulting the mental lexicon, appearing just to be arbitrary and
conventional, show up robustly in our repulsion output (bearing in mind that
this is based on journalistic corpus data) as being systematic and explicable
differences in terms of register or style, but above all, semantic.
Of course, it should be borne in mind that our output lists contain actively
repelled words identified specifically according to measures of lexical repulsion, but this does not prevent some of these instances of repulsion from being
further explicable in terms of phonological, grammatical or other co-occurrence constraints.
From a language-pedagogic point of view, if lexical repulsion is not just
an arbitrary matter of convention but is explicable in terms of semantic and
other qualities which in principle should be accessible to us within our mental
lexicon, then this finding is rather fundamental — it indicates that what a language learner ideally needs to learn — and to be taught — is a finer awareness
of these less intuitively accessible features of words as they are used in text.
Unfortunately, this awareness does not seem to be intuitively accessible even
to native speaker linguists. One practical application of our research could accordingly be in the generation of inventories of lexical repulsion lists: lists of
words which cannot normally combine naturally with a given headword, for
the benefit of language learners and non-native-speakers wishing to optimise
the quality of their textual composition. We are currently using a corpus of UK
newspaper texts for analysis but our approach will be extended to other text
types; it will of course also be interesting to investigate repulsion patterns in
spoken language data.
It is hoped that a new ‘repulsion’ measure will emerge from the research,
which complements and supplements the use of existing collocation measures
in another area of application, NLP. For instance, repulsion scores could be
used to advance the field of spell-checking, by identifying well-formed but
© 2007. John Benjamins Publishing Company
All rights reserved
440 Antoinette Renouf and Jayeeta Banerjee
contextually inappropriately spelled or selected words on the basis of the aberrant presence of dis-preferred or repelled collocates.
If concluding remarks are ordered from specific to general, then perhaps
the most general conclusion we can offer is the observation that corpus linguistics is vital in the search for repulsion. We may know of words within our own
mental lexicon which we try to avoid, personal shibboleths, just as we prefer
some words. But whereas we can perceive lexical collocation (or attraction)
each time we read or hear English text, we cannot as individuals observe the
phenomenon of lexical repulsion, except through a sizeable cross-section of
the shared repository of language that a large text corpus represents.
We have given a taste of our early findings concerning the phenomenon of
repulsion as a possible force in text, and in particular lexical repulsion, and this
specifically between synonyms and antonyms. Our plan is to move on to investigate the following areas: modified statistical thresholds and measures, repulsion spans, directionality of repulsion in fixed phrases, as well as text at phrase
level, repulsion across sentence boundaries, and the effect of case sensitivity.
We have found that semantic distinctions play a major role in lexical repulsion, but we also plan a small-scale study of ‘semantic’ repulsion per se; that is
to say, to study word pairs which are semantically incompatible, such as plug
and horse. Of course, these words could be identified manually, by semantic
componential analysis, but we are interested on discovering how far we can
distinguish them by collocational means.
Acknowledgement
We acknowledge with gratitude the financial support for the Repulsion Project awarded by
the EPSRC (grant no. EP/D502551/1). The authors would also like to thank the two anonymous reviewers and the editors of this issue for their comments on the various versions of
this manuscript.
Notes
. The metaphor of ‘attraction’ comes from the sciences, where it refers to a type of magnetic
relationship, that of being drawn together, that arises between electrically charged particles
that are in motion. It is thus a convenient metaphor for the characteristic preferences for cooccurrence that are well-known to exist between two (or more) particular words in text.
2. The Research and Development Unit for English Studies (http://rdues.uce.ac.uk). This
Unit began at the University of Birmingham in 1988, moved to the University of Liverpool
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 44
in 1994, and returned to UCE, Birmingham, in 2004. The Director, Antoinette Renouf, and
the statistician, Paul Davies, have been in place since the outset; the remaining research
team has changed in personnel through the years. Currently, it is composed of software
experts Jayeeta Banerjee, Andrew Kehoe and Matt Gee. The Unit has primarily engaged in
English corpus linguistic description which has led to the creation of automated and semiautomated software systems, which in turn extract various types of knowledge about lexical and textual meaning, and language change, from large textual databases, primarily (for
convenience) broadsheet journalism.
3. Beeferman et al. (1997), in ‘A Model of Lexical Attraction and Repulsion’, in fact only
discusses the ‘lexical exclusion principle’ — specifically, that exact word repetition occurs
less frequently within shorter text spans.
4. This ‘indifference’ is a complex matter, which we cannot go into fully here. We use it according to a minimal definition, simply for the case where words do not significantly attract
or repel each other. A more comprehensive definition would see it is a relative, not absolute,
term, both in degree and in function. The construction of a single article or text on a single
topic ultimately requires that all words in it are selected interdependently. But here, a whole
network of larger-scale relationships involving long-range collocation is probably involved.
5. The shift from the Independent to the Guardian took place in 2000, when the distribution
of Independent data moved out of the hands of the Financial Times and into the Lexis-Nexus
database, whence it is only retrievable at commercial rates. The Guardian, at the moment of
writing, still allows a free download to be made.
6. Though we are viewing this interaction collocationally, there is a grammatical point here:
discuss and argue are base verb forms, and each form of the lexeme will only interact positively or negatively with word forms which are grammatically compatible — here, we find
that these are plural nouns.
References
Andrews, A. (1990). Unification and Morphological Blocking. Natural Language & Linguistic Theory, 8, 507–57.
Aronoff, M. (1976). Word Formation in Generative Grammar. MIT Press.
Beeferman, D., Berger, A. & Lafferty, J. (1997). A Model of Lexical Attraction and Repulsion.
In Proceedings of the 35th Annual Meeting of the ACL and 8th Conference of the EACL
(pp. 373–380). Association for Computational Linguistics, NJ, USA.
Blache P., Guenot, M. L. & Van Rullen, T. (2003). A corpus-based technique for grammar
development. In D. Archer, P. Rayson, A. Wilson & T. McEnery (Eds.), Proceedings of
Corpus Linguistics 2003 (pp. 123–131). University of Lancaster.
Bonci, A. (2004). Collocations in Italian L2. A case control study. PhD Thesis, Royal Holloway
College, University of London.
Carter, R. (1987). Is there a Core Vocabulary? Some Implications for Language Teaching.
Applied Linguistics, 8 (2), 178–193. Oxford: Oxford University Press.
© 2007. John Benjamins Publishing Company
All rights reserved
442 Antoinette Renouf and Jayeeta Banerjee
Collins Cobuild English Language Dictionary (2006). (5th edition). Sinclair, J. et al. (Eds.).
London/Glasgow: William Collins Sons & Co. Ltd.
Firth, J. R. (1957). Modes of meaning. In J. R. Firth: Papers in Linguistics 1934–1951 (pp.
190–215). London: Oxford University Press.
Kim, D. W. (1998). Finding the Reader in Literary Computing. In R. Woolridge, W. McCarty
& W. Winder (Eds.), CH Working Papers series, A.11. Jointly publ. with TEXT Technology, 8.1. Wright State University. Available at: http://www.chass.utoronto.ca/epc/chwp/
kim/
Kjellmer, G. (2003). Synonymy and corpus work: on almost and nearly. ICAME Journal, 27,
19–27.
Kučera, H. & Francis, W. N. (1967). Computational Analysis of Present-Day American English. Brown University Press Providence.
Laviosa-Braithwaite, S. (1996). Comparable Corpora: Towards a Corpus Linguistic Methodology for the Empirical Study of Translation. In M. Thelen & B. Lewandowska-Tomaszczyk (Eds), Translation and Meaning, Part 3 (pp.153–163). Maastricht.
Pacey, M., Collier, A. J. & Renouf, A. J. (1998). Refining the Automatic Identification of Conceptual Relations in Large-scale Corpora. In E. Charniak (Ed.), Proceedings of the Sixth
Workshop on Very Large Corpora, Montreal, 15–16 August 1998 (pp. 76–84). COLINGACL.
Palmer, F. R (1981). Semantics. Cambridge: Cambridge University Press.
Renouf, A. J. (2007). Corpus Development 25 years on: from super-corpus to cyber-corpus.
In R. Facchinetti (Ed.), Corpus Linguistics twenty-five years on: Selected papers of the
twenty-fifth International Conference on English Language Research on Computerised
Corpora, Verona, May 2004 (pp. 27–49). Atlanta/New York: Rodopi.
Renouf, A. J. (2001). Lexical Signals of Word Relations. In M. Scott & G. Thompson (Eds.),
Patterns of Text: in honour of Michael Hoey (pp. 35–54). Amsterdam/ Philadelphia:
John Benjamin Publishing Co.
Renouf, A. J. (1996). The ACRONYM Project: Discovering the Textual Thesaurus. In I. Lancashire, C. Meyer & C. Percy (Eds.), Papers from English Language Research on Computerized Corpora (ICAME 16) (pp. 171–187). Amsterdam/ New York: Rodopi.
Renouf, A. J. & Banerjee, J. (Forthcoming). The Phenomenon of Repulsion in text. In C.
Leclère et al. (Eds.), Special edition of Proceedings of 25th International Conference on
Lexis and Grammar, Palermo, Sicily, Sept. 6–10, 2006, Lingvisticae Investigationes,
Amsterdam: John Benjamins.
Renouf, A. J. & Banerjee, J. (2007). The Search for Repulsion: a new corpus analytical approach. In eVARIENG: Methodological Interfaces as online Proceedings of 27th International ICAME Conference, May 2006, Hanasaari, Finland. Available at: http://www.
helsinki.fi/varieng/journal/volumes/index.html
Resnik, P. (1997). Selectional Preference and Sense Disambiguation. Presented at the ACL
SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How? April
4–5, 1997, Washington, D.C
Suzuki, S. (1998). A Typological Investigation of Dissimilation. Unpublished doctoral dissertation, University of Arizona.
© 2007. John Benjamins Publishing Company
All rights reserved
Lexical repulsion between sense-related pairs 443
Yip, M. (1998). Identity avoidance in phonology and morphology. In S. LaPointe, D. Brentari
& P. Farrell (Eds.), Morphology and its Relation to Phonology and Syntax (pp. 216–246).
Stanford, CA: CSLI Publications.
Authors’ address:
Antoinette Renouf and Jayeeta Banerjee
Research and Development Unit for English Studies,
School of English, University of Central England in Birmingham
Perry Barr, Birmingham B42 2SU
[email protected], [email protected]
© 2007. John Benjamins Publishing Company
All rights reserved
© 2007. John Benjamins Publishing Company
All rights reserved