How good is a decent student? - Create and Use Your home

How good is a decent student?
An experimental investigation on gradable adjectives and scalar implicatures
March 15th, 2012
Andrea Beltrama
University of Chicago
[email protected]
1. Introduction
Scalar implicatures represent a neat example of how pragmatic inferences can affect the
semantic interpretation of an utterance. However, the way in which they are licensed by different kinds
of scale structures is not fully understood. In this study, we designed an experiment to investigate how
these inferences are computed with respect to adjective scales. In particular, we wanted to look at two
factors potentially affecting their calculation: the strength of different adjectives on the same scale and
the effect of contextual information on how this strength was perceived. Our results are partially
surprising, in that only the weakest adjectives on the scale (i.e. decent) systematically triggered
negative inferences to higher value on the scale. Adjectives with middle strength (i.e. good), instead,
were always processed exactly in the same way as the strongest ones (i.e. excellent), even when the
context of utterance was manipulated so as to underline the under-informativity of the former with
respect to the latter. We explain this results by proposing that adjectives like excellent (defined as
“strong adjectives” in the paper) should be analyzed as non-gradable, and therefore share a scale with
the weaker adjectives only from a pragmatic perspective, but not from a semantic one. We suggest that
scalar alternatives, as revealed also by several patterns independent from the experiment, are more
easily evoked for adjectives which take degree arguments and are actually part of a semantic scale. For
this reason, implicatures from good to excellent, while in principle available, were not actually
computed in the experimental task. The paper is structured as follows. In section 2 we provide some
background on adjectives and implicatures. In section 3 we discuss the experimental design and the
significance of the results. In section 4 we outline several observations on the semantic and pragmatic
behavior of the three categories of adjectives used in the experiment, focusing on the non gradable
nature of strong adjectives. In section 5 we propose an account for the experimental results based on
the distinction between semantic and pragmatic scales.
2. Background
2.1 Implicatures
According to Grice, the interpretation of a sentence must take not only the truth-conditional content
(what is said), but also what is implicated, that is, the part of speaker’s meaning that must be inferred
from the context and from particular assumptions about the communicative situation. A significant part
of what is implicated is represented by conversational implicatures, that is, those inferences triggered
by the assumption that the speaker is being cooperative and is respecting the maxims of quantity,
quality, relevance and manner. An example is provided in (1)
(1) Some people left the party
In the sentence, the content of what is said is that at least (and possibly all) some people left the party.
What is implicated, however, corresponds to a stricter interpretation, paraphrasable as “some people
left the party, but some people didn’t”. The implicature that not all people left is based on the
assumption that the speaker is being cooperative and is respecting the conversational maxims,
including the Quantity Maxim (“make you contribution as informative as required” 1967:26). As a
consequence, the speaker is expected not to withhold information that she would be in a position to
provide. For example, if she knows that all people left the party, she is expected to say that all people
left the party. By the same token, if the speaker utters (1), it is inferred that she cannot truthfully say
that all people left the party, and therefore that not all people left the party. Hence, the more restrictive
interpretation of the quantifier, even though, from the point of view of what is said, some is consistent
with all.
Implicatures triggered by the Quantity Maxim, as those in (1), are also known as scalar
implicatures and have received a significant amount of attention in the literature. A particularly
influential model has been proposed by Horn (1984). In his view, scalar implicatures are triggered by
entailment scales, total orderings of terms in which each expression entails the weaker terms and
implicates the negation of the stronger terms. The amount of informativeness/strength of an expression
translates into its position on the scale. In the example above, the relevant entailment scale triggering
the implicature would be <some, many, all>. Each term on the scale entails the terms to its right and
implicates the negation of the terms to its left.
Prominent examples of such scales in natural language are conjunctions (<and, or>), adverbs of
frequency (<sometimes, often, always>), expressions of probability (<possible, likely, impossible>)
and, crucially, also some gradable adjectives (<warm, hot>). According to Horn (2005), in fact,
adjective pairs such as <warm, hot> are governed by entailment relations, and as such give rise to
scalar inferences.
(2) It’s warm
While what is said is that “it is warm and possibly hot”, what is implicated is that it is warm and not
hot1. Other authors, however, pointed out that entailment-based orderings are not a necessary condition
for scalar inferences. Hirschberg (1991), in particular, suggests that scalar inferences might be licensed
by scales that do not exhibit any semantic relations between their members, but exist only depending
on a specific context, as in (3). In this example, there is no independent lexical entailment relation
between Chicago and Denver. The scalar ordering in which Chicago stands in a weaker position with
respect to Denver is merely based on the specific context in which Robin is travelling from east to
west.
(3) Robin has made it to Chicago => Robin hasn’t made it to Denver
(context: Robin is travelling westward from New York to California)
The difference between semantic and pragmatic orderings will be particularly relevant in the discussion
of our experiment, in which we are going to suggest that adjective scales are also based on pragmatic
scales, and not only on semantic ones.
Scalar implicatures have also been widely investigate in language processing and language
acquisition. There is consistent experimental evidence (Noveck and Posada 2003, Bott and Noveck
2004) that the computation of implicatures requires longer processing times for the utterances on which
1
“If it’s hot, it’s (a fortiori) warm, but if I know it’s hot, the assertion that it’s warm can be echoed and rejected as (not false
but) insufficiently informative”
the implicatures have been drawn. By the same token, the performance of children in computing
implicatures has proven generally poor compared to adults (Noveck 2001, Papafragou and Musolino
2003, Pouscoulous et a. 2007). This fact has been explained in terms of the scarcity of cognitive
resources available to first language learners when it comes to integrate what is said with what is
implicated. In general, it has been demonstrated that scalar inferences translate into an extra processing
effort for speakers.
2.2. Adjectives
Although Horn suggested that adjective scales operate through the very same mechanism of other
implicature triggers, the bulk of the relevant work on these inferences has been carried out on
quantifiers and conjunctions. Adjective scales, nevertheless constitute an interesting object of
investigation, as they are often formed by multiple terms distinguished by no sharp boundaries. This
fact offers the possibility of testing how different degrees of strength with fuzzy boundaries affect the
computation of scalar inferences, an issue which cannot be investigated with other, less vague types of
scales (i.e., quantifiers and conjunctions). This is precisely the general question underlying the
experiment discussed in this paper. Before addressing it in more detail, however, some background
discussion on the semantics of gradable adjectives is in order to frame the ensuing investigation.
2.2.1 Adjectives and degrees
It is agreed upon that gradable adjectives require the introduction of type d (degree) in the
semantic ontology, together with the traditional other types (individuals, truth values and so forth)2.
Following Kennedy (2007)3, we assume (i) that they denote relations between individuals and abstract
representations of measurement, or degrees (semantic type e,d) and (ii) that a set of degrees totally
ordered with respect to some dimension constitutes a scale. Dimensions along which adjective scales
are commonly ordered include size (i.e. big), temperature (i.e warm), height (i.e. tall), intensity (i.e.
strong) and many others. The measure function of type <e,d> is then converted into a property of
individuals of type <e,t> by means of degree morphology, which imposes a requirement (depending on
the specific morpheme) on the degree derived by applying the adjective to its argument. While degree
morphology is often overt (modifiers like very, comparative morphemes like more), this approach
2
3
For a semantic analysis of gradability that does not make use of degrees, see Klein 1982.
There have been other slightly different implementations, but the core assumptions are agreed upon.
stipulates a covert, phonologically unrealized morpheme for the positive form4. The function of this
covert morpheme is to introduce a contextually appropriate standard that needs to be exceeded by the
degree of the property at stake, in order for the adjective to truthfully apply. For instance, the adjective
tall denotes the property of having a degree of tallness that is at least as great as the prevailing standard
given5. A denotation of the silent positive morpheme (“pos”) and its composition with a gradable
adjective is offered below in (4), where “g” is a gradable adjective and “s” stands for the standard
function which returns the degree to be equalled or exceeded in order for the adjective to be true.
(4) [|pos|]= λg<d,t>λx<e>. g(x)> s(g)
[| [DegP pos [AP tall]] |] = λx tall(x) > s(tall)
Under these assumptions, and going back to the issue of entailment scales and implicatures,
there seems to be a tight connection between the notion of strength and the notion of degrees. If a scale
is a total ordering of degrees, then the strength of an expression boils down to how high the degree of
the property encoded by the adjective must be in order for that adjective to truthfully apply. The higher
the degree, the stronger the expression on the scale. If we take into consideration expressions encoding
different degrees along the dimension of goodness, for example, the most intuitive way of representing
the relevant scale is in (4)
(5)
2.2.2 Adjectives and vagueness
Adjective scales like the one above display two distinct properties which have been identified in
the semantic literature as indicators of vagueness. First, they have truth conditions which may vary
according to the context. The same performance in a determinate activity, for example, can be good or
excellent if performed by a beginner, but not so good if performed by a person with long experience in
that activity. Quantifiers, instead, are less vague in this sense: given a set of individuals, the distinction
4
5
For a slightly different analysis, see Rett 2007.
Considering only relative gradable adjectives and not adjectives with an absolute standard
between all and many will always be drawn in the same way, regardless of whether these individuals
are humans, non humans or inanimate entities, and regardless of the context. Second, there are infinite
borderline cases between the meaning of these expressions. In many situations, distinguishing between
a good and an excellent performance is simply impossible6. Once again, quantifiers do not exhibit this
property, as the boundary between many and all is always clear-cut and discrete, with no in-between
cases.
In addition, we note that the examples of adjective scales listed above are part of the category of
relative adjectives (Kennedy and McNally 2005). As such, they are associated with scales that are
open-ended at both their extremes (tall, expensive, warm, good are all relevant examples).7 As a
consequence of these properties, not only do they display vague boundaries between the terms on the
scale, but they also display vagueness at the boundaries of the scale itself. Vagueness at the borders of
the scale has sparked two different issues of investigation in the semantic literature: the issue of the
indifference gap for vagueness at the lower end of the scale, and the issue of adjectival extremeness for
the upper portion of the scale.
As for vagueness at the lower end, it has been pointed out (Sapir 1954) that the lower portion of
an adjective scale shades into a “zone of indifference” which separates the scale from the scale with
opposite polarity. For instance, the zone of indifference relative to good will comprise the degrees to
which neither good nor bad applies. Following this intuition, Krifka (2007) pointed out that it is
possible to find expressions that intuitively index a “mild degree” of the relevant property of the scale,
and yet are at least partially comprised in this vagueness gap. A particular class of adjectives displaying
this property, which will be henceforth referred to as weak adjectives, seems to hinge around this lower
portion of the scale which starts overlapping with the indifference area. Examples of this category
include adjectives like decent, acceptable, edible, pretty, and others. As will be shown in the second
part of the paper, it is this category of adjectives that systematically trigger implicatures with respect to
stronger expression on the same ordering.
6
In order to handle the challenged posed by vagueness, Kennedy (2007) propose that “the objects that the positive form is
true of ‘stand out’ in the context of utterance, relative to the kind of measurement that the adjective encodes”. This proposal
limits the predication of an adjective to context in which the relevant property clearly applies. However, while this solution
works for the denotation of a single expression, it does not eliminate vagueness when it comes to consider an adjective scale
in its entirety: the boundaries between the various expressions are fuzzy in nature, and there will always be countless
borderline cases between them.
7
These scales contrast with partially closed scales, which do have either a fixed endpoint (i.e. full) or a fixed starting point
(i.e. dirty), or with totally closed scales (close), which are bounded on both sides. For all these scales, the standard is not
variable, but is associated with the relevant boundary of the scale (Kennedy and McNally 2005).
Regarding vagueness at the upper end, it is well known that there is no ontological boundary to
scales ordered along dimensions such as beauty, tallness, price and the like. The question, however, is
whether there is at least a lexical maximum for such orderings. This is a particularly important question
in connection to the investigation of implicatures. Expressions atop of a scale, in fact, are more
informative than any other term on the scale and, conversely, lower terms on the scale are likely to be
perceived as underinformative with respect to the scalar maximums. As a consequence, scalar
maximums represent the alternatives that can be pragmatically negated via implicature whenever any
of the lower scalar terms is used. Various authors, including Paradis (2001) and Morzycki (2009)
argued that there is a natural class of adjectives that serve as a maximum. Examples of these adjectives,
also referred to as extreme adjectives, include excellent, gorgeous, huge, sweltering, ecstatic, brilliant.
However, their status with respect to the rest of the scale is not entirely clear. It has been pointed out
that these expressions display properties that are not typically associated with gradable adjectives. For
instance, they resist comparative constructions and intensification with very (a thorough set of
examples will be provided in section 4.1). Some authors (Morcyzki 2009 and 2010) treated extreme
adjectives as gradable and suggested that they exhibit these unexpected properties because they index
degrees that go “off the scale” and too high to be salient. Other authors, instead, (Kennedy, p.c.)
proposed that such expressions might not even considered to be gradable, and therefore are not part of
any scale. Assessing how the different nature of extreme adjectives might interact with their scalar
position and their accessibility for scalar inferences is a crucial question, which will be thoroughly
investigated in our experiment.
2.3 Adjectives and implicatures
Vagueness as defined here poses serious problems for investigating these inferences within the
category. Since the perception of differences in strength is crucial to license implicature, the lack of a
clear boundary between the meaning of stronger and weaker lexical items might make underinformativity much harder to assess, complicating the computation of the inferences. Moreover, fuzzy
boundaries pose a practical problem for designing experiments, both in terms of creating suitable
stimuli and in terms of creating contexts in which vagueness does not interfere with the inference. In
light of these features, adjectives have been largely understudied with respect to the computation of
scalar implicatures. The main experimental contribution to the topic has been provided by Doran et al.
(2008). In this study, the authors wanted to investigate whether different kinds of scales – cardinal
numbers, quantifiers, ranked orderings and indeed gradable adjectives - would be processed in the same
way with respect to scalar inferences. In particular, they were interested into how often implicatures
would “penetrate” into the truth-conditional interpretation of the sentences. Their overarching goal was
to verify whether a unified theory of scalar implicatures could be supported by processing evidence.
Stimuli for each of these scales were presented in various conditions, differing as to whether alternative
values on the scale were evoked or not8. As far as adjectives are concerned, the tested scales consisted
of triplets of terms of different strength, such as <huge, big, average>. The subjects were asked to read
dialogues like the one in (6), in which Irene asks Sam to describe her a fact (without evoking
alternative values to the SI trigger, condition (a), or with alternatives evoked, condition (b)) and Sam
utters an underinformative description of the fact containing a SI trigger. The FACT portrays an
“extreme” situation, which is supposed to require as strong an expression as possible to be adequately
described. To clarify, in the example provided below the best descriptor of the fact is supposed to be
huge (the highest on the scale), while Sam describes it as big (the middle one on the scale).
(6) <huge, big, average>
Irene: a) What size is Jeremy? / b) Is Jeremy average, big or huge?
Sam: He’s big.
FACT: Jeremy can’t fit in an airplane seat.
Is Sam telling the true?
No: SI (big --> ¬ huge) / Yes SI (big ≠> ¬ huge)
After the each stimulus, the participant is required to perform a Truth Value Judgment Task by
answering the question “Is Sam telling the truth?”. Given that the description is supposed to be
underinformative, responding “no” (i.e. saying that Sam is telling the false) would signal that the
implicature, and more generally what is implicated, did play a crucial role in the interpretation of the
sentence, while responding “yes” would mean that the sentence was only interpreted on the basis of
what is said. The results show that for adjectives SIs were calculated significantly less frequently than
for other scales. Moreover, the rate of SIs associated with adjectives significantly increased when
alternative values were evoked (version (b) of Irene’s question), showing that adjectives were the
trigger with the highest sensitivity to the discourse condition. The authors explain these findings by
8
A further manipulation was the perspective to be taken for answering the True/False question. Participants had to answer
by taking their own point of view or the point of view of an imaginary character (Literal Lucy) that was famous for
interpreting everything in a literal way. This further manipulation is not discussed here.
suggesting that adjective scales, compared to other entailment scales, are domain-dependent and have
fuzzy boundaries, and are therefore less likely to trigger scalar inferences.
While this conclusion should not be too surprising, given the peculiarities of adjective scales
discussed above, a particular fact about this experimental design is worth of notice here. Although the
stimuli always consisted of adjectives scales with three different adjectives, the description that had to
be judged by the subjects included only adjectives in the middle of the scale. This means that
implicatures triggered by the weakest term on the scale were not considered. For instance, given the
scale <average, big, huge>, the tested inference was “big --> not huge”, and never “average --> not big/
not huge”. This fact gives rise to two different concerns in relation to Doran et al’s conclusions. The
first concern has to do with the extent to which these results can be generalized. Given the a way an
entailment scale works, and given that the authors seem to suggest that <average, big, huge> is an
example of such scale9, both the lower and middle term are legitimate triggers, as they are both entailed
by the strongest term (huge). Therefore, an exhaustive investigation of how truth conditional and
implicated content interact for this scale should have included data for the low-most trigger too. What
Doran et al. showed, instead, is only limited to implicatures triggered by big and other middle
adjectives of the scale. The second concern is connected to the methodology, and specifically to the
presence of a condition in which alternative values on the scale were explicitly evoked (Is this big? vs
Is this average, big or huge? for describing a passenger occupying two seats on a plane). This
manipulation turned out to significantly boost the rate of NO responses (indicators of scalar
implicatures, according to the design), but raises the issue that evoked alternatives might have an effect
regardless of the fact that they are on the same scale or not. For instance, the fact that huge makes big
inadequate for describing a certain fact does not mean that it is a stronger term on the same scale, but
just that it is a better descriptor for the scenario, regardless of whether it is a scalar alternative to the
other one or not. As a consequence, there is no certainty that the increase of NO responses is actually
due to an increase in scalar implicature calculations, as the authors claim.
In summary, the results of Doran et al.’s investigation, while having the undisputable merit of
showing that adjective scales do not pattern with other kinds of scale with respect to the calculation of
implicatures, warrant a careful reconsideration. On the one hand, not testing the weakest terms on the
scale undermines the universality of their claims; on the other hand having explicit alternatives might
bring about effects that are not necessarily related to the computation of scalar implicatures. In our own
9
It seems at least doubtful that average could be considered to be on the same scale as big or huge. If anything, average
already entails the negation of big/huge.
investigation, which we are now turning to discuss, we avoided these issues by testing scales with
respect to all the available triggers and by deciding not to explicitly evoke any alternative in the
experimental task.
3. The experiment
3.1 Questions
As can be recalled from the previous experiment, a peculiarity of adjective scales is that they
are often composed of multiple terms with increasing strength and ordered along a continuum. As such,
they contain at least two distinct triggers for scalar inferences. Given a three-place scale such as
<decent, good, excellent>, the two potentially available inferences would be in (7).
(7) (i) decent (weakest trigger) --> not good/not excellent
(ii) good (middle trigger) --> not excellent
This situation seems to be characteristic to adjectives, as other scales either do not have multiple terms
(i.e., conjunctions) or do not exhibit such a continuum (i.e, quantifiers). In this regard, the question
informing our study concerns the interaction between vagueness, the strength continuum and the actual
computation of the implicatures. In particular, we wanted to verify whether there is a proportional
correspondence between the strength continuum and the amount of implicatures actually computed on
a certain scale. In order to look at this, we manipulated the strength of the triggers in the experiment to
assess whether weak adjectives, having the least amount of strength and informativity, trigger
implicatures significantly more often than middle adjectives, or, vice versa, there is no significant
different between the two triggers.
The other goal of this study is to assess the interaction between strength perception (and
therefore implicature calculation) and the context-sensitivity of adjectives, which is a distinctive feature
of the category. As can be recalled, Doran et al. showed that participants were more likely to infer not
huge from big when huge was made lexically explicit in the discourse context, whereas implicatures
associated with other scales did not display this kind of context-sensitivity. In out study, we wanted to
verify how manipulations of context information can impact the perception of adjectival strength and
ultimately the computation of implicatures. For instance, given the <decent, good, excellent> scale,
how sensitive is the amount of implicature triggered by decent to change in the context provided? And
how sensitive is the amount of implicatures associated with good ? In order to investigate this
question, we presented our stimuli in two different discourse contexts, one of which was designed to
stress the under-informativity of the adjective at stake (see 3.2.3). We predict that the context
manipulation should have a particularly noticeable effect for the interpretation of middle adjectives.
Since these expressions are closer in strength to strong adjectives, it could be the case that their contrast
with respect to strong adjectives would be hard to detect in a neutral context, inhibiting the calculation
of the relative implicature (good --> not excellent). However, with a context that stresses the underinformativity of the adjective, the contrast in strength should resurface, stressing the lower strength of
the adjective and facilitating the implicature. As far as other other categories of adjectives are
concerned, weak adjectives are expected to be perceived as less informative and trigger implicatures to
a higher rate than the other two in both conditions, while strong adjectives are expected to be always
perceived as more informative than the other two, regardless of the context.
3.2 Methods
3.2.1 Materials
Two factors were manipulated in the design: informativity (conditions (a) through (d)) and
discourse context (conditions (e) through (h)). The materials used in the experiment consisted of
twenty-four adjective scales in Italian. Each scale contained three adjectives of different strengths: 1) a
weak adjective; 2) a middle adjective; 3) a strong adjective. The adjectives in each scale were
provisionally assumed to form an entailment scale in Horn’s sense, with the strong adjective logically
entailing the middle and the weak ones, and these latter two possibly implicating the negation of the
former. An example of such a scale from the experiment, already mentioned above, is <excellent, good,
decent>. In addition, each scale also contained an outright contradictory adjective (bad, for the scale
considered here), which was meant to serve as baseline condition. Before proceeding with the
experiment, a pilot test was run on some native speakers of Italian to ensure that the adjectives in each
scale were clearly perceived to have different strength and to be related to the same dimension. In order
to do so, we ran classic entailment tests, predicting that sentences negating a weaker term and affirming
a stronger one (i.e “x is good but not decent”) would be ruled out as contradictory, while sentences with
the reverse pattern would be accepted (i.e. “x is decent, but not good”). Contradictory effects with the
former construction, moreover, would arise only if these adjectives are actually perceived as related to
the same dimension, and therefore as sharing a scale of some sort.
3.2.2. Informativity manipulation
As far as the strength manipulation is concerned, for each scale a two-sentence scenario was
created. The first sentence was a statement containing an adjective in one of the four conditions (weak,
middle, strong and contradictory). The second one portrayed a situation (the equivalent of Doran’s
“fact”) linked to the first one via the causative connector that’s why (per questo in Italian). The second
statement always described a rather “extreme” situation. Its function was to force a construal of the first
sentence as a cause, creating a context for which only the strong adjective would be adequately
informative, whereas the middle and the weak ones would result underinformative, although not
contradictory. Crucially, however, the strong adjective was never explicitly mentioned in the second
sentence, and was therefore never lexically activated (other than in the conditions in which it appeared
in the first sentence). More in general, no stronger or weaker alternative was ever provided in the
experiment, so that participants could only see one adjective per trial. This constitutes a key difference
with respect to Doran’s experiment, where scalar alternatives were made explicit (see also 3.2.3).
Across the four conditions, the adjective in the first sentence is the only element that changes, and
would therefore be the only factor responsible for any difference in the interpretation of the conditions
A full paradigm is shown below for the “excellence” entailment scale. The rest of the items can be
found in the appendix.
(8)
a.Mark è uno studente discreto. Per questo è stato preso ad Harvard per un dottorato. (weak)
Mark is a decent student. That’s why he has been accepted to Harvard for a Ph.D
b.Mark è uno studente buono. Per questo è stato preso ad Harvard per un dottorato. (middle)
Mark is a good student. That’s why he has been accepted to Harvard for a Ph.D
c.Mark è uno studente eccellente. Per questo è stato preso ad Harvard per un dottorato. (strong)
Mark is an excellent student. That’s why he has been accepted to Harvard for a Ph.D
d.Mark è uno studente scarso. Per questo è stato preso ad Harvard per un dottorato. (contradictory)
Mark is a bad student. That’s why he has been accepted to Harvard for a Ph.D
3.2.3 Context manipulation
For the discourse context manipulation (conditions (e) through (h)), four three-sentence
scenarios for each scale were also created. In these new trials, the two sentences described above were
preceded by a third sentence, which for practical reasons will be referred to as “context sentence”. In
the example discussed above, for instance, the context sentence was the The competition for entering
top schools is very tough (in Italian: la competizione per entrare nelle scuole più prestigiose è
estremamente serrata). The function of this new information was to build a more constrained context
for the interpretation of the sentence. This new sentence was meant to reinforce the participants to
reassess the informativity of each scenario more carefully, and to raise the threshold of informativity
required to have an adequate description. The ultimate goal was to underscore the fact that any
adjective different than the strongest one (that is, decent or good) would make the general scenario
under-informative.
With this further manipulation, there are now 8 conditions for each scale, resulting from a
2(context) x 4 (adjective strength) design. The eight stimuli in each adjective scale were distributed into
eight different lists with a Latin Square design, so that each list had 24 experimental sentences and 30
fillers. A full paradigm for an adjective scale is the following (note that conditions a-d are identical
with the previous experiment).
(9)
(a-d) Without context sentence:
Mark è
uno studente {(a).discreto/(b).buono/(c).eccellente/(d).scarso}. Per questo è stato
preso ad Harvard per un dottorato.
Mark is a {(a). decent/ (b) good /(c) excellent / (d) bad} student. That’s why he has been accepted to
Harvard for a Ph.D
(e-f) With context sentence
La competizione per entrare nelle università più prestigiose è estremamente serrata. Mark è uno
studente {(a).discreto/(b).buono/(c).eccellente/(d).scarso}. Per questo è stato preso ad Harvard per un
dottorato.
The competition for entering top schools is very tough. Mark is a {(a). decent/ (b) good /(c) excellent
/ (d) bad} student. That’s why he has been accepted to Harvard for a Ph.D
Moreover, each trial contained a critical word (“Harvard”, in bold face). The critical word is the first
lexical item which provided a precise idea about the plausibility of the of the causal connection
between the last sentence and the preceding one/ones, and it was never the last word of the sentence.
In addition to these sentences, 30 fillers were also created. They had the same two-sentence structure as
the test sentences and introduced a cause-effect relation as well, but lacked the target adjective. The
fillers were equally divided between plausible and blatantly implausible cause-effect scenarios:
(10a) Plausibile filler:
Oggi piove a dirotto. Per questo prendo l’ombrello.
Today it’s pouring down. That’s why I grab the umbrella
(10b) Implausible filler
Marco ha mangiato 8 kg di tortellini. Per questo ha ancora fame
Mark has eaten 8 kilograms of tortellini. That’s why he’s still hungry
The four stimuli in each adjective scale were distributed into four different lists with a Latin
Square design and presented in a randomized order, such that each participant saw only one condition
for each scale. The fillers were the same for every list and were mixed with the experimental sentences.
Each subjects, therefore, saw 54 trials in total.
3.2.4 Participants
42 native speakers of Italian of age comprised between 18 and 40 participated to the
experiment. 22
subjects were graduate or undergraduate students affiliated to the University of
Chicago, and 20 were students affiliated to an Italian university or already in possession of a B.A.
diploma. Subjects were recruited through announcements posted online, personal email
communications and word-of-mouth advertising. Subjects tested in Chicago were paid $ 10, while
subjects tested in Italy were paid 5 Euros.
3.2.5 Procedure
The eight experimental lists were coded with the support of the software E-Prime. The
experiment combined an acceptability rating with a self-paced reading task. In each scenario, including
fillers, the first or the first two sentences (the one containing the enriched context, in (e-h), and the one
containing the adjective) appeared on the screen as a whole chunk, whereas the last sentence – the one
introduced by the causative – could be read word by word by pressing the space bar. In this way,
reading times for each word, including the Critical Word, could be recorded. After reading the last
word of the second sentence, subjects were asked to provide an acceptability rating on the plausibility
of the scenario. The question was “How sensible is the second sentence given the first one?” (In Italian:
Quanto è sensata la seconda frase rispetto alla prima?), and participants were asked to answer by
providing a score ranging from 1 (completely nonsensical) to 5 (perfectly sensible). Besides explaining
the task, instructions given at the beginning of the experiment invited participants to leave aside
markedly ironic or sarcastic interpretations, and to evaluate each scenario on the basis of general
common sense. Moreover, they also explicitly asked the participants to focus on the content of the
sentences, and to disregards style, form or orthographic errors they should encounter. These additional
specifications were made to ensure that acceptability judgments were given with the same criteria
across all the subjects. Response times for each scenario – that is, how long it took the subjects to
provide the acceptability score – were also recorded.
3.2.6 Predictions
In this study we assess the calculation of implicatures in a continuous rather than in a discrete
way. As far as judgments are concerned, we anticipate inferences to degrade the acceptability of the
scenario in presence of an underinformative adjective. Therefore, we assume the amount of
implicatures to be inversely proportional to the acceptability rating. The more subjects draw scalar
inferences on under-informative adjectives (weak and middle condition), the less acceptable the
sentences containing them should be with respect to those containing strong adjectives. This measure
allows us to test our research questions: (i) whether both weak and middle adjectives are perceived as
under-informative, and therefore triggered implicatures with respect to the strong ones; (ii) whether
weak adjectives are perceived as more underinformative, and therefore trigger more implicatures than
middle adjectives; (iii) whether context has any effect on the perceived under-informativity of middle
adjectives. For instance, given the usual <decent, good, excellent> scale, looking at the acceptability
score would allow us to evaluate for both contexts conditions (i) if the scenario containing decent and
the scenario containing good are both lower in acceptability than the scenario containing excellent and
(ii) if, in turn, the weak adjective decent is associated with lower scores than the middle trigger good
and (iii) if middle adjectives are associated with significantly lower acceptability judgments in
presence of the context condition. Crucially, the relation between acceptability and implicatures is
exclusively limited to under-informative adjectives. In fact, we expect the contradictory adjective to
record a significantly lower rating than any other condition. This would not reflect a higher amount of
implicatures, of course, but merely the fact that contradiction is more disruptive than underinformativity.
As far as response and reading times are concerned, we also predict that drawing scalar
implicatures would cause a slowdown in the processing of these sentences. As to the exact point at
which this should emerge, it is still unclear whether inferences are drawn online, as the sentence is read
in real time, or at the end of it, after it has been completely processed. Evidence for both hypothesis can
be found in the literature (for the former hypothesis, see Nieuwland, Ditman & Kuperberg 2010; for
the latter see Bott and Noveck 2004). We predict that, if implicatures are processed in real time, we
should observe longer reading times at the Critical Word on (i) weak and middle adjectives with
respect to strong ones and eventually (ii) on weak adjectives with respect to middle ones and (iii) on
weak and middle adjectives in presence of the context sentence. On the other hand, if implicatures are
processed at the end of the sentence, we should expect longer response times when it comes to provide
the acceptability judgment for the same conditions listed above.
3.3 Results
The results relative to four subjects were excluded as outliers. For the rest of the participants, the main
effect of each of the two independent variables - strength of the adjectives and presence/absence of the
context sentence - was tested by carrying out two way ANOVA. As expected, contradictory adjectives
were correctly perceived as such by subjects and received a significantly lower acceptability judgment
than every other category. Therefore, we will not consider them further in the analysis reported below.
3.3.1 Acceptability judgments
The results for the acceptability judgments are reported in table 1 and illustrated in graph 1.
Table 1: acceptability judgments
Condition
Response
(1-5)
a. Pretty
(weak)
3.16
No context sentence
b. Beautiful c. Wonderful
(middle)
(strong)
3.94
4.11
d. Ugly
(contrad)
1.40
e. Pretty
(weak)
2.54
Context sentence
f. Beautiful
g. Wonderful
(middle)
(strong)
3.34
3.33
h. Ugly
(contrad)
1.59
A main effect of adjective strength was found, with F(2,76) = 31.7 (p < .0001) by-subject and F(2,46) =
20.0 (p< .001) by-item. A main effect of context was also found, with F(1,38) = 41.7 (p < .001) bysubject and F(2,46) = 28.2 (p< .001) by-item. On the contrary, no interaction was found, with F (2,76)
= 0.6 (p > 0.5) by-subject and F (2,46) = 0.6 (p> 0.5) by-item.
As far as strength is concerned, paired t-tests showed that judgments for the contradictory
condition (d and h) were significantly lower than those for the other conditions, as expected (p<0.001
for both). Within the other three conditions, weak adjectives turned out to be judged significantly lower
than the other two, as paired t-test comparison showed, both with and without the context sentences.
For the weak vs middle comparison, t(38)= 5.2 (p<.001) without context sentence and t(38) = 4.9
(p<.001) with context sentence. For the weak vs strong comparison, t(38)= 6.1 (p<.001) without the
context sentence (a vs c) and t(38) = 4.8 (p<.001) with the context sentence (e vs f). No significant
difference, instead, was found for the corresponding comparisons between middle and strong
adjectives. As far as context is concerned, paired t-tests showed that for all conditions, except for the
contradictory one, the version with context was deemed significantly less acceptable than the version
without context. For weak adjectives, t(38)=3.8 (p<.0001). For middle adjectives, t(38)=4.0 (p<.0001).
For strong ones, t(38)=5.4 (p<.0001).
Acceptability rating
4,5
4
3,5
3
2,5
2
1,5
1
0,5
0
without context
with context
Weak
Middle
Strong
Contradictory
Graph 1: acceptability judgments. The y bar indicates the average 1-5 judgment. The x bar shows the
four different conditions, each with data for the context sentence and without the context sentence.
Significant differences: Context vs no context for all conditions. Weak vs middle/strong with and
without context.
3.3.2 Response times
We now take into consideration response times for the acceptability judgments, reported in table
2 and plotted in graph 2. This measure indicates the amount of time necessary for the subjects to
evaluate the informativity of the scenario. Response times above 8000 ms were removed for the data
analysis.
Table 2: response times
Response
time (ms)
a. Pretty
(weak)
3190
Condition
No context sentence
b. Beautiful c. Wonderful
e. Pretty
(middle)
(strong)
(weak)
2656
2639
2901
Context sentence
f. Beautiful
g. Wonderful
(middle)
(strong)
3090
3288
No main effects of strength or context were found. For strength, F (2,76) = 0.5 (p >0.5) by-subject and
F (2,46) = 0.4 (p > 0.5) by-item. For context, F (1,38) = 2.2 (p > 0.05) by-subject and F (2,46) = 1.9 (p
> 0.05) by-item. However, a significant interaction effect was found, with F (2,76) = 3.3 (p <0.05) bysubject and F (2,46) = 5.2 (p<0.01) by-item.
Paired t-test comparisons showed a strength effect among the conditions without context (a-c),
with weak adjectives recording significantly longer response times than middle and strong. As far as
the weak vs middle comparison is concerned, t (38)= -2.2 (p<0.05). For the weak vs strong comparison,
t(38) = -2.4 (p<0.05). No difference was found between middle and weak adjectives. As far as context
is concerned, the condition with the additional context sentence had no effect on weak adjectives, but
recorded significantly longer response times for middle and strong ones. For middle adjectives, t(38) =
-1.95 (p<0.05). For strong adjectives, t(38) = -3.3 (p<0.001).
Response RT to answering the questions
4000
3500
3000
2500
without context
with context
2000
1500
1000
500
0
Weak
Middle
Strong
Graph 2: response times for giving the acceptability judgment. The y bar indicates the average time in
milliseconds. The x bar shows the three different conditions that were considered in the analysis, each with data
for the context sentence and without the context sentence. Times for the contradictory condition were not
considered. Significant differences: Middle with context vs middle without context; strong with context vs
strong without context; weak without context vs middle/strong without context.
3.3.3 Reading times at the Critical Word
The third kind of data collected in the experiment were reading times at the Critical Word. Both the
Critical Word region and the two regions immediately following it were checked. No significant effect
was found with respect to reading times in any of these regions
3.4 Discussion
The most robust, and perhaps surprising, pattern that we observe is that, for each scale, there is
a clear divide between weak adjectives on the one hand and middle/strong adjectives on the other hand,
which are not distinguishable from each other. It is interesting to notice that this situation does not
change even under the effect of the context manipulation. All three categories of adjectives are
degraded by the same amount in presence of the context sentence, so that the configuration with it is
exactly the same as the configuration without it: weak adjectives are deemed significantly lower than
middle and strong ones, and strong ones pattern exactly in the same way. I will now turn to discuss the
strength and context effects in greater detail.
3.5.1 Strength effect
As can be recalled, the function of the informativity manipulation (see 3.2.2) was to verify
whether there is a correlation between the strength continuum of the adjectives on a scale and the
amount of implicatures actually triggered by these adjectives. If this is the case, we expect that, given
the strength continuum “weak < middle”, the corresponding amount of triggered implicatures should
therefore be “weak > middle”. However, this prediction was not borne out. While weak adjectives did
generate implicatures, middle ones did not and were interpreted exactly as the strong ones and did not
trigger any inference.
Weak adjectives were consistently rated lower than the other two, both with and without the
additional context sentence. As we discussed in 3.2.6, we take this to be evidence that they were
perceived as under-informative, and that they triggered scalar implicatures to the negation of the
stronger expressions on the scale. Crucially, their ratings were still significantly higher than those for
the contradictory condition. This showed that having a contradictory adjective made the whole scenario
completely unacceptable, as we expected. This also meant that speakers captured the difference
between mere contradiction, which made the scenario completely unacceptable, and underinformativity, which only made it “less-than-perfectly-fine”, but still potentially consistent with the
facts, as predicted by the theory of implicatures. On the contrary, the behavior of middle adjectives,
which constitute the other potential trigger for implicature, turned out to be basically identical to that of
strong adjectives, suggesting that the two types of expressions were processed in the very same way
and no scalar inference was drawn on these expressions with respect to the stronger terms.
This pattern is confirmed by data concerning response times. In the conditions without context
sentence, scenarios containing weak adjectives caused a significant delay in the attribution of the
acceptability judgment (graph 2), suggesting that some cognitive overload was actually present, and
that it surfaced at the end of the sentence. On the contrary, scenarios containing middle and strong
adjectives always required the same time to be evaluated with respect to their acceptability, once again
suggesting that there was no difference in processing between the two classes.
From a theoretical standpoint, the fact that only one of the two potential triggers actually gave
rise to these inferences poses a puzzle to an account of scalar inferences based on entailment scales. If
each of the adjective triplets really constitutes a scale, then why do middle adjectives not implicate the
negation of the stronger terms, much like many does with all in the <some, many, all> scale tested by
Doran et al.? One explanation would be that, along the lines suggested by Doran, a peculiar property of
adjectives, namely vagueness, negatively affects the possibility of drawing inferences in spite of their
entailment-based ordering. Since the borders between the meaning of adjectives are blurry and flexible,
the interpretation of an adjective can be accommodated upwards without excluding stronger scalar
values, making the detection of under-informativity virtually impossible. As the authors put it, “a hot
day may be a felicitous description of a sweltering day, while Many people had a good time is
generally an infelicitous description of everyone having a good time” (Doran et al., 23). However, there
is a major problem to this account. If what blocked implicatures from middle adjectives is just
vagueness, we should expect the context sentence to raise the amount of implicatures associated with
these expressions, as the manipulation is supposed to mark the boundary between middle and strong
adjectives in a more clear way and stress the under-informativity of the former with respect to the
latter. However, data for the relevant condition show that this is not the case, as there is no more
differentiation between middle and strong expression with the context sentence than without it (see
3.5.2). The explanation that we are going to propose in section 4 and 5 is of a different nature, and will
hinge on the distinction between semantic and pragmatic scales discussed in section 2.1. In particular,
we are going to show that only weak and middle adjectives share a lexical scale, while strong
adjectives, being non gradable, are connected by the other two in a looser way. For this reason they are
not immediately accessible alternatives for the computation of implicatures.
3.5.2 Context effect
The function of the context sentence was to invite participants to assess the informativity of
each scenario more carefully and, ultimately, raise the threshold of strength required by the scenario to
be adequately described. In our predictions, this would make implicatures easier to calculate for underinformative adjectives. On the one hand, we expect that weak adjectives, which already trigger
implicatures to a significant extent in the neutral context, would be perceived as even more underinformative, generating a higher amount of inferences. On the other hand, we expect the context
sentence to bring to surface the distinction between middle and strong adjectives and finally push
speakers to trigger implicatures from the former to the negation of the latter.
To our partial surprise, while the prediction with regard to weak adjectives was borne out, the
one with respect to middle ones was not. In fact, we observe a generalized, sharp decrease across
conditions, as every class of adjectives is downgraded to exactly the same extent. For weak adjectives,
this shows that their under-informativity, already detected without the context sentence, was even more
apparent, pushing the scenario further down on the acceptability scale. The puzzling data point, instead,
concerns middle adjectives, whose acceptability value keeps being identical to that of strong adjectives.
If the sharp degrading involved middle adjectives alone, with strong adjectives retaining a higher
acceptability value, one could argue that, when prompted to consider informativity more carefully,
subjects finally ended up calculating the implicature on mid-scale expressions as well. However, while
context did degrade the acceptability rating on middle adjectives, it also degraded the acceptability of
extreme ones to the same extent, keeping the two categories undistinguishable. This effect undermines
the idea that context facilitated the implicatures on middle adjectives: if strong adjectives constitute the
alternative that is negated by that the inference, they cannot under any circumstance receive the same
acceptability score as the implicature triggers. Whatever the reason of the degrading on middle
adjectives is, it cannot be the computation of an inference to the negation of strong adjectives.
Context also had an effect on response times. Here, contrary to what happened with the
judgments, the three categories were not affected in a homogenous way. For weak adjectives there
were no significant differences between the two conditions. If anything, there is a tendency for the
condition with the context sentence to require a shorter time, in spite of the increased rate of
implicature calculation. While at first glance this finding appears to be contradictory - we expect lower
judgment means and higher implicature rates to correlate with higher response times -, it could be the
case that the context sentence made implicatures more available and ultimately easier to calculate,
reducing (or, at least, not increasing) the cognitive effort associated with them. Where there is a clear
effect of context, instead, is with middle and strong adjectives, which once again behave similarly. For
both these categories, there turned out to be a significant increase in the response time with the context
sentence, suggesting that the additional information brought about some extra processing cost.
However, as hinted above, it is very unlikely that this delay is due to implicature calculation for at least
two reasons. First, if this really were an implicature, we would first expect a difference in ratings
between middle and strong adjectives to arise, which is not the case. Second, it is hard to individuate
the stronger expression that would be negated by the implicature in this case. It cannot be the strong
adjective itself, as it shows a degraded rating too. And even if it were an intensified version of the
extreme adjective (something like downright gorgeous), this would not explain why such effect, not
even in a slighter form, never surfaced in the no context condition.
In general, these results constitute a problematic point for a theory that treats adjective scales as
entailment-based orderings, as it shows that scale-mates with obviously different strengths end up
being processed in the same way. What the experimental results suggest, in fact, is that middle and
strong adjectives, in a context which evokes only one of the at the time, are interpreted as independent
and relatively interchangeable objects. In other words, seeing good or excellent did not make any
differences for the speakers in interpreting the scenarios. Should this lead us to conclude that good and
excellent are, in fact, equivalent? Unless one is willing to give in to this counterintuitive solution, the
alternative way out of this puzzle is hypothesize that, contrary to our initial assumptions, middle and
strong adjectives might just not constitute an entailment-based ordering from a semantic point of view,
as we are going to argue in section 4. Under this view, the fact that good and excellent receive the same
acceptability judgments and are affected by the context manipulation to the same extent is no longer
problematic. Since these objects are now seen as semantically independent, we should not expect the
interpretation of good to be intrinsically connected to the one of excellent. Rather, the two adjectives
should not evoke each other as two semantic scale-mates would do. Therefore, given that in the
experimental task only one adjective at a time was used and its potential alternatives were never made
explicit, it is reasonable that middle and strong adjectives received the same acceptability judgment,
both with and without the context sentence
3.5.3 Interim conclusion
In the experimental results, middle and strong adjectives could not be teased apart under any
condition and any measurement (ratings and response times) that we adopted. The initial assumption of
this experiment is that the adjectives in each triplet were ordered along the same dimension and formed
a proper entailment-scale. According to this view, scale-mate adjectives only differed for the degree of
the relevant property that they were associated with, and were otherwise assumed to have the same
semantic properties and be part of a homogenous class. The results of the study, however, are not
consistent with the predictions of such account, and cast some doubts on the idea that adjective scales
as described here are proper entailment-based orderings in Horn’s sense. One way out of the puzzle
would be to claim that there is no actual difference in strength between mid-scale and high-scale
adjectives. However, this would be at best counterintuitive. The other solution, and the one we are
going to argue for in the remainder of the paper, would be to revise the initial assumption, showing that
what we thought to be an homogenous scale is actually a more complex object. In particular, we will
propose that, while low-scale and mid-scale terms actually form an entailment based pair, extreme
adjectives are not part of this semantic scale, but only of a looser, less specific pragmatic ordering.
While good and excellent can be considered two ways to describe the same dimension with different
strength, and therefore share a scale in a pragmatic sense, they do not share a semantic scale, as they
use entirely different semantic representations. In particular, we are going to propose that the one for
good and middle adjectives makes reference to degrees, while the one for excellent and strong
adjectives doesn’t. Because of this, middle and strong adjectives did not evoke each other as scalar
alternatives in the experimental task.
4. Categories of adjectives
Before outlining the details of this proposal, we temporarily leave aside the experiment to explore a
series of independent semantic/pragmatic facts related to these three categories of adjectives. As it
turns out, the facts discussed below pattern in a way that is consistent with the experimental results and
suggest that there is more than a mere strength difference between the three categories. In particular we
will focus on two main generalizations: the non gradable nature of strong adjectives and the resistance
of middle adjectives towards modifiers that presuppose the presence of a scalar alternative. We argue in
section 5 that these facts are evidence that extreme adjectives should not be considered to part of the
lexical scale to which weak and middle ones belong.
4.1 Weak adjectives
These adjectives were those which triggered implicatures more consistently. Members of this
group in Italian are discreto, decente, accettabile, benestante, carina, but a similar analysis can apply to
their English counterparts decent, acceptable, well-off, pretty and others. Contrary to what is the case
for strong adjectives (see 4.2), there is no mention nor theoretical analysis of low-scale adjectives in the
literature. In this section I will point out several facts and patterns concerning these expressions. A
series of semantic/pragmatic tests help us distinguish this kind of adjectives from the others and show
that the relation that holds between weak and middle adjectives is much different, and ultimately
tighter, than the one that holds between middle and extreme adjectives.
The first observation comes from the interaction of these adjectives with modifiers like a
malapena (“barely”) and almeno (“at least”), which target the lower portion of the scale and evoke
potential alternatives. In every context, low-scale adjectives are felicitous with these expressions,
showing that their occurrence easily evokes stronger alternatives on the scale. Middle adjectives,
however, are not generally felicitous and do not seem to evoke stronger counterparts10. Only upon
explicit mention of an extreme adjective in the context of utterance (13) can they combine with these
expressions. And even in this case, middle adjectives are still less natural than weak adjectives. Their
use in this construction seems to suggest a metalinguistic judgment, rather than a proper evaluation
over degrees.
(11a) Per essere ammessa al festino, devi essere almeno {carina/??bella/??splendida}.
To be admitted to the party, you must be at least {pretty /??beautiful /?? gorgeous}.
(11b) Sei a malapena {carina/??bella/??splendida}, non può essere invitata al festino.
You are barely {pretty/ ??beautiful/ ??gorgeous}, you can’t be invited to the party.
(12a) Per sederti a questo ristorante devi essere almeno {benestante/??ricco/??miliardario}.
To sit at this restaurant, you must be at least
{well off/?? rich/?? billionaire}
10
The only way in which middle adjectives in (11-12) could be acceptable is by forcing a reading in which “at least”
evokes alternatives to the dimension picked out by the scale. For instance, (11) could be read as “you must be at least
beautiful, assuming that you are not smart or funny”.
(12b) Sei a malapena {benestante/??ricco/??miliardario}, non puoi sederti a questo ristorante.
You are barely {well off/ ??rich/ billionaire}, you can’t be seated at this restaurant.
(13) Per essere invitata al festino devi essere splendida, ma tu sei a malapena bella
To be invited to the party you must be gorgeous, but you are barely beautiful
Another pattern is provided by expressions that reinforce the implicature, like nulla più in
Italian (nothing more, in English). We expect that these expressions can combine with adjectives that
have a stronger counterpart on the scale. However, we also observe that, when these are used out of the
blue, with no specific context built around the sentence, this expression is much more natural with
weak rather than with middle adjectives. Once again, this suggests that, while the former seem to
always evoke a stronger alternative (14a-b), the latter can do so only upon explicit mention of an
extreme adjective (see (14c)). (14c), as observed for (13) ad well, also has a metalinguistic flavor.
(14a) Sofia è {carina / ??bella}, niente più
Sofia is {pretty / ??beautiful}, nothing more.
(14b) Queste condizioni di lavoro sono {accettabili /??valide}, nulla più.
These working conditions are {acceptable / ??valid}, nothing more
(14c) Cercavo delle condizioni di lavoro eccezionali. Ma queste condizioni sono valide, nulla più.
I was looking for excellent working conditions. But these are valid, nothing more.
The third fact that we observe is that low-scale adjectives cannot be used to express properties to
particularly high degrees. As a consequence, they are odd with modifiers such a as estremamente11
(“extremely”), a livelli massimi (“to a crazy extent”), straordinariamente (“extraordinarily”). Middle
adjectives like good can instead combine with such modifiers. This pattern seems to suggest that, while
the use of hyperbolic/expressive modifiers is blocked with low adjectives for the salient presence of
stronger scale mates, middle adjectives do not seem to have a salient stronger counterpart that prevents
them from combining with the aforementioned expressions. In other words, if the oddness of extremely
decent can be explained by the fact that a very similar meaning could have been expressed by a
stronger scalar mate like good, then one should also expect that extremely good (and similar
expression) would be blocked by the presence of an extreme adjective like excellent. But this does not
11
Here, we are not referring to sentential uses of incredibilmente, in which the adverb does not combine with the adjective
but operates on the whole speech act. Such cases are obviously compatible with any kind of adjective, but are not our focus
here.
happen, suggesting that the availability of excellent as an alternative to good is not as salient as the
availability of good as an alternative to decent.
(15a)Le condizioni di questo lavoro sono incredibilmente {??accettabili / valide}
These working conditions are unbelievably
{??acceptable/ valid}
(15b) Questa pasta è estremamente {??mangiabile/buona}.
This pasta is extremely
{??eatable/good}
Finally, we observe that weak adjectives support degree modification and comparison more smoothly
than strong adjectives12 (see 4.2), as examples in (16) show. This is a crucial property that ties together
weak and middle adjectives and distinguishes them from the strong ones. If weak and middle adjectives
are both gradable, then it makes sense to assume that they share a scale in a semantic sense, as they are
both take degree arguments in their semantic representation. By the same token, if strong adjectives are
not gradable (see 4.2), then it trivially follows that they could never be part of the same semantic scale,
intended as an ordering of degrees, to which weak and middle adjectives belong. At best, they can
share a scale of pragmatic nature, as we will suggest below.
(16a) Sofia è più {carina / ??splendida} di Giovanna
Sofia is { prettier / more gorgeous} than Giovanna.
(16b) Sofia è abbastanza / molto / piuttosto carina {carina /?? splendida}
Sofia is quite / very / rather pretty / gorgeous
(16c) Queste condizioni di lavoro sono più {accettabili / ??straordinarie} di quelle
These working conditions are more {acceptable / ??fabulous} than those.
(16d) Le condizioni di lavoro sono abbastanza / molto / piuttosto {accettabili / ??favolose}
The working conditions are quite / very / rather {acceptable / ?? fabulous}
Summarizing the discussion in this paragraph, it turns out that weak adjectives are gradable, are
marked for having low strength and seem to evoke stronger alternatives more easily than middle
adjectives do. This asymmetry mirrors the asymmetry of the test results, and is going to be crucial in
developing our account of the data. As for the gradability of weak adjectives, I am going to suggest that
12
It must be said that, while low-scale adjectives generally interact with degree modification/comparison better than
extreme ones, some low scale adjectives seem to support these phenomena better than others. Moreover, it is also the case
that, in general, middle adjectives are those which support degree modification best.
this property, together with the fact that middle adjectives are also gradable, confirm the intuition that
both these classes make reference to degree representation (contrary to strong adjectives, see 4.2), and
therefore that they share a scale in a semantic sense. As for the fact that weak adjectives evoke middle
adjectives as alternatives regardless of the contest, this suggests that these two categories are closely
tied together, and that their difference in strength is encoded at a grammatical level. Conversely, the
fact that middle adjectives do not normally evoke strong adjectives as alternatives suggest that the
relationship between middle and strong adjectives is of a looser nature than the one between weak and
middle adjectives. Again, this is consistent with the experimental finding that middle adjectives did not
give rise to implicatures to the negation of stronger ones.
4.2 Strong adjectives
It has been suggested in the literature (see 2.3) that adjectives indexing a particularly high
degree of a property form a natural class on their own, as they display a series of properties that are not
shared by other gradable adjectives. A non exhaustive list of such adjectives in English includes
gorgeous, gigantic, fantastic, excellent, huge. Crucially, the strong adjectives that we used for all the
scales in the experiment are part of this category. Paradis (2001) introduced the term extreme adjectives
for the category, although its full formalization is due to Morzycki (2009 and 2010). In light of this, the
expressions “strong adjectives” and “extreme adjectives” will be used in an interchangeable way for the
remainder of the paper. Here, we provide evidence that these adjectives, contrary to the proposals
suggested by Paradis and Morzycki, are in fact non gradable, and therefore do not make use of degree
arguments. In light of this and of the patterns discussed in 4.1, we suggest that strong adjectives do not
share a scale with weak and middle ones in a semantic sense, but merely in a pragmatic one. Before
outlining our proposal, however, we review Morzycki`s analysis of these expressions.
Morzycki notes that these adjectives show patterns that should be unexpected for gradable
adjectives. First, they take their own class of modifiers (simply, flat-out, downright, balls-out and
more), which are not as acceptable with other non extreme gradable adjectives (17). Second, they are
resistant to explicit comparative constructions (18). Third, they are resistant to intensification with
“very” and other degree modifiers (19). All the examples here are from Morzycki 2010, and they apply
to Italian as well. In addition to these properties, we add the observation that these adjectives are also
resistant to how and other measure phrases (20).
(17) a. simply gigantic/??big
b. just gorgeous/??pretty
It: semplicemente gigantesco/??grande
It. a dir poco stupendo / ?? bello
c. flat-out crazy/??sane
It. Proprio matto / sano
d. ﬂat-out excellent/??good It. Proprio eccellente / adeguato
(18) a.? Godzilla is more gigantic than Mothra. It. ?? Godzilla è più gigantesco di Monthra
b.? Monkeys are less marvelous than ferrets. It. ?? Le scimmie sono meno meravigliose dei furetti
(19) a. very ??excellent/ ??marvelous/ ??fantastic/ good It. molto ??eccellente /??meraviglioso / buono
b. very ??gigantic/ big
It. molto ??gigantesco / grande
c. very ?? gorgeous/pretty
It. molto ??meraviglioso / bello
(20)a. How ??excellent/good is this performance?
It. Quanto è ??eccellente/buona questa
performance?
b. How ??gorgeous/pretty is that girl?
It. Quanto è ??splendida/bella quella ragazza?
c. How ??gigantic/big is that monster?
It. Quanto è ??enorme / grande quell mostro?
As an explanation to these facts, Morzycki proposes that extreme adjectives “impose that a degree be
so great as to exceed any of the degrees that are at-issue in the discourse” (Morzycki 2009, 7).
According to him, the reason why they resist comparatives and intensification with very is that making
fine-grained distinctions among degrees that are so high on a scale becomes pragmatically irrelevant.
Although extreme adjectives are located on the same lexical scale as their non-extreme counterparts
(hence, the entailment patters excellent => good), they are off the perspective scale, which is
constituted of all the degrees that are contextually salient in a given situation. Hence, the oddness of the
examples proposed above, which all attempt at providing further, contextually irrelevant distinctions
given the perspective scale. In Morzycki’s account, the denotation of a DP containing excellent is
provided in ()13. It can be seen that excellent is using the very same standard as good, with the further
requirement that that highest degree salient in the context (max(C) in the denotation) be exceeded as
well.
(21) [| [DegP pos [AP excellent]] |] = λx. big(x) > s(big) ^ big(x) > max(C)
Along these lines, Morzycki suggests that those modifiers which typically combine with these
adjectives, which he calls extreme degree modifiers, must modify the relevant context by introducing a
domain-widening operation on the domain of available degrees. This operation can retrieve the degrees
13
For uniformity, I reported the denotation using the same formalism and the same type theoretical assumptions held by
Kennedy 2007 and used in section 2.2 of this paper. Morzycki slightly departs from these in assuming that gradable
adjectives are of type <e,dt> and not <d,t>, and that POS is of type <<e,dt>et> instead of <dt,et>, and therefore his
formalism is slightly different than the one used here. These differences, however, are not relevant to the main point of this
paper and will therefore be set aside.
out of the perspective scale and bring them back into “at-issueness” such that the degrees out of context
become accessible again, and can therefore be targeted by the extreme degree modifiers. Since this
domain widening operation would result to be redundant with degrees that are already at issue, extreme
degree modifiers are not felicitous with non-extreme adjectives. By contrast, in Morzycki’s view,
degree modifiers like very do not bring about such a domain widening operation. As they can only
modify degrees that are already contextually salient, they are not able to access those that are off the
contextual scale. For this reason, they are considered to be ineffective with extreme adjectives, whose
degree is by definition beyond contextual salience.14
While this account appears to be a plausible explanation of these data, it is not entirely
consistent with some further facts that can be observed with respect to the category. First of all, a
prediction of Morzycki’s theory would be that, if the unacceptability of degree modification with
extreme adjectives is only of pragmatic nature, degree constructions should significantly improve in
contexts which make these extreme values salient. But examples in (22) show that this is not the case.
The use of a non-extreme expressions, instead, always works fine and seems to be a significant better
alternative to the extreme adjective.15
(22a) Sia Kobe Bryant che Magic Johnson sono giocatori di basket eccellenti. Ma Michael Jordan è
più {??eccellente / bravo} di loro.
Both Kobe Bryant and Magic Johnson are excellent basketball players. But Michael Jordan is
{??more excellent / better} than them
(22b) ?? Kobe Bryant e Larry Bird sono eccellenti, ma Michael Jordan è molto eccellente.
?? While Kobe Bryant and Larry Birds are excellent, Michael Jordan is very excellent.
(22c) Kobe Bryant è eccellente. E quanto è {?? eccellente / bravo} Michael Jordan?
Kobe Bryant is excellent. And how {??excellent / good} is Michael Jordan?
14
Other authors suggested that these adjectives are mapped onto sub-scales with a different structure than the scales they are
supposed to be part of. Rett (2008), for instance, suggests that extreme adjectives are lower-bounded, as they correspond to
subscales with a minimum point. Interestingly, however, lower-bounded scales are typically associated with minimum
standard adjectives (e.g. dirty, opaque), whereas scales of beauty or goodness, to which adjectives like excellent and
gorgeous are supposed to belong, are considered to be unbounded and with a relative standard (Kennedy and McNally
2005). While Rett’s discussion of extreme adjectives is only a marginal footnote in a work dedicated to different topics, her
hypothesis confirms the idea that this class of expressions has something inherently different from other adjectives and
requires a special theoretical treatment
15
Morzycki points out that adding the focus particle “even” would significantly improve the use of extreme adjectives in
comparatives like those in (13). While this is true, it is also true that “even” seems to license comparison with extreme
adjectives also out of the blue, regardless of whether extreme degrees have been made salient in the discourse or not. For
instance, “Michael Jordan is even more excellent than Kobe Bryant” seems better than “Michael Jordan is more excellent
than Kobe Bryant”. In light of this, the improvements brought about by even could be due to the specific semantic work
carried out by the particle, and not so much about degrees being already salient or not. Our suggestion, intuitively, is that
“even” could introduce a coercion operation that turns extreme adjectives in gradable adjectives, or that it is forcing some
kind of metalinguistic comparison.
Therefore, there should be something more than just discourse salience that makes comparatives and
intensification with very odd for these expressions16.
A second challenge is presented by extreme degree modifiers, those expressions that seem to
combine only with extreme adjectives. Morzycki’s argument is that they constitute a proof of their
gradability, by bringing about a domain-widening operation which retrieves degrees that would
otherwise be so extreme to be out of context. However, as Morzycki himself suggests, the
aforementioned modifiers seem to encode, at least to a certain extent, some kind of expressive
meaning. For instance, they are resistant to negation (23a), they provide very little truth conditional
contribution for contrastive constructions (23b.1), and they are difficult to paraphrase, as in (23.c) (for
other properties of expressive meaning, see Potts 2007).
(23a) ??Kobe Bryant non è proprio eccellente (intended: è solo eccellente)
??Kobe Bryant isn’t downright excellent (intended: he is only excellent)
(23b.1)
(23b.2)
(23.c).
?? Si dice che Kobe Bryant sia proprio eccellente, ma io direi che è solo eccellente.
?? While Kobe Bryant is claimed to be downright excellent, I would say that he is just excellent
Si dice che Kobe Bryant sia molto bravo, ma io direi che è solo bravo.
While Kobe Bryant is claimed to be very good, I would say that he is just good.
Kobe Bryant is downright excellent (≠ the degree to which Kobe Bryant is excellent would
exceed the degree to which a player is excellent)
If extreme degree modifiers have a significant expressive component, then a crucial argument
for the gradability of extreme adjectives, namely the fact that their degrees can be modified provided
appropriate domain widening, is seriously weakened. As is well known, expressive meaning is not
particularly sensitive to scale structure, nor to gradability. Therefore, the fact that this class of
intensifiers can combine with extreme adjectives cannot be taken as evidence that extreme adjectives
encode a scale. If anything, together with the fact that, by contrast, true degree modifiers cannot
combine with these adjectives, this suggests that extreme adjectives are not gradable.
16
Note that these facts would be problematic even for an account a-la Paradis, according to whom extreme adjectives are
lexical maximums of scales. This story would account for the resistance to very, but would fall short of explaining the
resistance to comparatives and how phrases. Comparatives and how phrases are indeed perfectly licensed by adjectives
associated with upper-closed scales, such as full or straight, as Kennedy and McNally (2005) observed (x is more full than
y, how full is x?).
A third observation, already pointed out by Cruse (1986) and cited by Morzycki himself, is that
only extreme adjective support a certain kind of prosodic prominence: “That van is
(huuuuuuuuuuuuuuuuuuge / ??biiiiiiiiiiiiiiiiiiig)!” or Kevin Spacey is (fantaaaastic /??goooooooooood)
! (from Morzycki 2010, 4). While Morzycki takes this to be evidence that such prosodic prominence is
only licensed by high degrees, an alternative story would be that extreme adjectives encode themselves
an expressive component that make them compatible with such prominence (and with the
aforementioned modifiers). Although this component alone is not be the only factor distinguishing
these adjectives from their non extreme counterparts, it shows that one reason why extreme adjectives
are perceived as stronger does not necessarily entail the fact that they must be linked to higher degrees
on the same scale, expressive meaning being relatively independent from propositional content.
Finally, as also mentioned by Morzycki, not all extreme adjectives have a univocal counterpart. While
excellent seems to be connected to good and gorgeous with beautiful, there seems to be no obvious
counterpart to extraordinary or fantastic, or marvellous. If these adjectives are all part of a scale in a
lexical sense, then it should be rather automatic to point out what the lower scale-mates are.
In light of these problems, a more radical account has been suggested by Kennedy (personal
communication, also cited in Morzycki 2010), according to whom these adjectives do not take degree
arguments at all, and should therefore be analyzed as non gradable. This proposal seems to be the most
straightforward solution for accounting for the data in (17-20). In this view, the property associated
with excellent is seen as a simple, stand-alone, non-scalar property, and not as the result of an operation
that returns a (presumably very high) degree along the dimension of goodness. A denotation which
does not refer to standards or degree types is proposed in (24)
(24) [|excellent|] = λx. excellent (x)
This proposal can capture the resistance of this expressions to comparatives and degree modifiers.
Under this view, extreme degree modifiers are not quantifying over degrees, but are performing
different kinds of operations, most likely expressive intensification (see data in (23) for preliminary
evidence).
A crucial consequence of this view is that, while excellent is definitely providing a semantic
contribution related to the dimension of goodness, it is not part of the same entailment-based ordering
as good and decent, which are instead gradable. If a scale is a total ordering of degrees, and if excellent
does not encode degrees, then it cannot be part of this scale, at least in a lexical sense. The intuition that
these adjectives and their gradable counterparts are somehow related can still be captured by positing
that these terms form a scale of pragmatic nature, as decent, good and excellent are all ways of talking
about a similar dimension with different strength. The crucial aspect is that, while for the former two
strength is mapped onto a shared degree-based scale, for the latter it is not. As we will suggest in 5, this
difference is important to capture the experimental results.
4.3. Middle adjectives
The final subclass of adjectives is the one of middle adjectives. The use of the term “middle”
does not make reference to an imaginary point halfway through the scale, but simply indicates that
these adjectives are neither marked for being particularly strong, nor for being particularly weak. Their
semantic properties are those that are predicted for regular gradable adjectives in their positive form,
and have already been shown in a contrastive fashion with the other two categories described above.
On the one hand, contrary to extreme adjectives, they are fully gradable with respect to comparatives
(17), intensification with very (18) and other degree constructions (19-20). As such, they fully behave
as scalar predicates. On the other hand, contrary to weak adjectives, they are somewhat resistant to
modifiers targeting low region of the scale such as barely and at least (11), or to modifiers that
presuppose the natural existence of a stronger alternative such as nothing more (12). This shows that,
while it is intuitively the case that middle adjectives have extreme adjectives as stronger alternatives,
their effective availability is contingent upon the discourse condition. Weak adjectives, instead, always
seem to have salient stronger alternatives, regardless of the context.
4.4 Summary
In conclusion, the evidence discussed in this and in the preceding sections suggest that there is
more than a mere quantitative difference between weak, middle and strong/extreme adjectives. What
seemed to be an internally homogenous scale turned out to be a more internally differentiated object,
both in light of the experimental results and of independent pragmatic/semantic observations. In
particular, two aspects are significant in this regard. First, while weak and middle adjectives support
degree modification, strong adjectives are resistant to it, and therefore do not behave as predicates
belonging to a scale. Second, while weak adjectives seem to naturally evoke stronger alternatives in
any context of use, even when no context information is provided, middle adjectives are much less
likely to do so. It follows that the relationship between decent and good is tighter than the one between
excellent and any of these two expressions.
This idea would have important implications for the calculation of implicatures, and could
provide a new way to look at the experimental result. In section 5 we are going to suggest that the
distinction between semantic and pragmatic scales could also be reflected in the way in which these
scales are processed. We are going to propose that implicatures within semantic scales can go through
more easily than implicatures within pragmatic ones. According to this view, weak adjectives trigger
implicatures since they have middle adjectives as semantic scale-mates, whereas middle adjectives do
not generate inferences because they do not have any stronger semantic scale-mate, strong adjectives
being connected to them only via a pragmatic relation.
5. Strength, classes and inferences
In this section we outline a proposal to capture the experimental results and the patterns that we
pointed out in section 4. Our argument is that, contra Morzycki 2010, extreme adjectives are not part of
the same lexical scale as weak and middle adjectives, and can only be considered to form a pragmatic
scale together with the other two. Weak and middle adjectives, however, do share a scale also in the
lexical sense, making scalar inferences easier to draw.
The distinction between semantic and pragmatic scales has been pointed out by Hirschberg (1991),
when she argues, contra Horn (1984), that entailment is a sufficient but not necessary condition to
trigger implicatures (see section 1). As she provides examples of non-entailment based orders, she
argues that the only condition to license scalar inference is a contextually salient ordering, as in (3).
Gradable adjectives, however, have been treated by Horn and the following literature as proper
entailment scales. As discussed in section 2, their semantics can be analyzed in terms of relations
between individuals and degrees, and the related scales as total orderings of these degrees. A key
consequence of this account is that the notion of strength boils down to the notion of degree: the higher
a degree indexed by an adjective is, the stronger the adjective should be. Moreover, these expressions
are expected to support degree modification, comparison and any other semantic operation targeting
degrees.
These predictions, however, as it emerged in the experimental results and in the ensuing
discussion, are not entirely met. In the experiment, middle and extreme adjectives were always
interpreted in the same way. Extreme adjectives did not turn out to be accessible alternatives for the
calculation of implicatures on middle adjectives. At the same time, extreme adjectives do not even
exhibit the properties of gradable adjectives in the first place. We explain this fact by arguing,
following Kennedy, that extreme adjectives are, de facto, non gradable. As such they do not take
degree arguments and they do not map degrees onto the total ordering that constitutes a lexical scale.
For instance, in this view the property associated with excellent is seen as a simple, stand-alone, nonscalar property, and not as the result of an operation that returns a (presumably very high) degree along
the dimension of goodness. As a consequence, excellent and good are not lexically connected, but are
rather two distinct properties that are related to similar dimensions. They constitute an ordering in a
broad, pragmatic sense, as they represent different ways of talking about the same thing with different
strengths, but they are not making use of the same semantic representation. Accordingly, only good
makes reference to degrees and requires a pos morpheme and a standard to be exceeded, whereas
excellent is a simple property with no reference to degree arguments or to a relative standard, which is
vice versa present for good. A proposal for a denotation that captures this difference is offered in (25)
and (26).
(25) [|excellent|] = λx. excellent (x)
(26)[| [DegP pos [AP good]] |] = λx good(x) > s(good)
On the one hand, this idea explains the apparent entailment effects observed above (#excellent
but not good, vs good but not excellent). On the other hand, it captures in a straightforward way the
resistance of extreme adjectives to operations targeting degrees and the fact, brought to surface by the
experiment, that middle adjectives do not seem to evoke extreme adjectives as salient alternatives. For
the first pattern, there is simply no degree to be targeted. For the second one, if the alternative is not
part of the same scale as the predicate at stake, then it could also be harder to evoke, even with
appropriate context manipulations. Inferences on semantic scales could go through more automatically
and more smoothly than inferences on pragmatic scales, although further experimental investigation is
needed to support this idea. Note that, even with extreme adjectives out of the semantic scale, we are
still able to account for the fact that weak adjectives were consistently considered to be
underinformative across both context conditions. These expressions are fully gradable and naturally
combine with modifiers presupposing alternatives. In light of this, they seem to encode a semantic
scale which has middle adjectives as stronger scale-mates, licensing scalar inferences in the
experimental task.
It has been pointed out by Horn (personal communication) that other examples of adjectival
extremeness (i.e. modals) appear to behave differently than the way predicted by this account. In
particular, Horn’s point is that, for a scale such as <possible, likely, certain>, the inference from the
middle adjective (likely) to the negation of the strong adjective (certain) seems much more likely to be
drawn than with the adjectives used in the experiment. Our response is that modal scales seem to be
structured differently than the relative adjective scales used in the experiment. In particular, certain
behaves as a fully gradable predicate and does not exhibit the properties of extreme adjectives. For
instance, it can take comparatives (27) and can combine with degree modifiers/intensifiers (28).
Moreover certain seems to encode an upper-closed scale (Kennedy and McNally 2005), whose lower,
non maximal portion can be naturally occupied by lexicalizations of lower degrees. Examples in (29)
shows that it supports maximum standard slack regulators and does not give rise to entailments to the
positive form when accompanied by modifiers that exclude accomplishment of the maximum.
(27) John was more certain than Mike about the fact
(28) I’m very certain that this is the case
(29) a. completely/absolutely/entirely certain
b. {half/almost} certain ≠> certain
We take these examples to be evidence that certain, contrary to extreme adjectives, encodes a scale in
its semantic representation, and is therefore linked to its weaker counterparts in a different, deeper
sense. Hence, the intuition that inferences from either of possible / certain to the negation of the
stronger ones should go through more easily than with pragmatic scales.
6. Conclusion and further research
This experimental study aimed at investigating the computation of scalar implicatures on gradable
adjectives. Our goals were twofold. First, we aimed at verifying whether there was any correspondence
between the strength continuum on the adjective scale and the amount of scalar implicatures that each
adjective would generate with respect to a certain scenario. Second, we wanted to assess how
manipulations of context information would influence the perception of strength and ultimately the
computation of the implicatures. As for the first question, it turned out that the amount of computed
scalar inferences did not pattern into a continuum. Rather, only weak adjectives triggered implicatures
consistently, whereas middle and strong adjectives, in spite of their obvious difference in strength, were
processed exactly in the same way. As for the second question, context information turned out to affect
all the three categories of adjectives examined in the study, and not only those which were supposed to
be underinformative. After taking into consideration several independent facts on the semantic and
pragmatic behavior of these classes of adjectives, we argue that only weak and middle adjectives
constitute a scale in a semantic sense. Strong adjectives, as they turn out to be not gradable, can enter a
scalar relation with the other two only from a pragmatic point of view. They represent a way of
describing the same thing with higher strength, but they do not make of degree orderings in their
semantic representations. Because of this, inferences from middle adjectives to the negation of strong
adjectives are more difficult to draw, and in fact were not computed at all in the experimental task. The
distinction between semantic and pragmatic scales, first pointed out in the work of Hirschberg, has not
been investigated from an experimental point of view. Further research is in order to assess whether
this theoretical distinction consistently pattern with a processing difference, with semantic scales, as
suggested by this study, triggering inferences more easily than pragmatic ones. A possible field of
investigation could be represented by modal scales, which, contrary to relative adjectives, seem to be
mapped onto a semantic scale. The prediction would be that, with this ordering, a continuum would
finally emerge, and that also mid-scale elements would able to trigger implicatures with respect to top
scale ones. In general, as most of the exiting research on implicatures has focused on quantifiers, the
investigation of other scales and the ways in which they are processed constitutes a promising, and still
unexplored, area of research.
References
Bott, L. & Noveck, I.A. (2004). Some utterances are underinformative: The onset and time course of
scalar inferences. Journal of Memory and Language, 51(3), 437-457
Doran, Ryan, Rachel Baker, Yaron McNabb, Meredith Larson, and Gregory Ward, 2008,“On the NonUnified Nature of Scalar Implicature: An Empirical Investigation,” in International Review of
Pragmatics, 1:1-38
Grice, P. (1975). Logic and conversation, in P. Cole & J. Morgan (ed.), Syntax and Semantics, 3:
Speech Acts, pp. 41–58, New York: Academic Press. Reprinted in H. P. Grice (ed.), Studies in the Way
of Words, pp. 22–40, Cambridge, MA: Harvard University Press (1989)
Horn, Laurence, 1984. “A new taxonomy for pragmatic inference: Q-based and R-based implicature.”
In D. Schiffrin (ed.), Meaning, Form and Use in Context (GURT '84), 11-42.
Horn, Laurence, 2005. “Implicature”. In L. Horn and Ward, G. (eds) The handbook of pragmatics.
Blackwell Publishing. 3-29
Kennedy, Christopher & Louise McNally. 2005. Scale structure, degree modification and the semantics
of gradable predicates. Language 81(2). 345–381.
Kennedy, C. 2007 “Vagueness and Grammar: The Semantics of Relative and Absolute Gradable
Adjectives”, Linguistics and Philosophy 30.1
Krifka, M (2007). Negated antonyms: Creating and filling the gap. in Uli Sauerland and Penka Stateva
(eds.), Presupposition and implicature in compositional semantics, Palgrave Studies in Pragmatics,
Language and Cognition, Palgrave Macmillan, 163-177.
Morcyzki, Marcin. 2009. “Degree Modification of Estreme Adjectives”. To appear in the Proceedings
of the Chicago Linguistics Society 45.
Morzycki, Marcin. 2010. Adjectival extremeness: Degree modification and contextually restricted
scales. To appear in Natural Language and Linguistic Theory.
Papafragou, A. and J. Musolino. Scalar implicatures: experiments at the semantics-pragmatics
interface. Cognition 86 (2003). 253-282
Pouscoulous N., I. Noveck, G. Politzer, A.Bastide (2007). Processing costs and implicature
development. Language Acquisition, 14(4), 347 – 375
Paradis, Carita. (2001). “Adjectives and boundedness”. Cognitive linguistics 12.1: 47-65
Rett, Jessica (2008): Degree modiﬁcation in natural language Rutgers University: PhD
Dissertation.
Appendix: adjectives and stimuli
<Discreto, bravo, eccezionale> (<decent, good, excellent>)
La competizione per entrare nelle università di prestigio è serratissima. Luca è uno studente bravo
(good). Per questo l’hanno preso ad Harvard per un dottorato.
The competition for entering top colleges might be very tough. Luca is a good (decent / excellent)
student That’s why he’s been accepted to Harvard for a Ph.D program.
<Fattibile, facile, ridicolo> (<doable, easy, ridiculous>)
La prova di abilitazione consiste in 924 domande a risposta multipla. Il livello è facile. Per questo tutti i
candidati prenderanno il massimo senza fare errori.
The qualifying exam typically consists of 924 multiple choice questions to be concluded in two hours.
This year exam was easy (doable/ridiculous). That’s why all students got the maximum with no
problem.
<Mogio, Triste, Depresso> (<slightly sad, sad, depressed)
Al giorno d’oggi i social network offrono una costante fonte di consolazione e di allegria. Davide è
triste. Per questo pensa al suicidio con convinzione.
Nowadays, social networks offer a constant source of consolation and cheerfulness. Davide is sad
(melancholic / depressed). That’s why he really considers committing suicide.
<Carina, Bella, Splendida> (<Pretty, Beautiful, Very beautiful)
Alle selezioni per i concorsi di bellezza si presentano sempre migliaia di partecipanti. Sofia è una
ragazza bella. Per questo ha partecipato a Miss Mondo nel 2008.
Thousands of candidates enrol to selections for beauty contests. Sofia is a beautiful (pretty /
marvellous) girl. That’s why she was admitted to Miss Mondo (a beauty contest) in 2008
<Decente, Bravo, Superbo> (<decent, good, superb>)
Le discoteche sono incredibilmente esigenti nel scegliere chi mette la musica. Fabio è un DJ decente.
Per questo tutti i fine settimana viene invitato tutte le sere da un posto diverso per mettere dischi
Clubs are highly demandino in picking people who should play the music. Fabio is a decent DJ. That’s
why he is invited every night to a different club.
<Passabile, valida, irresistibile> <passable, valid, irresistible>
Fare il cavatore è improbo. Giulio ha ricevuto da una miniera un’offerta valida. Per questo ha deciso di
lavorare lì per il resto deoi suoi giorni
Working as a miner is tough. Giulio received a valid offer from a mine. That’s why he decided to work
there for the rest of his days
<Mangiabile, Buono, Delizioso> (<Eatable, Good, Delicioius>)
I colleghi sono sempre molto severi nel valutare l’operato di un lavoratore. Totò cucina una zuppa di
pesce buona (good). Per questo i suoi amici cuochi lo ammirano da sempre.
Workers are incredibly strict in commenting on their colleagues’ performances. Toto cooks a good
(eatable vs delicious) fish soup. That’s why his chef friends admire him
<Benestante, Ricco, Miliardario> (<getting by, rich, billionaire>)
I prezzi nel mercato immobiliare sono sempre più alti. La famiglia di Luca è benestante
(ricca/miliardaria). Per questo ha comprato una villa al mare.
Prices on the real estate market are higher and higher. Luca’s family is getting by (rich, billionaire).
That’s why they bough a villa on the seaside.
<Accettabile, buono, strepitoso> (<Acceptable, good, fantastic>)
I genitori di Marco l’hanno educato a essere molto ambizioso. Le condizioni di lavoro offertegli da una
certa azienda sono gradevoli (pleasurable). Per questo avere un posto lì rappresenta un sogno per lui
Marco’s parents have taught him to always be very ambitious. Working conditions in that company are
pleasurable (acceptable/fantastic). That’s why a job there is considered as a dream by him.
<Interessato, Appassionato, Malato> (<Interessato, Appassionato, Malato>)
In questo convento, le regole impongono di andare a letto presto. Luca è appassionato (passionate)
alla musica classica. Per questo ogni notte ascolta Vivaldi fino al mattino nella sua stanza.
In this retirement house, guests have to go to bed early. Luca is passionate (interested / fanatic) with
classic music. That’s why every night he listens to Vivaldi until the following morning in his living
room.
<Modesto, scarso, inguardabile> (<mediocre, bad, horrible>)
Al campetto il livello è molto vario. Giuseppe a basket è un giocatore scarso. Per questo nessuno lo
vuole mai in squadra.
At the playground there is a wide array of basketball skills. Giuseppe is a bad (mediocre / horrible)
player. That’s why he is s hated by teams’ fans.
<Soddisfatto, Felice, Estasiato> (<satisfied, happy, ecstatic>)
Giulio è sempre molto pacato nel manifestare i suoi sentimenti. Dopo l’esame era soddisfatto. Per
questo si è messo a saltellare per casa.
Giulio is always very cautious with manifesting his feeling. After the exam he was satisfied. That’s why
he was jumping around the house.
<Modesta, Scarsa, Pessima> (<modest, low-quality, horrible>)
A Borgomartino c’è un unico ristorante nel giro di 40 km. La cucina è scarsa. Per questo gli abitanti
mangiano in casa 365 giorni all’anno..
In the town of Borgomarino there’s only a restaurant within a 50 km range. The food they serve is of
low quality (mediocre / disgusting) That’s why the local inhabitants eat at home 365 days a year.
<Increspato, mosso, burrascoso> (<wrinkled, wavy, treacherous>)
L’intervento della guardia costiera costa in media 1000 Euro a chi lo richiede. Il mare era mosso. Per
questo ho chiamato l’SOS con la radiolina.
Coastguard police charges an average of 1000 dollars for every intervention they have to make. The
sea was wavy (slightly wavy vs stormy). That’s why we called the SOS with the walkie-talkie.
<Informale, Burbero, Villano> (<Informal, Rude, Unbearable>)
Nelle piccole comunità il livello di confidenza tra gli abitanti è massimo. Luigi ha un modo di fare
burbero. Per questo il resto del paese l’ha emarginato dalla vita quotidiana.
In small villages the level of intimacy between people is very high. Luigi has rude (informal /
unbearable) manners. That’s why they have been marginalized by the community.
<Interessato, Attratto, Innamorato> (<interested, attracted, folle>)
Lunghe relazioni mettono a dura prova anche i sentimenti più intensi. Luca è attratto da Giulia. Per
questo le ho chiesto di sposarla quando l’ho vista
Long lasting relationships can challenge even the deepest feelings. Luca is attracted ( interested / crazy
) to Julia. That’s why he asked her to marry him when he saw her. (marriage: questo ha l’effetto
giusto)
<Sveglio, Brillante, Geniale> (<smart, brilliant, genius>)
La competizione per il riconoscimento pubblico nella scienza è estremamente dura. Luca è un
ricercatore brillante. Merita di vincere un Nobel per la fisica.
Competition for public recognition in science is extremely hard. Luca is a smart (good vs genius-like)
scientist. That’s why he deserves a Nobel Prize for physics.
<Simpatizzante, Tifoso, Invasato> (<Interested, Passionate, Fanatic>)
Condizioni meteo avverse possono paralizzare intere zone urbane. Marco è tifoso (supporter) del
Milan. Per questo è andato allo stadio mentre grandinava sulla città.
Severe weather sometimes paralyze whole urban neighborhoods. Marco is a supporter (sympathizer /
fanatic) of Milan. That’s why he went to the stadium while it was hailing on the city
<Teso, Forte, Devastante> (<Tense, Strong, Devastating>)
Il mese scorso la protezione civile ha deciso di costruire una barriera protettiva attorno agli spazi
pubblici . Oggi il vento è forte (strong). Per questo alcuni alberi sono crollati davanti alle scuole.
In the last month the county government decided to build a protective wall around public spaces .
Today the wind is strong (tense / devastating). That’s why some trees have fallen down in front of the
school.
<Noioso, Molesto, Odioso> (<Boring, Annoying, Odious>)
<Ammaccato, Danneggiato, Distrutto> (<bruised, damaged, destroyed>)
Vista la crisi del mercato, nel limite del possibile si cerca di evitare nuovi acquisti. In seguito
all’incidente, la nostra macchina è danneggiata. La porteremo a rottamare dal concessionario.
In light of the crisis, people always try to avoid buying new stuff. After the accident, our car is
damaged (bruised / destroyed). That’s why we are going to scrap it.
<Positiva, buona, mostruosa> <positive, good, monster>
In Italia lo sport interessa solo se si tratta di calcio. L’ultima gara di Alberto Tomba è stata positiva.
Per questo era in apertura del TG117 delle tredici.
In Italy, only soccer is considered to be an interesting sport. Alberto Tomba’s (a skier) performance
has been good (positive / extraordinary). That’s why it was the headline of the TG1 at 1 pm.
<Moderato, Rischioso, Estremo> (<moderate, risky, extreme>)
La diffusione di racchette e ramponi all’avanguardia ha facilitato di molto l’escursionismo. Il sentiero
che Giulio e Franca vogliono percorrere domani è rischioso. Per questo potrebbero morire se lo
faranno.
The invention of state-of-the-art crampons and rackets has made outdoor sports much easier. The trail
that Giulio and Franca would like to hike tomorrow is dangerous (demanding / extreme). That’s why
they could die if they go for it.
<Deciso, coraggioso, matto> <determined, brave, insane>
In montagna molte persone sono morte per le difficili condizioni ambientali. Francesca è un alpinista
spericolato (reckless). Per questo vuole scalare l’Everest senza corda il prossimo inverno.
Tough environmental conditions have made mountain excursions deadly for many people. Francesca is
a brave (determined / insane) climber. That’s why she can climb the Everest with no rope on next
winter.
<Curioso, strano, assurdo> <curous, weird, absurd>
Gli animali domestici hanno istinti molto diversi da quelli dell’uomo. Il gatto di Stefano si comporta in
modo strano. Per questo vuole portarlo dal veterinario al più presto.
Pets have very different instincts than humans. Stefanos’ cat is showing a weird (curious / absurd)
behavior. That’s why he wants to take him to the vet as soon as possible.

Download Report