Scalar implicatures 1 Possibly all of that and then some

Scalar implicatures 1
Possibly all of that and then some: Scalar implicatures are understood in two
steps.
John M. Tomlinson, Jr.
Todd M. Bailey
Lewis Bott
Cardiff University
Address for Correspondence:
John M. Tomlinson, Jr.
School of Psychology
Tower Building, Park Place
Cardiff University
Cardiff CF10 3AT
United Kingdom
Email: [email protected]
Scalar implicatures 2
Abstract
This paper tested between one-step and two-step processing models of scalar
implicatures. Participants saw sentences that could be interpreted with or without
scalar implicatures, such as some elephants are mammals, and responded true or
false by mouse-clicking on the corresponding part of a computer screen. We
then analysed the mouse trajectories to provide information about processing
prior to the completed interpretation. Experiment 1 found that when participants
were trained to respond with an upper-bound interpretation, their initial mouse
movements deviated towards the lower-bound response. Experiment 2 replicated
the findings from Experiment 1 in a context in which no training was provided.
Experiment 3 ruled out an alternative explanation of the findings that
participants delay processing of the scalar term until after processing the
sentence frame. Our findings therefore support a two-step approach to
processing scalar implicatures. We discuss the results in relation to previous
findings demonstrating costs of processing scalar implicatures, and theoretical
accounts of pragmatic inference.
Scalar implicatures 3
To communicate more efficiently, speakers often imply information instead of
explicitly stating it. Consider this example:
1A) Nowadays, teenagers are tethered to their smart phones.
1B) Some are.
Here, B is a teenager who distances himself from people of his age who
seemingly never put down their mobile phones. By saying, “some are,” he
communicates several points. First, he confirms that there are indeed teenagers who
match A’s description. Second, and most importantly for the purposes of this paper,
he implies that there is a significant group of teenagers who do not use their phones
excessively. Additional meaning could also be derived from B’s statement given more
context, such as that B is not one of those people constantly checking their
smartphone, or that perhaps B is reluctant to provide the precise numbers of teenagers
who fit A’s stereotype.
In order to derive inferences like those above, however, a number of procedures
must be in place. The listener must know which of an infinite number of potential
inferences the speaker intended her to draw and, in order to be efficient, the
inferences must be derived in a very short space of time. Grice’s (1975; 1989)
maxims of communication provide one set of rules that are in part a solution to the
first of these problems. In this paper, we also consider how inferences are derived, but
our focus is on explaining their processing rather than their linguistic distribution.
Following Bott and Noveck (2004), Breheny, Katsos and Williams (2006) and Huang
and Snedeker (2009), we use scalar implicatures, a type of inference illustrated above,
as a test case for exploring how language understanders use pragmatic principles to
infer intended meanings.
Scalar implicatures 4
Scalar implicatures
When B says “Some are,” in (1) he implies that not all teenagers use their
phones excessively. This inference can be described using Grice’s (1975) cooperative
principle and general reasoning abilities. The Gricean explanation involves a series of
steps. First, some sort of literal meaning is computed (e.g. “at least some…”) and the
listener considers what the speaker could have said, which would also be informative
and relevant. For example, in (1), the speaker said, “some are,” but he could have
said, “all are”, which would have been more informative than some, and still relevant.
In the subsequent step the listener infers that because the speaker did not say all, it is
not the case that all holds. Finally, by combining what the speaker said, “some are,”
with the not all inference derived in the previous step, the listener arrives at the final
interpretation, some but not all are.
The inference in (1) is an example from a broader group of inferences known as
scalar implicatures (see Guerts, 2010, for a thorough discussion). In general, scalar
implicatures occur when a speaker uses a weak term from a semantic scale (see Horn,
1972, 1989), such as some < many < all. Under these circumstances the listener is
licensed to derive the inference that the stronger elements in the scale do not hold.
Other examples of scalar implicatures include the use of may to imply not must, or to
imply not both, and warm to imply not hot. Accounts of the linguistic environments
that give rise to scalar implicatures have been developed and formalized by many
others since Grice’s seminal work (e.g., Gazdar, 1979; Chierchia, 2004, 2006;
Levinson, 2000; van Rooij & Schulz 2004; Sauerland, 2004; Sperber & Wilson,
1986/1995). Some of these theories are close in spirit to Grice’s original, such as the
Neo-Gricean accounts of Levinson or Horn, and the globalist accounts of van Rooij
Scalar implicatures 5
and Schulz, others are more grammatical in nature, such as Chierchia or Fox (2007),
while other explanations involve even further departures from Gricean principles,
such as Sperber and Wilson. One of the reasons for the popularity of scalar
implicatures is that they occur in highly structured semantic environments (e.g.,
upward entailing environments), yet it is possible to cancel or defease the inference, a
defining feature of pragmatic phenomena (e.g., as in, “some are… in fact all of them
are”). Scalar implicatures thus lie on the boundary between semantics and pragmatics
(see Horn, 2006) and are an ideal area in which to integrate advancements from both
disciplines. More relevant for our purposes, the highly structured behavior of scalar
implicatures and the strong predictions that follow from theories of scalar
implicatures has meant that research has began into how these inferences are
processed, as we describe in the next section.
Processing scalar implicatures
One interpretation of Levinson (2000) is that scalar implicatures are default
inferences that occur automatically and occur on every occasion in which the meaning
of the scalar term (e.g., some) is computed. Thus, the scalar term will always trigger
the scalar implicature, but the implicature may sometimes be cancelled. The
obligatory nature of the implicature generates particularly strong psycholinguistic
predictions that have now been tested using a variety of different paradigms.
According to the default view of scalar implicatures, people should take longer to
derive literal meanings like “at least some” than implicature meanings like “some but
not all…”. However, there is now a large body of contrary evidence showing that
literal meanings can be processed faster than implicature meanings (e.g. Bott &
Noveck, 2004; Bott, Bailey & Grodner, 2012; Breheny, Katsos & Williams, 2006;
Scalar implicatures 6
Huang & Snedeker, 2009; but see Grodner, Klein, Carbury, & Tanenhaus, 2010).
These results argue against a strong default view of scalar implicatures.
More contentious, however, is what causes the delay for scalar implicatures, and
consequently which theories might be able to explain how scalar implicatures are
derived. With respect to sentence verification tasks (Bott & Noveck, 2004; Noveck &
Posada, 2003; Bott, Bailey, & Grodner, 2012), it seems likely that part of the
observed response time delay is caused by verifying the extra propositional
complexity inherent in interpreting scalar implicatures (although not all of the delay,
see Bott et al.). For example, verifying some [and possibly all] elephants are
mammals requires searching for elephants that are mammals, whereas verifying some
[but not all] elephants are mammals requires an additional search for elephants that
are not mammals. There may also be costs involved in dividing the subject set into a
referent and a complement set in scalar implicatures, which is not present in literal
meanings (see Grodner, Gibson & Watson, 2005, for evidence of additional
processing cost when a reader must instantiate a complement set in addition to a
reference set). Similarly, processing the negation of the complement in scalar
implicatures (e.g., not all elephants) may take time (see Clark & Chase, 1972). A
further possibility is that the delay is caused by people having to reject an initial
interpretation (e.g., the literal meaning of the sentence), before the implicature can be
derived. Finally, in addition to some of the above, delays observed in visual world
studies might also be explained by a mismatch between the way in which visual world
is encoded and the target sentence. For example, in Huang and Snedeker (2009),
participants may have encoded the visual referent in one sense, such as, the girl with
two of the socks, and been asked to click on the same referent expressed differently,
such as, “click on the girl with some of the socks.” The mismatch between the two
Scalar implicatures 7
may have contributed to the delay in interpreting the scalar term (see Degen &
Tanenhaus, 2009; and see Grodner et al. 2010).
The range of explanation for the costs makes it particularly difficult to establish
which theories are consistent with the empirical findings. In the next section we group
these costs together in a way that maps onto different explanations for how scalar
implicatures are derived. Because the accounts relate to the sequences of processing
steps, we refer to them as one-step and two-step models.
One-step vs two-step processing models
We consider two-step models to be those in which a distinct and accessible
meaning is derived prior to the implicature meaning. Several different theories are
possible; the most obvious being a processing version of a Gricean account. Under
this view, a listener must first compute the literal meaning of the sentence and its
possible alternatives (Step 1), and then, assuming the speaker is informative and
reliable, the listener enriches the literal meaning with the implicature (Step 2). This
type of two-step model assumes that derivation of the implicature (Step 2) is
contingent on consideration of the literal meaning, its alternatives, and various
contextual factors (Step 1), that is, the output of Step 1 is necessary to execute Step 2.
Other, less modular accounts can also be considered two-step models. For example,
the decision to proceed onto Step 2 processing might not be entirely contingent on the
output of Step 1. The derivation of the implicature may occur automatically after
computation of an accessible prior meaning. The common theme running through
two-step models is that some form of accessible meaning must be computed and
pragmatically enriched prior to the completed interpretation.
Scalar implicatures 8
One-step models on the other hand, do not assume multiple, sequential steps
of processing. Instead, one or other of the meanings associated with the scalar term
would be retrieved and directly integrated into the sentence. It would not be necessary
to consider and reject one interpretation before deriving the other. One-step models
are more compatible with typical constraint-based models of processing in which
contextual, grammatical and other factors are all computed in parallel to provide the
best guess at the appropriate interpretation (e.g., MacDonald, Pearlmutter &
Seidenberg, 1994; Traxler, Pickering, & Clifton, 1998; van Gompel, Pickering,
Pearson, & Liversedge, 2005; and Degen & Tanenhaus, 2011, explicitly propose a
model of this sort for scalar implicatures). What we have in mind is a model that
determines whether to incorporate an upper-bound meaning1 into the scalar term on
probabilistic grounds. Multiple cues would determine whether the upper-bound
interpretation would be derived, such as the presence of the partitive (of), a
conditional term (e.g., if), or negation terms (e.g., not). Crucially, however, the
classical Gricean apparatus necessary for deriving scalar implicatures, such as
considering what the speaker could have said, would not be represented in the model.
Both types of models can predict that costs of scalar implicature should arise
under some circumstances. In particular, one-step models account for response time
delays observed in verification tasks by assuming the delay is caused by the added
complexity of the upper-bound sentence representation. As we discussed above, the
upper-bound interpretation requires forming additional sentence representations
1
We refer to scalar sentences as having upper-bound meanings when they have
some but not all interpretations, and lower-bound meanings when they have the literal
meaning, or some and possibly all interpretations, consistent with Brehney et al.
(2006). The terminology refers to the scale having a bounded meaning at the upper
end of the semantic scale in the implicature case (something less than all).
Scalar implicatures 9
relative to the lower-bound interpretation, such as the “not all” component of some
[but not all] elephants are mammals. Participants might be slower to compute a
meaningful upper-bound sentence representation than a lower-bound sentence
representation; hence they would experience a longer delay before feeling confident
enough to produce a response. Two-step models can equally explain the delay by
assuming there is an extra processing step required in deriving the implicature relative
to the literal meaning. For example, participants might first need to access and enrich
the literal meaning when computing the upper-bound interpretation, but not when
they compute the lower bound. Both of these accounts are therefore compatible with
the sentence verification literature on scalar implicatures2.
This paper tests between the two accounts. While they both predict delayed
scalar implicatures, they make different predictions about participant’s reasoning
prior to the completed response. Two-step models predict that participants will pass
though a step in which they first access the literal meaning before enriching it to form
the scalar implicature, but one-step models predict that participants will directly
incorporate the implicature interpretation into the sentence representation. We used a
2
One exception to this might be the SAT results by Bott et al. (2012),
Experiment 2. Bott et al argued that if complexity differences alone were responsible
for the observed delay of scalar implicatures relative to literal meanings, sentences
that were equally complex but differed in whether they contained an implicature
should be processed equally quickly. They tested this by comparing sentences with
only some, as in only some elephants are mammals, against the implicature sentences
with plain some, as in some [but not all] elephants are mammals. Instead of equal
processing time however, they observed faster processing for the only sentences. A
one-stage model therefore has to explain these results without resorting to an
additional stage of processing in the implicature sentence relative to the only sentence.
Scalar implicatures 10
sentence verification paradigm in which participants had to classify scalar sentences
as true or false. The experimental sentences were underinformative, such as some
elephants are mammals, and so could be considered false if participants derived the
scalar implicature (the upper-bound interpretation was false) or true if participants did
not (the lower-bound interpretation was true), just as in Bott and Noveck (2004).
Instead of using a simple button press to collect responses, however, we used a
mouse-tracking paradigm. The mouse trajectories provide vital information about
participants’ reasoning prior to their final response in a way that would not be
observable with reaction times alone. In the following sections we introduce the
mousetracking paradigm and explain the predictions from the one-step and two-step
models in more detail.
Mouse tracking
Mouse-tracking has been used successfully to investigate a broad range of
phenomena: phonetic categorization (Spivey, Grosjean, & Knoblich, 2005), restricted
relative clause ambiguities in syntax processing (Farmer, Anderson & Spivey, 2007),
prejudice and stereotypes in social cognition (Freeman & Ambady, 2010), lie
detection (Duran, Dale, & McNamara, 2010), and various other domains (see
Freeman, Dale & Farmer, 2011, for a more extensive review). Participants’ mouse
movements during responses reveal that attentional processes and decision-making
processes are not mutually exclusive and distinct; rather motor movements
demonstrate online integration of both at their earliest steps of processing, suggesting
people initiate motor movements towards targets in a manner similar to saccadic eyemovements (Spivey, 2007).
Scalar implicatures 11
In a typical mouse tracking experiment the mouse starts at the bottom of the
screen in the centre, with two targets placed in the top left and right corners, perhaps
two pictures, or “true” and “false” response options. The participant then hears or sees
a sentence and clicks on one of the targets at the top of the screen with the mouse. The
directness of the mouse trajectory from the bottom of the screen to the target provides
information about the underlying cognitive processes of the formulation and selection
of a response. When participants are confident of their response, mouse paths depart
early from the central axis and progress directly to the target. Conversely, when
participants have difficulty and/or are oscillating between two responses, they
generally hover around or move the mouse up the centre of the screen for a longer
period resulting in a less direct mouse trajectory towards the target. If participants
initially consider one option before ultimately selecting the other, mouse paths
initially head away from the ultimate target (e.g., Dale & Duran, 2011). Although
there are many advantages of mouse-tracking relative to simple button presses and
arguably advantages of mouse tracking relative to eyetracking (see Spivey & Dale,
2006; Spivey, Dale, Knoblich, & Grosjean, 2010), our reason for using mousetracking is that the mouse path provides information about the interpretation processes
that occur prior to the completed interpretation (the ultimate response).
Overview of experiments and predictions
Participants read categorical sentences, such as some elephants are mammals,
and clicked on either “true” or “false.” The true response corresponds to the lowerbound interpretation (at least some) and the false response to the upper-bound
interpretation (some but not all). In Experiments 1 and 3 participants received training
on one or other of the interpretations and in Experiment 2 they made a free choice
about which interpretation they thought was more appropriate. Participants were
Scalar implicatures 12
allowed to move the mouse when the final word was presented and were instructed to
click on the targets at the top two corners of the screen as quickly as possible. Along
with accuracy, the x- and y- coordinates of participants’ mouse movements were
recorded over the time course of their response.
Predictions for the one- and two-step models are as follows. One-step models
assume that the appropriate interpretation of the scalar term is incorporated directly
into the sentence representation. Whether participants ultimately interpret some with
the lower-bound, as in some [and possibly all], or with the upper-bound, some [but not
all], they do not consider and reject a prior meaning. However, the added complexity
of the upper-bound interpretation means that participants require more time to come
to a decision about whether the sentence is true or false. In mouse tracking terms,
participants interpreting an upper-bound sentence should therefore move their mouse
further up the vertical axis than participants interpreting a lower-bound sentence. The
proportion of the mouse trajectory that favours neither a true interpretation nor a false
interpretation should be greater for the upper-bound interpretations than the lowerbound interpretations. The left panel of Figure 1 illustrates these predictions.
Two-step models do not explain the delay observed for scalar implicatures
purely in terms of the differential complexity across interpretations. According to
two-step models, interpretations of the upper-bound sentence involve first accessing
an initial meaning before the completed implicature. Participants might start off by
pushing the mouse up the screen while decoding the sentence, but then, when
interpreting upper-bound sentences, they should first move their mouse towards
lower-bound targets (“true”) and then shift towards upper-bound targets (“false”) in a
secondary step. When deriving lower-bound interpretations, only one interpretation
would be needed and so the mouse should proceed directly to the target. This pattern
Scalar implicatures 13
would look similar to the patterns when interpreting negation in Dale and Duran
(2010). The right panel of Figure 1 illustrates this prediction.
Experiment 1
Experiment 1 compared lower-bound and upper-bound interpretations of
sentences like some elephants are mammals. Participants read propositional
statements modified by quantifiers some or all, and decided whether each sentence
was true or false. Participants classified six types of sentences, including the critical
sentences, such as some elephants are mammals, and five control sentences. Critical
sentences were true if responses were based on the lower-bound meaning and false if
their responses were based on the upper-bound meaning. The five control sentence
types were the same as those of Bott et al. (2012) and were designed so that
participants could not predict whether the sentence was going to be true or false from
the quantifier-subject relationship. For example, on reading some elephants are,
participants were not able to predict the truth of the completed sentence.
To experimentally control the interpretation assigned to the critical sentences
and to make sure that participants understood the task, the study began with a training
phase (see Bott & Noveck, 2004, Experiment 1; and Rips, 1975). In the logical
condition, participants received feedback encouraging a lower-bound, logical
interpretation (a true response). In the pragmatic condition, feedback encouraged an
upper-bound, pragmatic interpretation (false). Feedback to the other types of
sentences was identical across conditions.
One-step and two-step models make different predictions about the mouse
trajectories for the critical sentences. One-step models predict that upper-bound
interpretations will be generally more complex than lower bound interpretations, but
Scalar implicatures 14
not qualitatively different. Mouse trajectories (to the false response option) for critical
sentences should therefore depart from the zero point of the x-axis later for upperbound interpretations than trajectories (to the true response option) for lower-bound
interpretations but trajectories should look similar overall. In contrast, two-step
models predict an altogether different path for upper-bound interpretations than lower
bound interpretations. In the first step of processing the upper-bound interpretations,
mouse paths should be consistent with the lower-bound interpretation (true), and it is
only later, after further enrichment, that participants will move towards the upperbound response option (false).
Methods
Participants. Forty Cardiff University psychology students participated for
course credit. All were native speakers of English.
Stimuli. Sentences frames were of the form “Quantifier X are Y”. The
quantifier was either all or some, and X and Y were either exemplars or categories,
depending on the sentence type. Sentences were generated from 18 categories and 30
exemplars. We constructed six different types of sentences: 1) some critical (“Some
elephants are mammals”), 2) some true (superordinate) (“Some mammals are
elephants”), 3) all true (“All elephants are mammals”), 4) some false (“Some
elephants are insects”), 5) all false (“All elephants are insects”) and 6) some true
subordinate (“Some elephants are Indian”).
Design.
Participants were presented with a biasing context that encouraged either an
upper-bound or a lower-bound interpretation. The context consisted of a training
Scalar implicatures 15
phase in which feedback (“correct” or “incorrect”) was given to reinforce either the
upper-bound or lower-bound interpretations of some. Participants were randomly
allocated to a pragmatic condition or a logical condition. No feedback was given in
the experimental phase.
The training phase consisted of 100 trials, 25 of which were critical items along
with 15 some true items, 15 some super true items, 15 false items, 15 all false items,
and 15 all true items. These items were different from the items used in the main
experiment and were not rotated across conditions.
Each item in the design was formed by using an exemplar, e.g., elephants, in
one of six versions corresponding to the six sentence conditions above. The design
was entirely within-subject and participants saw each exemplar in all six conditions.
In total, there were 30 exemplars by 6 conditions, making 180 sentences in the
experiment. All sentences were presented in a different random order for each
participant. True and False targets were counterbalanced across lists, whereby half
participants had true targets on the left and false targets on the right and the other half
of participants had the opposite arrangement.
Procedure. The Mousetracker software (Freeman & Ambady, 2010) was used
to run the experiment. Participants received instructions prior to starting the
experiment. They were told to evaluate the truth of statements about the world by
either clicking on the TRUE or FALSE boxes at the top-left or top-right of the screen.
Positioning of the response boxes was counterbalanced across participants. However,
each participant had a consistent position of the boxes across the training and test
sessions. Participants started the trial by clicking on the START box at the bottom
center of the screen. Sentences were presented word by word in the middle of the
Scalar implicatures 16
screen at a rate of 300ms per word. Participants could move their mouse toward the
response boxes at the onset of the presentation of the final word in the sentence. If
participants had not initiated any mouse movement within 500ms of the final word,
they received a warning telling them to respond more quickly.
Results
Preprocessing
For all of the experiments presented in this study, responses were considered
outliers if mouse trajectories were three standard deviations outside of the mean area
under the curve (see below) of all responses. We also removed incorrect responses
from all conditions. In Experiment 1, 1.5% of the data set were excluded as outliers
and 7% as incorrect responses.
To compare the mouse movements of responses with different response times,
e.g., a 900ms response vs. a 1200ms response, we normalized the time course of the
mouse paths into 101 time steps, as is standard in mouse tracking (Freeman &
Ambady, 2010). The time steps also provide a rough space of potential areas of
processing during the response: response initiation stages (time steps 1-25),
early/middle stages (time steps 26-50), middle/late stages (time steps 51-75), and late
stages (time steps 76-101). In general, the majority of response initiation stages
cluster around the start button, i.e. participants rarely initiate their response in the first
couple of time steps, and the majority of time steps in the late stages cluster around
the response button because participants generally stop moving the mouse before
initiating a click. The coordinates corresponding to each time quadrant and each
condition are shown in Appendix 1.
Scalar implicatures 17
At each time step the position of the mouse was represented by (X, Y)
coordinates, with X ranging from -1 to 1 and Y ranging from 0 to 1.5 as shown in
Figure 1. This 2 x 1.5 rectangle roughly corresponds to classical CRT monitors and
facilitates comparisons across different screen resolutions. The Analyser program in
the Mousetracker suite (Freeman & Ambady, 2010) was used to perform the
normalization.
Measures
We assessed mouse trajectories on two measures: area under the curve (AUC)
and deviation towards the competitor response. AUC provides a general measure of
processing difficulty for each condition (Freeman & Ambaday, 2010). It is calculated
by computing an idealized response trajectory (a straight line between each
trajectory’s start and end points) and determining the geometric area between the
actual trajectory and the idealized trajectory. Mouse paths that deviate far from the
ideal trajectory will have a large AUC, and the more difficult the task the greater
AUC (Farmer, Anderson, & Spivey, 2007; Freeman, Ambady, Rule, & Johnson,
2008). Response time differences observed by Bott and Noveck (2004) and others
might therefore be reflected in AUC.
While AUC is a good overall measure of processing difficulty, it does not
discriminate between the one step and two step models discussed above. Both models
predict that trajectories for upper-bound interpretations will deviate more from an
idealized trajectory than trajectories for lower-bound interpretations. To better
distinguish these two accounts, we also analysed the extent to which trajectories
initially veered away from the target response towards the competitor response.
Deviation towards the competitor response (Xneg) was computed for each time step
Scalar implicatures 18
as the deviation along the X axis in the direction of the incorrect response (i.e.
towards “true” for the upper-bound interpretations of the critical sentences, and
“false” for lower-bound interpretations of these items). Two-step models predict
substantial deviation towards the competitor response, but one-step models do not.
Analysis
Figures 2 and 3 display mouse paths for participants in the logical and
pragmatic condition respectively. Mouse paths for the some critical sentences are
shown by the dark crosses. In the logical condition the critical sentences are true.
Correspondingly, the mouse paths display a simple arc towards the true response
option. Conversely the critical sentences for the pragmatic condition are false. Here,
however, the mouse paths substantially deviate towards true before crossing back
over the medial axis towards false. This abrupt shift in the pragmatic condition for
critical sentences supports a two-step processing model. While there are several
control sentences that deviate slightly towards the opposing response before returning
to the correct response, such as the some subordinate true, these deviations appear
much smaller and occur earlier in the paths than the deviation for the pragmatic
sentences. These are likely to be due to responses bias effects, as we discuss below.
To establish whether these patterns were significant, we first tested whether
there were any differences across context and quantifier with AUC. We used a linear
mixed-effect regression model (Baayen, Davidson, & Bates, 2008) with sentence type
(some critical, some false, some subordinate true, some (superordinate) true, all false,
all true) and training (pragmatic vs. logical) as predictors. This revealed a significant
interaction between sentence type and training, t = 9.68, p < .001. The AUC for each
of the conditions is shown in Table 1. In particular, AUC for the critical sentences
Scalar implicatures 19
was significantly greater for upper-bound interpretations than lower-bound ones, t =
26.60, p < .001, suggesting greater complexity for upper-bound interpretations. This
is consistent with Bott and Noveck (2004) and others who found that participants
experience delays when deriving upper-bound interpretations.
We also analysed the x-coordinates to establish whether they were significantly
negative (i.e., in the competitor half of the trajectory). For pragmatic participants, the
largest mean negative deviation for the critical sentences occurred at time step X_43,
M = -.24. A one sample t-test showed that this was significant for participants, t(20)
= 6.03 p < 0.001, and items, t(29) = 9.98, p < 0.001. In contrast, there were no
negative mean x-coordinates for the critical sentences in the logical condition.
Several of the control sentences also displayed significant negative deviations.
For the logical condition, all false and some false conditions displayed small negative
deviations, and for the pragmatic condition similar negative deviations were observed
for the some subordinate true condition. These negative deviations were both smaller
than those associated with the critical sentences, all t1(1,19) < 3.06, p’s < .001, all
t2(1,29)’s > 3.44, p’s < .001. The deviations also occurred much earlier in the
trajectories the critical sentences: the control deviations occurred at X_23, X_25, and
at X_28, respectively, whereas the critical sentences occurred at X_43.
We suggest that these were due to response biases caused by an imbalance in
the ratio of true to false responding in the training phase. For logical sentences, there
was a predominance of true sentences overall, hence participants displayed negative
mouse paths whenever they responded false. For pragmatic participants, while there
was a more equal ratio of true to false sentences overall, sentences that involved some
and a subordinate item (e.g., elephants) were predominantly false, hence there were
Scalar implicatures 20
initial trajectories towards false. Our view that these effects are due to response biases
is driven partly by the size and timing of these effects (small and early relative to the
negative paths in the critical sentences), and, to foreshadow the results of Experiment
2, partly because we do not replicate the negative trajectories when we reduce the
imbalance in true/false responding.
Discussion
Mouse paths in the pragmatic condition deviated towards the competitor
response (true) prior to the correct response (false) for upper-bound interpretations.
No such deviation, however, was observed for lower-bound interpretations. Initially,
mouse trajectories for upper bound responses were reliably heading towards true and
only later corrected towards false. Because participants initially moved their mouse
movements towards true, this strongly suggests that they did not initially access the
upper-bound meaning of the scalar term. These results are consistent with two-step
models, in which participants begin with a lower-bound interpretation and only
subsequently derive an upper-bound interpretation for the critical sentences. The
results are not consistent with one-step models, which predict monotonic trajectories
in the direction of false responses for upper-bound interpretations.
In this study, participants were trained to respond to the critical sentences in a
particular way. It is conceivable that participants might ordinarily be inclined to
respond true to our experimental sentences, but in the pragmatic condition they
responded false in order to conform to the experimental demands introduced by the
feedback during the training phase. This could explain an initial deviation towards the
true response, followed by a later, metalinguistic deviation towards false. In order to
test the possibility that our results were an artifact of task demands, we repeated
Scalar implicatures 21
Experiment 1 but without providing participants with feedback on the critical
sentences.
Experiment 2
Experiment 2 compared lower-bound and upper-bound interpretations of
sentences but in an unbiased context. Participants had to verify categorical sentences,
just as they did in Experiment 1, but, crucially, they did not receive feedback on their
responses to the critical sentences. There were therefore no experimenter demands to
make false responses to the critical sentences and participants could respond with
their most natural interpretation.
We introduced one further change in this experiment relative to Experiment 1.
In Experiment 1, there were three, true control sentences and two, false control
sentences. This meant that there was a response bias towards true responding,
especially in the logical condition after training. While it is unlikely that response
biases could explain the trajectories that we observed for the critical sentences (the
effects on the critical sentences were later and stronger than those associated with
response biases), we wanted to eliminate this possibility. We therefore replaced one
of the true control sentences, some subordinate true, with another false control
condition, all super false, such as all mammals are elephants (this meant that the
control sentences we used were identical in form to those of Bott and Noveck, 2004),
and adjusted the proportions of the sentences assigned to each condition so that there
was an equal ratio of true to false items overall (100 trials in the practice phase, 50:50
true:false). The ratio of true to false was also balanced within each quantifier so that
there was an approximately equal ratio within some sentences (25:20) and within all
Scalar implicatures 22
sentences (25:30). The critical sentences were not included in these calculations
because participants were able to choose which interpretation they preferred.
Method
Participants. Thirty-two undergraduate students from Cardiff University
participated in Experiment 2 in exchange for cash payment. All were native speakers
of English.
Stimuli. We used the same stimuli base as Experiment 1. However, because we
changed one of the control conditions, the sentences presented in this experiment
differed to those in Experiment 1.
Design. All participants underwent a practice phase of 100 trials. These trials
consisted of 15 all super false sentences, 25 some true sentences, 20 some false
sentences, 15 all false sentences, and 25 all true sentences. The breakdown was done
in this way to insure that each quantifier had roughly an equal chance of being true or
false. No critical sentences were included in the practice session.
The main experiment contained 180 trials. There were six conditions and all
conditions had an equal number of trials. Thirty exemplars were selected equally
across different category types and participants were presented with the six different
sentence types built around each exemplar (see Appendix A). Participants did not
receive any feedback during the main experiment.
Procedure. The experimental procedure was the same as Experiment 1,
although participants were not trained on critical items.
Results
Scalar implicatures 23
Preprocessing
We removed 1.8% of the responses as outliers. A further 3% incorrect responses
were removed from the control conditions.
Accuracy
Accuracy rates are shown in Table 2. Participants answered the control
questions extremely accurately, as they did in Experiment 1. For the critical
sentences, participants responded with a slight bias towards upper-bound
interpretations (false) but the sample still generated a high proportion of lower-bound
responses. Overall the accuracy rates are similar to Bott and Noveck (2004;
Experiment 3), who also observed a 59% upper-bound response rate.
Mouse tracking
Figures 4 and 5 show the average mouse trajectories for the false and true
responses respectively. For the critical sentences, the data are the average mouse
paths to true or false respectively. The pattern is very similar to the results of
Experiment 1: Upper-bound responses deviate towards the competitor response,
whereas lower-bound responses do not.
We analysed the results using a repeated measures, mixed model design.
Predictors were quantifier (all, some control, some critical), and response (true,
false). Because all had two false response conditions (all false and all super false),
two separate analyses were done with each of these two conditions in the all false
cell.
Analysis of AUC revealed a significant interaction of quantifier and response, t
= 9.68, p < .001 (with all false) and t = 7.24, p < .001 (with all superfalse). In
Scalar implicatures 24
particular, AUC was significantly greater for upper-bound interpretations than lowerbound ones, t = 18.31, p < .001, again suggesting greater deviation for upper-bound
interpretations.
To assess what was causing the differences in AUC across conditions, we
examined the negative x-coordinates, just as we did in Experiment 1. For the upperbound responses, the largest negative deviation was at time step X_41, M = -.27,
which was significantly different to zero, t(28) = 3.78, p < 0.001, t(29) = 6.99, p <
0.001. The data from 3 participants were not included in this analysis because they
responded “true” to all of the critical sentences.
One potential concern with the repeated measures analysis is that it includes
participants who might be very unsure of their responses. Participants who are
generally unsure may deviate towards either of the response options prior to the final
decision. This effect may be much greater for upper-bound responses (false) because
participants could have a general response bias towards true. To avoid this possibility,
we divided participants into logical responders and pragmatic responders depending
on whether the majority of their responses (65%) were lower-bound (logical) or
upper-bound (pragmatic). This generated two groups of participants who were
relatively confident about their responses and on which we could conduct an analysis
similar to that of Experiment 1.
Classifying on the basis of the majority of responses resulted in 14 out of 32
logical responders, 14 pragmatic responders, and four participants who were not
included in this analysis because the majority of their responses did not fall above
65% to either lower-bound and upper-bound responses. We removed the minority
“incorrect” responses from both groups. Analysis of AUC revealed a significant
Scalar implicatures 25
interaction between group and condition, t = 8.77, p < .001 (with all false) and t =
6.84, p < .001 (with all superfalse). In the analysis of negative x-coordinates, the
upper-bound group significantly deviated towards true at X_41 (M = -15.2), t1(13) =
4.22, p < 0.001, t2(29) = 6.77, p < 0.001. The lower-bound group had no negative
coordinates nor did any of the control conditions aside from the some true
subordinate condition albeit these did not significantly differ from the zero x-axis. In
short, analysis of the between-subject data leads to the same conclusion as the withinsubject analysis.
Discussion
Experiment 2 examined mouse trajectories for spontaneous lower-bound and
upper-bound interpretations of critical sentences. The mouse paths for upper bound
responses were similar to those of Experiment 1, in which participants were trained to
interpret these sentences one way or another. In both experiments mouse paths of the
upper-bound responses deviated towards the competitor response before completing
their trajectories towards false, but the same pattern was not observed for the lowerbound responses. Experiments 1 and 2 are therefore consistent with participants
processing scalar implicatures in two steps.
One explanation for our findings is that the initial attraction towards the true
response might arise from a mismatch between the truth of the embedded proposition
(elephants are mammals, true) and the truth of the quantified sentence (some
elephants are mammals, false in the pragmatic condition). If participants delayed
processing of the quantifier until after they had fully processed the embedded
proposition, they would display initial deviations towards true in exactly the pattern
that we observed. Delayed processing of the quantifier would be an entirely sensible
Scalar implicatures 26
processing strategy within our task and in other, more naturalistic situations. Content
words carry by far the most information in any conversational exchange and it makes
sense for listeners to focus their attention on these components of the sentence.
Furthermore, quantifiers need elements over which they quantify, and to some degree,
people cannot fully interpret quantifiers until they know what it is that is being
quantified.
In Experiments 1 and 2 the lower-bound meaning of a sentence with some was
the same as the meaning of the embedded proposition without the quantifier, so some
elephants are mammals (lower-bound) means elephants are mammals, and so forth.
Not processing the quantifier appears to be exactly what one does to understand the
lower-bound meaning of the critical sentences. However, if participants were ignoring
quantifiers in general, or processing them only after the end of the sentences, the same
effects should be seen for other sentences (not just for the critical condition). In that
case, sentences with propositions like mammals are elephants would give rise to
mouse movements in the same initial direction regardless of the quantifier. Yet results
from such sentences in Experiment 2 show trajectories moving directly towards true
when the quantifier is some and towards false when the quantifier is all.
Although we did not observe evidence for delayed processing of quantifiers in
sentences based on propositions like mammals are elephants, it is conceivable that
these propositions are confounded by idiosyncratic factors that differentiate them
from the critical experimental sentences of Experiments 1 and 2. In particular, the
subject-predicate relationship was different in these control sentences (supercategoryexemplar) than it was in the some critical sentences (exemplar-supercategory). We
therefore designed Experiment 3 to test whether nonmontonic mouse paths would be
Scalar implicatures 27
observed with the sentences that involved the same form of embedded proposition as
the critical sentences but used a different quantifier.
Experiment 3
In Experiment 3 we compared the critical sentences from Experiments 1 and 2
with sentences that used no as quantifier, such as no elephants are mammals. The no
quantifier reverses the truth of the embedded sentence, just as with the upper-bound
interpretation of the some critical sentences, but it does not involve a scalar
implicature. If the nonmonotonic mouse paths are due to the mismatch between
sentence truth and embedded proposition, nonmonotonic mouse paths should also be
observed with the comparable no sentences.
All participants received upper-bound training, just as in the pragmatic
condition of Experiment 1, so that the some critical sentences were false. In addition
to the some critical sentences (some elephants are mammals) and the no comparison
sentences (no elephants are mammals), we used two some control sentences, two no
control sentences, and two all control sentences.
Method
Participants. Twenty-four undergraduate students from Cardiff University
participated in exchange for cash payment. All were native speakers of English.
Stimuli. We used three some conditions: some false (“some elephants are
reptiles”); some true (“some elephants are Indian”); and some critical; and three no
conditions: no false (“no elephants are Indian”); no true (“no elephants are cars”); and
no comparison (“no elephants are mammals”). We also used the two all conditions
from Experiment 1, all true (“all elephants are mammals”) and all false (“all
Scalar implicatures 28
elephants are reptiles”). Because of the high proportion of false responses this design
would have generated, however, we added 30 more all true sentences as fillers. These
used new exemplars and were not counterbalanced with the main conditions of the
experiment.
Because the number of conditions has been increased to eight from six, eight
sentences in each condition were dropped to shorten the time to complete the
experiment to the same length as it took participants to complete the first two
experiments (~25 minutes).
Design. The experiment was a within participant design. All participants
underwent a practice phrase of 100 items, which consisted of 10 all false items along
with 12 some true items, 10 some false items, 20 some critical items, 12 all true
items, 12 no comparison items, 12 no false items, and 12 no true items.
The main experiment contained 206 trials. Of these, 176 were experimental
trials and the remaining 30 were filler trials. The experimental trials involved counterbalanced items distributed equally across the nine conditions (22 trials per condition).
The filler trials were designed to counter-balance the number of true and false
responses and were all of the all true form.
Procedure. The experimental procedure was the same as the Experiment 1
pragmatic condition in that participants were trained to respond to some critical items
as false.
Results
Preprocessing
Scalar implicatures 29
We removed 1.3% of responses as outliers. A further 6% of responses were
removed because they were incorrect. Table 8 shows the accuracy rates for all
conditions.
Mouse tracking
Figures 7 and 8 show the average mouse trajectories for the false response
conditions and true response conditions respectively. The pattern is very similar to the
results of Experiment 1: Upper-bound responses deviate towards the true responses
before arriving at false responses. Of principle interest for this experiment, however,
were the no comparison sentences. The average trajectory for these sentences does
not look like the trajectory for the critical some sentences. Only the critical some
sentences show deviation towards the competitor response; the no comparator
sentences push up the vertical axis initially, but when the trajectory deviates from the
axis it heads for the target response, not the competitor response.
We analyzed AUC using quantifier (some, no) and sentence type (true, false,
comparison) as predictors along with subjects and items as random effects. We
omitted the all sentences from the global analysis because our hypotheses concerned
the some and no quantifiers, and because the all sentences involved only two levels
(true, false). This analysis revealed a significant interaction between quantifier type
and sentence type, AUC, t = 13.85, p < .001. The AUC for each of the conditions is
shown in Table 3.
We further analyzed the x-coordinates to establish whether they were
significantly negative (i.e., in the competitor half of the trajectory). For critical
sentences, the largest mean negative deviation occurred at time step X_40, M = -.23.
A one sample t-test showed that this was significant for participants, t(23) = 5.47, p <
Scalar implicatures 30
0.001, and items, t(21) = 10.21, p < 0.001. Conversely, the no comparison sentences
showed no signs of the negativity shown by the critical sentences.
Two control conditions showed negative mean x-coordinates during part of the
average response paths. First, the all super false condition, shown in Figure 7, had a
slight deviation towards true. The largest average negative deviation was at time step
X_30, M = -.07. However, one-sample t-tests revealed the pattern was not significant,
t1(23) = 1.65, p = 0.11, t2(21) = 1.75, p = 0.09. The trend most likely reflects the
result of having 30 extra all true items as fillers to counterbalance an otherwise
overwhelming bias towards false responses. Second, the no true conditions, such as
no elephants are reptiles, also showed negative deviation away from the target. The
largest average deviation was at time step X_46, M = -.30 and the one sample t-test
shows that this was significant for both subjects, t(23) = 7.42, p < 0.001, and items,
t(21) = 17.49, p < 0.001. We discuss this result below.
Discussion
Experiment 3 tested whether the nonmotonic mouse paths in the some critical
sentences were because the truth of the embedded proposition was different to the
truth of the quantified sentence. We compared some critical sentences against other
sentences that had the same incongruency between sentence truth and embedded
proposition and the same exemplar category relationship, but lacked the scalar
implicature. These were sentences that involved no, as in no elephants are mammals.
Whilst we replicated the nonmonotonic mouse paths for the some critical sentences,
we did not find any evidence of nonmontonic paths for comparable no sentences. We
can therefore rule out the possibility that the effects we observed with some were due
to a general mismatch between sentence truth and embedded proposition.
Scalar implicatures 31
Interestingly, the no true sentences, such as no elephants are reptiles, showed
significant, negative deviations. Participants initially directed their mouse towards
false, and then reversed the trajectory to click on true. It is not clear to us why this
pattern occurred. There is likely some response bias towards false for no sentences,
since 2/3 of these were false, but the deviation appears much stronger than other
response bias effects we have observed. Another possibility is that for the no-true
condition, participants were processing these sentences in two steps, with the
affirmative processed at a step prior to the negative (as in a classical linguistic model
of negation, see Wason & Johnson-Laird, 1972; and see Clark & Chase, 1972, and
Dale & Duran, 2010, for similar mousetracking results to our own). We cannot
determine which of these explanations is responsible for the no true deviation but it is
important to remember that the only comparable no condition to the critical some
sentences did not show any negative deviation. When the subject-predicate
relationship was equivalent across the some and no conditions, as in some elephants
are mammals and no elephants are mammals the two sentences were processed in
very different ways. A delayed quantifier account cannot fully explain why strong
negative deviations were observed for the some critical sentences but not for the nocomparison sentences.
General Discussion
Our goal in these experiments was to test between two processing models of
scalar implicatures. According to one-step models, either an upper bound or lower
bound interpretation can be derived directly; differences in processing speeds between
upper and lower bound interpretations (e.g., Bott & Noveck, 2004; Bott, Bailey,
Grodner, 2012; Hunag & Snedeker, 2009; Breheny, Katsos, & Williams, 2006) are
Scalar implicatures 32
attributed to the presumed extra complexity of upper bound interpretations relative to
lower-bound interpretations. In contrast, two-step models of implicatures attribute
differences in processing speeds to an extra processing step inherent in the derivation
of implicatures. According to the two-step model, lower bound literal interpretations
are fundamental and are generated automatically; in appropriate contexts, upper
bound interpretations are subsequently derived from the corresponding lower bound
interpretations. Across three experiments, we found that when participants made
upper-bound interpretations they first deviated towards the lower-bound response
option (true), before clicking on the upper-bound response option (false); in contrast,
when participants made lower-bound interpretations they went directly towards the
target response option.
These results indicate that delayed upper-bound responses observed in other
studies (e.g., Bott & Noveck, 2004; Bott, Bailey and Grodner, 2012; Feeney,
Scrafton, Duckworth, & Handley, 2004; Noveck, 2001; Noveck & Posada, 2003;
Pijnacker, Hagoort, Buitelaar, Teunisse, & Geurts, 2009) were not due to general
indecision or lack of evidence, but due to a belief that the sentence was true prior to
believing that the sentence was false. In short, the results support a two-step model of
implicature processing.
Context and generalization
The hypotheses tested by Bott and Noveck (2004), Breheny et al. (2006) and
Huang & Snedeker (2009) all involved testing a default implicature model (motivated
by Levinson, 2000). Part of the reason that this hypothesis was tested is that it made
very strong predictions about when the implicature was derived; specifically, a default
model predicts that an implicature is derived on every occasion (and then sometimes
Scalar implicatures 33
cancelled). A similar question that arises from our data is whether Step 1 is an
obligatory processing step or whether there are some contexts in which this step need
not occur and the implicature could be derived directly. Perhaps there are contexts in
which the not all component of an implicature is far more salient than in ours for
example, and in which case, people might be able to immediately jump to the Step 2
processing component.
Part of the answer to this question can already be seen in the data to Experiment
2. In this experiment, participants were not provided with feedback about whether
they should derive the lower-bound or the upper-bound interpretation. This
experiment therefore provides a test of whether the task generated a context in which
participants were biased towards one of the interpretations. Our results however,
suggest that there was no strong contextual bias either way. Participants responded
with approximately 60/40 upper-bound responses – the same as Bott and Noveck
(2004). Furthermore, when we analyzed participants who responded with a majority
upper-bound response, those participants also displayed significant negative
deviation. Thus, even participants who find the context strongly biased towards an
upper-bound interpretation deviated towards true before clicking on false. While we
cannot eliminate the possibility that there is an even stronger context that would
obviate the need for Step 1 processing, it is clear that our task is not strongly lowerbound biased (otherwise participants would respond “true” all of the time), and that
there are participants who are very confident that the general context points to an
upper-bound interpretation.
The more general answer to the question of whether Step 1 processing is
obligatory depends on the cognitive interpretation of Step 1 and Step 2 processing.
Scalar implicatures 34
This is turn depends on the theories underpinning scalar implicatures. We discuss
these possibilities below.
Implications for models
Our results illustrate that participants interpret upper-bound meanings in two
steps but lower-bound meanings in a single step. There are a variety of explanations
for what the steps mean cognitively, however. Here we discuss this issue and the
implications for theories of implicatures. We first discuss a classical Gricean account,
followed by Relevence (Sperber & Wilson, 1986) and then grammatical models of
scalar implicatures (typified by Chierchia, 2004, 2006, and Chierchia, Fox, Spector,
2008). We conclude with a discussion of possible constraint-based models of scalar
implicatures.
Our data are generally consistent with a direct implementation of the classical
Gricean account of implicature processing. According to this view, when participants
derived upper-bound interpretations, they first decoded the words and applied the
normal rules of semantic composition to derive a coherent, literal sentence
interpretation. Because this resulted in a sentence that was true, participants directed
their mouse towards the true response option (Step 1). At some point during this
mouse movement, however, the sentence was compared with what the speaker could
have said (what the sentence could have been), participants applied the quantity
maxim, and finally derived the not all inference. At this point, Step 2, participants
computed that the sentence was false and changed trajectories towards the false
response option.
The Gricean processing model provides a good explanation for our data.
Nonetheless, there are difficulties in translating the classical Gricean model into
Scalar implicatures 35
processing terms. One issue is that in order to derive the implicature, participants
must compare the informativeness of the literal meaning with the informativeness of
the alternatives. In this view, Step 1 and Step 2 are discrete processing stages that are
executed sequentially in a strict, logical sense (Step 2 cannot occur without the output
of Step 1). Such strict modularity, however, is inconsistent with current conceptions
of psycholinguistics that emphasize the incrementality of comprehension. Perhaps
more problematically, a strict modular account may not be internally coherent. Pure
modular accounts require that discrete representations, and the ensuing response
decisions, must be computed sequentially and entirely before a motor action is
initiated. From this perspective it is hard to explain early deviation towards the
competitor response, because these seem to involve premature initiation of motor
action before the response decision is complete (Spivey, 2007). A less modular, more
incremental account is that participants could be enriching parts of the literal meaning
automatically, that is, Step 2 processing could begin before the output of Step 1 was
completed. This account would suggest that the use of informativeness of the
proposition to derive the implicature could be integrated into the implicature process
at a much earlier stage than a direct Gricean account would suggest. Future research
needs to examine more precisely the minimal amount of information needed for
pragmatic enrichment to begin.
Relevance theory (Sperber & Wilson, 1986) also offers accounts of how scalar
implicatures are derived (see Carston, 1998; Noveck & Sperber, 2007). Relevance
Theory states that all linguistic messages are first decoded for conceptual content and
then they are pragmatically enriched. The enrichment procedure should occur
gradually until a listener has reached an internal criterion for relevance. The relevance
criterion is determined by a trade-off between the costs to derive the inference and the
Scalar implicatures 36
benefits in doing so. Relevance theory assumes that scalar implicatures are just one
example of pragmatic inferences that are derived in order to maximize the perceived
relevance, no different to any other type of inference. This can be contrasted with a
Neo-Gricean view (e.g. Horn, 1972, Levinson, 2000), say, in which scalar
implicatures are linked to particular lexical terms, or with a Gricean view that
assumes scalar implicatures invoke particular maxims (Quantity). To what extent,
then, is Relevance theory consistent with a two-step model? Although there is a
general emphasis on parallel, gradual enrichment, rather than the abrupt changes in
interpretation that we observe, Relevance assumes that there is an initial decoding
stage in which context-independent meaning is retrieved. The enrichment process
applies after the decoding stage. One possibility for explaining our results, then, is
that the initial trajectories towards true correspond to the stage during which
participants decode the linguistic input, and that the change in mouse trajectory
happens after the enrichment process begins. An alternative is that the enrichment
process begins much earlier but that there is a gradual build up of information during
the negative trajectories. At some point the information gleaned from the enrichment
process would indicate that an upper-bound interpretation is most likely, and the
mouse trajectories would change. While both of these accounts are consistent with
Relevance, they also raise questions about the enrichment process itself. For example,
it is not clear what computations the enrichment process undertakes. A Gricean
account might include computing alternatives, comparing informativeness of those
alternatives, and engaging in deductive inference - all of which could be seen as a
time consuming act, but Relevance does not propose that these are part of the
enrichment process. Second, it is not clear why the output during the enrichment
process should be consciously accessible for the upper-bound interpretations.
Scalar implicatures 37
Enrichment is described as being unconscious yet participants have mouse trajectories
that are directed at true. There is no reason why the mouse paths could not have
pushed vertically up the y-axis, as in the left panel of Figure 1, or indeed some of our
control conditions.
An alternative approach to Relevance of the Gricean model are accounts that
consider scalar implicatures to be part of compositional semantics, rather than postcompositional pragmatics (see Chierchia, 2004, 2006; Chierchia, Fox, Spector, 2008;
although note that none of these authors have explicitly made processing predictions).
These models propose that there is a silent only operator that applies to the scalar term
at the earliest possible opportunity, that is, during the compositional process. Such
accounts do not assume that the processor necessarily engages in Gricean reasoning to
derive the implicature, but that the only operator activates some pre-compiled set of
operations to exclude stronger alternatives. From this perspective, our results suggest
that the only operator must be applied at a fairly late stage in the compositional
process, so that an initial interpretation of the sentence is formed before the only
operator is applied, driving initial mouse trajectories towards the logical response
option in our task before the only operator kicks in. Crucially, because the only
operator is part of the composition process, this account requires emerging
interpretations to be accessible throughout the composition process, or at least before
composition is complete.
Finally, we consider potential constraint-based and statistical models of
language processing. Because there has been no detailed specification of such a
model, we restrict our discussion to models that assume the implicature is determined
using probabilistic cues, and that do not have an explicit representation of pragmatic
Scalar implicatures 38
maxims. In our view, the most straightforward prediction would be that the not all
component of some would be incorporated directly into the sentence representation in
a single step. In the sentence verification paradigm that we used, the greater
complexity associated with verifying upper-bound interpretations would lead to a
longer period of time in which the participants would be unable to make a decision,
that is, a greater period of uncertainty, but there would be no reason to predict initial
deviations towards the true response. Here we discuss other possible predictions and
models.
One explanation for the nonmonotonic mouse paths is that participants initially
derive the lower-bound interpretation of the sentence but then reappraise this decision
later in the trial. Participants start off with the incorrect some interpretation, the lower
bound, but later rectify the error and derive the upper-bound. In this respect the
phenomenon is similar to “garden-path” analyses in syntactic analyses (e.g., Bever,
1970; Frazier & Rayner, 1982): the listener initially parses the sentence incorrectly
but later settles on the correct syntactic model. Is it is possible that constraint-based
models can explain our data in the same way that constraint-based models can explain
behavior with garden-path sentences (Bates & MacWhinney, 1989; Elman, Hare, &
McRae, 2004; Farmer, Anderson & Spivey, 2007; MacDonald, Pearlmutter, &
Seidenberg, 1994)? Constraint-based models account for (syntactic) garden-path data
by maintaining that context initially biases participants towards one syntactic
interpretation (the garden path interpretation), and then the nonconforming syntactic
information later in the sentence reverses the initial activation levels to favor the
second (and correct) syntactic frame. We think that there are two factors that make it
difficult for a constraint-based explanation to explain our data in the same way. First,
there is no reason to think that the context biases participants towards the lower-
Scalar implicatures 39
bound interpretation in our study, as we describe above. In particular, the results of
Experiment 2 demonstrate that a strong bias towards a lower-bound context is highly
unlikely. The second difficulty of equating our data with garden-path phenomena is
that with implicature sentences, there is no later part of the sentence that conflicts
with the initial interpretation. For example, after interpreting “some elephants are”
with the lower-bound interpretation of some, the word “mammals” does not make the
sentence more anomalous than if the upper-bound interpretation had been derived.
This contrasts with garden path sentences, in which the initial syntactic parse is found
to conflict with later parts of the sentence, hence forcing costly reanalysis. It is this
conflicting information that allows constraint-based models to reverse their initial
syntactic choice and derive a different syntactic frame. Without any exogenous
pressure to derive an alternative interpretation of the sentence, a constraint-based
model would have to suggest endogenous reasons for the change in path, such as the
underinformative nature of the utterance. At this point however, the constraint-based
model becomes very similar to a more classical pragmatic explanation of the
phenomenon.
Constraint-based models might also account for our data by maintaining that
while the correct form of the sentence is incorporated into the representation
immediately (in one stage), verification of the sentence proceeds in multiple stages.
Verification of the upper-bound sentences could involve first verifying the reference
set of the proposition and only later verifying the complement set. For example, to
verify some elephants are mammals, the elephants are mammals component would
first be checked, and then the complement, elephants that are not mammals. This
could account for the mouse paths because verification of the reference set would
result in a true response but verification of the complement set would result in a false
Scalar implicatures 40
response. Mouse trajectories would therefore reverse at the point in which verification
of the complement would be completed. While there is no reason why such a twostep, statistical model cannot exist, we make several points with respect to its
parsimony.
First, the two-step statistical model described above assumes that the decision
to derive the implicature is determined independently from the verification process:
statistical constraints determine whether the implicature will be derived but a
different, sequential process verifies the sentence. This raises a number of issues. For
example, separating verification from interpretation (or understanding) is
conceptually very difficult. Understanding a sentence involves integrating its
propositional content into an internal representation, and integration involves
comparing new information with old information (verification). In some sense then,
verification and interpretation are synonymous. Completing separating derivation
from verification would therefore involve separating derivation from interpretation,
and, in contrast to this proposal, it seems likely that the decision to derive the
implicature depends to some extent on the meaning of the sentence (see Bonnefon,
Feeney, & Villejoubert, 2009). A related problem is that it is very inefficient to first
derive an upper-bound interpretation when the lower-bound interpretation is patently
false. This is because the upper-bound interpretation cannot be true if the lower-bound
is false. For example, it would be very strange for a listener to derive the upper-bound
meaning to, “some of the children are in the classroom,” when the listener knows that
none of the children are in the classroom. At least in some contexts, the most efficient
strategy is to delay deriving the implicature until after the lower-bound has been
verified.
Scalar implicatures 41
Second, the appeal of statistical models is that much of the standard, Gricean
pragmatic machinery might not need to exist at a representational level. However,
without the pragmatic machinery, there is no obvious explanation for why the
sentence should be processed in two steps rather than in a single step. A Gricean
account requires the literal meaning of the sentence in order to make informativeness
judgments and other decisions, yet there is no requirement for a statistical model. In
other words the two-step process is independently motivated in a Gricean account but
not in a statistical account. There is no more reason for the upper-bound interpretation
of some to involve two steps than there is for other quantifiers like every, or all, to
involve two steps.
Conclusion
We view our research as making two major contributions to the understanding
of scalar implicatures and language processing in general. First, from the point of
previous research into scalar implicatures using this paradigm (e.g., Bott & Noveck,
2004; Bott, Bailey and Grodner, 2012; Feeney, Scrafton, Duckworth, & Handley,
2004; Noveck, 2001; Noveck & Posada, 2003; Pijnacker, Hagoort, Buitelaar,
Teunisse, & Geurts, 2009), our studies reveals why upper-bound responses are
delayed: participants believe that underinformative sentences are true prior to
believing that they are false. Upper-bound responses could have been delayed because
participants lacked sufficient evidence to make a response earlier than they did
(perhaps because they needed extra time to formulate the more complicated sentence
representation), but instead, participants initially formed an accessible but incorrect
judgment about the truth of the sentence. Second, our results place constraints on
models of how people derive implicatures: any model must predict that people have
Scalar implicatures 42
conscious access to the lower-bound interpretation prior to the upper-bound. Several
versions of pre-existing models are incompatible with this claim. Of course, there are
versions of these models that are more in line with our data, and we look forward to
future research that identifies and tests these alternatives.
References
Baayen, R.H., Davidson, D.J. & Bates, D.M. (2008) Mixed-effects modeling with
crossed random effects for subjects and items. Journal of Memory and Language
59, 390-412.
Bates, E. & MacWhinney, M. (1989). Functionalism and the competitive model. In B.
MacWhinney and E. Bates (Eds.), The crosslinguistic study of sentence
processing (pp. 3-73). New York: Cambridge University Press.
Bever, T. G. (1970). The cognitive basis for linguistic structures. In: J.R. Hayes (ed.):
Cognition and the Development of Language, pp. 279-362.
Bonnefon, J.F., Feeney, A. & Villejoubert, G. (2009). When some is actually all:
Scalar inferences in face-threatening contexts. Cognition, 112, 249-258
Bott, L. & Noveck, I.A. (2004). Some utterances are underinformative: The onset and
time course of scalar inferences. Journal of Memory and Language, 51, 437-457.
Bott, L., Bailey, T.M., & Grodner, D. (2012). Distinguishing speed from accuracy in
scalar implicatures. Journal of Memory & Language, 66, 123-142.
Breheny, R., Katsos, N., & Williams, J. (2006). Are generalized scalar implicatures
generated by default? An on-line investigation into the role of context in
generating pragmatic inferences. Cognition, 100, 434-463.
Scalar implicatures 43
Carston, R. (1998). Informativeness, relevance and scalar implicature. In: R. Carston
& S. Uchida (eds.). Relevance Theory: Applications and implications, 179-236.
Amsterdam: John Benjamins.
Chierchia, G. (2004). Scalar implicatures, polarity phenomena, and the
syntax/pragmatics interface. In A. Belletti (ed.), Structures and beyond, (pp. 39103). Oxford Univeristy Press.
Chierchia, G. (2006). Broaden your views: Implicatures of domain widening and the
logicality of language. Linguistic Inquiry, 37, 535-590.
Chierchia, G., Fox, D. & Spector, B. (2008). The grammatical view of scalar
implicatures and the relationship between semantics and pragmatics. Handbook
of Semantics. Mouton de Gruyter, New York
Clark, H. H. & Chase, W.G. (1972). On the process of comparing sentences against
pictures. Cognitive Psychology, 3, 472-517
Dale, R., & Duran, N. D. (2011). The cognitive dynamics of negated sentence
verification. Cognitive Science, 35, 983-996.
Degen, J. and M. K. Tanenhaus (2011). Making Inferences: The Case of Scalar
Implicature Processing. In L. Carlson, C. Hölscher, & T. Shipley (Eds.),
Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp.
3299 - 3304).
Duran, N. D., Dale, R. & McNamara, D. (2010). The action dynamics of overcoming
the truth. Psychonomic Bulletin & Review, 17, 486-491.
Scalar implicatures 44
Elman, J. L, Hare, M., & Mcrae, K. (2004). Cues, constraints, and competition in
sentence processing. In M. Tomasello and D. Slobin (Eds.), Beyond NatureNature: Essays in Honor of Elizabeth Bates (pp. 111-138). Mahwah, NJ:
Lawernce Erlbaum Associates.
Farmer, T. Anderson, S., & Spivey, M. J. (2007). Gradiency and visual context in
syntactic garden paths. Journal of Memory and Language, 57, 570-595.
Feeney, A., Scafton, S., Duckworth, A., & Handley, S. J. (2004). The story of some:
Everyday pragmatic inferences by children and adults. Canadian Journal of
Experimental Psychology, 58, 121-132.
Fox, D. (2007). Free Choice and Theory of Scalar Implicatures. In : U. Sauerland and
P. Stateva (eds.): Presupposition and Implicature in Compositional Semantics.
Palgrave MacMillian.
Freeman, J. B., Ambady, N., Rule, N. O., & Johson, K. L. (2008). Will a category cue
attract you? Motor output reveals dynamic competitioin across person construal.
Journal of Experimental Psychology: General, 137(4), 673-690.
Freeman, J.B. & Ambady, N. (2010). MouseTracker: Software for studying real-time
mental processing using a computer mouse-tracking method. Behavior Research
Methods, 42, 226-241.
Freeman, J. B., Dale, R., & Farmer, T. A. (2011). Hand in motion reveals mind in
motion. Frontiers in Psychology, 2, 59.
Frazier L. & Rayner, K. (1982). Making and correcting errors during sentence
comprehension: Eye movements in the analysis of structurally ambiguous
sentences. Cognitive Psychology, 14, 178-210.
Scalar implicatures 45
Gazdar, G. (1979) Pragmatics: Implicature, Presupposition, and Logical Form.
Academic Press. New York.
Geurts, B. (2010). Quantity Implicatures. Cambrige, University Press, Cambridge.
Van Gompel, R.P.G., Pickering, M.J., Pearson, J., & Liversedge, S.P. (2005).
Evidence against competition during syntactic ambiguity resolution. Journal of
Memory and Language, 52, 284-307.
Grice, H. P. (1975). Logic and conversation. In Cole, P., and Morgan, J., eds., Syntax
and Semantics 3: Speech Acts, pp. 41–58
Grice, H. P. (1989). Studies in the Way of Words. Cambridge, MA. Harvard
University Press.
Grodner, D., Gibson, E., & Watson, D. (2005). The influence of contextual contrast
on syntactic processing: Evidence for strong interaction in sentence
comprehension. Cognition, 95, 276-296.
Grodner, D., Klein, N. M., Carbary, K. M., & Tanenhaus, M. K. (2010). ‘‘Some”, and
possibly all, scalar inferences are not delayed: Evidence for immediate pragmatic
enrichment. Cognition, 116, 42–55.
Horn, L. R. (1972). On the semantic properties of logical operators in English. Ph.D.
thesis, University of California, Los Angeles.
Horn, L. R. (1989). A Natural History of Negation. Chicago, Ill.: University of Χ
cago Press.
Horn, Laurence R. (2006). The border wars. In Klaus von Heusinger and Ken P.
Turner, eds., Where Semantics Meets Pragmatics, 21–48. Oxford: Elsevier.
Scalar implicatures 46
Huang, Y. T. & Snedeker, J. (2009) On-line interpretation of scalar quantifiers:
Insight into the semantic-pragmatics interface. Cognitive Psychology, 58, 376415.
Levinson, S. C. (2000) Presumptive meanings. Cambridge, Mass.: MIT Press.
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). The lexical nature
of syntactic ambiguity resolution. Psychological Review, 101, 676-703.
Noveck, I. (2001). When children are more logical than adults. Experimental
investigations of scalar processing costs in implicature production. Cognition, 78,
165-188.
Noveck, I. & A. Posada (2003). Characterizing the time course of an implicature: an
evoked potentials study. Brain and Language 85, 203–210.
Noveck, I. & Sperber, D. (2007). The why and how of experimental pragmatics: The
case of ‘scalar inferences. In N. Roberts, Advances in Pragmatics PP, Palgrave.
Pijnacker, J., Hagoort, P. Buitelaar, J., Teunisse, J., & Geurts, B. Pragmatic inferences
in high-functioning adults with autism and Asperger syndrome. Journal of Autism
and Developmental Disorders, 39, 607-618.
Rips, L. (1975). Quantification and semantic memory. Cognitive Psychology, 7, 307340.
Rooij, R. van & K. Schulz, (2004), ‘Exhaustive interpretation of complex sentences.
Journal of Logic, Language, and Information, 13, 491-519.
Sauerland, U. (2004). Scalar implicatures in complex sentences. Linguistics and
Philosophy, 27: 367–391.
Scalar implicatures 47
Sperber, D. & Wilson, D. (1986|1995). Relevance: Communication and Cognition.
2nd Edition. Oxford: Blackwell.
Spivey, M., Grosjean, M., & Knoblich, G. (2005). Continuous attraction toward
phonological competitors: Thinking with your hands. Proceedings of the National
Academy of Sciences, 102, 10393-10398.
Spivey, M.J. & Dale, R. (2006). Continuous temporal dynamics in real-time
cognition. Current Directions in Psychological Science, 15, 207-211
Spivey, M. J. (2007). The Continuity of Mind. New York: Oxford University Press.
Spivey, M. J., Dale, R., Knoblich, G., & Grosjean, M. (2010). Do Curved Reaching
Movements Emerge from Competing Perceptions? A Reply to van der Wel et al.
(2009). Journal of Experimental Psychology: Human Perception and
Performance, 36, 251-254.
Traxler, M.J., Pickering, M.J., & Clifton, C., Jr. (1998). Adjunct attachment is not a
form of lexical ambiguity resolution. Journal of Memory and Language, 39, 558592
Wason, P. C., & Johnson-Laird, P. N. (1972). Psychology of reasoning: Structure and
content. Cambridge, MA: Harvard University Press.
Acknowledgements
This work was funded by ESRC Award RES-062-23-2410 to L Bott, TM Bailey & D
Grodner.
Scalar implicatures 48
Scalar implicatures 49
Table 1
Experiment 1: Mean AUC Values and Accuracy for Literal and Pragmatic Conditions
Condition
Literal
AUC
Pragmatic
Accuracy
M(SD)
AUC
Accuracy
M(SD)
All True
0.30 (.84)
97%
0.71 (1.25)
96%
All False
0.84 (1.22)
98%
0.37 (1.22)
98%
Some True
0.34 (0.92)
98%
0.91 (1.41)
94%
Some False
0.84 (1.18)
97%
0.32 (0.83)
98%
Some True
Superordinate
1.16 (1.25)
95%
1.44 (1.61)
96%
Some Critical
0.25 (0.8)
97%
2.26 (1.48)
87%
Scalar implicatures 50
Table 2
Experiment 2: Mean AUC Values and Accuracy
Conditions
AUC
Accuracy
M(SD)
All True
0.84 (1.47)
93%
All False
0.29 (.79)
99%
All Superordinate False
1.09 (1.55)
92%
Some True
0.80 (1.44)
95%
Some False
0.33 (.91)
98%
Some Critical (True Responses)
0.84 (1.45)
41%
Some Critical (False Responses)
1.89 (1.78)
59%
Scalar implicatures 51
Table 3
Experiment 3: Mean AUC Values and Accuracy for Semantic and Pragmatic
Responders
Conditions
AUC M(SD)
Accuracy
All True
0.67(1.39)
94%
All False
1.28 (1.47)
92%
Some True
1.32(1.40)
90%
Some False
0.57(1.13)
97%
Some Critical
2.08(1.88)
78%
No True
2.34(1.60)
93%
No False
1.02(1.46)
93%
Scalar implicatures 52
Figures
Figure 1.
Scalar implicatures 53
Figure 2.
Figure 2. The average mouse trajectories in Experiment 1 for the logical condition.
The data points represent the average x- and y- positions of each of the 101 time
steps. Mouse trajectories for each condition are averaged across the counterbalanced
right and left positions of the targets.
Scalar implicatures 54
Figure 3.
Figure 3. The average mouse trajectories in Experiment 1 for the pragmatic condition.
The data points represent the average x- and y- positions of each of the 101 time
steps. Mouse trajectories for each condition are averaged across the counterbalanced
right and left positions of the targets.
Scalar implicatures 55
Figure 4.
Figure 4. The average mouse trajectories in Experiment 2 for the false response
conditions. The data points represent the average x- and y- positions of each of the
101 time steps. Mouse trajectories for each condition are averaged across the
counterbalanced right and left positions of the targets
Scalar implicatures 56
Figure 5
Figure 5. The average mouse trajectories in Experiment 2 for the true response
conditions. The data points represent the average x- and y- positions of each of the
101 time steps. Mouse trajectories for each condition are averaged across the
counterbalanced right and left positions of the targets
Scalar implicatures 57
Figure 6
Figure 6. The average mouse trajectories in Experiment 2 for the false response
conditions for pragmatic responders. The data points represent the average x- and ypositions of each of the 101 time steps. Mouse trajectories for each condition are
averaged across the counterbalanced right and left positions of the targets
Scalar implicatures 58
Figure 7
Figure 7. The average mouse trajectories in Experiment 3 for the false response
conditions. The data points represent the average x- and y- positions of each of the
101 time steps. Mouse trajectories for each condition are averaged across the
counterbalanced right and left positions of the targets. Control conditions not shown
on the graph have been removed to make the figure more readable.
Scalar implicatures 59
Figure 8
Figure 8. The average mouse trajectories in Experiment 3 for the true response
conditions. The data points represent the average x- and y- positions of each of the
101 time steps. Mouse trajectories for each condition are averaged across the
counterbalanced right and left positions of the targets.
Scalar implicatures 60
Appendix 1
Table A1
Experiment 1: Mean X-Coordinates for Literal and Pragmatic Conditions across Normalized
Time Bins Coordinates
Conditions
Bin 1 (1-25) Bin 2(25-50)
Bin 3 (50-75)
Bin 4 (75-101)
M(SD)
M(SD)
M(SD)
M(SD)
All True
-0.07 (.17)
-0.39 (.22)
-0.72 (.22)
-0.82 (.09)
All False
-0.06 (.16)
0.05 (.19)
0.60 (.35)
0.83 (.08)
Some True
-0.05 (.14)
-0.34 (.26)
-0.71 (.31)
-0.84 (.08)
Some Super True
-0.04 (.14)
-0.14 (.28)
-0.57 (.33)
-.084 (.08)
Some False
-0.06 (.16)
0.18 (.30)
0.62 (.33)
0.83 (.08)
Some Critical
-0.07 (.14)
-0.39 (.24)
-0.73 (.34)
-0.83 (.08)
All True
-0.06 (.14)
-0.23 (.26)
-0.61 (.34)
-0.83 (.1)
All False
0.03 (.11)
0.30 (.21)
0.66 (.29)
0.82 (.09)
Some Super True
0.04 (.14)
0.02 (.37)
-0.48 (.36)
-.81 (.16)
Some True
0.05 (.13)
-0.24 (.21)
-0.50 (.41)
-0.81 (.1)
Some False
-0.03 (.12)
0.32 (.23)
0.67 (.26)
0.83 (.08)
Some Critical
-0.04 (.18)
-0.22 (.32)
0.16 (.43)
0.79 (.13)
Literal
Pragmatic
Note: Negative values indicate movement towards the TRUE response, positive values indicate movement towards
the FALSE response
Scalar implicatures 61
Table A2
Experiment 2: Mean X-Coordinates across Normalized Time Bins Coordinates
Conditions
Bin 1(1-25)
Bin 2(26-50)
Bin 3 (51-75)
Bin 4 (76-101)
M(SD)
M(SD)
M(SD)
M(SD)
All True
-0.01 (.17)
-0.18 (.34)
-0.55(.43)
-0.80(.13)
All False
0.001 (.13)
0.25(.32)
0.69(.34)
0.83(.12)
All Super False
0.01 (.16)
0.04(.33)
0.41(.48)
0.79(.08)
Some True
-0.02 (.15)
-.015(.33)
-0.54(.41)
-0.8(.11)
Some False
0.01(.12)
0.24(.32)
0.64(.44)
0.77(.19)
Some Critical (True responses)
-0.04(.17)
-0.25 (.34)
-0.57(.39)
-0.8(.12)
Some Critical (False responses)
-0.03(.19)
-0.18 (.38)
0.21(.53)
0.78(.17)
Note: Negative values indicate movement towards the TRUE response, positive values indicate movement towards the
FALSE response
Scalar implicatures 62
Table A3
Experiment 3: Mean X-Coordinates across Normalized Time Bins
Conditions
Bin 1(1-25)
M(SD)
Bin 2(26-50)
M(SD)
-0.03 (.45)
Bin 3 (51-75)
Bin 4 (76-101)
M(SD)
M(SD)
All False
-0.03 (.2)
0.39 (.48)
0.8 (.13)
All True
-0.04 (.17)
-0.25 (.42)
-0.6 (.38)
-0.81 (.11)
No Comparison
0.01 (.2)
0.06 (.42)
0.37 (.45)
0.8 (.13)
No False
0.0 (.18)
0.11 (.42)
0.47 (.45)
0.8(.12)
No True
0.03(.19)
0.19(.24)
-0.13(.5)
-0.8(.12)
Some True
0.0 (.18)
-0.07(.41)
-0.5(.39)
-.81(.11)
Some Critical
-0.03(.22)
-0.2(.36)
0.23(.52)
0.79(.15)
Some False
-0.01(.17)
0.16(.35)
0.61(.33)
0.83(.11)
Note: Negative values indicate movement towards the TRUE response, positive values indicate movement towards
the FALSE response