Subitizing: Magical numbers or mere superstition?

PsychologicalResearch
PsychologischeForschung
Psychol Res (1992) 54:80 90
© Springer-Verlag 1992
Subitizing: Magical numbers or mere superstition?
J. D. Balakrishnanl and F. Gregory Ashby 2
i Department of Psychology, Northwestern University, Evanston, IL 60208, USA
2 Department of Psychology, University of California, Santa Barbara, CA 93106, USA
Received May 7, 1991/Accepted September 12, 1991
Summary. It is widely believed that humans are endowed
with a specialized numerical process, called subitizing,
which enables them to apprehend rapidly and accurately
the numerosity of small sets of objects. A major part of the
evidence for this process is a purported discontinuity in the
mean response time (RT) versus numerosity curves at
about 4 elements, when subjects enumerate up to 7 or more
elements in a visual display. In this article, RT data collected in a speeded enumeration experiment are subjected
to a variety of statistical analyses, including several tests on
the RT distributions. None of these tests reveals a significant discontinuity as numerosity increases. The data do
suggest a strong stochastic dominance in RT by display
numerosity, indicating that the mental effort required to
enumerate does increase with each additional element in
the display, both within and beyond the putative subitizing
range.
Introduction
When a small set of discrete elements is briefly presented
to human observers, the number of elements, or "numerosity," of the set is ascertained quickly and accurately, with
little or no feeling of conscious effort (Chi & Klahr, 1975;
Kaufman, Lord, Reese, & Volkmann, 1949; Mandler &
Shebo, 1982). Kaufman et al. (1949) called this remarkable
capacity subitizing, and contrasted it with the lengthy and
error-prone enumeration of larger sets. Although its definition can be strictly tied to performance measures, subitizing is commonly thought to represent a unique numerical
ability, perhaps present even at birth (Starkey & Cooper,
1980; Starkey, Spelke, & Gelman, 1983).
The existence of a fast, "automatic" process with limited scope has been hypothesized for many years, dating
back to the earliest studies of"span of attention" or "immeOffprint requests to: J. D. Balakrishnan
diate apprehension" (Cattell, 1886; Wundt, 1896). Despite
this long history, however, the statistical evidence for such
a process remains weak. Much of it depends upon the exact
shape of the function relating mean response time (MRT)
to numerosity in the speeded-enumeration experiment.
Methods of corroborating theories about subitizing using
these RT data have not included rigorous statistical tests.
Two quantitative models of the RT versus numerosity
function have been proposed. The first of these (Chi &
Klahr, 1975; Klahr & Wallace, 1976; Oyama, Kikuchi, &
Ichihara, 1981; Simons & Langheinrich, 1982) partitions
the function into three separate regions, each associated
with a different numerical procedure. The first region (subitizing) is linearly increasing with a small slope (about
50 ms) from 1 to 3 or 4 elements. The second region is
linearly increasing with a considerably larger slope (about
250 ms) from 4 to 6 or 7 elements. The large slope is
thought to reflect covert counting, or subitizing of subgroups and addition of the partial sums (Klahr & Wallace,
1976; van Oeffelen & Vos, 1982a).
Both functions are obtained by simple regression on
MRT, and the large difference in slopes is seen as primafacie evidence of the existence of two distinct enumeration
processes. When display duration is limited and the stimulus numerosity exceeds about 8 elements, there is little or
no change in the MRT function, although the error rate
continues to increase. The third region is therefore defined
by a constant, asymptotic RT equal to about 1.5 s (Mandler
& Shebo, 1982; Kaufman et al., 1949). Such performance
is usually attributed to a gross estimation process (Kaufman et al., 1949; Krueger, 1984).
The complete model for brief stimulus displays may be
defined as
R T i 4 = a l i + b l + hi,j,
i<m
R T i , j = a 2 i + b 2 + hi,j,
m<i<l
RTi,j= c + nl,j,
l+l <_i
(1)
where i is the stimulus numerosity, j is the trial number, m
is 3 or 4, l is about 6 or 7, n is zero-mean noise whose
81
variance depends upon i for i up to l, and ak and bk are free
parameters. Because the theoretical issues of most interest
depend upon the linearity of the first two regions of the
MRT versus numerosity function, this model will he referred to as the "bilinear" model.
The alternative model postulates a single, exponential
increase in MRT for up to about 6 elements (Kaufman et
al., 1949; Von Szeliski, 1924). Formally,
R Ti,j = ~ea~ i + ni,j,
l <_ i % w ,
RTi,j=
w+2<_i,
c + ill,j,
(2)
where w is 6 or 7, o~ and [5 are free parameters, and i and n
are defined as before.
One motivation for the Equation 2 model comes from
theories of psychophysical discrimination. Specifically,
enumeration might be achieved by correlating numerosity
with physical properties of the display, such as energy,
local density, or contrast. Under this view, the total RT is
dominated by encoding processes in pattern recognition
rather than by numerical processes per se. Recognition of
"canonical" geometric patterns (i. e., "dots," "lines," and
"triangles") has also been held to account for the extremely
rapid enumeration of small sets (Mandler & Shebo, 1982;
Neisser, 1967; Woodworth & Schlosberg, 1954).
Elsewhere (Balakrishnan & Ashby, 1991) it was shown
that neither of these two models provides an adequate
account of the MRT data. The bilinear model was ruled out
because the slope of the MRT function was consistently
increasing, and the exponential model was ruled out because the slope was not proportional to MRT. A more
complex relationship between numerosity and enumeration processes is therefore indicated, which is unfortunate
in the sense that there are not sufficient data in the traditional measures (MRT and percentage correct) to evaluate
the more complex models statistically.
In the present paper, very large samples of responses
were collected from individual subjects who participated
over a two-week period. These data were used to obtain
accurate estimates of the complete RT distributions, and
therefore to test several general classes of RT models that
postulate a limited-capacity numerical process. In addition
to a more rigorous test of the discontinuity hypothesis, the
tests establish a detailed set of criteria by which future
theories of enumeration processes can be judged. The analyses include tests of stochastic dominance in RT, estimates of the number of modes in the RT-density functions,
and tests of independent, discrete stage models of
enumeration. More complete descriptions of the nature and
purposes of these tests are given in the two sections that
follow.
Stochastic dominance in l i t
The central empirical question to be addressed in this article is whether a capacity-limited numerical process
manifests itself in the RT data of speeded enumeration. If
there is evidence of a qualitative shift in the effect of
display numerosity on the capacity to enumerate, then this
would constitute important evidence of such a process. If
no such evidence exists, then the dual-process model is
more difficult to justify and the single-process model becomes more attractive. Given that the summary statistics,
i. e., MRT, RT variance, and percentage correct, may not be
sufficiently powerful to decide the issue, it is necessary to
consider other measures as well. Therefore, a means of
choosing these measures and assessing their implications
for capacity is needed.
Assuming that subitizing is a non-deterministic process
whose completion time varies from trial to trial, then there
may be no fixed ordering of completion times from one
trial to the next, even when the task load (i. e., stimulus
numerosity) is increased. We therefore introduce the notion of stochastic dominance (Townsend, 1988; Townsend
& Ashby, 1983). Stochastic dominance is the ordering of a
probability measure that is defined over a set of observations. Although any number of such measures might be
chosen for analysis, some of them are more appropriate
tests of models than others.
One of the weakest measures of stochastic dominance is
currently the most popular one in experimental psychology, i.e., the MRT. This statistic is a weak measure in the
sense that an ordering in MRT need not imply an ordering
in some of the other, more "sufficient" RT statistics, in
particular those related to the complete RT distribution.
Three of these distribution functions are of particular
importance:
(1) the cumulative RT distribution,
F(t) = P(RT _<t); (2) the
RT
density
function,
f(t) =d/dtF(t); and ( 3 ) t h e RT hazard-rate function,
h(t) =f(t)/[1-F(t)]. Although each of these functions is
uniquely determined by each of the others, there are reasons for empirically estimating them separately.
Townsend and Ashby (1983) showed that the strongest
measure of stochastic dominance occurs if
Lk j(t)= -
fk l (t)
-
f~(t)
,
(3)
is nonincreasing in t, where k is the experimenter-determined load. This ordering is strongest in the sense that it
implies an ordering in the hazard-rate functions, which
implies an ordering in the cumulative distributions and in
MRT. The complete dominance hierarchy is
Lk-~ (t) non-increasing in t
hk-l(t)
>-- h k ( t )
F~_fft) >_ Fk(t)
MRTk >__MRTk-1.
Distribution models
Obviously there are reasons for estimating RT distributions
besides establishing their dominance relations. Also of interest is the family of functions that might best describe
these distributions and their theoretical justification. Such
an analysis should have some impact on the question of a
discrete shift in numerical processes. It so happens that one
of these three functions, the hazard-rate function, is particularly suited to the evaluation of RT models (Luce, 1986;
Townsend & Ashby, 1983). The hazard-rate function is the
instantaneous response rate - that is, the likelihood that a
82
response occurs at time t, given that it has not yet occurred.
Its basic shape, increasing or decreasing, can therefore
more clearly reflect the temporal changes in a system's
capacity to perform.
An example of the diagnosticity of this function is the
fact that empirical estimates of it unequivocally rule out the
traditional candidates for the RT distribution, including the
gamma, log-normal, and ex-Gaussian (an exponential convolved with a normal distribution) for signal detection
times ("simple RT"), even though these distributions appear to provide quite adequate fits to the RT density (Burbeck & Luce, 1982; Luce, 1986; Ratcliff, 1978). We show
here that these distributions may be similarly rejected in
the case of the speeded-enumeration experiment, and elsewhere (Ashby, Tein, & Balakrishnan, 1991) we show the
same result for the memory-scanning experiment. The consistent pattern of results for both simple- and choice-RT
experiments has broad implications for RT theory.
Exponential pure insertion
Suppose that the random variable describing the total RT in
the choice-RT experiment can be decomposed into a sum
of independent random variables, with some of these variables being directly connected to controllable stimulus
conditions. Specifically, assume that by the addition of an
element to the stimulus display in the enumeration task, a
component-processing time is added to the sum without
affecting the other components. Such conditions on the
effects of stimulus manipulations are known as "pure insertion" (Sternberg, 1969). Note that independence is possible without pure insertion holding, and similarly pure
insertion can hold without independence. Various structural assumptions, including both parallel and serial models,
can lead to one or the other (or neither) state of affairs
(Colonius, 1990, on dependence plus pure insertion;
Townsend & Ashby, 1983, on parallel versus serial
models).
Now suppose that the distribution associated with the
added component is exponential, with processing rate )~k.
Ashby and Townsend (1980) showed that under these assumptions,
fk(t)
Fk 1(t) - Fk(t)
)~
(14)
for all t for which the ratio is defined. This means that a
good test of the model would be to plot the estimated ratio
as a function of t, and verify that there is no significant
trend with time, or alternatively to plotf~(t) against F k - l ( t )
- Fk(t) and verify that it is linear with zero intercept and
slope equal to 3~k. A statistical test is therefore possible
under the auspices of the general linear model.
Before evaluation of the Equation 4 ratio, several
simple preliminary tests of the exponential insertion model
may be sufficient to reject it at the outset. First, the assumption of additivity plus independence implies an ordering at
the level of the cumulative-distribution functions (Ashby,
1982). Estimates of the cumulative distributions from the
cumulative RT histogram are easily available, and probably adequate.
A second test arises from the fact that the mean and
standard deviation of the exponential function are equal.
Because the valance of a sum of independent random
variables is the sum of their variances, this property implies
that the mean and variance estimates should satisfy the
following relation:
MRTk- MRT~-I = [Vark- Vark 1] 1/2
(5)
Although variance estimates have a relatively large
standard error, it is necessary to obtain large samples to
estimate the distribution functions as well.
Finally, taking the derivative of Equation 4 with respect
to t, Ashby (1982) noted that
d
~tfk(t) = )~k Ilk-1 ( D - fk(t)].
(6)
which implies that the density functions f k - l ( t ) and fk(t)
should intersect at the mode offk(t), where its derivative is
zero.
The result in Equation 4 is stated in an "if and only i f '
fashion, and so is potentially a very strong test of the
exponential pure-insertion model. It is always possible,
however, that other models that do not assume pure insertion to hold might approximate the Equation 4 relation.
Ratcliff (1988) found that a diffusion model of memory
scanning (essentially a continuous-time random walk) can
mimic the pure exponential-insertion assumptions to some
extent.
The question of identifiability ultimately remains, but it
is nevertheless important to establish the degree to which
the additive independence model is satisfied statistically,
because this places some important constraints on the models, continuous or discrete stage, that should be considered.
For example, Ratcliff (1988) also found that very impressive fits to the RT-density functions produced by the diffusion model yielded very poor fits to the ratio of the distributions given by Equation 4. With respect to subitizing,
if the model holds well within the subitizing range, but not
beyond, or if the reverse is true, then this would be evidence for a change in process. If there is no change in fit,
then this places important further constraints on subitizing
models that do assume a change in process.
In the experiment described below, each of four subjects participated in many experimental sessions, providing
a large corpus of RT data that could be used to investigate
each of the statistical tests outlined above. The results
suggest that there is no statistical discontinuity in enumeration. Rather, the temporal demands of enumeration increase continuously with increases in stimulus numerosity,
both within and beyond the putative subitizing range.
Method
Subjects. Four undergraduate students at the University of California,
Santa Barbara, participated.Theywere paid $ 5.00 per hour.
Stimuli. The stimuliwere horizontallylinear arraysof 1- 8 solid colored
blocks. The blocks were presented on a grayish-whitebackground for
200 ms, followedby a dark field. Differentcolors were used, in orderto
decrease the correlationbetween nurnerosityand stimulus energy. The
display monitorhad a verticalrefreshrate of 60 Hz; 8 colors were used:
83
1.4
attempt to keep their error rates at less than 10% for each numerosity.
Current error-rate feedback was given after each error trial and at the end
of the practice session. For Group 2, speed was emphasized, no reference
was made to accuracy, and no feedback was provided. Group-1 subjects
were informed about the maximum display size, Group-2 subjects were
not. This manipulation for Group 2 was also intended to reduce the
potential for alternative strategies at large displays (see below for discussion).
The experiment required approximately 10 2-hour sessions for each
subject. For the first of these sessions, subjects received 80 practice trials,
10 per numerosity. For subsequent sessions, there were 16 warm-up
trials, 2 per numerosity.
1.31.2 "
Ix
1.1
~- 0.9
0.8
0.70.80.80.4
1
0.90.8-
Results and discussion
0.7-
'~ 0.6
#" 0.5
0.4
0.3
0.2
0.1
0
1
0.90.8.~
=
8
0.7-
0.60.5-
0.40.3-
*
8ubj 3
A
$ubj 4
0.2
Numorosity
Fig. 1. Summary statistics for speeded enumeration of linear displays of
200-ms duration. The RT mean and SD estimates are based on correct
responses only.
red, green, blue, yellow, pink, brown, light gray, and dark grey; luminance values were 7.30, 21.89, 56.62, 28.31, 62.46, 4.90, 21.30, and
9.43 cd/m 2, for the 8 colors, respectively. These colors were selected
randomly for each trial, without replacement. The locations of the blocks
on a given trial were based on a fixed template.
For more than two-element displays, the left- and right-most locations of the template were always used; hence the total length of the
display was fixed. The position of a single-element display was chosen
randomly from the set. Visual angle of the individual blocks was 0.24 °
horizontally, 1.22 ° vertically. For two of the subjects (Group 1), there
were 10 locations in all, and the total visual angle was 6.47 °. For the other
two subjects (Group 2), there were 14 locations in all and the visual angle
was 8.35 °. The number of locations for Group 2 was increased in order to
ensure that subjects could not count the number of empty locations (even
though this would be difficult, since the actual locations were not
marked).
Procedure. Subjects wore a lapel microphone which was connected to a
voice key (an in-house device) with an input to the con~'olling computer
(IBM XT). The timing board in the computer was accurate to within
4 ms. To begin each trial, subjects depressed the space bar on the keyboard in front of them. They were instructed to enter on this keyboard
their first response to the stimulus, regardless of whether they had subsequently changed their mind.
For Group 1, accuracy was emphasized. The subjects were instructed
to respond as quickly as possible, but also told that they should at all costs
Summary statistics for each of the 4 subjects are given in
Figure 1. As is customary, the RT measures are based on
correct responses only. RTs less than 250 ms were excluded (less than 2% of the total sample), the cutoff being
suggested by the virtual absence of responses between 250
and 350 ms.
Within the subitizing range, that is, up to 3 or 4 elements, there is consistent evidence for an ordering in MRT
by numerosity, under both instruction conditions. (Note
that, owing to the very large sample sizes, the standard
error of the estimated means is very small in relation to the
increases seen in the figure.) In two cases this increase is
quite small, i.e., numerosities 1-2 for Subject 1 (5 ms),
and 2 - 3 for Subject 3 (7 ms). However, the RT variance
does increase in both these instances. Note also that the
increase in MRT from 1 to 2 is considerably larger than
that from 2 to 3 for both subjects in the speeded-instruction
condition. Balakrishnan and Ashby (1991) found this to be
a consistent effect under similar instructions, in an experiment with many more subjects, but relatively smaller
sample sizes per subject. Their stimuli included a condition
in which the spacing between elements was fixed. Hence,
this result cannot be accounted for by the more central
location on average of the single-element display.
At larger display numerosities, several additional effects are worth mentioning. First, Subject 1 was barely able
to achieve the minimum targeted 90% accuracy for large
displays, whereas Subject 2 clearly was not. Indeed, Subject 2's accuracy is no better than that of the speeded
subjects, and the RTs are somewhat faster overall. Second,
at the largest numerosities (i.e., 7-8), the slope of the
MRT function decreases for subjects under speed instructions, and changes sign for subjects instructed for accuracy.
Accompanying these RT effects is a continuous decline in
response accuracy in the speed condition, and an increase
for the largest numerosity in the accuracy condition. "Endanchor effects" such as the latter have been noted elsewhere (Kaufman et al., 1949; Klahr & Wallace, 1976). One
account of them is that subjects begin to use partial information about the stimulus and a sophisticated guessing
strategy when the display numerosity is large. Further evidence of such a strategy is discussed below.
Next consider the two models of MRT discussed earlier.
As noted previously, these models can be ruled out because
of a consistent pattern of violations over subjects. Although this type of analysis is not possible in the present
84
1,5
Subj I
1.4
Table 1. Matrix of response probabilities and MRTs for data averaged
over subjects. RT in ms is in 1st row, response probability in 2nd, for
each display size. RTs for entries with response probabilities less than
.005 are omittedfor purposes of legibility.
1.3
1.2
Subj 4
1.1
2
512
.969
3,350
.006
Display size
1
o.5o.760.
o.o
65-
1
3
4
Response
5
6
7
8
9
Subj 3
~
sobi2
558
.983
590
.991
o.4-
697 790
.961 .024
Numerosity
Fig. 2. Least squares estimates of the exponentialmodel (Equation2) for
MRT. Each functionis displacedverticallyby some amountfor purposes
of legibility.
857 887
.017 .905
961
.062
1,130 1,189
.o65 .oo9
1,092 1,141 1,029
.790 .132 .008
1,040 1,139 1,129
.149 .692 .146
design, some of the same effects are illustrated in these
data. For example, the bilinear model can be ruled out
immediately for the speeded group, since the functions are
noticeably nonlinear for numerosities 1 - 3. With elimination of the single-element conditions, the functions appear
to be concave upward. For Subject 1, there is virtually no
increase in MRT from 1 to 2 elements, but there is a
noticeable increase from 2 to 3 elements. Hence the linear
model can be ruled out for a different reason in this case.
Violations of the exponential model are similarly
robust. Figure 2 shows the best-fitting versions (in the
sense of minimum sum squared error) of the Equation 2
model. The results suggest that log (RT) has at least a
quadratic component, and possibly higher components as
well.
The error rates shown in Figure 1 suggest virtually
perfect performance up to about 4 elements. Although
accuracy data have sometimes been used to define subitizing, this is an ambiguous definition in the sense that many
different experiments show a similar pattern of accuracy,
whereas the RT data can be quite different (Broadbent,
1975).
Table 1 shows the complete confusion matrix, including
response proportions and MRT by stimulus-response pair,
for the data combined over the 4 subjects. The most important results are that (1) errors begin to show a consistent
pattern at 5 elements, i. e., they are more frequently caused
by overestimation than by underestimation, but this proportion decreases with numerosity; and (2) MRT increases
with numerosity reported. Both these results have some
precedence in the literature (Oyama et al., 1981; van Oeffelen & Vos, 1982b).
With respect to the more common statistical measures
of performance, then, these data are qualitatively similar to
previous results. On this basis it seems unlikely that unique
features of the present experiment, such as the amount of
practice that subjects receive or the display configurations
used, could account for the model failures (see Balak-
1,133 1,002 969 1,067
.011 .289 .643 .049
rishnan & Ashby, 1991, for further arguments based on
comparison among previously reported data).
Cumulative-distribution functions
Cumulative histogram estimates of the RT-distribution
functions are given in Figure 3. Once again it is useful to
describe these results first within and then beyond the
supposed subitizing limit. At small numerosities, violations of ordering occur when the MRT difference is small
(Subject 1, 1 - 2 ; Subject 3, 2 - 3 ) , but the dominance is
reasonably good otherwise. Violations are mostly limited
to the extreme tails. For example, the ordering is not consistent above the 95th percentile for numerosities 1 - 2 or
below the 5th percentile for numerosities 2 - 3 in the case of
Subject 2. However, a Kolmogorov-Smirnov test of the
hypothesis that F~(t) > Fk-l(t) was not significant in any of
these cases (maximum of maximum differences = 0.01,
p >0.05), which suggests that the dominance may hold
even in the extreme tails.
Strong and statistically significant violations (minimum
of maximum differences = 0.10, p <0.01) of ordering
occur at numerosities 6 - 8 for Subjects 1 and 2, and at 7 - 8
for Subject 4. For some subjects, both the error rate and
MRT are nevertheless increasing over these intervals.
Also, these cumulative distributions have significantly
lower tails, that is, they do not saturate as quickly, even
when MRT is smaller. Both these results are consistent
with the assumption of a mixture of slow and fast enumeration processes at the largest numerosities, which could
indicate sophisticated guessing (see above).
85
a
1
1
0.9
0.8
0.7
0.6
.>_
0.50.4-
0
0.3-
0.2-
0.1 -
0-
0,3
0.5
.
0.9
1,1
1.3
1.5
1.7
1,9
i
17
'.
1.9
Response ]]me ($ecs)
C
1
0.9
1
0.8
0.7
-~
o.6
c~
0.5
0.4
0.3
0.2
O.o
0.3
i
05
'.
07
i
09
'.
11
'
1,3
15
'
0.3
0.5
Response Time (Secs)
0.7
0.9
1.1
1.3
1.5
1.7
1.9
Response Time ($ecs)
Fig. 3. Cumulative-RTdistributionfunctionsby subject and displaysize.
Hazard-rate functions
Recall that the hazard-rate function represents the likelihood that a response occurs at time t, conditioned on it not
occurring prior to t. An ordering in the hazard-rate functions implies an ordering in the cumulative-distribution
functions and in the MRTs; hence there is at least the
potential to resolve some of the cases cited above in which
the cumulative ordering was not consistent.
Estimates of the hazard-rate functions were obtained by
the method of "random smoothing" (Miller & Singpurwalla, 1977; see Appendix A). These estimates are
shown in Figure 4. The results are complex in the sense that
strong violations of ordering occur in isolated cases for
Subjects 1-3, but not for Subject 4. In the tails, the ordering relations appear similar to results from the cumulative
distributions, but there are some striking exceptions. For
example, for Subject 3, the hazard-rate functions for 1 and
2 elements cross roughly 100 ms sooner than the point at
which the cumulative distributions intersect, and the noisy
indeterminacy in the cumulative estimates becomes a
strong violation in the hazard-rate function.
A more striking result than the order of these functions
is their shape as a function of the display conditions. For
Subjects 1-3, the hazard rate at small numerosities rises to
a peak, then descends to a flat, non-zero tail, indicating that
the density function is asymptotically exponential. For the
largest displays, the functions are monotonic with a constant tail. A very similar pattern has been found for RT to
detect a change (simple RT), with change of intensity playing the same role as numerosity in modulating the shape of
the function (Burbeck & Luce, 1982). This result is important for two reasons. First, it provides converging evidence
of the validity of the estimates, which is important because
of the large estimator variance that must exist in the tail.
Second, it suggests that a very general property of the
human information-processing system is responsible for
the decrease in response rate with time.
Likelihood ratios
The final distributional analysis of stochastic dominance is
the likelihood-ratio test given by Equation 3. To perform
this test, Ashby, Tein, and Balakrishnan (1991) suggest
plotting the latency ROC curve, i.e., 1-Fk-l(t) against 1F~(t). A well-known result from signal-detection theory
states that the likelihood ratio is nondecreasing if and only
if this ROC curve is convex (Laming, 1973; Peterson,
Birdsall, & Fox, 1954). Because there are too many data to
86
a
0.019
0,018
,
0,018 -I
1,2
0,017
3
0.016
2
0,017 -4
Subi 2
0.018 -I
0.015
0.014
4
0.015 -I
SuN 1
0.014 -t
0.010
0.013 -4
0.012
0.012 "4
0.011
0.011 -I
0,01
D.01 -~
0.009
,.,
(1O09 -I
0.008
O,OO8 H
0,007
0.007 -I
0.00,5
( l o 0 6 -I
0,OO5
0,005
(~004
(1O04 H
0.O03
0.003 -I
0.002
0,002 -I
H
0.001 H
O.O01
0
i
0.0
'
0.5
t
017
i
09.
t
i
1.1
~
I
1,3
J
t
0.3
1,5
0.5
0,9
t
t
1.1
i
t
E
1,0
1.5
Response Time ($ecs)
Response Time (Sees)
C
0.7
0.04 "
0,024
2
0,022
0.035 "
$ubj 0
r t - ~ _ _
0.02
Subj 4
1
0.018
0.00 "
0,016
0.025 "
0.014
N
I
0.02 "
0.012
0.01
4
0.015 "
0.008
4
5
0.006
0.01 -
0.004
0.002
i
0.3
J
i
0.5
6
0,005 -
i
~
0.7
J
0.9
E
t
1.1
i
r
0
i
1.3
1.5
0.3
0,5
0.7
0.9
1.1
1.3
Response Time (Secs)
Response ~irne ($ecs)
Fig. 4. Hazard-rateestimates with randomsmoothing (Miller & Singpurwalla, 1977).
present in figures, the following is a summary of the basic
results.
In many instances, the evidence for dominance is quite
strong. Violations once again are frequent at numerosities
greater than 6 (as should be expected, given the previous
analyses). Also, for Subjects 1 - 3 there is a consistent
tendency for the extreme right tail of these functions (i. e.,
the very slow responses) to become linear with slope 1; that
is, there is little difference in density for very long RTs
both within and beyond the subitizing range. Similar violations occur at the lower 5th- 10th percentiles for S ubj ects 2
and 3; otherwise the effect is limited to the fight tail. Note
that very slow RTs are typically excluded from RT analyses, since they are likely to result from less-than-perfect
measurement conditions and lack of perfect attentiveness
on the part of subjects. Overall, the data suggest a very
strong stochastic dominance up to the highest level.
Summary
There are small but frequent violations of stochastic dominance in the extreme right tail of the RT distributions.
Apart from this the data are consistent with the view that
the amount of mental effort (as measured by processing
time) that is necessary to complete the task increases with
each increase in the number of display elements up to at
least 6. No substantial evidence is obtained for a unique
process that operates up to 3 or 4 elements. Although it still
may be possible to account for the lack of a discontinuity in
the RT data by assuming that the limit of the subitizing
function is random between trials, the evidence as it stands
does not favor this view over a single-process model for
displays up to 6.
Testing exponential pure insertion
Since the results in Figure 3 provide fairly strong evidence
for an ordering at the level of cumulative distributions, a
test of exponential pure insertion is warranted. The first
prediction to be examined is therefore the expected mean
and variance of the inserted component (see Equation 5
above). These data are given in Table 2, for each subject by
display numerosity. In several cases the predictions are
quite accurate, and there is general agreement between the
strength of the ordering in Figure 3 and the Equation 5
prediction. Further, for at least 3 of the 4 subjects there
1.5
87
a
0,009
0.006
1
2
0,008 t
0.005
1
0.007
3
0,006
0.004
4
0.005 -
2~ 0.003
-g
0.002
5
0.001
0
I
i
i
O.2
i
i
,
i
i
0.009
0.011
i
i
0.8
Response
C
i
.
Time
i
t
1
i
i
1.2
1.4
(Sees)
-
1
0.01 -
0.008
0.009 -
3
0.007
0.008 0,006
~'
{
0.007 -
0.005
0.006 -
0.004
0.005 -
¢
0.004 -
0.00.3
0.003 -
0.002
0.002 -
0.(3173
0.001 0
°
'
0 i2
'
0 i4
'
0,6
'
Response
'
0 i8
Time
($ecs)
.
. 1 .
.
.1.2
" 1.4
'
o12
i
2
i
0,4
i
i
0,6
Response
i
0,8
Time
i
i
1
i
~
1,2
h
I
1.4
(Sees)
Fig. 5. Adaptive smoothing estimates of the RT-densityfunctions by subject and display size.
Table 2. Test of Equation 5 predictions for mean response time (MRT)
and variance (Var) as they increase with numerosity.Data were censored
at _+2.5 SDs. *** indicates a decrease in estimated variance (i.e.. the
model fails), and times are in ms.
Group 1 (Accuracy Instructions)
Subject 1
MRTk-MRT~I
-1
40
129
296
239
53
-379
Subject3
84
1
97
96
203
39
75
Subject 2
(Vark-Varbl)l/2
25
23
150
268
413
427
***
MRTk-MRTM
53
47
70
114
157
-82
-t08
(Vark-Va~-t)l/2
20
49
67
119
166
***
***
Group 2(Speedinstructions)
SuNect4
***
39
78
105
176
172
222
59
32
100
213
151
122
61
37
***
59
156
173
189
***
appears to be no systematic pattern of violation of the
predictions. Subject 4's data are unique in that the predicted values are consistently below the actual values
within the subitizing range.
The next step is to plot the RT-density functions, and to
determine whether these are unimodal and satisfy the condition thatf~_l(t) =fk(t) at the mode offk(t). Figure 5 shows
the estimated functions for each of the 4 subjects, for
numerosities 1 - 6 (higher numerosities would confuse the
figures somewhat and are unnecessary for the purposes of
this preliminary test). The estimates were obtained by the
method of "adaptive kernels," one of the most efficient
methods known (Silverman, 1986; see Appendix B).
Obviously, the density estimates become increasingly
multimodal as the smoothing is decreased or the sample
variance is increased (i. e., increasing numerosity). Even so
there does not appear to be any consistent or robust evidence for bimodal density functions near 4 elements. Rather, the functions appear to be unimodal and skewed to the
right, much as they are k n o w n to be in other choice-RT
experiments (Hockley, 1984; Luce, 1986; Ratcliff, 1978).
Hence there is no evidence of a bimodality that could
provide a compelling argument for a shift in numerical
processes.
88
a
045 ,
03
•
04
rim:
•
&85
m~
+
•
•
•
o25
••
•
01
;
mm
Ol.
o15
:
•
mira
i
0°2
o~
0"2
'
'
'
'
'
-oo2° F -
,
,
,
,
0.350.3°"4)
•
,,
•
=
•
"~
.
i.I_~
•
l
n,~']
q
03 -4
o.,
•
•
.
0i02~
0
•
•
•
•
•
~
•
&.
•
•
i1,~-
N_.i
o15 -
•
•
•
•
J
•
mm
mm
•
mim
••
$
in
•
•
•
m
•
•m•mmmm•i
•
m_
o.,-
--
Om
0O5~
,
Of&
~
Om~
f4(t)
Fig. 6. Scatterplots offk(t) versus
•
•
02
•
•
0,~2
•
~"e 0.25-
•~
'in .
,
•
~
•
0,151
-0.05 /
•
•
02
--== • • •
$
. . . .
•
1
,
d 0.45
•
•
025
,
f3 (t)
•
~..
v
m
%
mm
f2 (t)
C
m m
•
oo,
"
'
i
•
02
0"08
006 ~
• "'
!
•
o.,4
•
o 1/__~
~ m ~
')
•
Ol I
ooo
:
=
•
o2 1
•
o.24
o.~'2
•
03
i•
025
mm
Fk-t(t)
- F~(t)
43.05
'' '
,
0'~1
,
Om~2 '
Om~3
,
O'&
f5(t)
for Subject 2.
In many cases the density estimates do appear to intersect close to a mode, thereby justifying the more rigorous
Equation 4 test. Note that, from the point of view of probability theory, the RT distribution of a mixture of processes
is not likely to have an exponential distribution, whereas
that of a single process can very naturally assume this
form. Further, Ashby and Townsend (1980) found fairly
strong support for exponential pure insertion with each
increase in memory-set size in a memory-scanning experiment. If subitizing is replaced by a combination of subitizing and other operations at some limiting numerosity, then
the exponential-insertion model might be expected to hold
within the subitizing range, but not at its limits or beyond.
The fit of the model will also be examined for the cases in
which the preliminary tests suggest a strong violation,
since this provides a useful standard for comparison.
Distribution-ratio test
Figure 6 shows the results of plottingf~(t) against [F~_](t)
- Fk(t)] for one subject (Subject 2). Table 3 gives the
regression coefficients and significance values for the intercept. The density estimates are obtained with h = 25 (see
Appendix B), and the cumulative distributions are estimated as above.
The results in Table 3 may be summarized as follows.
First, the model does poorly for Subject 1 at all numerosities (where the intercept is not significantly different from
zero, the regression fit is relatively poor compared to other
subjects), but this contrasts with very good fits overall for
the other 3 subjects. Note that Subject 1 was significantly
more accurate, and also showed considerably larger RT
variance (see above). Second, the model is soundly rejected in all cases for numerosities 7 - 8 . Third, in 3 of 4
cases, the model is also rejected for numerosities 6 - 7 .
Finally, the increase in MRT is well predicted by the slope
of the regression lines for numerosities up to 6 (compare
also the variances in Table 2). Thus, for numerosities up to
6, the model receives fairly good support.
The first panel of Figure 6 represents one condition in
which preliminary tests indicated that the model should
fail. The bowed shape of the plot indicates a significant
linear trend (and possibly higher-order terms as well) in the
89
Table 3. Linear-regression analysis on Fk l(t)-Fk(0 =0~fk(t)+13, by
instruction group and stage. The empirical increase in MRT is denoted by
A g, cz is the slope of the regression line, [3is the intercept, and R is the
multiple-regression value.
Group 1 (Accuracy)
Subject
1
Stage
1-2
2-3
3-4
4-5
5-6
6-7
7-8
Ag
5
31
163
288
258
64
-350
2
~
-4
43
126
308
156
-93
-63
~
.001
-.001"
.013"
-.023*
-.019"
.011'
-.063*
R
.637
.990
.974
.914
.858
.741
.532
Ag
49
50
70
125
163
-83
-121
~
48
49
69
I19
166
-57
-72
13
.001
-.001
.002
-.002
.00t
-.014"
-.033"
R
.907
.993
.943
.979
.942
.712
.784
c~
57
29
96
208
149
131
-71
13
.001
.001
.004
.011
-.008
-.013
.015"
R
.945
.950
.931
.880
.928
.943
.895
Group 2 (Speed instructions)
Subject
3
Stage
1-2
2-3
3-4
4-5
5 -6
6-7
7-8
Ag
79
7
92
105
205
63
56
4
o~
68
-4
98
106
199
4
13
13
.007
.004*
-.001
-.004
.001
.017"
.023*
R
.816
.560
.963
.981
.962
.091
.162
Ag
59
31
103
231
155
130
-71
Note: The slope of the regression line, c~, should predict the MRT
increase (Ag) for successive numerosities. Significant intercepts, p <.05,
are violations of the model and are indicated by an asterisk. Each of the
regression-line slopes was significant, p <.01.
ratio of Equation 4. A similar, but less pronounced, effect
is detectable in several o f the other cases as well, even
though the data are well approximated by a linear function.
At a sufficiently detailed level of analysis, then, the model
may be rejected. However, the fact that the model performs
extremely well in some cases (e. g., for Subject 2) suggests
that the correct model m a y include the exponential model
as an important special case.
Summary
The exponential pure-insertion model provides a g o o d first
approximation to the data, but nevertheless can be rejected
under close scrutiny. The correct description m a y be a
similar, but slightly more general, model, in which the
assumptions o f pure insertion hold, but the inserted processing-time distribution perhaps comes from another
m e m b e r of the exponential family. Once again there is no
indication of a statistical discontinuity pointing to a change
in numerical process prior to 6 elements. The density functions appear unimodal and skewed to the right, and the fit
of the exponential pure-insertion model is good both within
and b e y o n d the putative subitizing range.
Conclusions
Various distributional tests were used to examine whether
statistical measures of enumeration time can serve as a
benchmark for the existence and limits o f a unique numerical process in humans. N o such measure was discovered.
Instead, the distributional analyses suggest a continuous
increase in the mental effort required to enumerate for
displays up to 6 elements. Although the absence of a statistically robust discontinuity might be explained by some
variation in the capacity of the proposed mechanism, such
arguments will not be compelling without some form of
empirical support.
There are currently two alternative accounts of the dramatic change in enumeration time as numerosity increases
that do not appeal to a specialized numerical process. One
of these proposes that subjects are able to recognize
canonical geometric patterns existing in two-dimensional
displays o f small numerosity (Mandler & Shebo, 1982;
Neisser, 1967). In the experiments reported above, however, the elements were always presented in a one-dimensional, linear array, yet subjects were fast and accurate at
the smallest numerosities. It is not obvious how a patternrecognition process could explain these results.
The alternative theory is that subitizing is a reflection of
the limited capacity of visual attention, which will manifest
itself in almost any simple cognitive task. In several respects the RT data would be more consistent with this point
of view. For example, the distributional analyses suggested
that a single-process model would be capable of predicting
the enumeration data for up to about 6 elements. This value
is within the range of Miller's (1956) classical estimate of
attentional capacity. The fact that subitizing is usually considered to be limited to only 4 elements can be explained
by the choice of statistical measure. That is, Miller's estimates are based on 50% accuracy of responses, whereas
subitizing is described as fast and very accurate enumeration. W h e n extremely high accuracy (e. g., 99% correct) is
used as the criterion for other tasks, then about 3 or 4
independent items is the limit (Broadbent, 1975). To conclude, then, it appears that there is very little evidence that
subitizing is a unique numerical ability, and ample evidence that it is not.
Acknowledgements.This research was supported in part by National
Science Foundation Grant BNS88-19 403 to the second author. The first
author was also supported by grants MH44 640 to Roger Ratcliff and
AFOSR 90-0246 (jointly funded by NSF) to Gail McCoon.
Appendix A
The random smoothing method of Miller and Singpurwalla (1977) is an
"adaptive smoothing" algorithm in which the time interval of the estimated hazard rate is adjusted in order to maintain a fixed sample size
within the smoothing window. For each estimate, the smoothing constant
was set to 100, for purposes of figure legibility; the results were fundamentally the same with a constant as small as 3, then with the application of a 10-ms Hamming window (in essence, a running mean within a
window) to these estimates.
90
Appendix B
The adaptive-kernels estimate of the density function is obtained as
follows. Let {Xi} be the ordered set of RTs for a given stimulus numerosity, and letf(Xi) be a "pilot estimate" of the density function at time Xi.
Local bandwidth factors, )~j,are computed using
where g is the geometric mean of thef(Xi), and o~is a sensitivity parameter set to 0.5. The adaptive kernel estimate off,(t) is then given by
1
n i=L h
i
where K is a kernel function and h is the smoothing-window size. The
function chosen for K was the Epanechnikov kernel,
K (t): U ~
t ,
-45 _<t_< 4~
L 0,
otherwise
The pilot estimate was obtained using the method of Parzen (1960) with
a small Gaussian kernel. The density estimates in Figure 5 were then
obtained with h = 25.
References
Ashby, F. G. (1982). Testing the assumptions of exponential, additive
reaction time models. Memory & Cognition, 10, 125-134.
Ashby, F. G., & Townsend, J. T. (1980). Decomposing the reaction time
distribution: Pure insertion and selective influence revisited. Journal
of Mathematical Psychology, 21, 93 - 123.
Ashby, F. G., Tein, J., & Balakrishnan, J. D. (1991). Response time
distributions in memory scanning. Submitted.
Balakrishnan, J. D., & Ashby, F. G. (1991). Is subitizing a unique
numerical ability? Perception & Psychophysics, 50, 555-564.
Broadbent, D. E. (1975). The magic number seven after fifteen years. In
A. Wilkes & A. Kennedy (Eds.), Studies in long term memory. New
York: Wiley.
Burbeck, S. L., & Luce, R. D. (1982). Evidence from auditory simple
reaction times for both change and level detectors. Perception &
Psychophysics, 32, 117-133.
Cattell, J. M. (1886). Ueber die Trfigheit der Netzhaut und des Sehzentrums. Philosophische Studien, 3, 94-127.
Chi, M. T. H., & Klahr, D. (1975). Span and rate of apprehension in
children and adults. Journal of Experimental Child Psychology, 19,
434 -439.
Colonius, H. (1990). Possibly dependent probability summation of reaction time. Journal of Mathematical Psychology, 34, 253 - 275.
Hockley, W. E. (1984). Analysis of response time distributions in the
study of cognitive processes. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 10, 598 - 615.
Kaufman, E. L., Lord, M. W., Reese, T. W., & Volkmann, J. (1949).
The discrimination of visual number. American Journal of Psychology, 62, 498-525.
Klahr, D., & Wallace, J. G. (1976). Cognitive development: An infonnationprocessing view. Hillsdale, NJ: Erlbaum.
Krueger, L. E. (1984). Perceived numerosity: A comparison of magnitude production, magnitude estimation, and discrimination judgements. Perception & Psychophysics, 35, 536-542.
Laming, D. (1973). Mathematicalpsychology. New York: Academic
Press.
Luce, R. D. (1986). Response times: Their role in inferring elementary
mental organization. New York: Oxford University Press.
Mandler, G., & Shebo, B. J. (1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General,
111, 1 - 2 2 .
Miller, D. R., & Singpurwalla, N. D. (1977). Failure rate estimation
using random smoothing. National Technical Information Service,
No. AD-A040 999/5ST.
Miller, G. A. (1956). The magical number seven plus or minus two:
Some limits on our capacity to process information. Psychological
Review, 63, 81-97.
Neisser, U. (1967). Cognitive psychology. Englewood Cliffs, NJ: Prentice-Hall.
van Oeffelen, M. P., & Vos, P. G. (1982a). Configurational effects on
the enumeration of dots: Counting by groups. Memory & Cognition,
10, 396-404.
van Oeffelen, M. P., & Vos, P. G. (1982b). A probabilistic model for the
discrimination of visual number. Perception & Psychophysics, 32,
163 - 170.
Oyama, T., Kikuchi, T., & Ichihara, S. (1981). Span of attention, backward masking, and reaction time. Perception & Psychophysics, 29,
106-112.
Parzen, E. (1960). Modern probability theory and its applications. New
York: Wiley.
Peterson, W. W., Birdsall, T. G., & Fox, W. C. (1954). The theory of
signal detectability. Trans. IRE Professional Group on InJbrmation
Theory, PGIT-4, 171-212.
Ratcliff, R. (1978). A theory of memory retrieval. PsychologicalReview,
85, 59-108.
Ratcliff, R. (1988). A note on mimicking additive reaction time models.
Journal of Mathematical Psychology, 32, 192- 204.
Silverman, B. W. (1986). Density estimation for statistics and data
analysis. London: Chapman & Hall.
Simons, D., & Langheinrich, D. (1982). What is magic about the magical number four? Psychological Research, 44,283 - 294.
Starkey, P., & Cooper, R. G. (1980). Perception of number by human
infants. Science, 210, 1033-1035.
Starkey, P., Spelke, E. S., & Gelman, R. (1983). Detection ofintermodal
numerical correspondences by human infants. Science, 222
179-181.
Sternberg, S. (1969). The discovery of processing stages: Extensions of
Donders' method. In W. G. Koster (Ed.), Attention and performance
(Vol. 2). Amsterdam: North Holland.
Townsend, J. T. (1988). The truth and consequences of ordinal differences in statistical distributions: Toward a theory of hierarchical
inference. Psychological Bulletin, 108, 551 - 567.
Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. Cambridge: Cambridge University
Press.
Von Szefiski, V. (1924). Relation between the quantity perceived and the
time of perception. Journal of Experimental Psychology, 7,
135-147.
Woodworth, R. S., & S chlosberg, H. (1954). Experimental psychology.
New York: Holt.
Wundt, W. (1896). Grundrig der Psychologic. Leipzig: Engelmann.