PsychologicalResearch PsychologischeForschung Psychol Res (1992) 54:80 90 © Springer-Verlag 1992 Subitizing: Magical numbers or mere superstition? J. D. Balakrishnanl and F. Gregory Ashby 2 i Department of Psychology, Northwestern University, Evanston, IL 60208, USA 2 Department of Psychology, University of California, Santa Barbara, CA 93106, USA Received May 7, 1991/Accepted September 12, 1991 Summary. It is widely believed that humans are endowed with a specialized numerical process, called subitizing, which enables them to apprehend rapidly and accurately the numerosity of small sets of objects. A major part of the evidence for this process is a purported discontinuity in the mean response time (RT) versus numerosity curves at about 4 elements, when subjects enumerate up to 7 or more elements in a visual display. In this article, RT data collected in a speeded enumeration experiment are subjected to a variety of statistical analyses, including several tests on the RT distributions. None of these tests reveals a significant discontinuity as numerosity increases. The data do suggest a strong stochastic dominance in RT by display numerosity, indicating that the mental effort required to enumerate does increase with each additional element in the display, both within and beyond the putative subitizing range. Introduction When a small set of discrete elements is briefly presented to human observers, the number of elements, or "numerosity," of the set is ascertained quickly and accurately, with little or no feeling of conscious effort (Chi & Klahr, 1975; Kaufman, Lord, Reese, & Volkmann, 1949; Mandler & Shebo, 1982). Kaufman et al. (1949) called this remarkable capacity subitizing, and contrasted it with the lengthy and error-prone enumeration of larger sets. Although its definition can be strictly tied to performance measures, subitizing is commonly thought to represent a unique numerical ability, perhaps present even at birth (Starkey & Cooper, 1980; Starkey, Spelke, & Gelman, 1983). The existence of a fast, "automatic" process with limited scope has been hypothesized for many years, dating back to the earliest studies of"span of attention" or "immeOffprint requests to: J. D. Balakrishnan diate apprehension" (Cattell, 1886; Wundt, 1896). Despite this long history, however, the statistical evidence for such a process remains weak. Much of it depends upon the exact shape of the function relating mean response time (MRT) to numerosity in the speeded-enumeration experiment. Methods of corroborating theories about subitizing using these RT data have not included rigorous statistical tests. Two quantitative models of the RT versus numerosity function have been proposed. The first of these (Chi & Klahr, 1975; Klahr & Wallace, 1976; Oyama, Kikuchi, & Ichihara, 1981; Simons & Langheinrich, 1982) partitions the function into three separate regions, each associated with a different numerical procedure. The first region (subitizing) is linearly increasing with a small slope (about 50 ms) from 1 to 3 or 4 elements. The second region is linearly increasing with a considerably larger slope (about 250 ms) from 4 to 6 or 7 elements. The large slope is thought to reflect covert counting, or subitizing of subgroups and addition of the partial sums (Klahr & Wallace, 1976; van Oeffelen & Vos, 1982a). Both functions are obtained by simple regression on MRT, and the large difference in slopes is seen as primafacie evidence of the existence of two distinct enumeration processes. When display duration is limited and the stimulus numerosity exceeds about 8 elements, there is little or no change in the MRT function, although the error rate continues to increase. The third region is therefore defined by a constant, asymptotic RT equal to about 1.5 s (Mandler & Shebo, 1982; Kaufman et al., 1949). Such performance is usually attributed to a gross estimation process (Kaufman et al., 1949; Krueger, 1984). The complete model for brief stimulus displays may be defined as R T i 4 = a l i + b l + hi,j, i<m R T i , j = a 2 i + b 2 + hi,j, m<i<l RTi,j= c + nl,j, l+l <_i (1) where i is the stimulus numerosity, j is the trial number, m is 3 or 4, l is about 6 or 7, n is zero-mean noise whose 81 variance depends upon i for i up to l, and ak and bk are free parameters. Because the theoretical issues of most interest depend upon the linearity of the first two regions of the MRT versus numerosity function, this model will he referred to as the "bilinear" model. The alternative model postulates a single, exponential increase in MRT for up to about 6 elements (Kaufman et al., 1949; Von Szeliski, 1924). Formally, R Ti,j = ~ea~ i + ni,j, l <_ i % w , RTi,j= w+2<_i, c + ill,j, (2) where w is 6 or 7, o~ and [5 are free parameters, and i and n are defined as before. One motivation for the Equation 2 model comes from theories of psychophysical discrimination. Specifically, enumeration might be achieved by correlating numerosity with physical properties of the display, such as energy, local density, or contrast. Under this view, the total RT is dominated by encoding processes in pattern recognition rather than by numerical processes per se. Recognition of "canonical" geometric patterns (i. e., "dots," "lines," and "triangles") has also been held to account for the extremely rapid enumeration of small sets (Mandler & Shebo, 1982; Neisser, 1967; Woodworth & Schlosberg, 1954). Elsewhere (Balakrishnan & Ashby, 1991) it was shown that neither of these two models provides an adequate account of the MRT data. The bilinear model was ruled out because the slope of the MRT function was consistently increasing, and the exponential model was ruled out because the slope was not proportional to MRT. A more complex relationship between numerosity and enumeration processes is therefore indicated, which is unfortunate in the sense that there are not sufficient data in the traditional measures (MRT and percentage correct) to evaluate the more complex models statistically. In the present paper, very large samples of responses were collected from individual subjects who participated over a two-week period. These data were used to obtain accurate estimates of the complete RT distributions, and therefore to test several general classes of RT models that postulate a limited-capacity numerical process. In addition to a more rigorous test of the discontinuity hypothesis, the tests establish a detailed set of criteria by which future theories of enumeration processes can be judged. The analyses include tests of stochastic dominance in RT, estimates of the number of modes in the RT-density functions, and tests of independent, discrete stage models of enumeration. More complete descriptions of the nature and purposes of these tests are given in the two sections that follow. Stochastic dominance in l i t The central empirical question to be addressed in this article is whether a capacity-limited numerical process manifests itself in the RT data of speeded enumeration. If there is evidence of a qualitative shift in the effect of display numerosity on the capacity to enumerate, then this would constitute important evidence of such a process. If no such evidence exists, then the dual-process model is more difficult to justify and the single-process model becomes more attractive. Given that the summary statistics, i. e., MRT, RT variance, and percentage correct, may not be sufficiently powerful to decide the issue, it is necessary to consider other measures as well. Therefore, a means of choosing these measures and assessing their implications for capacity is needed. Assuming that subitizing is a non-deterministic process whose completion time varies from trial to trial, then there may be no fixed ordering of completion times from one trial to the next, even when the task load (i. e., stimulus numerosity) is increased. We therefore introduce the notion of stochastic dominance (Townsend, 1988; Townsend & Ashby, 1983). Stochastic dominance is the ordering of a probability measure that is defined over a set of observations. Although any number of such measures might be chosen for analysis, some of them are more appropriate tests of models than others. One of the weakest measures of stochastic dominance is currently the most popular one in experimental psychology, i.e., the MRT. This statistic is a weak measure in the sense that an ordering in MRT need not imply an ordering in some of the other, more "sufficient" RT statistics, in particular those related to the complete RT distribution. Three of these distribution functions are of particular importance: (1) the cumulative RT distribution, F(t) = P(RT _<t); (2) the RT density function, f(t) =d/dtF(t); and ( 3 ) t h e RT hazard-rate function, h(t) =f(t)/[1-F(t)]. Although each of these functions is uniquely determined by each of the others, there are reasons for empirically estimating them separately. Townsend and Ashby (1983) showed that the strongest measure of stochastic dominance occurs if Lk j(t)= - fk l (t) - f~(t) , (3) is nonincreasing in t, where k is the experimenter-determined load. This ordering is strongest in the sense that it implies an ordering in the hazard-rate functions, which implies an ordering in the cumulative distributions and in MRT. The complete dominance hierarchy is Lk-~ (t) non-increasing in t hk-l(t) >-- h k ( t ) F~_fft) >_ Fk(t) MRTk >__MRTk-1. Distribution models Obviously there are reasons for estimating RT distributions besides establishing their dominance relations. Also of interest is the family of functions that might best describe these distributions and their theoretical justification. Such an analysis should have some impact on the question of a discrete shift in numerical processes. It so happens that one of these three functions, the hazard-rate function, is particularly suited to the evaluation of RT models (Luce, 1986; Townsend & Ashby, 1983). The hazard-rate function is the instantaneous response rate - that is, the likelihood that a 82 response occurs at time t, given that it has not yet occurred. Its basic shape, increasing or decreasing, can therefore more clearly reflect the temporal changes in a system's capacity to perform. An example of the diagnosticity of this function is the fact that empirical estimates of it unequivocally rule out the traditional candidates for the RT distribution, including the gamma, log-normal, and ex-Gaussian (an exponential convolved with a normal distribution) for signal detection times ("simple RT"), even though these distributions appear to provide quite adequate fits to the RT density (Burbeck & Luce, 1982; Luce, 1986; Ratcliff, 1978). We show here that these distributions may be similarly rejected in the case of the speeded-enumeration experiment, and elsewhere (Ashby, Tein, & Balakrishnan, 1991) we show the same result for the memory-scanning experiment. The consistent pattern of results for both simple- and choice-RT experiments has broad implications for RT theory. Exponential pure insertion Suppose that the random variable describing the total RT in the choice-RT experiment can be decomposed into a sum of independent random variables, with some of these variables being directly connected to controllable stimulus conditions. Specifically, assume that by the addition of an element to the stimulus display in the enumeration task, a component-processing time is added to the sum without affecting the other components. Such conditions on the effects of stimulus manipulations are known as "pure insertion" (Sternberg, 1969). Note that independence is possible without pure insertion holding, and similarly pure insertion can hold without independence. Various structural assumptions, including both parallel and serial models, can lead to one or the other (or neither) state of affairs (Colonius, 1990, on dependence plus pure insertion; Townsend & Ashby, 1983, on parallel versus serial models). Now suppose that the distribution associated with the added component is exponential, with processing rate )~k. Ashby and Townsend (1980) showed that under these assumptions, fk(t) Fk 1(t) - Fk(t) )~ (14) for all t for which the ratio is defined. This means that a good test of the model would be to plot the estimated ratio as a function of t, and verify that there is no significant trend with time, or alternatively to plotf~(t) against F k - l ( t ) - Fk(t) and verify that it is linear with zero intercept and slope equal to 3~k. A statistical test is therefore possible under the auspices of the general linear model. Before evaluation of the Equation 4 ratio, several simple preliminary tests of the exponential insertion model may be sufficient to reject it at the outset. First, the assumption of additivity plus independence implies an ordering at the level of the cumulative-distribution functions (Ashby, 1982). Estimates of the cumulative distributions from the cumulative RT histogram are easily available, and probably adequate. A second test arises from the fact that the mean and standard deviation of the exponential function are equal. Because the valance of a sum of independent random variables is the sum of their variances, this property implies that the mean and variance estimates should satisfy the following relation: MRTk- MRT~-I = [Vark- Vark 1] 1/2 (5) Although variance estimates have a relatively large standard error, it is necessary to obtain large samples to estimate the distribution functions as well. Finally, taking the derivative of Equation 4 with respect to t, Ashby (1982) noted that d ~tfk(t) = )~k Ilk-1 ( D - fk(t)]. (6) which implies that the density functions f k - l ( t ) and fk(t) should intersect at the mode offk(t), where its derivative is zero. The result in Equation 4 is stated in an "if and only i f ' fashion, and so is potentially a very strong test of the exponential pure-insertion model. It is always possible, however, that other models that do not assume pure insertion to hold might approximate the Equation 4 relation. Ratcliff (1988) found that a diffusion model of memory scanning (essentially a continuous-time random walk) can mimic the pure exponential-insertion assumptions to some extent. The question of identifiability ultimately remains, but it is nevertheless important to establish the degree to which the additive independence model is satisfied statistically, because this places some important constraints on the models, continuous or discrete stage, that should be considered. For example, Ratcliff (1988) also found that very impressive fits to the RT-density functions produced by the diffusion model yielded very poor fits to the ratio of the distributions given by Equation 4. With respect to subitizing, if the model holds well within the subitizing range, but not beyond, or if the reverse is true, then this would be evidence for a change in process. If there is no change in fit, then this places important further constraints on subitizing models that do assume a change in process. In the experiment described below, each of four subjects participated in many experimental sessions, providing a large corpus of RT data that could be used to investigate each of the statistical tests outlined above. The results suggest that there is no statistical discontinuity in enumeration. Rather, the temporal demands of enumeration increase continuously with increases in stimulus numerosity, both within and beyond the putative subitizing range. Method Subjects. Four undergraduate students at the University of California, Santa Barbara, participated.Theywere paid $ 5.00 per hour. Stimuli. The stimuliwere horizontallylinear arraysof 1- 8 solid colored blocks. The blocks were presented on a grayish-whitebackground for 200 ms, followedby a dark field. Differentcolors were used, in orderto decrease the correlationbetween nurnerosityand stimulus energy. The display monitorhad a verticalrefreshrate of 60 Hz; 8 colors were used: 83 1.4 attempt to keep their error rates at less than 10% for each numerosity. Current error-rate feedback was given after each error trial and at the end of the practice session. For Group 2, speed was emphasized, no reference was made to accuracy, and no feedback was provided. Group-1 subjects were informed about the maximum display size, Group-2 subjects were not. This manipulation for Group 2 was also intended to reduce the potential for alternative strategies at large displays (see below for discussion). The experiment required approximately 10 2-hour sessions for each subject. For the first of these sessions, subjects received 80 practice trials, 10 per numerosity. For subsequent sessions, there were 16 warm-up trials, 2 per numerosity. 1.31.2 " Ix 1.1 ~- 0.9 0.8 0.70.80.80.4 1 0.90.8- Results and discussion 0.7- '~ 0.6 #" 0.5 0.4 0.3 0.2 0.1 0 1 0.90.8.~ = 8 0.7- 0.60.5- 0.40.3- * 8ubj 3 A $ubj 4 0.2 Numorosity Fig. 1. Summary statistics for speeded enumeration of linear displays of 200-ms duration. The RT mean and SD estimates are based on correct responses only. red, green, blue, yellow, pink, brown, light gray, and dark grey; luminance values were 7.30, 21.89, 56.62, 28.31, 62.46, 4.90, 21.30, and 9.43 cd/m 2, for the 8 colors, respectively. These colors were selected randomly for each trial, without replacement. The locations of the blocks on a given trial were based on a fixed template. For more than two-element displays, the left- and right-most locations of the template were always used; hence the total length of the display was fixed. The position of a single-element display was chosen randomly from the set. Visual angle of the individual blocks was 0.24 ° horizontally, 1.22 ° vertically. For two of the subjects (Group 1), there were 10 locations in all, and the total visual angle was 6.47 °. For the other two subjects (Group 2), there were 14 locations in all and the visual angle was 8.35 °. The number of locations for Group 2 was increased in order to ensure that subjects could not count the number of empty locations (even though this would be difficult, since the actual locations were not marked). Procedure. Subjects wore a lapel microphone which was connected to a voice key (an in-house device) with an input to the con~'olling computer (IBM XT). The timing board in the computer was accurate to within 4 ms. To begin each trial, subjects depressed the space bar on the keyboard in front of them. They were instructed to enter on this keyboard their first response to the stimulus, regardless of whether they had subsequently changed their mind. For Group 1, accuracy was emphasized. The subjects were instructed to respond as quickly as possible, but also told that they should at all costs Summary statistics for each of the 4 subjects are given in Figure 1. As is customary, the RT measures are based on correct responses only. RTs less than 250 ms were excluded (less than 2% of the total sample), the cutoff being suggested by the virtual absence of responses between 250 and 350 ms. Within the subitizing range, that is, up to 3 or 4 elements, there is consistent evidence for an ordering in MRT by numerosity, under both instruction conditions. (Note that, owing to the very large sample sizes, the standard error of the estimated means is very small in relation to the increases seen in the figure.) In two cases this increase is quite small, i.e., numerosities 1-2 for Subject 1 (5 ms), and 2 - 3 for Subject 3 (7 ms). However, the RT variance does increase in both these instances. Note also that the increase in MRT from 1 to 2 is considerably larger than that from 2 to 3 for both subjects in the speeded-instruction condition. Balakrishnan and Ashby (1991) found this to be a consistent effect under similar instructions, in an experiment with many more subjects, but relatively smaller sample sizes per subject. Their stimuli included a condition in which the spacing between elements was fixed. Hence, this result cannot be accounted for by the more central location on average of the single-element display. At larger display numerosities, several additional effects are worth mentioning. First, Subject 1 was barely able to achieve the minimum targeted 90% accuracy for large displays, whereas Subject 2 clearly was not. Indeed, Subject 2's accuracy is no better than that of the speeded subjects, and the RTs are somewhat faster overall. Second, at the largest numerosities (i.e., 7-8), the slope of the MRT function decreases for subjects under speed instructions, and changes sign for subjects instructed for accuracy. Accompanying these RT effects is a continuous decline in response accuracy in the speed condition, and an increase for the largest numerosity in the accuracy condition. "Endanchor effects" such as the latter have been noted elsewhere (Kaufman et al., 1949; Klahr & Wallace, 1976). One account of them is that subjects begin to use partial information about the stimulus and a sophisticated guessing strategy when the display numerosity is large. Further evidence of such a strategy is discussed below. Next consider the two models of MRT discussed earlier. As noted previously, these models can be ruled out because of a consistent pattern of violations over subjects. Although this type of analysis is not possible in the present 84 1,5 Subj I 1.4 Table 1. Matrix of response probabilities and MRTs for data averaged over subjects. RT in ms is in 1st row, response probability in 2nd, for each display size. RTs for entries with response probabilities less than .005 are omittedfor purposes of legibility. 1.3 1.2 Subj 4 1.1 2 512 .969 3,350 .006 Display size 1 o.5o.760. o.o 65- 1 3 4 Response 5 6 7 8 9 Subj 3 ~ sobi2 558 .983 590 .991 o.4- 697 790 .961 .024 Numerosity Fig. 2. Least squares estimates of the exponentialmodel (Equation2) for MRT. Each functionis displacedverticallyby some amountfor purposes of legibility. 857 887 .017 .905 961 .062 1,130 1,189 .o65 .oo9 1,092 1,141 1,029 .790 .132 .008 1,040 1,139 1,129 .149 .692 .146 design, some of the same effects are illustrated in these data. For example, the bilinear model can be ruled out immediately for the speeded group, since the functions are noticeably nonlinear for numerosities 1 - 3. With elimination of the single-element conditions, the functions appear to be concave upward. For Subject 1, there is virtually no increase in MRT from 1 to 2 elements, but there is a noticeable increase from 2 to 3 elements. Hence the linear model can be ruled out for a different reason in this case. Violations of the exponential model are similarly robust. Figure 2 shows the best-fitting versions (in the sense of minimum sum squared error) of the Equation 2 model. The results suggest that log (RT) has at least a quadratic component, and possibly higher components as well. The error rates shown in Figure 1 suggest virtually perfect performance up to about 4 elements. Although accuracy data have sometimes been used to define subitizing, this is an ambiguous definition in the sense that many different experiments show a similar pattern of accuracy, whereas the RT data can be quite different (Broadbent, 1975). Table 1 shows the complete confusion matrix, including response proportions and MRT by stimulus-response pair, for the data combined over the 4 subjects. The most important results are that (1) errors begin to show a consistent pattern at 5 elements, i. e., they are more frequently caused by overestimation than by underestimation, but this proportion decreases with numerosity; and (2) MRT increases with numerosity reported. Both these results have some precedence in the literature (Oyama et al., 1981; van Oeffelen & Vos, 1982b). With respect to the more common statistical measures of performance, then, these data are qualitatively similar to previous results. On this basis it seems unlikely that unique features of the present experiment, such as the amount of practice that subjects receive or the display configurations used, could account for the model failures (see Balak- 1,133 1,002 969 1,067 .011 .289 .643 .049 rishnan & Ashby, 1991, for further arguments based on comparison among previously reported data). Cumulative-distribution functions Cumulative histogram estimates of the RT-distribution functions are given in Figure 3. Once again it is useful to describe these results first within and then beyond the supposed subitizing limit. At small numerosities, violations of ordering occur when the MRT difference is small (Subject 1, 1 - 2 ; Subject 3, 2 - 3 ) , but the dominance is reasonably good otherwise. Violations are mostly limited to the extreme tails. For example, the ordering is not consistent above the 95th percentile for numerosities 1 - 2 or below the 5th percentile for numerosities 2 - 3 in the case of Subject 2. However, a Kolmogorov-Smirnov test of the hypothesis that F~(t) > Fk-l(t) was not significant in any of these cases (maximum of maximum differences = 0.01, p >0.05), which suggests that the dominance may hold even in the extreme tails. Strong and statistically significant violations (minimum of maximum differences = 0.10, p <0.01) of ordering occur at numerosities 6 - 8 for Subjects 1 and 2, and at 7 - 8 for Subject 4. For some subjects, both the error rate and MRT are nevertheless increasing over these intervals. Also, these cumulative distributions have significantly lower tails, that is, they do not saturate as quickly, even when MRT is smaller. Both these results are consistent with the assumption of a mixture of slow and fast enumeration processes at the largest numerosities, which could indicate sophisticated guessing (see above). 85 a 1 1 0.9 0.8 0.7 0.6 .>_ 0.50.4- 0 0.3- 0.2- 0.1 - 0- 0,3 0.5 . 0.9 1,1 1.3 1.5 1.7 1,9 i 17 '. 1.9 Response ]]me ($ecs) C 1 0.9 1 0.8 0.7 -~ o.6 c~ 0.5 0.4 0.3 0.2 O.o 0.3 i 05 '. 07 i 09 '. 11 ' 1,3 15 ' 0.3 0.5 Response Time (Secs) 0.7 0.9 1.1 1.3 1.5 1.7 1.9 Response Time ($ecs) Fig. 3. Cumulative-RTdistributionfunctionsby subject and displaysize. Hazard-rate functions Recall that the hazard-rate function represents the likelihood that a response occurs at time t, conditioned on it not occurring prior to t. An ordering in the hazard-rate functions implies an ordering in the cumulative-distribution functions and in the MRTs; hence there is at least the potential to resolve some of the cases cited above in which the cumulative ordering was not consistent. Estimates of the hazard-rate functions were obtained by the method of "random smoothing" (Miller & Singpurwalla, 1977; see Appendix A). These estimates are shown in Figure 4. The results are complex in the sense that strong violations of ordering occur in isolated cases for Subjects 1-3, but not for Subject 4. In the tails, the ordering relations appear similar to results from the cumulative distributions, but there are some striking exceptions. For example, for Subject 3, the hazard-rate functions for 1 and 2 elements cross roughly 100 ms sooner than the point at which the cumulative distributions intersect, and the noisy indeterminacy in the cumulative estimates becomes a strong violation in the hazard-rate function. A more striking result than the order of these functions is their shape as a function of the display conditions. For Subjects 1-3, the hazard rate at small numerosities rises to a peak, then descends to a flat, non-zero tail, indicating that the density function is asymptotically exponential. For the largest displays, the functions are monotonic with a constant tail. A very similar pattern has been found for RT to detect a change (simple RT), with change of intensity playing the same role as numerosity in modulating the shape of the function (Burbeck & Luce, 1982). This result is important for two reasons. First, it provides converging evidence of the validity of the estimates, which is important because of the large estimator variance that must exist in the tail. Second, it suggests that a very general property of the human information-processing system is responsible for the decrease in response rate with time. Likelihood ratios The final distributional analysis of stochastic dominance is the likelihood-ratio test given by Equation 3. To perform this test, Ashby, Tein, and Balakrishnan (1991) suggest plotting the latency ROC curve, i.e., 1-Fk-l(t) against 1F~(t). A well-known result from signal-detection theory states that the likelihood ratio is nondecreasing if and only if this ROC curve is convex (Laming, 1973; Peterson, Birdsall, & Fox, 1954). Because there are too many data to 86 a 0.019 0,018 , 0,018 -I 1,2 0,017 3 0.016 2 0,017 -4 Subi 2 0.018 -I 0.015 0.014 4 0.015 -I SuN 1 0.014 -t 0.010 0.013 -4 0.012 0.012 "4 0.011 0.011 -I 0,01 D.01 -~ 0.009 ,., (1O09 -I 0.008 O,OO8 H 0,007 0.007 -I 0.00,5 ( l o 0 6 -I 0,OO5 0,005 (~004 (1O04 H 0.O03 0.003 -I 0.002 0,002 -I H 0.001 H O.O01 0 i 0.0 ' 0.5 t 017 i 09. t i 1.1 ~ I 1,3 J t 0.3 1,5 0.5 0,9 t t 1.1 i t E 1,0 1.5 Response Time ($ecs) Response Time (Sees) C 0.7 0.04 " 0,024 2 0,022 0.035 " $ubj 0 r t - ~ _ _ 0.02 Subj 4 1 0.018 0.00 " 0,016 0.025 " 0.014 N I 0.02 " 0.012 0.01 4 0.015 " 0.008 4 5 0.006 0.01 - 0.004 0.002 i 0.3 J i 0.5 6 0,005 - i ~ 0.7 J 0.9 E t 1.1 i r 0 i 1.3 1.5 0.3 0,5 0.7 0.9 1.1 1.3 Response Time (Secs) Response ~irne ($ecs) Fig. 4. Hazard-rateestimates with randomsmoothing (Miller & Singpurwalla, 1977). present in figures, the following is a summary of the basic results. In many instances, the evidence for dominance is quite strong. Violations once again are frequent at numerosities greater than 6 (as should be expected, given the previous analyses). Also, for Subjects 1 - 3 there is a consistent tendency for the extreme right tail of these functions (i. e., the very slow responses) to become linear with slope 1; that is, there is little difference in density for very long RTs both within and beyond the subitizing range. Similar violations occur at the lower 5th- 10th percentiles for S ubj ects 2 and 3; otherwise the effect is limited to the fight tail. Note that very slow RTs are typically excluded from RT analyses, since they are likely to result from less-than-perfect measurement conditions and lack of perfect attentiveness on the part of subjects. Overall, the data suggest a very strong stochastic dominance up to the highest level. Summary There are small but frequent violations of stochastic dominance in the extreme right tail of the RT distributions. Apart from this the data are consistent with the view that the amount of mental effort (as measured by processing time) that is necessary to complete the task increases with each increase in the number of display elements up to at least 6. No substantial evidence is obtained for a unique process that operates up to 3 or 4 elements. Although it still may be possible to account for the lack of a discontinuity in the RT data by assuming that the limit of the subitizing function is random between trials, the evidence as it stands does not favor this view over a single-process model for displays up to 6. Testing exponential pure insertion Since the results in Figure 3 provide fairly strong evidence for an ordering at the level of cumulative distributions, a test of exponential pure insertion is warranted. The first prediction to be examined is therefore the expected mean and variance of the inserted component (see Equation 5 above). These data are given in Table 2, for each subject by display numerosity. In several cases the predictions are quite accurate, and there is general agreement between the strength of the ordering in Figure 3 and the Equation 5 prediction. Further, for at least 3 of the 4 subjects there 1.5 87 a 0,009 0.006 1 2 0,008 t 0.005 1 0.007 3 0,006 0.004 4 0.005 - 2~ 0.003 -g 0.002 5 0.001 0 I i i O.2 i i , i i 0.009 0.011 i i 0.8 Response C i . Time i t 1 i i 1.2 1.4 (Sees) - 1 0.01 - 0.008 0.009 - 3 0.007 0.008 0,006 ~' { 0.007 - 0.005 0.006 - 0.004 0.005 - ¢ 0.004 - 0.00.3 0.003 - 0.002 0.002 - 0.(3173 0.001 0 ° ' 0 i2 ' 0 i4 ' 0,6 ' Response ' 0 i8 Time ($ecs) . . 1 . . .1.2 " 1.4 ' o12 i 2 i 0,4 i i 0,6 Response i 0,8 Time i i 1 i ~ 1,2 h I 1.4 (Sees) Fig. 5. Adaptive smoothing estimates of the RT-densityfunctions by subject and display size. Table 2. Test of Equation 5 predictions for mean response time (MRT) and variance (Var) as they increase with numerosity.Data were censored at _+2.5 SDs. *** indicates a decrease in estimated variance (i.e.. the model fails), and times are in ms. Group 1 (Accuracy Instructions) Subject 1 MRTk-MRT~I -1 40 129 296 239 53 -379 Subject3 84 1 97 96 203 39 75 Subject 2 (Vark-Varbl)l/2 25 23 150 268 413 427 *** MRTk-MRTM 53 47 70 114 157 -82 -t08 (Vark-Va~-t)l/2 20 49 67 119 166 *** *** Group 2(Speedinstructions) SuNect4 *** 39 78 105 176 172 222 59 32 100 213 151 122 61 37 *** 59 156 173 189 *** appears to be no systematic pattern of violation of the predictions. Subject 4's data are unique in that the predicted values are consistently below the actual values within the subitizing range. The next step is to plot the RT-density functions, and to determine whether these are unimodal and satisfy the condition thatf~_l(t) =fk(t) at the mode offk(t). Figure 5 shows the estimated functions for each of the 4 subjects, for numerosities 1 - 6 (higher numerosities would confuse the figures somewhat and are unnecessary for the purposes of this preliminary test). The estimates were obtained by the method of "adaptive kernels," one of the most efficient methods known (Silverman, 1986; see Appendix B). Obviously, the density estimates become increasingly multimodal as the smoothing is decreased or the sample variance is increased (i. e., increasing numerosity). Even so there does not appear to be any consistent or robust evidence for bimodal density functions near 4 elements. Rather, the functions appear to be unimodal and skewed to the right, much as they are k n o w n to be in other choice-RT experiments (Hockley, 1984; Luce, 1986; Ratcliff, 1978). Hence there is no evidence of a bimodality that could provide a compelling argument for a shift in numerical processes. 88 a 045 , 03 • 04 rim: • &85 m~ + • • • o25 •• • 01 ; mm Ol. o15 : • mira i 0°2 o~ 0"2 ' ' ' ' ' -oo2° F - , , , , 0.350.3°"4) • ,, • = • "~ . i.I_~ • l n,~'] q 03 -4 o., • • . 0i02~ 0 • • • • • ~ • &. • • i1,~- N_.i o15 - • • • • J • mm mm • mim •• $ in • • • m • •m•mmmm•i • m_ o.,- -- Om 0O5~ , Of& ~ Om~ f4(t) Fig. 6. Scatterplots offk(t) versus • • 02 • • 0,~2 • ~"e 0.25- •~ 'in . , • ~ • 0,151 -0.05 / • • 02 --== • • • $ . . . . • 1 , d 0.45 • • 025 , f3 (t) • ~.. v m % mm f2 (t) C m m • oo, " ' i • 02 0"08 006 ~ • "' ! • o.,4 • o 1/__~ ~ m ~ ') • Ol I ooo : = • o2 1 • o.24 o.~'2 • 03 i• 025 mm Fk-t(t) - F~(t) 43.05 '' ' , 0'~1 , Om~2 ' Om~3 , O'& f5(t) for Subject 2. In many cases the density estimates do appear to intersect close to a mode, thereby justifying the more rigorous Equation 4 test. Note that, from the point of view of probability theory, the RT distribution of a mixture of processes is not likely to have an exponential distribution, whereas that of a single process can very naturally assume this form. Further, Ashby and Townsend (1980) found fairly strong support for exponential pure insertion with each increase in memory-set size in a memory-scanning experiment. If subitizing is replaced by a combination of subitizing and other operations at some limiting numerosity, then the exponential-insertion model might be expected to hold within the subitizing range, but not at its limits or beyond. The fit of the model will also be examined for the cases in which the preliminary tests suggest a strong violation, since this provides a useful standard for comparison. Distribution-ratio test Figure 6 shows the results of plottingf~(t) against [F~_](t) - Fk(t)] for one subject (Subject 2). Table 3 gives the regression coefficients and significance values for the intercept. The density estimates are obtained with h = 25 (see Appendix B), and the cumulative distributions are estimated as above. The results in Table 3 may be summarized as follows. First, the model does poorly for Subject 1 at all numerosities (where the intercept is not significantly different from zero, the regression fit is relatively poor compared to other subjects), but this contrasts with very good fits overall for the other 3 subjects. Note that Subject 1 was significantly more accurate, and also showed considerably larger RT variance (see above). Second, the model is soundly rejected in all cases for numerosities 7 - 8 . Third, in 3 of 4 cases, the model is also rejected for numerosities 6 - 7 . Finally, the increase in MRT is well predicted by the slope of the regression lines for numerosities up to 6 (compare also the variances in Table 2). Thus, for numerosities up to 6, the model receives fairly good support. The first panel of Figure 6 represents one condition in which preliminary tests indicated that the model should fail. The bowed shape of the plot indicates a significant linear trend (and possibly higher-order terms as well) in the 89 Table 3. Linear-regression analysis on Fk l(t)-Fk(0 =0~fk(t)+13, by instruction group and stage. The empirical increase in MRT is denoted by A g, cz is the slope of the regression line, [3is the intercept, and R is the multiple-regression value. Group 1 (Accuracy) Subject 1 Stage 1-2 2-3 3-4 4-5 5-6 6-7 7-8 Ag 5 31 163 288 258 64 -350 2 ~ -4 43 126 308 156 -93 -63 ~ .001 -.001" .013" -.023* -.019" .011' -.063* R .637 .990 .974 .914 .858 .741 .532 Ag 49 50 70 125 163 -83 -121 ~ 48 49 69 I19 166 -57 -72 13 .001 -.001 .002 -.002 .00t -.014" -.033" R .907 .993 .943 .979 .942 .712 .784 c~ 57 29 96 208 149 131 -71 13 .001 .001 .004 .011 -.008 -.013 .015" R .945 .950 .931 .880 .928 .943 .895 Group 2 (Speed instructions) Subject 3 Stage 1-2 2-3 3-4 4-5 5 -6 6-7 7-8 Ag 79 7 92 105 205 63 56 4 o~ 68 -4 98 106 199 4 13 13 .007 .004* -.001 -.004 .001 .017" .023* R .816 .560 .963 .981 .962 .091 .162 Ag 59 31 103 231 155 130 -71 Note: The slope of the regression line, c~, should predict the MRT increase (Ag) for successive numerosities. Significant intercepts, p <.05, are violations of the model and are indicated by an asterisk. Each of the regression-line slopes was significant, p <.01. ratio of Equation 4. A similar, but less pronounced, effect is detectable in several o f the other cases as well, even though the data are well approximated by a linear function. At a sufficiently detailed level of analysis, then, the model may be rejected. However, the fact that the model performs extremely well in some cases (e. g., for Subject 2) suggests that the correct model m a y include the exponential model as an important special case. Summary The exponential pure-insertion model provides a g o o d first approximation to the data, but nevertheless can be rejected under close scrutiny. The correct description m a y be a similar, but slightly more general, model, in which the assumptions o f pure insertion hold, but the inserted processing-time distribution perhaps comes from another m e m b e r of the exponential family. Once again there is no indication of a statistical discontinuity pointing to a change in numerical process prior to 6 elements. The density functions appear unimodal and skewed to the right, and the fit of the exponential pure-insertion model is good both within and b e y o n d the putative subitizing range. Conclusions Various distributional tests were used to examine whether statistical measures of enumeration time can serve as a benchmark for the existence and limits o f a unique numerical process in humans. N o such measure was discovered. Instead, the distributional analyses suggest a continuous increase in the mental effort required to enumerate for displays up to 6 elements. Although the absence of a statistically robust discontinuity might be explained by some variation in the capacity of the proposed mechanism, such arguments will not be compelling without some form of empirical support. There are currently two alternative accounts of the dramatic change in enumeration time as numerosity increases that do not appeal to a specialized numerical process. One of these proposes that subjects are able to recognize canonical geometric patterns existing in two-dimensional displays o f small numerosity (Mandler & Shebo, 1982; Neisser, 1967). In the experiments reported above, however, the elements were always presented in a one-dimensional, linear array, yet subjects were fast and accurate at the smallest numerosities. It is not obvious how a patternrecognition process could explain these results. The alternative theory is that subitizing is a reflection of the limited capacity of visual attention, which will manifest itself in almost any simple cognitive task. In several respects the RT data would be more consistent with this point of view. For example, the distributional analyses suggested that a single-process model would be capable of predicting the enumeration data for up to about 6 elements. This value is within the range of Miller's (1956) classical estimate of attentional capacity. The fact that subitizing is usually considered to be limited to only 4 elements can be explained by the choice of statistical measure. That is, Miller's estimates are based on 50% accuracy of responses, whereas subitizing is described as fast and very accurate enumeration. W h e n extremely high accuracy (e. g., 99% correct) is used as the criterion for other tasks, then about 3 or 4 independent items is the limit (Broadbent, 1975). To conclude, then, it appears that there is very little evidence that subitizing is a unique numerical ability, and ample evidence that it is not. Acknowledgements.This research was supported in part by National Science Foundation Grant BNS88-19 403 to the second author. The first author was also supported by grants MH44 640 to Roger Ratcliff and AFOSR 90-0246 (jointly funded by NSF) to Gail McCoon. Appendix A The random smoothing method of Miller and Singpurwalla (1977) is an "adaptive smoothing" algorithm in which the time interval of the estimated hazard rate is adjusted in order to maintain a fixed sample size within the smoothing window. For each estimate, the smoothing constant was set to 100, for purposes of figure legibility; the results were fundamentally the same with a constant as small as 3, then with the application of a 10-ms Hamming window (in essence, a running mean within a window) to these estimates. 90 Appendix B The adaptive-kernels estimate of the density function is obtained as follows. Let {Xi} be the ordered set of RTs for a given stimulus numerosity, and letf(Xi) be a "pilot estimate" of the density function at time Xi. Local bandwidth factors, )~j,are computed using where g is the geometric mean of thef(Xi), and o~is a sensitivity parameter set to 0.5. The adaptive kernel estimate off,(t) is then given by 1 n i=L h i where K is a kernel function and h is the smoothing-window size. The function chosen for K was the Epanechnikov kernel, K (t): U ~ t , -45 _<t_< 4~ L 0, otherwise The pilot estimate was obtained using the method of Parzen (1960) with a small Gaussian kernel. The density estimates in Figure 5 were then obtained with h = 25. References Ashby, F. G. (1982). Testing the assumptions of exponential, additive reaction time models. Memory & Cognition, 10, 125-134. Ashby, F. G., & Townsend, J. T. (1980). Decomposing the reaction time distribution: Pure insertion and selective influence revisited. Journal of Mathematical Psychology, 21, 93 - 123. Ashby, F. G., Tein, J., & Balakrishnan, J. D. (1991). Response time distributions in memory scanning. Submitted. Balakrishnan, J. D., & Ashby, F. G. (1991). Is subitizing a unique numerical ability? Perception & Psychophysics, 50, 555-564. Broadbent, D. E. (1975). The magic number seven after fifteen years. In A. Wilkes & A. Kennedy (Eds.), Studies in long term memory. New York: Wiley. Burbeck, S. L., & Luce, R. D. (1982). Evidence from auditory simple reaction times for both change and level detectors. Perception & Psychophysics, 32, 117-133. Cattell, J. M. (1886). Ueber die Trfigheit der Netzhaut und des Sehzentrums. Philosophische Studien, 3, 94-127. Chi, M. T. H., & Klahr, D. (1975). Span and rate of apprehension in children and adults. Journal of Experimental Child Psychology, 19, 434 -439. Colonius, H. (1990). Possibly dependent probability summation of reaction time. Journal of Mathematical Psychology, 34, 253 - 275. Hockley, W. E. (1984). Analysis of response time distributions in the study of cognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 598 - 615. Kaufman, E. L., Lord, M. W., Reese, T. W., & Volkmann, J. (1949). The discrimination of visual number. American Journal of Psychology, 62, 498-525. Klahr, D., & Wallace, J. G. (1976). Cognitive development: An infonnationprocessing view. Hillsdale, NJ: Erlbaum. Krueger, L. E. (1984). Perceived numerosity: A comparison of magnitude production, magnitude estimation, and discrimination judgements. Perception & Psychophysics, 35, 536-542. Laming, D. (1973). Mathematicalpsychology. New York: Academic Press. Luce, R. D. (1986). Response times: Their role in inferring elementary mental organization. New York: Oxford University Press. Mandler, G., & Shebo, B. J. (1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111, 1 - 2 2 . Miller, D. R., & Singpurwalla, N. D. (1977). Failure rate estimation using random smoothing. National Technical Information Service, No. AD-A040 999/5ST. Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capacity to process information. Psychological Review, 63, 81-97. Neisser, U. (1967). Cognitive psychology. Englewood Cliffs, NJ: Prentice-Hall. van Oeffelen, M. P., & Vos, P. G. (1982a). Configurational effects on the enumeration of dots: Counting by groups. Memory & Cognition, 10, 396-404. van Oeffelen, M. P., & Vos, P. G. (1982b). A probabilistic model for the discrimination of visual number. Perception & Psychophysics, 32, 163 - 170. Oyama, T., Kikuchi, T., & Ichihara, S. (1981). Span of attention, backward masking, and reaction time. Perception & Psychophysics, 29, 106-112. Parzen, E. (1960). Modern probability theory and its applications. New York: Wiley. Peterson, W. W., Birdsall, T. G., & Fox, W. C. (1954). The theory of signal detectability. Trans. IRE Professional Group on InJbrmation Theory, PGIT-4, 171-212. Ratcliff, R. (1978). A theory of memory retrieval. PsychologicalReview, 85, 59-108. Ratcliff, R. (1988). A note on mimicking additive reaction time models. Journal of Mathematical Psychology, 32, 192- 204. Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall. Simons, D., & Langheinrich, D. (1982). What is magic about the magical number four? Psychological Research, 44,283 - 294. Starkey, P., & Cooper, R. G. (1980). Perception of number by human infants. Science, 210, 1033-1035. Starkey, P., Spelke, E. S., & Gelman, R. (1983). Detection ofintermodal numerical correspondences by human infants. Science, 222 179-181. Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders' method. In W. G. Koster (Ed.), Attention and performance (Vol. 2). Amsterdam: North Holland. Townsend, J. T. (1988). The truth and consequences of ordinal differences in statistical distributions: Toward a theory of hierarchical inference. Psychological Bulletin, 108, 551 - 567. Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. Cambridge: Cambridge University Press. Von Szefiski, V. (1924). Relation between the quantity perceived and the time of perception. Journal of Experimental Psychology, 7, 135-147. Woodworth, R. S., & S chlosberg, H. (1954). Experimental psychology. New York: Holt. Wundt, W. (1896). Grundrig der Psychologic. Leipzig: Engelmann.
© Copyright 2024 Paperzz