COGNITIVE PSYCHOLOGY 23, 393419 (1991) Priming Contour-Deleted images: intermediate Representations Object Recognition IRVING BIEDERMAN AND ERIC University of Minnesota Evidence in Visual for E. COOPER The speed and accuracy of perceptual recognition of a briefly presented picture of an object is facilitated by its prior presentation. Picture priming tasks were used to assess whether the facilitation is a function of the repetition of: (a) the object’s image features (viz., vertices and edges), (b) the object model (e.g., that it is a grand piano), or(c) a representation intermediate between (a) and (b) consisting of convex or singly concave components of the object, roughly corresponding to the object’s parts. Subjects viewed pictures with half their contour removed by deleting either (a) every other image feature from each part, or (b) half the components. On a second (primed) block of trials, subjects saw: (a) the identical image that they viewed on the first block, (b) the complement which had the missing contours, or (c) a same name-different exemplar of the object class (e.g., a grand piano when an upright piano had been shown on the lirst block). With deletion of reatures, speed and accuracy of naming identical and complementary images were equivalent, indicating that none of the priming could be attributed to the features actually present in the image. Performance with both types of image enjoyed an advantage over that with the different exemplars, establishing that the priming was visual, rather than verbal or conceptual. With deletion of the components, performance with identical images was much better than that with their complements. The latter were equivalent to the different exemplars, indicating that all the visual priming of an image of an object is through the activation of a representation of its components in specified relations. In terms of a recent neural net implementation of object recognition (Hummel 8t Biederman, in press), the results suggest that the locus of object priming may be at changes in the weight matrix for a geon assembly layer, where units have self-organized to represent combinations of convex or singly concave components (or geons) and their attributes (e.g., aspect ratio, orientation, and relations with other geons such as TOP-OF). The results of these experiments provide evidence for the psychological reality of intermediate representations in real-time visual object recognition. 8 1991 Academic Press. Inc. This research was supported by AFOSR Research Grant 88-0231 to LB. and an NSF Graduate Fellowship to E.E.C. We thank S. W. Kohlmeyer for his programming support and Nancy Kanwisher, Stephen E. Palmer, and an anonymous reviewer for their helpful comments. Correspondence and reprint requests should be addressed to Irving Biederman, Department of Psychology, University of Minnesota, Elliott Hall, 75 East River Road, Minneapolis, MN 55455, or [email protected]. 393 OOlO-0285/91 $7.50 Copyright 0 1991 by Academic Press, Inc. All rights of reproduction in any foml reserved. 394 BIEDERMAN AND COOPER Although we can generally identify pictures of objects at a glance, the speed and accuracy of such perception is greatly facilitated by a prior viewing of that same picture (e.g., Bat-tram, 1974). What is the representation in memory that mediates this benefit or repetition? The present investigation measured the contribution of: (a) image features, viz., lines and vertices, (b) convex or singly concave components roughly corresponding to the object’s parts, and (c) object models, e.g., that the object was a grand piano, in such a repetition priming task. Surprisingly, all the effects of visual priming could be attributed to activation of the object’s components and none to the features present in the image or the object model. for objects that can be identified through their simple line drawings, a number of authors have argued for a representation that consists of an arrangement of simple parts (e.g., Palmer, 1975, 1977; Guzman, 1971; Mat-r & Nishihara, 1978; Brooks, 1981; Tversky & Hemenway, 1984; Biederman, 1987a). In Biederman’s Recognition-by-Components (RBC) theory, for example, the parts are simple volumetric primitives, termed geons, that can be determined from a general viewpoint and are robust to occlusion from noise. An image of a complex object is decomposed into convex or singly concave regions and the edges and vertices in this region activate the closest fitting geon. According to the theory, the events leading to recognition are: (a) the representations for the particular image features that are in view, viz., the edges and vertices, are activated; (b) these in turn activate the geons and their relations; and then (c) an arrangement of geons activates an object model, which is a representation of the complete object, large regions, or different views of it. There is little dispute that the first and last stages would be required for object recognition. Although advantages of a part-based representation have been argued on computational and perceptual grounds (Biederman, 1987a),’ there has been no direct evidence for their role in real-time object recognition. Indeed, several recent proposals for object recognition, albeit by machine rather than human, posit direct activation of a representation of a potential object model from image features, without any reference to the decomposition of an image into components (Lowe, 1987; Ullman, 1989). This is achieved, top-down, by testing an object model initially suggested by a few image features. Our investigation as to whether object priming was mediated by image * The computational advantage includes a capacity to represent a vast number of objects, including novel instances, with a modest number of intermediate primitives and relations. The perceptual advantage includes viewpoint invariance and robustness to noise on the basis of qualitative discriminations. Despite these theoretical advantages, evidence for the psychological reality of intermediate representations in visual pattern recognition has not been previously documented. PRIMING CONTOUR-DELETED IMAGES 395 features, components, or models assumed that the shape descriptors (features or components) remained in their original relations. The actual issue under examination, therefore, was not whether features or components were responsible for the priming but rather whether features in their specific relations or the components in their specified relations mediated priming. The theoretical implications of this distinction are considered in the discussion. We report two experiments which tested the role of simple convex or singly concave components in object priming.* In both experiments, subjects named a series of briefly presented, 50% contour-deleted pictures of objects in two blocks of trials. The same names were appropriate for the pictures in both blocks. We assume that for a picture priming task, the priming can be regarded as comprising two components, one visual and the other nonvisual. The second block of trials of both experiments included same name-different exemplar instances of the classes in the first block of trials. We assume that the advantage in performance of these pictures over the first block of trials reflected nonvisual priming such as would be associated with the name, e.g., “piano,” general concept (of a PIANO) of the class, as well as any nonspecific transfer, though the latter should not, strictly speaking, be regarded as priming. In both experiments, repetition of the identical image resulted in considerably more priming than that from same name-different exemplar trials. This difference, we assume, represented visual priming and our interest was in the description of the representation that was accounting for the advantage. Both experiments investigated the visual component of priming by comparing the magnitude of priming of images that were identical to those seen on the first block with their complements, which contained the remaining half of the contour. In Experiment I, the images were prepared by deleting every other edge and vertex from each component. Because of the redundancy of the features for component activation (Biederman, 1987a), the components (and the object) remained identifiable. Experiment I thus provided a test of whether the visual priming was based on a reinstatement of the original line segments and vertices or on the activation of a more global intermediate representation, such as a geon, which could be activated from either image. We found that the identical and complementary conditions were equivalent, indicating that none of the priming could be attributed to activation of the original vertices and edges that were present in the image. * We use the term “components” to refer to simple convex or singly concave parts of an object rather than “geons” because the experiments did not provide a test of the particular components assumed by Biederman (1987a) (although we know of no other proposals for intermediate representations that might motivate the contour deletion operations in Experiment I or account for these results). 396 BIEDERMAN AND COOPER Experiment II provided a test of the degree to which the priming observed in Experiment I could be attributed to a specific high-level object model, for example, a grand piano, rather than to a representation of the components. This was done by comparing the priming of identical and complementary images where the members of a complementary pair each had approximately half of its (intact) components. Equivalence between identical and complementary images would indicate that all the visual priming observed in Experiment I could be attributed to activation of an object model and none to the components. Equivalence between the complementary condition and the same name-different exemplar condition would indicate that all of the visual priming in these experiments (taken together) could be attributed to activation of components and none to higher-level models. EXPERIMENT I The fundamental comparison in this experiment was between the magnitude of identical and complementary image priming when the complements each contained every other vertex and edge of each component. Presumably, either member of a complementary pair would activate the same components, although some of the components might be ambiguous or unidentifiable in both members. If the representation responsible for priming is the actual features present in the image, it might be expected that the identical condition would show more priming than the complements. That such a result might be plausible is suggested by Jacoby, Baker, and Brooks (1989), who argued that visual priming was a function of the reinstatement of the conditions of original stimulation. Without a theory of “conditions” or a criterion of when conditions are sufficiently identical to qualify as a “reinstatement” it is difficult to evaluate such a claim rigorously. A strong form of it would be contrary to a theory of stimulus representation, such as RBC, that assumes that different images under different conditions can activate a common representation. Method Unless otherwise noted, the method employed in Experiment I was used for all the other experiments. Subjecrs. The subjects were 64 native English speakers with normal or correctedto-normal vision. They participated for payment (Wsession) or for research experience points for the Introductory Psychology course. Stimuli. Pictures of 48 common objects, each with a readily available basic-level name, were drawn in Cricket Draw. The 48 objects were composed of 24 pairs of objects that had the same basic level name but different part compositions, such as a grand piano and an upright piano. Three examples are shown in Fig. 1. For one class, ELEPHANT (see Fig. 3), the two versions were shown in different poses. An effort was made to have the two exemplars of a class appear as dissimilar as possible, subject to the constraint that both versions be readily identifiable. However, the readily identifiable criterion meant that, on average, there was generally greater similarity between the pairs of a class than PRIMING Complementary Image 1 CONTOUR-DELETED Complementary IMAGES Image 2 397 Same Name, Different Exemplar FIG. 1. Three of the object classes used in Experiment I on complementary feature priming. Left and middle columns: Examples of contour-deleted complementary images. From each component for each image, alternate vertices and edges have been removed so that each component had 50% of the contour of the original. When superimposed, the members of a complementary pair would make an intact figure with no overlap in contour. Assuming that the image in the left column was originally shown on the first (priming) block, the figure in the middle column would be an instance of complementary priming and the right figure would be a different exemplar (same name) control. Additional examples of contourdeleted images are shown in Fig. 9 (Appendix C). between classes (a difference that would work againsr the conclusions of this experiment). (The within-class similarity effects were probably modest. In both experiments, the Different Exemplar conditions had RTs and error rates markedly higher than those of the Identical conditions.) 398 BIEDERMAN AND COOPER Two complementary versions of each of the 48 pictures were prepared by deleting every other edge and vertex from each part of the 48 object pictures as shown in the left and middle columns of Fig. 1. When edges were deleted, a small portion of the edge was retained to define the adjacent vertex. Because deleting the long sides of a large component seriously impaired the identifiability of that object at brief durations, long edges (equal to or greater than 1.65”) were considered as two separate edges. (For some objects, unless the long edges were split into two, it would have been impossible for each version to each have 50% of the contour). The detailed set of rules for creating these complementary images is described in Appendix A. Each version contained approximately 50% of the contour from each component. Each image thus contained half the contour from the original intact object. The two versions, when superimposed, formed an intact picture without any overlap in contour. The contourdeletion procedure typically allowed each component and, consequently, each of the objects (with a sufficiently long exposure duration) to be identified. When a component could not be completely identified in one image, e.g., a region with curved sides that might represent a component with a cross section that expanded and contracted or else a component with a curved axis, it was similarly ambiguous in the other image. Each picture was shown on a high-resolution (1024 x 768) monitor (Mitsubishi Model HL6605) controlled by a Macintosh II. The maximum extent of each image could be contained in a square whose sides subtended a visual angle of 5.6”. Procedure. The pictures were presented in two blocks of trials, a first priming block and a second primed block. Approximately 7 min intervened, on average, between priming and corresponding primed trials for a given class. On both blocks, the subject named with the basic-level term, e.g., “piano,” each picture as it was shown. To decrease the likelihood that the subjects would use other names for the stimuli, prior to the presentation of the experimental stimuli, subjects read the names of the objects from their terminal. They were told (correctly) that these were the names of the objects that they were to see in the experiment. (In three experiments in our laboratory we have never found this aspect of the procedure, which is designed to reduce naming variability, so that subjects say “car” and not “auto” or “automobile,” for example, to interact with any stimulus variable or even to reduce RTs.) On the first block the pictures were shown for 500 ms and on the second block they were shown for 200 ms. In both cases they were followed by a random appearing arrangement of lines which served as a mask. The naming RTs were recorded through a Lafayette voice key. Reaction time and accuracy feedback were displayed after each trial. The subject pressed a mouse button to start each trial. A fixation dot would then be presented for 500 ms followed by the object picture, which was, in turn, followed by a 500-ms mask. Design and analysis. On the first block of trials, subjects viewed one member of each of the 24 basic-level pairs. On the second block, for each object viewed on the first block, subjects would see either the identical image that had been shown on the first block, its complement with the remaining edges and vertices, or a different exemplar with the same name. For each trial type, half the images were in an original orientation and the other half were in mirror-image reversed orientation. The sequences of images were balanced across subjects so that the mean serial position of every image in every condition was the same, with ah members of a pair of objects and the complements of each member serving equally often as priming and primed stimuli. An analysis of variance design for the data from the second block was constructed by defining one fixed factor of Condition (Identical, Complement, or Different Exemplar) and two random factors, Subjects and Groups. The Groups factor had two subjects nested within each of 32 groups. These two subjects saw exactly the same objects in the same conditions (one in forward order, the other in reverse order). Variance between Groups included PRIMING CONTOUR-DELETED 399 IMAGES variations in the difficulty of particular stimuli in particular conditions and thus served the goal of accounting for variance due to particular objects being in particular conditions. Results and Discussion The results are shown in Fig. 2. Mean correct RTs and error rates were sharply lower (by 135 ms and 11.5%) on the second trial block than the first [t(63) = 5.97 and 4.02, for RTs and error rates, respectively; both ps < .OOl]. The actual benefit in block 2 from the prior viewing in block 1 is z g 900 075 E 650 ; Non-visual Priming E 625 600 p 775 g 750 C 725 g 700 z K 675 650 2 625 Q, 600 r 1st Block Identical Complement Diff. Exemplar Condition 1 St Block Identical Complement Diff. Exemplar Condition FIG. 2. Mean correct naming reaction times (RTs) and error rates for Experiment I. The second block data are for those trials where the object was correctly named on the first block. (Inclusion of those trials where the first block was in error did not alter the pattern of the results, although it did increase variability.) Asterisks on a second block bar indicate that the marked condition differed significantly from the other second block conditions (** = p < .Ol; *** = p < .OOl by Least Significant Difference Test). 400 BIEDERMAN AND COOPER underestimated in that the exposure durations in block 2 were only 200 ms, compared to the 500-ms exposure durations used for block 1. On the second trial block, naming RTs and error rates for the Different Exemplar condition were lower than the RTs and error rates on the first trial block-an effect that would represent nonvisual benefits of generalized practice from the first to the second block, name priming, and concept priming, e.g., that something is a piano. (Appendix B describes an experiment in which it is demonstrated that a significant component to the nonvisual priming was indeed name-concept priming, rather than general practice.) The three second-block conditions differed significantly: F(2,62) = 8.76, p < .OOl, for RTs, and F(2,62) = 18.92, p < .OOl, for errors. The advantage of the Identical and Complementary conditions over the Different Exemplar conditions represented visual priming. These differences were significant by Least Significant Difference (LSD) Test (p < .Ol for RTs and p < .OOl for errors). (The experiment was sufficiently sensitive that a 40-ms difference in RTs and a 4.3 1% difference in error rates would have been significant at OL= .05 by the LSD test for the data from the second block.) Most important, the Identical and Complementary image conditions were equivalent, indicating that there was 120priming of the specific edges and vertices that were present in the image. This result indicates that priming was of a representation more global than that specifying the image features--either of the components composing an object model or the specific object model itself, e.g., a grand piano. Possible alternative explanations for these results are considered under General Discussion. Effects of reflection changes. Two other experiments with pictures similar to the intact versions of these stimuli (Biederman & Cooper, in press, a) and informal observations established that subjects were readily aware of when an object was presented in mirror-image reflection. Nonetheless the mirror-reversed images in Experiment I enjoyed substantial priming. Indeed, reflection had no effect on naming RTs (but there was a modest effect on errors): Mean RTs (and error rates) for original orientation and mirror reversed stimuli averaged 741 ms (8.3%) and 740 ms (13.4%), respectively, t(63) <l.OO for RTs and = 1.22 for errors, both ps > .20. It would seem, therefore, that objects are represented as an arrangement of components that are, to a large extent, mirror-image invariant. This dissociation between factors that may affect recognition memory (for left-right orientation in the present case) and perceptual processing is a well-documented characteristic of repetition priming (Schacter, 1987; Jacoby & Dallas, 1981). However, the effects in the present experiment PRIMING CONTOUR-DELETED IMAGES 401 reverse the typical finding. Here a variable, mirror reflection, for which subjects have good recognition memory was shown not to influence priming.3 EXPERIMENT II The results of the first experiment eliminated image vertices and edges as the basis for visual priming of objects. This leaves two of the three possibilities described in the Introduction. Either the priming was of the components or of the specific object model (or some combination of the two). With respect to model priming, the longer RTs and higher error rates for the Different Exemplar condition compared to the Identical and Complementary conditions in Experiment I eliminated the possibility that priming was of the general concept PIANO or name “piano.” However, it could be the case that even though the same name was used for both exemplars of a class, the concept actually primed was more specific-that of a GRAND PIANO. A striking subjective aspect of the complementary images in Experiment I is that they require scrutiny before their differences become apparent, particularly if they are viewed alternately rather than simultaneously, side by side. If their equivalence were a consequence of having a familiar specific object model for that class, one might expect that the subjective equivalence would not be manifested if the complementary images were created from an unfamiliar object, such as the one shown in Fig. 3. Informal surveys suggest, and the reader may verify, that these images also exhibit the same subjective equivalence as that experienced with the familiar objects in Fig. 1. Unfamiliar objects cannot be named so the procedure used in Experiment I could not be run with stimuli of the type shown in Fig. 3 to test whether specific object models were underlying the equivalence of Identical and Complementary conditions in Experiment I. Experiment II was designed to provide a controlled test of the possibility of specific model priming. In this experiment, complementary pairs of images of object pictures were produced by deleting approximately half the components in each member to produce the kind of images shown in Fig. 4. Note that either the original or the complement would activate the same object model (e.g., a grand piano). As in the previous experiment, different exemplar-same name controls were included, as illustrated in the right column. 3 We have found that the dissociation between the priming of naming RTs and episodic recognition memory extends to translation and size, in addition to reflection (Biederman & Cooper, in press, a&b). Although explicit old-new shape judgments were affected by changes in position or size, naming RTs were not. 402 BIEDERMAN AND COOPER FIG. 3. An example of alternate feature deletion for a complementary pair of a nonsense objects. Subjectively, the appreciation of the differences between the two images, when viewed sequentially, requires scrutiny. This experiment assessed whether the visual priming observed in Experiment I could be attributed to the priming of specific object models. There are three plausible outcomes of this experiment. At one extreme would be a result in which the amount of priming with complementary images would be equivalent to that with the identical versions, with both superior to the Different Exemplar condition. Such an outcome would indicate that all of the priming observed in Experiment I could be attributed to specific object models and none to the specific components that were present in the image. At the other extreme, the amount of priming of Complementary and Different Exemplar conditions would be equivalent, with both showing less priming than the Identical condition. This latter result would indicate that all the visual priming in Experiment I could be attributed to the explicit presentation of the components. An intermediate result, with the Complementary condition showing less priming than the Identical condition but more than the Different Exemplar condition, would indicate that both the components and the object model contributed to the visual priming. Method The method, procedure, and design were the same as that used in Experiment I except that: (a) 32 subjects participated in the experiment, and (b) 32 object pictures (versus 48 in Experiment I) composed of 16 pairs of same name-different exemplar members were produced by deleting alternate components in the image to produce the kind of images shown in the left and middle columns of Fig. 4. Fewer objects were used in this experiment because we could only think of 12 classes with the required specifications: The objects had to have at least six parts and be of a basic-level class that had different shaped exemplars. To increase the number of classes we included four animals, elephant, bird, dog, and rabbit, where the different exemplars were generated by picturing different poses of the animal. As in Experiment I, the different exemplars within a class generally had greater similarity than PRIMING Complementary Image 1 CONTOUR-DELETED Complementary Image 2 IMAGES Same Name, Different Exemplar / FIG. 4. Three of the object classes used in Experiment II on complementary component priming. Left and middle columns: Examples of component-deleted complementary images. Each image had approximately 50% of the contour of the original intact picture. The two versions when superimposed would make an intact figure with no overlap of contour. Assuming that the image in the left column was originally shown on the first (priming) block, the figure in the middle column would be an instance of complementary priming and the right figure would be a different exemplar (same name) control. those between classes. The data from the animal classes were not noticeably distinguishable from those with manufactured objects. Results and Discussion The results are shown in Fig. 5. Considerable priming was apparent in that overall mean correct RTs and error rates were sharply lower (by 124 404 BIEDERMAN AND COOPER ms and 8.8%) on the second trial block than the first [t(31) = 5.72 and 2.91, for RTs and error rates, ps < .OOl and <.Ol, respectively]. The overall difference among the three block 2 conditions was significant: F(2,30) = 5.13, p < .02, for RTs and F(2,30) = 6.75, p < .Ol, for errors. Figure 5 shows that the Identical condition enjoyed an advantage over both the Complementary and Different Exemplar conditions (p < .05 for RTs and .Ol for errors by LSD Test), which were indistinguishable -g- 900 E CJl 075 v 850 a, E i= z 625 8oo 775 750 5 725 $f 700 g 675 ii 650 e firm 625 -_- 1st Block Identical Complement Diff. Exemplar Condition 24 22 1st Block Identical Complement Diff. Exemplar Condition 5. RTs and error rates for Experiment II. The second block data are for those trials where the object was correctly named on the first block. (Inclusion of those trials where the first block was in error did not alter the pattern of the results, although it did increase variability.) Asterisks on a second block bar indicate that the marked condition differs significantly from the second block conditions (* = p < .05; ** = p < .Ol by Least Significant Difference Test). FIG. PRIMING CONTOUR-DELETED IMAGES 405 from each other. (The experiment was sufficiently sensitive that a 55-ms difference in RTs and an 8.24% difference in error rates would have been significant at (Y = .05 by the LSD Test for the data from the second block.) The equivalence of the Complementary and the Different Exemplar conditions indicated that there was virtually no priming across complements. That is, only the explicit presentation of the convex components resulted in any priming. The equivalence of the Complementary and Different Exemplar conditions shows that none of the priming could be attributed to activation of the specific object model.4 As noted previously, the similarity of the different exemplars within an object class was generally greater than that between classes. Some of the advantage of the Different Exemplar condition compared to block 1 performance could, therefore, represent visual priming. Would this fraction (if it existed) be a consequence of specific model priming? Given that no specific model priming was evidenced in the equivalence of Complementary and Different Exemplar conditions in Experiment II, it would seem implausible that some of the advantage of the Different Exemplar condition over block 1 performance would be a consequence of specific model priming. GENERAL DISCUSSION Experiment I, in which alternate vertices and edges were deleted from each component, showed that complementary and identical primes resulted in equivalent visual facilitation. Experiment II, in which approximately half the components were deleted from each image, showed that complements produced no visual priming. The results of the two experiments, taken together, indicate that all the visual priming in object naming can be attributed to the explicit (actual) presentation of the components in the image. None of the priming can be attributed to explicit image 4 This experiment was run to determine if there was visual priming from the specific object models. Is it plausible that with these pictures of partial objects, subjects merely thought of them as collections of parts, say of a grand piano, without initially activating the concept of GRAND PIANO? We think not, in that the naming RTs and error rates in this experiment were approximately what they were in Experiment I. To the extent that the speed and accuracy of saying “piano” to either a feature- or component-deleted image of a piano is a measure of the activation of a model for that object, the two procedures were roughly equivalent in their activation of object models. Both the deletion of components and the deletion of features result in longer RTs and higher error rates at brief exposures compared to those of intact objects. At brief exposure durations, identification speed and accuracy for component-deleted objects are superior to those for midsegment feature-deleted objects (Biederman, 1987a). But at longer durations this effect reverses and feature-deleted objects enjoy an advantage. These effects can be modeled as a two-stage cascade reflecting first the activation of the components and then the activation of a representation of the object model (Biederman, 1987b, 1988). 406 BIEDERMAN AND COOPER features, the vertices and edges that were present in the image, or the specific object model. Possible Confounding Effects Acceptance of the conclusions stated in the previous paragraph requires evaluation of possible confounding factors produced by these novel stimulus manipulations. Three are considered: (a) low-level lillingin of contour in Experiment I to produce the equivalence in identical and complementary priming, (b) symmetry of small or thin components to also facilitate completion of components in Experiment I, and (c) greater effects on the global shape (GS) from the component deletion of the images in Experiment II compared to that of those in Experiment I that might have reduced the amount of complementary priming observed in Experiment II. Before considering these factors in detail, it should be noted that they would have to be completely effective in accounting for the pattern of results. In Experiment I, the low-level filling-in and the use of symmetry of small or thin components would have to account not just for some advantage of Complementary over Different Exemplar conditions, but for the equivalence of Identical and Complementary conditions also. The global shape factor for Experiment II would have to account not just for the advantage of the Identical condition over the Complementary condition, but for the equivalence of the Complementary and Different Exemplar conditions also. Such an account would also require that the sensitivity to global shape in Experiment II, which presumably eliminated all complementary priming, was not manifested in Experiment I, where there were slight differences in global shape. Low-levelfilling-in. In Experiment I, is it plausible that (all) the deleted contour controlling speeded recognition could have been restored through local processes of smooth continuation? Deferring for the moment the possible restoration of segments, it is important to note that half the vertices in each image were also deleted. No low-level process for contour restoration that has been posited can restore vertices. We can appreciate the problem if we consider two converging but truncated edges, as shown in Fig. 6a. The edges might be extended to form an L vertex, as shown in Fig. 6b, two types of T vertices, as shown in Figs. 6c and 6d, depending on whether the left or the right edge formed the stem of the T, or two L vertices, as shown in Fig. 6e. Given that the vertices could not be completed, is it likely that the segments were restored? A segment restoration process would be likely if the contour deletion had produced small gaps in the middle of segments. In the extreme, if every other pixel had been deleted no difference would have been expected between identical and complementary images (though theories that posit exact image restoration might predict some PRIMING b.Lvenex CONTOUR-DELETED c. T, vertex d. TZvertex 407 IMAGES e. Two L vertices FIG. 6. Four possible completions of an unconnected pair of segments. (a) Original segments. (b) Completion as an L vertex. (c) Completion as one T vertex. (d) Completion as another T vertex. (e) Completion as two L vertices. effect). But the contour deletions for the stimuli in this experiment were large relative to the extents that might plausibly be bridged according to current theories of smooth continuation. Two types of local segment continuation processes have been described by theorists, short range and long range. It would be unlikely for the deleted segments in the images of Experiment I to have been restored through a local, short-range process of smooth continuation, such as those described by Zucker and Davis (1988) (and Zucker, Dobbins, & Iverson (1989)) or Grossberg and Mingolla (1985). Zucker and Davis’ (1988) routine, for example, would not bridge gaps that are more than five times the width of the line. Grossberg and Mingolla’s (1985) theory would predict that for a constant amount of deletion, a distribution of the deletion among several small gaps should be less disruptive than concentration of the deletion in a single large gap. But Blickle (1989), using contourdeleted object pictures similar to those used in Experiment I, has shown that for images with, for example, 60% midsegment contour deletion, large gaps with 60% of the contour deleted have the same effect on recognition speed and accuracy as two gaps each with 30% deletion or three gaps each with 20% deletion. This equivalence broke down at very high deletion proportions, as described in the next paragraph. Would a local lung-range mechanism for smooth continuation restore these images? Explanations of local restoration have often been proposed to handle illusory contours (Kanizsa, 1979). However, there is good reason to believe that the images in Experiment I were not restored in that illusory contours are not seen when one views the stimuli. The most developed scheme for long-range grouping is Ullman and Sha’ashua’s (1988) saliency implementation. Neither their proposal nor any of the 408 BIEDERMAN AND COOPER others would restore vertices or whole segments. Evidence that the deletion of long segments is not (readily) restored derives from Blickle (1989), who showed that when 90% of a single long segment of a component was deleted, the disruption on object naming RTs and error rates was more severe than when 45% was removed from each of two different segments. That deletion of vertices and complete segments (as done in Experiment I) does result in more interference on recognition than that same amount of contour removed from the middle of each segment (which could be more readily restored through local processes) is shown in an experiment described in Appendix C. The midsegment kind of deletion would be restorable through a long-range mechanism of local smooth continuation such as that proposed by Ullman and Sha’ashua (1988). It is, of course, impossible to completely rule out a local continuation theory when continuation is regarded as being “able to fill in figures in ways not yet imagined by even the cleverest of modern theorists,” as conjectured by a reviewer of this work. The reader is invited however, using local proximity, smoothness, and symmetry constraints, to restore the Experiment I stimuli shown in Figs. 1 and 9. It is important to note that in evaluating the possible role of contour completion, illusory contours were not seen in these images. Our position is that “filling-in” need not have occurred. Instead, components could have been activated from the image information. Although these components can specify the deleted image features, we have no evidence that the missing features were actually generated, as either illusory or amodal contours (Kanizsa, 1979), when viewing the image in order to recognize it. Symmetry of small or thin components. Many of the objects had small or thin components such that the deletion of one side of the component could readily “suggest” the other side if not the whole component itself. For example, in Fig. 1, the deletion of half the support strut of the grand piano top might be equivalent to the other side or the complete component. Put another way, deleting 50% of the contour from such a component might not have produced any effect on the recognition of the whole object. But the recognition performance of the images from Experiment I was dramatically lower than that for their intact versions. In other experiments (e.g., Biederman, 1987a; Biederman & Ju, 1988) with a lOO-ms presentation duration, naming RTs of intact line drawings averaged approximately 700 ms with virtually no errors. In Experiment I, with a 500-ms exposure duration (live times the duration of the intact versions in the prior experiments), block 1 naming RTs averaged 882 ms with an error rate of 22.5%. If there was only a small (or completely absent) effect of contour deletion on symmetrical small or thin components, then recog- PRIMING CONTOUR-DELETED IMAGES 409 nition performance was being controlled by the larger components and thus the symmetry of the small or thin components cannot account for the results of Experiment I. Global shape. The deletion of components in Experiment II resulted in larger changes in GS than the deletion of the edges and vertices in Experiment I.’ Although GS could obviously be used when discriminating among a small set of items, e.g., a pencil and desk, we know of no evidence documenting GS as an independent factor in object recognition with a large and potentially unspecified set of images. Two studies from our laboratory suggest that, in fact, there is no effect of variations in GS on object recognition. In an experiment on the priming of object images rotated in depth, Biederman and Gerhardstein (1990, described in Biederman, Hilton, & Hummel, 1991) found no effect of rotation in depth to 135” (up to occlusion of parts), despite quite dramatic effects on GS. (See also Ellis & Allport, 1986.) In another experiment, Biederman (1987b) studied the effects of adding an inappropriate fourth component to a partial with only three of its components present on naming RTs. (At least six components were required for the object to appear complete). The addition of the component altered GS but no interference effect was found. The fourth component was large enough to affect recognition speed in that adding it as an appropriate component facilitated naming RTs. In Experiment I, there also was no effect of mirror-image reflection. If GS is to be interpreted to include left versus right orientation, then the absence of an effect of orientation also documents the lack of an effect of GS. We assessedthe effect of GS in Experiment II by calculating the aspect ratio of the smallest horizontally and vertically extended rectangle that could enclose the complementary images from that experiment. Figure 7 shows two pairs of component-deleted complements: the pair which had the largest difference in aspect ratio and the pair with the smallest difference in aspect ratio. If differences in aspect ratio had an effect on priming then there should have been a positive correlation between the differ5 Actually, it is easy to underestimate the effect of the deletions of edges and vertices on global shape in Experiment I and to overestimate it in the component-deletion procedure in Experiment II. For example, the complementary versions of the flashlight in the left and middle columns of Fig. 1 have somewhat different global shapes if the convex hull is drawn around the two versions. In some cases of component deletion, as with the upright piano in the right column in Fig. 4, the effects on global shape were quite modest. The subjective impression of global shape similarity for the stimuli in Experiment I may be due to the activation of the same components when one is looking at the two images. We have noted a similar subjective insensitivity to large differences in global shape when looking at objects that have been rotated in depth. 410 BIEDERMAN AND COOPER FIG. 7. Two complementary pairs of images from Experiment II showing the object (the wagon) whose complements differed most in aspect ratio and the object (the helicopter) whose complements differed least in aspect ratio. ences in aspect ratio of the complements and the magnitude of the difference between RTs for identical and complementary priming for each object. The value of this correlation was (nonsignificantly) in the opposite direction to what would be expected from an aspect ratio effect [r = - .259, t(30) = - 1.47, p > .lO, ns]. Although the differences in global shape of a complementary pair in Experiment I were small, differences were present nonetheless. In summary, none of the three possible confounding factors-low-level filling-in, symmetry of small and thin components, or GCwould appear to have any but a negligible effect on the pattern of results from these experiments. What Was Primed? As described in the Introduction, the comparison of whether priming was mediated by features or components in this investigation fixed the relations among the features or components. Consider a gedanken priming experiment in which the features or the components of an object prime were scrambled. We would expect little priming, if any, under such conditions (ignoring, for the moment, the finding in Experiment I that there was no priming of the features even when the relations among the features were intact). We would also expect little or no priming if a single feature or component was presented. What, then, are the theoretical implications PRIMING CONTOUR-DELETED IMAGES 411 of these considerations? We interpret the outcomes of Experiments I and II to indicate that what mediates priming is a component in a specified relationme In a neural net implementation of RBC, Hummel and Biederman (in press) have posited such a stage (layer) in their modeling of object recognition. The first layer of the model consists of units that respond to oriented and end stop cells at local regions of the image. As output, a unit that represents the components (geons in the model) of the object in their specified relations independent of where the image is presented in the visual field of the model, its size, or its orientation in depth (up to parts occlusion) is activated. The orientated edges activate units representing vertices, which, in turn, activate units for the geons themselves. A fundamental assumption of the implementation (and RBC) is that separate units are activated for geon identity (such as brick or cylinder) and geon relations (such as below or above). Separate representations avoid the forbidding combinatorics of having detectors for all possible combinations of geons and relations. In the neural net implementation, for a given object, a relation can be bound to a geon by synchrony of firing. For example, an object consisting of a cylinder on a brick might have the units representing brick tiring simultaneously with the units for below and the units for cylinder firing simultaneously with the units for above. (Still other units would code aspect ratio, orientation, relative size, etc. These units, too, would fire in synchrony with units for the geon whose attributes they represented.) These units serve as inputs to a geonfeature assembly (GFA) layer. GFA cells self organize to bind together in a single unit or units, a given geon, and its relations. Different units at this layer code different geons with their particular attributes. Objects are represented at the next layer by units that self-organize to respond to groups of units in the GFA layer. The evidence from the actual and gedanken experiments reported here supports a locus of priming, in terms of this neural net model, at the geon feature assembly layer. There may be quite general conclusions to draw from this result. Consider recognition theories, such as those for speech, word, or object perception, that posit initial sensory encoding mechanisms, a modest number of primitives, such as phonemes, letters, or components, and then a capacity to represent combinations of the primitives to yield classification into hundreds of thousands of possibilities. ’ An analogy with visual word priming would be that little visual priming would be expected if only a single letter or a prime consisting of the letters of the target but in scrambled order was presented. This would not necessarily mean that letters were not relevant for the representation of words, but rather that it is the letter in a given position (relation) that is critical. 412 BIEDERMAN AND COOPER Although there may be critical developmental stages, as has been demonstrated with phoneme discrimination, in general it may be advantageous not to have sensory and primitive stages biased by recent activation, as those detectors need to be employed repeatedly and unpredictably for novel arrangements. Longer-term memory effects (as in priming experiments) would be best confined for those stages that are confronted with the ~ombinato~~ explosion of possib~ities reflecting new classes that need to be formed. Indeed, in the neural net implementation, these priming effects would be modeled as changes in the weight matrix for self-organization at the geon feature assembly stage, rather than in the residual activation of the feature assembly units themselves. For many of the same reasons that biasing based on prior activation should be avoided at earlier stages, the former alternative, viz., changes in the weight matrix, would appear to be the preferred alternative. That is, we should be able to classify a novel image more quickly if that image (or its feature complement) is presented again because we now have built up a class for that image, but we do not want to bias what components and relations we see as a result of that prior experience. Relation to Other Theories The results from these experiments are contrary to two recent theories of recognition based on model-based matching (Lowe, 1987; Uilman, 1989). Although these proposals were not necessarily designed to model human recognition, we can regard them as specific instantiations of model-based or top-down matching schemes. In Lowe’s SCERPO model, an initial detection of long segments that are unlikely to be accidents of viewpoint is used to suggest a specific orientation of an exact model in three space. Additional image features are then tested against this model. No role is ascribed to middle-level representations, such as components. A serious problem for SCERPO, given the equivalence between identical and complementary priming observed in Experiment I, is how it would be able to infer missing features in the absence of an exact model for these particular pictures. If exact models were, somehow, divined for SCERPO to handle the results of Experiment I, it would then predict, incorrectly, equivalence between identical and complementary component image conditions in Experiment II. The same difficulties are encountered by Ullman’s Feature Alignment Model, in which a few features from a wider range of possible features are used to suggest the o~en~tion of an object. If a model of an object can be so successfully activated to account for the equivalence of the identical and complementary feature conditions in Experiment I, it would also predict priming across component complements in Experiment II. Both the Lowe and Ullman models achieve recognition by matching an image against a specific orientation of an object. How- PRIMING CONTOUR-DELETED IMAGES 413 ever, the equivalence in performance of original and mirror-image oriented images in Experiment I is contrary to that expectation. It would seem, therefore, that objects are represented as an arrangement of components that are mirror-image invariant. The equivalence of identical and complementary primes in Experiment I is inconsistent with object representations that specify the features at the joins of components, as with the “winged features” described by Chen and Stockman (1989). The contour deletion extensively disrupted such features. It should be noted that the absence of evidence for either feature priming (Experiment I) or specific model-based priming (Experiment II) is not a prediction derived from RBC. It would be an independent assumption whether the activation of representations of features or models would persist to contribute to visual priming. Conceivably all three possibilities-features, components, and models-could contribute to priming. However, evidence that a particular representation affected priming, components in the present case, can be taken as evidence for the existence of that representation. There is an asymmetry in the inference, however: The finding that a class of representations did not affect priming, image features and object models in these tasks, does not mean that those representations do not exist.7 CONCLUSION As described in Biederman (1987a), the image features in the geons (which correspond to what have been called components in the present investigation) from which objects are modeled are redundant in that a volume can be largely occluded (or have its contours deleted as in Experiment I), and the remaining features will often be sufficient to activate the geon as they presumably did in Experiment I. The results of Experiment I have some plausibility, given a system that might have to recognize objects when they are partially occluded by small, textured surfaces, such as light foliage, or viewed at a slightly different orientation in depth. Recognition of noisy images should not be dependent on reinstatement of the identical configuration of the noise. ’ With the present designs, if repetition of features and object models had affected priming, the effects would probably have obscured the evidence for component priming. It should be recalled that feature priming would have been evidenced by an advantage of the identical over the complementary feature conditions in Experiment I and object model priming would have been evidenced by an advantage of the complementary component condition in Experiment II over the same name-different exemplar condition. Barring additional experimental research, a quantitative analysis would then have been required to determine if the advantage of the identical condition over the complementary condition in Experiment II could not completely be accounted for by feature priming. 414 BIEDERMAN AND COOPER The results of these experiments are compatible with a model that posits intermediate shape components in specified relations that mediate object recognition. The benefits of a prior exposure of a picture might be in the development of units that assemble representations of the components, their attributes, and relations to other components. APPENDIX A This appendix consists of the instructions for creating the complementary images used in Experiment I, using CricketDraw on a Macintosh II. 1. Adjust the size of the object so that the diagonal length is between 5.5 and 6.5 in. The line width should be 2. 2. Using the polygon option in CricketDraw, place polygons over the vertices of the object according to the following scheme (the polygons, which are the same color as the background, are used to cover or reveal the contour in step 4 by placing them on or beneath the contour, respectively): a. Leave % in. of each line forming the vertex. b. If less that ti in. of any edge is left after parsing the vertices, then both vertices of that edge should be included under one polygon. c. If between 1/4and 1/2in. of any edge is left after parsing the vertices, then each vertex should be extended to include half of that edge. 3. After placing polygons over the vertices, polygons should be placed over the edges of the object according to the following scheme: a. Edges between 0.5 and 1.5 in. in length should be covered by one polygon. After placing polygons over the vertices, no edge shorter than 0.5 in. should remain. b. Edges greater than 1.5 in. should be divided in half, with two polygons placed over them. 4. The next step is to arrange the polygons to create two complementary images. To expose contour, make the borders of the polygons white and send them to the back. To hide the contour, simply make the borders of the polygons white. 5. When deleting and exposing contour: a. As closely as possible equal amounts of contour should be present in each component in each image. b. As closely as possible, equal amounts of contour should be present in each image as a whole. c. Try to divide between each image, as evenly as possible, the edges at each vertex. For example, at a two-segment vertex, give one segment to one of the images and one to the other. At a threesegment vertex, give two edges to one and one to the other. PRIMING CONTOUR-DELETED APPENDIX IMAGES 415 B The degree to which the nonvisual component of the priming could be attributed to activation of the name and concept as opposed to general transfer, such as learning how to perform these object naming tasks more efficiently, was assessed by comparing performance on the second block between the same name-Different Exemplar condition and a condition with stimuli that required a name (and concept) that did not appear on the first block. Method On the first block of trials, 16 subjects named pictures from 12 of the 24 categories used in Experiment I. On the second block of trials, the subjects named 24 pictures: half were the different exemplars of the 12 block 1 categories and the other half were completely new categories (and names) from the remaining 12 categories not used in block 1. As in Experiments I and II, first block exposure durations were 500 ms; second block durations were 200 ms. The particular complementary version used for a given object was selected randomly but the objects and categories employed were balanced. Results and Discussion The results are shown in Fig. 8. On block 2, mean correct RTs (and error rates) for the different exemplar condition were reliably lower, by 72 ms and 16.2%, than those for the new object condition; t(U) = 1.59 and 3.17, .2 <p < .I andp < .Ol, forRTsanderrors, respectively. This result indicates that a component on the nonvisual aspect of priming was specific to the basic-level class of the object. The experiment did not assess the relative contributions of repetition of the basic-level concept and the name of the object. That RTs on block 2 for the new condition were higher than those on block I undoubtedly was a consequence of the longer exposure durations on block 1 (500 ms vs 200 ms). APPENDIX C This experiment was designed to compare the effects of the deletion of complete edges and vertices on recognition performance as studied in Experiment I with midsegment deletion. With the latter but not the former, edge restoration could be achieved through a local process without resort to middle (e.g., component) or top (e.g., piano) representations . Method Sixteen subjects viewed 48 pictures of the objects. Half the pictures presented to each subject were the experimental stimuli, 3 of which are shown in the right column of Fig. 9. The other half were those with midsegment deletion, as shown in the left column of Fig. 9. Equal amounts (SO?G)of contour were removed from both types of images. The deletion type 416 BIEDERMAN AND COOPER Different Exemplar New Object Condition EWerent Exemplar New Object Condition FIG. 8. Mean correct naming RTs and error rates for the experiment described in Appendix B. Second block data are for those trials where the object was correctly named on the first block. (Inclusion of those trials where the first block was in error did not alter the pattern of the results, although it did increase variability.) for the different objects was balanced across subjects. Each picture was shown for 200 ms and followed by a mask. Results For the stimuli used in Experiment I (vertex and segment deletion), naming RTs averaged 947 ms with an error rate of 27.3%. Both the RTs and the error rates were greater than those for the midsegment-deleted stimuli, where the mean correct RTs were 891 ms with an error rate of PRIMING CONTOUR-DELETED IMAGES FIG. 9. Examples of the stimuli used in the experiment described in Appendix C. (Left column) Contour-deleted images with the contour removed from the middle of a segment. (Right column) Contour-deleted images from Experiment I produced by deleting every other edge and vertex. Equal amounts, 50% of the contour of the original, have been removed from both type of images. 16.7% [t(M) = 2.08, .05 c p < .lO for RTs and t(B) = 4.96, p < .OOl for errors]. Deleting vertices and complete segments rendered object images less identifiable than the same amount of contour removed in midsegment. REFERENCES Bartram, D. (1974). The role of visual and semantic codes in object naming. Cognifive Psychology, 6, 325-356. 418 BIEDERMAN AND COOPER Biederman, I. (1987a). Recognition-by-components: A theory of human image interpretation. Psychological Review, 94, 115-147. Biederman, I. (1987b). Matching image edges to object memory. In Proceedings ofthe First International Conference on Computer Vision (pp. 386392). New York: IEEE Computer Society. Biederman, I. (1988). Aspects and extensions of a theory of human image understanding. In in human vision (pp. 370-428). New York: Z. Pylyshyn (Ed.), Computationalprocesses Ablex. Biederman, I., & Cooper, E. E. (in press, a). Evidence for complete translational and reflectional invariance in visual object priming. Perception. Biederman, I., & Cooper, E. E. (in press, b). Scale invariance in visual object priming. Journal of Experimental Psychology: Human Perception and Performance. Biederman, I., Hilton, H. J., & Hummel, J. E. (1991). Pattern goodness and pattern recognition. In J. R. Pomerantz & G. R. Lockhead (Ed%), The perception of structure. Washington, DC: American Psychological Association. Biederman, I., & Ju, G. (1988). Surface vs. edge-based determinants of visual recognition. Cognitive Psychology, 20, 38-64. Blickle, T. W. (1989). Recognition of contour deleted images. Unpublished doctoral dissertation, State University of New York at Buffalo. Brooks, R. A. (1981). Symbolic reasoning among 3-D models and 2-D images. Artificial Intelligence, 17, 20.5-244. Chen, S.-W., & Stockman, G. (1989). Object wings-2 1/2-D primitives for 3-D recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 535-540. Ellis, R., & Allport, D. A. (1986). Multiple levels of representation for visual objects: A behavioural study. In A. G. Cohen and J. R. Thomas (Eds.), Artificial intelligence and its applications (pp. 245-257). New York: Wiley. Grossberg, S., & Mingolla, E. (1985). Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations. Perception & Psychophysics, 38, 141-171. Guzman, A. (1971). Analysis of curved line drawings using context and global information. In Machine Intelligence (Vol. 6). Edinburgh: Edinburgh University Press. Hummel, J. E., & Biederman, I. (in press). Dynamic binding in a neural network for shape recognition. Psychological Review. Jacoby, L. J., Baker, J. G., & Brooks, L. R. (1989). Episodic effects on picture identitication: Implications for theories of concept learning and theories of memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 275-281. Jacoby, L. L., & Dallas, M. (1981). On the relationship between autobiographical memory and perceptual learning. Journal of Experimental Psychology: General, 110, 306-341. Kanizsa, G. (1979). Organization in vision: essays on Gestalt perception. New York: Praeger. Lowe, D. G. (1987). The viewpoint consistency constraint. International Journal of Computer Vision, 1, 57-72. Marr, D., & Nishihara, H. K. (1978). Representation and recognition of three dimensional shapes. Proceedings of the Royal Society of London, Series B. 200, 269-294. Palmer, S. E. (1975). Visual perception and world knowledge: Notes on a model of sensorycognitive interaction. In D. A. Norman L D. E. Rumelhart (Eds.), Explorations in cognition (pp. 279-307). San Francisco: Freeman. Palmer, S. E. (1977). Hierarchical structure in perceptual representation. Cognitive Psychology, 9, 441474. Schacter, D. L. (1987). Implicit memory: History and current status. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 501-518. PRIMING CONTOUR-DELETED IMAGES 419 Tversky, B., & Hemenway, K. (1984). Objects, parts, and categories. Journnl of Experimental Psychology: General, 113, 169-193. Ullman, S. (1989). Aligning pictorial descriptions: An approach to object recognition. Cognition, 32, 193-2.54. Ullman, S., 8t Sha’ashua, A. (1988). Structural saliency: The detection of globally salient structures using a locally connected network (Tech. Rep. No. 1061). Cambridge, MA: MIT Artificial Intelligence Laboratory. Zucker, S. W., & Davis, S. (1988). Points and endpoints: A size/spacing constraint for dot grouping. Perception, 17, 229-247. Zucker, S. W., Dobbins, A., & Iverson, L. (1989). Two stages of curve detection suggest two styles of visual computation. Neural Computation, 1, 6841. (Accepted March 19, 1990)
© Copyright 2026 Paperzz