Manuscript under review. Please do not cite or quote without permission. One Plus One Equals One: The Effects of Merging on Object Files 1 2 2 Stephen R. Mitroff , Brian J. Scholl , & Karen Wynn 1 Center for Cognitive Neuroscience and Department of Psychological and Brain Sciences, Duke University. Department of Psychology, Yale University. 2 A critical task in visual processing is keeping track of objects as the same persisting individuals over time. The operations involved in such processing can be assessed in terms of the effects of various manipulations on mid-level object-file representations. Here we study what has been claimed to be the most important principle of object persistence: objects must maintain a single unified boundary over time (the ‘cohesion principle’). We do so by measuring ‘object-specific preview benefits’ (OSPBs), wherein a ‘preview’ of information on a specific object speeds the recognition of that information at a later point when it appears again on the same object. When two objects smoothly merged into one, the underlying object-file representations were dramatically affected: the information from only one of the initial objects survived this cohesion violation to produce an OSPB (whereas OSPBs from both original objects remain robust in similar control displays without cohesion violations). These results demonstrate the power of the cohesion principle in the maintenance of mid-level visual representations, and demonstrate that a single object file cannot store information from more than one object. Introduction We live in a visual world of constant flux, and a critical task of visual processing is thus not only to segment parts of the incoming input into discrete objects, but also to keep track of objects as the same, persisting individuals over time. Explorations of visual object persistence often take the notion of object files as their starting point. Object files (OFs) are episodic mid-level visual representations that track objects through spatiotemporal changes and store (and update) information about those objects’ properties (e.g., Kahneman, Treisman, & Gibbs, 1992). OFs are often intuitively characterized as ‘file folders’ in which an object’s information can be stored, and which are linked to physical objects in the world through ‘sticky’ pointers that track objects as they move. In this framework, the ‘folder’ represents the object per se, while all of an object’s properties are stored as entries in the folder. Because of these characteristics, OFs serve as a critical intermediate level of visual processing: an OF can survive — representing an object as the same enduring individual — even when its visual features are changing (“It’s red … now it’s blue”), and even when its recognized type is changing (“It’s a bird … no wait, it’s a plane”). In these examples, an object file is what represents the object as the same “it” in each case. For helpful conversation and/or comments on earlier drafts, we thank Erik Cheries, Nic Noles, Pamela Yee, and the members of the Scholl, Chun, & Wynn labs at Yale University. We also thank Melody Lu for assistance with data collection. SRM was supported by NIMH #F32-MH66553-01. BJS & KW were supported by NSF #BCS-0132444. For reprints and correspondence contact Stephen R. Mitroff at Center for Cognitive Neuroscience, Duke University, Box 90999, Durham, NC 27708, [email protected]. Some features of OFs can be directly studied via the object-reviewing paradigm (Kahneman et al., 1992): observers view an initial ‘preview’ display that contains two or more objects, and a different letter is placed in each (see Figure 1). The letters then disappear, and all of the objects move to new locations. After this motion, a single ‘probe’ letter appears in one of the objects, and the observers must simply name the probe letter as quickly as possible. When the probe happens to match one of the initial letters, responses are speeded, in a type of display-wide priming. In addition, however, observers are faster still to name the probe letter when it happens to match the letter initially presented on that same object — an effect which is termed an object-specific preview benefit (OSPB). More recently, a modified objectreviewing paradigm (used in the present study) was introduced in which observers must simply make a speeded response to indicate whether the probe letter had appeared anywhere in the initial display (e.g. Kruschke & Fragassi, 1996; Mitroff, Scholl, & Wynn, 2004; Noles, Scholl, & Mitroff, 2005). This paradigm yields a similar OSPB, which can be larger and more robust than letter-naming effects (Noles et al., 2005), and which can be used to study nonverbalizable visual features (Mitroff, Scholl, & Noles, under review). In both variants of the object reviewing paradigm, OSPBs serve as an index of object persistence: manipulations that attenuate enduring object representations will result in weakened OSPBs. After the initial demonstrations of OSPBs for static and dynamic objects of various types (Kahneman et al., 1992), an initial wave of research used object reviewing to explore the types of information which could be stored in Merging Object Files 2 Congruent Trials Preview Display Static Linking Display Incongruent Trials Probe Display A A Preview Display Probe Display A B B B A Motion Linking Display A A B B B Figure 1. Sample displays used in the original object-reviewing experiments of Kahneman et al. (1992). In the static displays, the probe is seen as the same object as one of the previews, because it appears on the same object, in the same location. Objecthood and location are unconfounded in the moving displays. In each case, congruent information facilitates probe naming on the same object, relative to incongruent information. (These actual experiments also involved No-Match trials, not depicted here.) object files. This work suggested that OFs can store information which is abstracted beyond superficial surface features, so that OSPBs will still be obtained when the probe differs from the preview in its specific visual features (e.g. the font of the letter) or even its format (e.g. words vs. line-drawings; see Gordon & Irwin, 1996, 2000; Henderson 1994; Henderson & Anes, 1994). (Other studies confirm, however, that OFs can also store lower-level nonverbalizable visual features about specific object tokens; Mitroff et al., under review; Noles & Scholl, 2005.) Recently, a second wave of research has also begun to explore the rules that constrain just how and when OFs are constructed and maintained (Mitroff, Scholl, & Wynn, 2004, 2005; Noles et al., 2005). This work has often taken a cue from similar investigations of ‘object cognition’ in young infants, which have resulted in a short list of critical ‘principles’ of object persistence (e.g. Spelke, 1990, 2000). Chief among these principles is that of cohesion — that an object must maintain a single, unified boundary over time. This principle, which has often been treated theoretically as the most important constraint on what it means to be an object (e.g. Bloom, 2000; Pinker, 1997), can be parsed into two basic components: (1) an object must always maintain a single unified boundary (i.e., it cannot split apart), and (2) the boundaries of multiple objects must always remain distinct (i.e., two objects cannot merge into one). While this first component has been addressed empirically both in the infant cognition and adult visual perception literatures, the second ‘boundedness’ component has receive far less attention. It has been shown that infants fail to represent non-cohesive substances (piles of sand) as bona fide objects (Huntley-Fenner, Carey, & Solimando, 2002), and that even the simple act of breaking an object into multiple pieces can disrupt numerical object cognition (Chiang & Wynn, 2000; Mitroff, Cheries, Wynn, & Scholl, 2005). Similarly, one adult study found that the maintenance of OFs was significantly impaired (though still present) when a single object smoothly split into two (Mitroff et al., 2004). Another study demonstrated that adults had difficulty attentionally tracking objects when they ‘poured’ in a substance like manner from one location to another, but were not impaired when unitary objects instantly turned into a local perceptual group before moving (vanMarle & Scholl, 2003). Both of these results suggest that although cohesion violations do not completely destroy adults’ persisting object representations, the cohesion principle does guide and influence adult mid-level vision just as it affects infant object cognition. Here we address the role of cohesion and ‘boundedness’ in object persistence by asking the following question: What happens to the corresponding OF representations of two objects when the objects are seen to smoothly merge into one? In other words, we are exploring whether OFs are constrained such that two OFs cannot both ‘point to’ or index the same object in the same location at the same time. After two objects merge, does the resulting OF contain information about both of the original objects, thus revealing a significant OSPB for both preview letters when probed? Or does the resulting OF only store one of the original objects’ feature-sets — and if so, which one? Note that any cost associated with such a transformation would reveal a strict adherence to the cohesion principle: whereas ‘splitting’ necessarily requires the formation of a completely new second representation, the ‘merging’ in this study simply requires maintaining already established representations. Note also that while this study thus used a display manipulation Merging Object Files 3 Response time (ms) Condition Congruent Trial (Same Object) Incongruent Trial (Different Object) Object Specific Preview Benefit (OSPB) Merging Trials Top 499.64 555.45 55.81 ms t(53) = 8.04, p < .001 Bottom 548.94 555.45 6.52 ms t(53) = 0.97, p = .337 Straight 582.83 603.79 20.95 ms t(53) = 3.20, p = .002 Approach Trials Top 510.44 573.81 63.38 ms t(53) = 6.27, p < .001 Bottom 562.82 587.11 24.29 ms t(53) = 2.15, p = .036 Straight 598.56 623.43 24.87 ms t(53) = 3.24, p = .018 Table 1. Response times and object specific preview benefits (OSPBs) for each condition. All data are collapsed over trials in which the topmost object moved straight and those in which the bottommost object moved straight. that is in many ways the converse of our previous study of ‘splitting’ (Mitroff et al., 2004), it addresses a different set of important theoretical questions about the underlying architectural constraints on OFs — e.g. whether they can store multiple instances of the same property, and whether it is possible for two internal OF representations to ‘point’ to the same object in the world. Method Fifty-seven Yale University undergraduates participated for course credit or payment. The data from three observers were removed because their overall response times were more than two standard deviations from the mean. The displays were presented on a Macintosh iMac computer using custom software written using the VisionShell graphics libraries (Comtois, 2004). Each trial began with three circles (2 deg in diameter), presented as black outlines on a white background, drawn 2.49 deg to the left of the horizontal midline, with one 4.49 deg above, one at, and one 4.49 deg below the vertical midline. (Distance measures were calculated from the circles’ centers and all visual angles are based upon an approximate viewing distance of 50 cm.) After 500 ms, a letter (subtending 1 deg, drawn in a black monospaced font) appeared in each circle, drawn without replacement from the set ‘K, M, P, S, T, V’. After 1 s, these ‘preview letters’ disappeared, and the circles began their motion (always at 10 deg/s, for a total of 500 ms). The different types of motions and conditions that were possible are depicted in Figure 2. Regardless of condition, one of the circles (either the topmost or bottommost) simply translated horizontally to the right, ending 2.49 deg to the right of center. The other two circles’ motions depended on the condition. In the Merging condition (two-thirds of the trials), they also moved to the right, but at the same time they merged into one identical circle — one gradually moving up and the other gradually moving down, resulting in a single circle 2.24 deg above or below center. Until the two merging circles had completely combined, only their outermost shared contour was drawn (see the ‘Linking Motion’ column of Figure 2). In the Approach condition (one-third of the trials), the remaining two circles also translated rightward while gradually approaching each other, but they never fully touched, ending instead with one 1.20 deg from center and with a small 0.10 deg gap between the two circles (see Figure 2). Immediately after the motion ended, a single probe letter appeared in one of the final circles (equally often in each circle) and remained until response. Observers made a speeded response, pressing one key to indicate that the probe letter was the same as any of the preview letters, or another key to indicate that it did not appear in the preview display. 50% of trials were ‘NoMatch’ trials, in which the probe letter (drawn from the same set as the preview letters) did not appear in any of the original circles. Of the remaining ‘Match’ trials, 50% were ‘Congruent Matches’ in which the probe letter was the same Merging Object Files 4 as the preview letter that initially appeared on that circle (or for final circles that resulted from a merge, from either of the initial circles that combined). The remaining 50% of the trials were ‘Incongruent Matches’ in which the probe letter was the same as the preview letter that initially appeared on one of the other initial circles (or for final circles resulting from a merge, from the lone translating circle). After 20 practice trials, 432 test trials were presented in a different random order for each observer. Results Overall accuracy was high (Mean = 95.69%, SD = 2.96%) and all analyses were conducted on observers’ median response times, limited to correct trials. The primary measures of interest were OSPBs — the relative response time benefit in the Congruent Match condition (when the probe letter reappeared on the same object in which it was initially previewed) compared to the Incongruent Match condition (when the probed letter had initially appeared on a different object). The OSPBs that resulted from each variation of the Approach and Merging conditions are presented in Table 1, along with the associated statistical tests. In the approach trials, wherein there was no cohesion violation, significant OSPBs were found for all three objects. In the Merging trials, in contrast, significant OSPBs were found only for the lone object undergoing straight motion, and for the letter which was initially previewed in the uppermost of the two objects which merged but not for the letter which was initially previewed in 1 the bottommost of the two objects which merged. Discussion This study began by asking what happens to object files when two objects are seen to smoothly merge into one. The results were clear: in this situation, only one of the object files survives. Two further results indicated that this impairment was specific to the merging manipulation. First, a significant object-specific effect was still observed for an independent third object in each display which did not participate in the combination: thus the merging destroyed only one of the objects that 1 Previous research using the object reviewing paradigm has consistently found a general bias for larger OSPBs on objects initially encountered above the other objects in vertically oriented initial displays, and for objects initially encountered to the left of other objects in horizontally oriented initial displays (Gordon & Irwin, 1996, 2000; Mitroff et al., 2004, 2005; Noles et al., 2005). What is of theoretical importance here is thus that there is a systematic survival of only one of the preview letters (compared to the 3-item Approach condition), not necessarily that it is the top preview letter in particular. participated in the combination, and did not impact the maintenance of object-specific information for the other object in the display. Second, no similar impairments were observed in the Approach trials when the display was as similar as possible, but did not involve any merging: here, robust object-specific effects were still observed for all three objects. In future work, it will be interesting to use this effect as a tool to see what other manipulations influence which object survives the combination — e.g. contrasting initial objects which are of different sizes or salience. Two conclusions follow from this pattern of results. First, the failure of both object files to survive the combination constitutes a strong demonstration of the importance of the cohesion principle in adults’ visual processing — and in doing so further supports the hypothesis that similar types of constraints control both infants’ object cognition and adults’ mid-level vision (Carey & Xu, 2001; Mitroff et al., 2004; Scholl & Leslie, 1999). Note that impact of the cohesion violation in this study was more extreme than previous experiments with ‘splitting’ objects in two ways. In this experiment, the probe letter appeared immediately after the two objects had merged, yet one of the underlying object files was still destroyed. This suggests that the cohesion violation directly cued the destruction of one of the object files — whereas in our previous study of ‘splitting’ violations (Mitroff et al., 2004) the motion continued after the split, leaving more time for any associated object files to decay. In addition, note that the cohesion violation employed in the ‘splitting’ experiments necessarily required extra processing (i.e. the construction of a new file), whereas the merging employed here only required continued maintenance of object files which had already been constructed. A second conclusion from this study is that object files are controlled by the constraint that only one object’s properties can be stored in each file. Previous research indicated that object files are tied to the present, and thus fail to store past features of objects when those features change (Kahneman et al., 1992); the present study extends this result by demonstrating that object files are also limited spatiotemporally to a single object. This constraint that two objects cannot be represented as being in the same location may be a general feature of visual and cognitive processing (Bedford, 2004). Both of these conclusions illustrate the existence and subtlety of the principles that underlie our perception of persisting objects in visual experience, and how they can be uncovered via the measurement of object-specific processing. Merging Object Files 5 References Bedford, F. (2004). Analysis of a constraint on perception, cognition, and development: One object, one place, one time. Journal of Experimental Psychology: Human Perception & Performance, 30, 907 - 912. Bloom, P. (2000). How children learn the meanings of words. Cambridge, MA: MIT Press. Carey, S., & Xu, F. (2001). Infant knowledge of objects: Beyond object files and object tracking. Cognition, 80, 179 - 213. Chiang, W.-C., & Wynn, K. (2000). Infants’ tracking of objects and collections. Cognition, 77, 169 - 195. Comtois, R. (2004). VisionShell PPC. [Software libraries]. Cambridge, MA: author. Gordon, R., & Irwin, D. (1996). What’s in an object file? Evidence from priming studies. Perception & Psychophysics, 58, 1260 - 1277. Gordon, R., & Irwin, D. (2000). The role of physical and conceptual properties in preserving object continuity. Journal of Experimental Psychology: Learning, Memory, & Cognition, 26, 136 - 150. Henderson, J. (1994). Two representational systems in dynamic visual identification. Journal of Experimental Psychology: General, 123, 410 - 426. Henderson, J. M., & Anes, M. D. (1994). Roles of object-file review and type priming in visual identification within and across eye fixations. Journal of Experimental Psychology: Human Perception and Performance, 20, 826 - 839. Huntley-Fenner, G., Carey, S., & Solimando, A. (2002). Objects are individuals but stuff doesn’t count: Perceived rigidity and cohesiveness influence infants’ representations of small groups of distinct entities. Cognition, 85, 223 250. Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of object files: Object-specific integration of information. Cognitive Psychology, 24, 174 - 219. Kruschke, J. K., & Fragassi, M. M. (1996). The perception of causality: Feature binding in interacting objects. In Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society (pp. 441 - 446). Hillsdale, NJ: Erlbaum. Mitroff, S. R., Cheries, E. W., Wynn, K., & Scholl, B. J. (2005). Cohesion as a principle of object persistence in infants and adults. Poster presented at the annual meeting of the Vision Sciences Society, 5/10/05, Sarasota, FL. Mitroff, S. R., Scholl, B. J., & Noles, N. S. (under review). Object files can be purely episodic. Manuscript submitted for publication. Mitroff, S. R., Scholl, B. J., Wynn, K. (2004). Divide and conquer: How object files adapt when a persisting object splits into two. Psychological Science, 15, 420 - 425. Mitroff, S. R., Scholl, B. J., & Wynn, K. (2005). The relationship between object files and conscious perception. Cognition, 96(1), 67 - 92. Noles, N. S., & Scholl, B. J. (2005). What’s in an object file? Integral vs. separable features. Poster presented at the annual meeting of the Vision Sciences Society, 5/8/05, Sarasota, FL. Noles, N. S., Scholl, B. J., & Mitroff, S. R. (2005). The persistence of object-file representations. Perception & Psychophysics, 67, 324 - 334. Pinker, S. (1997). How the mind works. New York: Norton. Scholl, B. J., & Leslie, A. M. (1999). Explaining the infant’s object concept: Beyond the perception/cognition dichotomy. In E. Lepore & Z. Pylyshyn (Eds.), What is cognitive science? (pp. 26 - 73). Oxford: Blackwell. Spelke, E. S. (1990). Principles of object perception. Cognitive Science, 14, 29 - 56. Spelke, E. S. (2000). Core knowledge. American Psychologist, 55, 1233 - 1243. vanMarle, K., & Scholl, B. J. (2003). Attentive tracking of objects vs. substances. Psychological Science, 14, 498 - 504. Merging Object Files 6 Merging Condition Congruent Trials Preview Display Linking Motion Incongruent Trials Probe Display Preview Display M Linking Motion Probe Display M M Top of Merge T P P M Bottom of Merge Straight Motion P T M T T T P P M M T P T P Time or P P M Time T Approach Condition Congruent Trials Preview Display Top of Approach Bottom of Approach Straight Motion Linking Motion M Incongruent Trials Probe Display Preview Display M Linking Motion M T T P P Probe Display T P or M M T T P M M T T P Time P M T P P or or P Time M T Figure 2. Depictions of Congruent-Match and Incongruent-Match trials for the six trial types (not to scale). The observers’ task was to indicate as quickly as possible whether the final letter had appeared anywhere in the initial preview display on that trial. This illustration simplifies the actual experiment in that there are no examples of the No-Match trials (in which the final probe letter was not one of the preview letters), and that in the actual experiment, the topmost object traced the straight motion on half the trials.
© Copyright 2026 Paperzz