One Plus One Equals One: The Effects of Merging on

Manuscript under review. Please do not cite or quote without permission.
One Plus One Equals One:
The Effects of Merging on Object Files
1
2
2
Stephen R. Mitroff , Brian J. Scholl , & Karen Wynn
1
Center for Cognitive Neuroscience and Department of Psychological and Brain Sciences, Duke University.
Department of Psychology, Yale University.
2
A critical task in visual processing is keeping track of objects as the same persisting individuals over time. The operations involved
in such processing can be assessed in terms of the effects of various manipulations on mid-level object-file representations. Here we
study what has been claimed to be the most important principle of object persistence: objects must maintain a single unified
boundary over time (the ‘cohesion principle’). We do so by measuring ‘object-specific preview benefits’ (OSPBs), wherein a
‘preview’ of information on a specific object speeds the recognition of that information at a later point when it appears again on the
same object. When two objects smoothly merged into one, the underlying object-file representations were dramatically affected: the
information from only one of the initial objects survived this cohesion violation to produce an OSPB (whereas OSPBs from both
original objects remain robust in similar control displays without cohesion violations). These results demonstrate the power of the
cohesion principle in the maintenance of mid-level visual representations, and demonstrate that a single object file cannot store
information from more than one object.
Introduction
We live in a visual world of constant flux, and
a critical task of visual processing is thus not only
to segment parts of the incoming input into
discrete objects, but also to keep track of objects as
the same, persisting individuals over time.
Explorations of visual object persistence often take
the notion of object files as their starting point.
Object files (OFs) are episodic mid-level visual
representations that track objects through spatiotemporal changes and store (and update)
information about those objects’ properties (e.g.,
Kahneman, Treisman, & Gibbs, 1992). OFs are
often intuitively characterized as ‘file folders’ in
which an object’s information can be stored, and
which are linked to physical objects in the world
through ‘sticky’ pointers that track objects as they
move. In this framework, the ‘folder’ represents
the object per se, while all of an object’s properties
are stored as entries in the folder. Because of these
characteristics, OFs serve as a critical intermediate
level of visual processing: an OF can survive —
representing an object as the same enduring
individual — even when its visual features are
changing (“It’s red … now it’s blue”), and even
when its recognized type is changing (“It’s a bird
… no wait, it’s a plane”). In these examples, an
object file is what represents the object as the
same “it” in each case.
For helpful conversation and/or comments on earlier drafts,
we thank Erik Cheries, Nic Noles, Pamela Yee, and the
members of the Scholl, Chun, & Wynn labs at Yale University.
We also thank Melody Lu for assistance with data collection.
SRM was supported by NIMH #F32-MH66553-01. BJS & KW
were supported by NSF #BCS-0132444. For reprints and
correspondence contact Stephen R. Mitroff at Center for
Cognitive Neuroscience, Duke University, Box 90999, Durham,
NC 27708, [email protected].
Some features of OFs can be directly studied
via the object-reviewing paradigm (Kahneman et
al., 1992): observers view an initial ‘preview’
display that contains two or more objects, and a
different letter is placed in each (see Figure 1). The
letters then disappear, and all of the objects move
to new locations. After this motion, a single
‘probe’ letter appears in one of the objects, and the
observers must simply name the probe letter as
quickly as possible. When the probe happens to
match one of the initial letters, responses are
speeded, in a type of display-wide priming. In
addition, however, observers are faster still to
name the probe letter when it happens to match
the letter initially presented on that same object —
an effect which is termed an object-specific preview
benefit (OSPB). More recently, a modified objectreviewing paradigm (used in the present study)
was introduced in which observers must simply
make a speeded response to indicate whether the
probe letter had appeared anywhere in the initial
display (e.g. Kruschke & Fragassi, 1996; Mitroff,
Scholl, & Wynn, 2004; Noles, Scholl, & Mitroff,
2005). This paradigm yields a similar OSPB, which
can be larger and more robust than letter-naming
effects (Noles et al., 2005), and which can be used
to study nonverbalizable visual features (Mitroff,
Scholl, & Noles, under review). In both variants of
the object reviewing paradigm, OSPBs serve as an
index of object persistence: manipulations that attenuate enduring object representations will result
in weakened OSPBs.
After the initial demonstrations of OSPBs for
static and dynamic objects of various types
(Kahneman et al., 1992), an initial wave of
research used object reviewing to explore the
types of information which could be stored in
Merging Object Files
2
Congruent Trials
Preview
Display
Static
Linking
Display
Incongruent Trials
Probe
Display
A
A
Preview
Display
Probe
Display
A
B
B
B
A
Motion
Linking
Display
A
A
B
B
B
Figure 1. Sample displays used in the original object-reviewing experiments of Kahneman et al. (1992). In the static displays, the
probe is seen as the same object as one of the previews, because it appears on the same object, in the same location. Objecthood and
location are unconfounded in the moving displays. In each case, congruent information facilitates probe naming on the same object,
relative to incongruent information. (These actual experiments also involved No-Match trials, not depicted here.)
object files. This work suggested that OFs can
store information which is abstracted beyond
superficial surface features, so that OSPBs will
still be obtained when the probe differs from the
preview in its specific visual features (e.g. the font
of the letter) or even its format (e.g. words vs.
line-drawings; see Gordon & Irwin, 1996, 2000;
Henderson 1994; Henderson & Anes, 1994).
(Other studies confirm, however, that OFs can
also store lower-level nonverbalizable visual
features about specific object tokens; Mitroff et al.,
under review; Noles & Scholl, 2005.) Recently, a
second wave of research has also begun to explore
the rules that constrain just how and when OFs
are constructed and maintained (Mitroff, Scholl, &
Wynn, 2004, 2005; Noles et al., 2005).
This work has often taken a cue from similar
investigations of ‘object cognition’ in young
infants, which have resulted in a short list of
critical ‘principles’ of object persistence (e.g.
Spelke, 1990, 2000). Chief among these principles
is that of cohesion — that an object must maintain
a single, unified boundary over time. This
principle, which has often been treated theoretically as the most important constraint on what
it means to be an object (e.g. Bloom, 2000; Pinker,
1997), can be parsed into two basic components:
(1) an object must always maintain a single
unified boundary (i.e., it cannot split apart), and
(2) the boundaries of multiple objects must always
remain distinct (i.e., two objects cannot merge
into one). While this first component has been
addressed empirically both in the infant cognition
and adult visual perception literatures, the second
‘boundedness’ component has receive far less
attention. It has been shown that infants fail to
represent non-cohesive substances (piles of sand)
as bona fide objects (Huntley-Fenner, Carey, &
Solimando, 2002), and that even the simple act of
breaking an object into multiple pieces can
disrupt numerical object cognition (Chiang &
Wynn, 2000; Mitroff, Cheries, Wynn, & Scholl,
2005). Similarly, one adult study found that the
maintenance of OFs was significantly impaired
(though still present) when a single object
smoothly split into two (Mitroff et al., 2004).
Another study demonstrated that adults had
difficulty attentionally tracking objects when they
‘poured’ in a substance like manner from one
location to another, but were not impaired when
unitary objects instantly turned into a local
perceptual group before moving (vanMarle &
Scholl, 2003). Both of these results suggest that
although cohesion violations do not completely
destroy adults’ persisting object representations,
the cohesion principle does guide and influence
adult mid-level vision just as it affects infant
object cognition.
Here we address the role of cohesion and
‘boundedness’ in object persistence by asking the
following question: What happens to the
corresponding OF representations of two objects
when the objects are seen to smoothly merge into
one? In other words, we are exploring whether
OFs are constrained such that two OFs cannot
both ‘point to’ or index the same object in the
same location at the same time. After two objects
merge, does the resulting OF contain information
about both of the original objects, thus revealing a
significant OSPB for both preview letters when
probed? Or does the resulting OF only store one
of the original objects’ feature-sets — and if so,
which one? Note that any cost associated with
such a transformation would reveal a strict
adherence to the cohesion principle: whereas
‘splitting’ necessarily requires the formation of a
completely new second representation, the ‘merging’ in this study simply requires maintaining
already established representations. Note also that
while this study thus used a display manipulation
Merging Object Files
3
Response time (ms)
Condition
Congruent Trial
(Same Object)
Incongruent Trial
(Different Object)
Object Specific Preview
Benefit (OSPB)
Merging Trials
Top
499.64
555.45
55.81 ms
t(53) = 8.04, p < .001
Bottom
548.94
555.45
6.52 ms
t(53) = 0.97, p = .337
Straight
582.83
603.79
20.95 ms
t(53) = 3.20, p = .002
Approach Trials
Top
510.44
573.81
63.38 ms
t(53) = 6.27, p < .001
Bottom
562.82
587.11
24.29 ms
t(53) = 2.15, p = .036
Straight
598.56
623.43
24.87 ms
t(53) = 3.24, p = .018
Table 1. Response times and object specific preview benefits (OSPBs) for each condition. All data are collapsed over trials in which
the topmost object moved straight and those in which the bottommost object moved straight.
that is in many ways the converse of our previous
study of ‘splitting’ (Mitroff et al., 2004), it
addresses a different set of important theoretical
questions about the underlying architectural
constraints on OFs — e.g. whether they can store
multiple instances of the same property, and
whether it is possible for two internal OF
representations to ‘point’ to the same object in the
world.
Method
Fifty-seven Yale University undergraduates
participated for course credit or payment. The
data from three observers were removed because
their overall response times were more than two
standard deviations from the mean. The displays
were presented on a Macintosh iMac computer
using custom software written using the VisionShell graphics libraries (Comtois, 2004). Each trial
began with three circles (2 deg in diameter),
presented as black outlines on a white
background, drawn 2.49 deg to the left of the
horizontal midline, with one 4.49 deg above, one
at, and one 4.49 deg below the vertical midline.
(Distance measures were calculated from the
circles’ centers and all visual angles are based
upon an approximate viewing distance of 50 cm.)
After 500 ms, a letter (subtending 1 deg, drawn in
a black monospaced font) appeared in each circle,
drawn without replacement from the set ‘K, M, P,
S, T, V’. After 1 s, these ‘preview letters’
disappeared, and the circles began their motion
(always at 10 deg/s, for a total of 500 ms). The
different types of motions and conditions that
were possible are depicted in Figure 2. Regardless
of condition, one of the circles (either the topmost
or bottommost) simply translated horizontally to
the right, ending 2.49 deg to the right of center.
The other two circles’ motions depended on the
condition. In the Merging condition (two-thirds of
the trials), they also moved to the right, but at the
same time they merged into one identical circle —
one gradually moving up and the other gradually
moving down, resulting in a single circle 2.24 deg
above or below center. Until the two merging
circles had completely combined, only their outermost shared contour was drawn (see the ‘Linking
Motion’ column of Figure 2). In the Approach
condition (one-third of the trials), the remaining
two circles also translated rightward while gradually approaching each other, but they never fully
touched, ending instead with one 1.20 deg from
center and with a small 0.10 deg gap between the
two circles (see Figure 2).
Immediately after the motion ended, a single
probe letter appeared in one of the final circles
(equally often in each circle) and remained until
response. Observers made a speeded response,
pressing one key to indicate that the probe letter
was the same as any of the preview letters, or
another key to indicate that it did not appear in
the preview display. 50% of trials were ‘NoMatch’ trials, in which the probe letter (drawn
from the same set as the preview letters) did not
appear in any of the original circles. Of the
remaining ‘Match’ trials, 50% were ‘Congruent
Matches’ in which the probe letter was the same
Merging Object Files
4
as the preview letter that initially appeared on
that circle (or for final circles that resulted from a
merge, from either of the initial circles that
combined). The remaining 50% of the trials were
‘Incongruent Matches’ in which the probe letter
was the same as the preview letter that initially
appeared on one of the other initial circles (or for
final circles resulting from a merge, from the lone
translating circle). After 20 practice trials, 432 test
trials were presented in a different random order
for each observer.
Results
Overall accuracy was high (Mean = 95.69%, SD
= 2.96%) and all analyses were conducted on
observers’ median response times, limited to
correct trials. The primary measures of interest
were OSPBs — the relative response time benefit
in the Congruent Match condition (when the
probe letter reappeared on the same object in
which it was initially previewed) compared to the
Incongruent Match condition (when the probed
letter had initially appeared on a different object).
The OSPBs that resulted from each variation of
the Approach and Merging conditions are presented in Table 1, along with the associated statistical
tests. In the approach trials, wherein there was no
cohesion violation, significant OSPBs were found
for all three objects. In the Merging trials, in
contrast, significant OSPBs were found only for
the lone object undergoing straight motion, and
for the letter which was initially previewed in the
uppermost of the two objects which merged but
not for the letter which was initially previewed in
1
the bottommost of the two objects which merged.
Discussion
This study began by asking what happens to
object files when two objects are seen to smoothly
merge into one. The results were clear: in this
situation, only one of the object files survives.
Two further results indicated that this impairment
was specific to the merging manipulation. First, a
significant object-specific effect was still observed
for an independent third object in each display
which did not participate in the combination: thus
the merging destroyed only one of the objects that
1
Previous research using the object reviewing paradigm has
consistently found a general bias for larger OSPBs on objects
initially encountered above the other objects in vertically
oriented initial displays, and for objects initially encountered
to the left of other objects in horizontally oriented initial
displays (Gordon & Irwin, 1996, 2000; Mitroff et al., 2004, 2005;
Noles et al., 2005). What is of theoretical importance here is
thus that there is a systematic survival of only one of the
preview letters (compared to the 3-item Approach condition),
not necessarily that it is the top preview letter in particular.
participated in the combination, and did not
impact the maintenance of object-specific information for the other object in the display. Second,
no similar impairments were observed in the
Approach trials when the display was as similar
as possible, but did not involve any merging:
here, robust object-specific effects were still
observed for all three objects. In future work, it
will be interesting to use this effect as a tool to see
what other manipulations influence which object
survives the combination — e.g. contrasting initial
objects which are of different sizes or salience.
Two conclusions follow from this pattern of
results. First, the failure of both object files to
survive the combination constitutes a strong
demonstration of the importance of the cohesion
principle in adults’ visual processing — and in
doing so further supports the hypothesis that
similar types of constraints control both infants’
object cognition and adults’ mid-level vision
(Carey & Xu, 2001; Mitroff et al., 2004; Scholl &
Leslie, 1999). Note that impact of the cohesion
violation in this study was more extreme than
previous experiments with ‘splitting’ objects in
two ways. In this experiment, the probe letter
appeared immediately after the two objects had
merged, yet one of the underlying object files was
still destroyed. This suggests that the cohesion
violation directly cued the destruction of one of
the object files — whereas in our previous study
of ‘splitting’ violations (Mitroff et al., 2004) the
motion continued after the split, leaving more
time for any associated object files to decay. In
addition, note that the cohesion violation
employed in the ‘splitting’ experiments necessarily required extra processing (i.e. the
construction of a new file), whereas the merging
employed here only required continued maintenance of object files which had already been
constructed. A second conclusion from this study
is that object files are controlled by the constraint
that only one object’s properties can be stored in
each file. Previous research indicated that object
files are tied to the present, and thus fail to store
past features of objects when those features
change (Kahneman et al., 1992); the present study
extends this result by demonstrating that object
files are also limited spatiotemporally to a single
object. This constraint that two objects cannot be
represented as being in the same location may be
a general feature of visual and cognitive
processing (Bedford, 2004). Both of these conclusions illustrate the existence and subtlety of the
principles that underlie our perception of
persisting objects in visual experience, and how
they can be uncovered via the measurement of
object-specific processing.
Merging Object Files
5
References
Bedford, F. (2004). Analysis of a constraint on
perception, cognition, and development: One
object, one place, one time. Journal of
Experimental Psychology: Human Perception &
Performance, 30, 907 - 912.
Bloom, P. (2000). How children learn the meanings of
words. Cambridge, MA: MIT Press.
Carey, S., & Xu, F. (2001). Infant knowledge of
objects: Beyond object files and object tracking.
Cognition, 80, 179 - 213.
Chiang, W.-C., & Wynn, K. (2000). Infants’
tracking of objects and collections. Cognition, 77,
169 - 195.
Comtois, R. (2004). VisionShell PPC. [Software
libraries]. Cambridge, MA: author.
Gordon, R., & Irwin, D. (1996). What’s in an object
file? Evidence from priming studies. Perception
& Psychophysics, 58, 1260 - 1277.
Gordon, R., & Irwin, D. (2000). The role of
physical and conceptual properties in
preserving object continuity. Journal of
Experimental Psychology: Learning, Memory, &
Cognition, 26, 136 - 150.
Henderson, J. (1994). Two representational systems in dynamic visual identification. Journal of
Experimental Psychology: General, 123, 410 - 426.
Henderson, J. M., & Anes, M. D. (1994). Roles of
object-file review and type priming in visual
identification within and across eye fixations.
Journal of Experimental Psychology: Human
Perception and Performance, 20, 826 - 839.
Huntley-Fenner, G., Carey, S., & Solimando, A.
(2002). Objects are individuals but stuff doesn’t
count: Perceived rigidity and cohesiveness
influence infants’ representations of small
groups of distinct entities. Cognition, 85, 223 250.
Kahneman, D., Treisman, A., & Gibbs, B. J. (1992).
The reviewing of object files: Object-specific
integration of information. Cognitive Psychology,
24, 174 - 219.
Kruschke, J. K., & Fragassi, M. M. (1996). The
perception of causality: Feature binding in
interacting objects. In Proceedings of the
Eighteenth Annual Conference of the Cognitive
Science Society (pp. 441 - 446). Hillsdale, NJ:
Erlbaum.
Mitroff, S. R., Cheries, E. W., Wynn, K., & Scholl,
B. J. (2005). Cohesion as a principle of object
persistence in infants and adults. Poster
presented at the annual meeting of the Vision
Sciences Society, 5/10/05, Sarasota, FL.
Mitroff, S. R., Scholl, B. J., & Noles, N. S. (under
review). Object files can be purely episodic.
Manuscript submitted for publication.
Mitroff, S. R., Scholl, B. J., Wynn, K. (2004). Divide
and conquer: How object files adapt when a
persisting object splits into two. Psychological
Science, 15, 420 - 425.
Mitroff, S. R., Scholl, B. J., & Wynn, K. (2005). The
relationship between object files and conscious
perception. Cognition, 96(1), 67 - 92.
Noles, N. S., & Scholl, B. J. (2005). What’s in an
object file? Integral vs. separable features.
Poster presented at the annual meeting of the
Vision Sciences Society, 5/8/05, Sarasota, FL.
Noles, N. S., Scholl, B. J., & Mitroff, S. R. (2005).
The persistence of object-file representations.
Perception & Psychophysics, 67, 324 - 334.
Pinker, S. (1997). How the mind works. New York:
Norton.
Scholl, B. J., & Leslie, A. M. (1999). Explaining the
infant’s object concept: Beyond the
perception/cognition dichotomy. In E. Lepore
& Z. Pylyshyn (Eds.), What is cognitive science?
(pp. 26 - 73). Oxford: Blackwell.
Spelke, E. S. (1990). Principles of object
perception. Cognitive Science, 14, 29 - 56.
Spelke, E. S. (2000). Core knowledge. American
Psychologist, 55, 1233 - 1243.
vanMarle, K., & Scholl, B. J. (2003). Attentive
tracking of objects vs. substances. Psychological
Science, 14, 498 - 504.
Merging Object Files
6
Merging Condition
Congruent Trials
Preview
Display
Linking
Motion
Incongruent Trials
Probe
Display
Preview
Display
M
Linking
Motion
Probe
Display
M
M
Top of
Merge
T
P
P
M
Bottom of
Merge
Straight
Motion
P
T
M
T
T
T
P
P
M
M
T
P
T
P
Time
or
P
P
M
Time
T
Approach Condition
Congruent Trials
Preview
Display
Top of
Approach
Bottom of
Approach
Straight
Motion
Linking
Motion
M
Incongruent Trials
Probe
Display
Preview
Display
M
Linking
Motion
M
T
T
P
P
Probe
Display
T
P
or
M
M
T
T
P
M
M
T
T
P
Time
P
M
T
P
P
or
or
P
Time
M
T
Figure 2. Depictions of Congruent-Match and Incongruent-Match trials for the six trial types (not to scale). The observers’ task was
to indicate as quickly as possible whether the final letter had appeared anywhere in the initial preview display on that trial. This
illustration simplifies the actual experiment in that there are no examples of the No-Match trials (in which the final probe letter was
not one of the preview letters), and that in the actual experiment, the topmost object traced the straight motion on half the trials.