Perspective Taking Promotes Action Understanding and Learning

Journal of Experimental Psychology:
Human Perception and Performance
2006, Vol. 32, No. 6, 1405–1421
Copyright 2006 by the American Psychological Association
0096-1523/06/$12.00 DOI: 10.1037/0096-1523.32.6.1405
Perspective Taking Promotes Action Understanding and Learning
Sandra C. Lozano, Bridgette Martin Hard, and Barbara Tversky
Stanford University
People often learn actions by watching others. The authors propose and test the hypothesis that
perspective taking promotes encoding a hierarchical representation of an actor’s goals and subgoals—a
key process for observational learning. Observers segmented videos of an object assembly task into
coarse and fine action units. They described what happened in each unit from either the actor’s, their own,
or another observer’s perspective and later performed the assembly task themselves. Participants who
described the task from the actor’s perspective encoded actions more hierarchically during observation
and learned the task better.
Keywords: perspective taking, observational learning, action understanding, intentional inference, hierarchical encoding
One way to learn how to do things is to watch others do them.
A first step in learning by watching is to infer the goals of the
actions being performed. Even small children can infer goals and
use those goals as the basis for imitating others’ behavior (e.g.,
Meltzoff, 1995). Inferring goals becomes more complex for realworld tasks that consist of long, interrelated sequences of actions,
such as making a cake or assembling a piece of furniture. To learn
such tasks, observers need to infer how actions are organized into
goal–subgoal hierarchies (Hard, Lozano, & Tversky, in press;
Whiten, 2002; Zacks, Tversky, & Iyer, 2001).
Here, we explore whether the ability to infer goal–subgoal
organization in action is influenced by perspective taking. We take
our notion of perspective taking from Galinsky, Ku, and Wang
(2005), who defined it as “the process of imagining the world from
another’s vantage point or imagining oneself in another’s shoes”
(p. 110). Put differently, taking another person’s perspective implies establishing overlap between one’s own mental representations and the mental representations of the other person (e.g.,
Davis, Conklin, Smith, & Luce, 1996; Galinsky & Moskowitz,
2000; Vorauer & Cameron, 2002).1 We report a series of studies
that test whether perspective taking promotes understanding of
goal–subgoal organization in observed behavior, thereby promoting observational learning. But first, we assemble the pieces of
evidence underlying our reasoning.
posed into even smaller subgoals (Newell & Simon, 1972). These
plans are instantiated into hierarchically organized behaviors, like
making a bed or even playing a violin (e.g., Lashley, 1951).
Imitative behavior also shows evidence of hierarchical organization: When people and even other primates imitate others’ behavior, they do so in a way that suggests they have encoded that
behavior as a hierarchy of goals and subgoals (e.g., Byrne &
Russon, 1998; Travis, 1997; Whiten, 2002).
In fact, people encode hierarchical organization when observing
behavior in real time, even when they are not intending to learn
that behavior. Evidence for spontaneous hierarchical organization
comes from several sources, notably using a segmentation task, in
which people observe a video of goal-oriented behavior, pressing
a key to indicate when, in their judgment, one action is completed
and the next begins (Newtson, 1973). The action boundaries that
people identify are referred to as breakpoints. For a wide range of
goal-directed behaviors, people reliably segment units of action
corresponding to the completion of goals and subgoals by the actor
(Baldwin, Baird, Saylor, & Clark, 2001; Hard, Zacks, & Tversky,
2006; Newtson, 1973; Zacks, Tversky, & Iyer, 2001). Zacks,
Tversky, and Iyer (2001) found that when asked to segment action
sequences into coarse and fine units on separate viewings, observers select units that are hierarchically nested: the boundaries of
coarse units coincide with the boundaries of fine units well above
chance. When observers are asked to report what happens in each
coarse or fine unit as they segment, in many cases they even give
descriptions of how sets of fine units can be summarized into a
coarse unit (Hard, Lozano, & Tversky, in press).
The nature of these descriptions, combined with the consistency
of hierarchical organization within and across observers, has been
taken to reflect observers’ attempts to understand observed behavior by encoding its hierarchical organization. Other paradigms
(e.g., Hard, Tversky, & Lang, in press; Martin, 2006; Zacks,
Action Is Planned and Encoded as a Hierarchy of Goals
and Subgoals
People plan actions hierarchically according to an overarching
goal that is decomposed into subgoals that are, in turn, decomSandra C. Lozano, Bridgette Martin Hard, and Barbara Tversky, Department of Psychology, Stanford University.
This research was supported by Office of Naval Research Grants
N00014-PP-1-O649, N000140110717, and N000140210534 to Stanford
University. We thank Jane Solovyeva, Herb Clark, Jonathan Winawer, and
Angela Kessell for their helpful comments.
Correspondence concerning this article should be addressed to Sandra
C. Lozano, Department of Psychology, Stanford University, Building
01-420, Jordan Hall, Stanford, CA 94305. E-mail: [email protected]
1
Perspective taking can take many forms, none of which are mutually
exclusive. That is, perspective taking can involve self– other overlap in the
form of emotional, social, mentalistic, behavioral, or motor representations
or all of the above simultaneously.
1405
1406
LOZANO, HARD, AND TVERSKY
Tversky, & Iyer, 2001), including studies of brain activation during passive viewing (Zacks, Braver, et al., 2001), corroborate these
claims. One possible benefit of hierarchical encoding in terms of
goals and subgoals is an action representation with the structure of
an action plan, which in turn facilitates performance of the action
sequence by observers. Supporting this possibility, the degree of
hierarchical organization in segmentation predicts accuracy of
observational learning (Hard, Lozano, & Tversky, in press).2
People Describe Action From the Actor’s Perspective
Although people naturally and frequently describe the world
from their own point of view (e.g., Hart & Moore, 1973; Levelt,
1989; Piaget & Inhelder, 1956; Shelton & McNamara, 1997), there
are notable exceptions. One of these exceptions is in describing
observed actions (Lozano, Hard, & Tversky, in press). For example, in a recent study, observers gave play-by-play reports of an
action sequence performed by an actor who faced them (Hard,
Lozano, & Tversky, in press). Close examination of these reports
revealed that when participants included specific spatial information, such as the locations of objects or which of the actor’s hands
performed an action, it was using the actor’s spatial reference
frame rather than their own. For example, participants were more
likely to say “She puts the block on her left” than “She puts the
block on my right.” This suggests that when observers describe
actions, they seem to put themselves in the actor’s shoes. There is
evidence that observers of action put themselves in the actor’s
shoes, not only in their descriptions of actions but also at the neural
level: Observing action activates many of the same brain mechanisms involved in planning and executing action (e.g., Grafton,
Arbib, Fadiga, & Rizzolatti, 1996; Iacoboni, 2005; Iacoboni et al.,
1999). As some have put it, observing others’ actions produces
motor simulation, in which an individual internally copies those
actions (Fadiga, Craighero, & Olivier, 2005; cf. Rizzolatti, Fadiga,
Fogassi, & Gallese, 1999). These findings have been taken to mean
that a component of understanding others’ actions is mapping them
to actions of the self.
Outside the domain of action understanding, increased self–
other overlap during perspective taking3 can have useful social
consequences, such as increasing feelings of liking, rapport, empathy, and sympathy toward others (e.g., Batson, 1991; Chartrand
& Bargh, 1999; C. M. Cheng & Chartrand, 2003). The self– other
overlap that takes place when people observe actions has been
proposed to promote action understanding, perhaps by facilitating
inferences about the goals and intentions of others (e.g., Arbib &
Rizzolatti, 1996; Rizzolatti & Arbib, 1998) or by generating predictions that guide the perception of ongoing behavior (Wilson &
Knoblich, 2005). As yet, there is little direct evidence supporting
these proposals, however.
There is some recent evidence that perspective taking is related
to action understanding, specifically to hierarchical encoding of
goal–subgoal structure. In a study described earlier, in which
participants provided play-by-play descriptions as they segmented
a video of an object assembly task, participants who spontaneously
described actions from the actor’s perspective both showed more
hierarchical organization in their segmentation and assembled the
object better (Hard, Lozano, & Tversky, in press). These findings
were unexpected and only showed a correlation between perspective taking and hierarchical encoding. Thus, it remains an open
question as to whether perspective taking leads to better action
understanding or whether the relationship works in the opposite
direction.
Testing the Role of Perspective Taking in Action
Understanding and Learning
Taking the actor’s perspective might allow better encoding of
the hierarchical organization of the actor’s intentions (i.e., the
goals and subgoals of the task). By promoting hierarchical encoding, perspective taking should also promote observational learning.
Together, these predictions form the hypothesis investigated here.
In a series of studies, observers segmented a video of an object
assembly task at coarse and fine levels, describing what happened
in each segment as they segmented. In the first study, half of the
observers were instructed to describe the action from their own
perspective, using their own body as a reference frame, and half
from the actor’s perspective, using the actor’s body as a reference
frame. As predicted, those who described actions from the actor’s
perspective encoded actions more hierarchically and learned them
better than those who described actions from their own perspective. A follow-up study confirmed that observers naturally describe
action from an actor’s perspective, leading them to hierarchically
encode and learn actions better than observers instructed to describe from a self perspective. But surprisingly, instructions to
describe from the actor’s perspective enhanced hierarchical encoding and learning above and beyond what people would do naturally. The third study rejected the possibility that taking any
perspective other than one’s own can facilitate action understanding and performance.
Study 1A: Describing From a Self Perspective Versus
From an Actor’s Perspective
The first study tested whether describing actions from an actor’s
perspective improves hierarchical encoding and learning. Participants watched a video of a person assembling an object, a TV cart,
pressing a key to indicate when they thought one action segment
ended and another began. They did this twice, once for the largest
units that made sense and once for the smallest units that made
sense. As they segmented, they described what happened, either
from their own perspective (e.g., “She puts the board on my right”)
or from the perspective of the actor (e.g., “She puts the board on
her left”). After describing and segmenting the video twice, participants were asked to assemble the TV cart. Half of the participants had been told of the assembly task but half had not, so that
effects of task awareness on performance could also be evaluated.
2
Hierarchical encoding was operationalized as the proportion of coarse
unit boundaries that fell after their closest fine unit boundary, a measure
that correlated highly with hierarchical descriptions of the action sequence.
3
It remains an open question whether perspective taking is driven more
by seeing the self in the other or by seeing the other in the self. Regardless
of the directionality of self– other overlap, the downstream consequences
of perspective taking (e.g., liking, rapport, empathy, sympathy, etc.) seem
to be the same (for a discussion, see Galinsky et al., 2005).
PERSPECTIVE AND ACTION UNDERSTANDING
1407
Method
Participants and Design
Forty Stanford University undergraduates participated in exchange for
course credit. A 2 ⫻ 2 ⫻ 2 ⫻ 2 mixed factorial design was used.
Segmentation level (fine, coarse) was varied within participant, and assigned perspective (actor, self), segmentation order (coarse–fine, fine–
coarse), and awareness of the later assembly task (aware, unaware) were
varied between participants.
Stimuli and Materials
All participants viewed two videos, one for practice and one for test. The
practice video, used for practicing the segmentation procedure, was created
by Zacks, Tversky, and Iyer (2001) and showed a woman assembling a
saxophone. The test video involved a woman assembling a TV cart made
by Talon Systems (Ontario, Canada). The TV cart measured 17 in. ⫻ 25 in.
⫻ 21 in. (43.2 cm ⫻ 63.5 cm ⫻ 53.3 cm) in size and consisted of two
sideboards, a lower shelf, an upper shelf, a support board, pegs for
attaching the support board, screws, screwdriver, and wheels (see top of
Figure 1). The actor faced the camera during filming and performed
approximately equal numbers of actions with her left and right hand,
following the script described in Appendix A.
All videos were presented on a 21-in. (53.3-cm) flat-screen computer
monitor. Response times were recorded with a keyboard attached to a
Macintosh G4 computer, using a program written in PsyScope 1.2.5
(Cohen, MacWhinney, Flatt, & Provost, 1993). Verbal descriptions were
recorded with a handheld tape recorder, and TV cart assembly performances were recorded with a digital video camera.
Procedure
At the beginning of the study, participants received a brief introduction
to segmentation. They were told that to make sense of their experiences,
people break them down into events of varying sizes, some small, some
large. Participants were never explicitly informed that large and small
events could be hierarchically related, so that we could observe how and
when hierarchical encoding occurred spontaneously and as a function of
the experimental manipulations. Participants did receive everyday examples of both large and small events, but these events were in no way related
to each other. The exact instructions that participants received are in
Appendix B.
Following this introduction, participants were told that they would see
two videos of a person assembling an object. Their job was to divide these
videos into separate event units by pressing the spacebar every time one
meaningful action ended and another began. Each time they pressed the
spacebar, participants were instructed to briefly describe, from the perspective they had been randomly assigned (actor or self), what happened in
the segment they had just observed. To demonstrate how to do this, we
showed participants a still frame from the practice video that depicted a
woman placing a saxophone on a table. Participants were given examples
of how this picture could be described either from an actor perspective or
self perspective and were then reminded of which of the two perspectives
they should describe.
To practice segmenting and describing actions, participants viewed a
practice video of saxophone assembly and were instructed to mark whatever units felt natural and meaningful to them. Participants then viewed and
segmented the test video, which was 6 min 35 s long and showed an actor
assembling the TV cart in the same room where participants were being
tested. Half of the participants were instructed to indicate the smallest units
that seemed natural and meaningful to them (fine– coarse); the other half
were instructed to indicate the largest units that seemed natural and
meaningful (coarse–fine). Participants then segmented the test video a
second time according to the opposite unit-size instructions. Viewing the
Figure 1. Still frames from the object assembly videos used in Studies 1A
and 1B (top) and Study 2 (bottom). The top still frame shows the actor with
the fully assembled TV cart. The bottom still frame shows the observer
(right) and actor (left) with the fully assembled horses and heart.
videos twice was necessary to create a measure of hierarchical encoding
(Hard, Lozano, & Tversky, in press; Hard, Tversky, & Lang, in press;
Zacks, Tversky, & Iyer, 2001). For both viewings, participants were
reminded to describe all action units they indicated in terms of their
assigned perspective. Participants in the aware condition were also told
that after segmenting and describing the test video twice, they would have
to assemble a TV cart themselves. The experimenter was not present during
the segmentation task.
After performing the segmentation task, participants were presented with
all assembly materials needed to build the TV cart. The assembly materials
were placed in a central, neutral position on the same table used by the
actor in the test video. Participants were instructed that they could perform
the task however they wanted; their task was simply to assemble the TV
cart as quickly and as accurately as possible. Participants received no
further instructions, and any suggestion of which perspective to use during
assembly was explicitly avoided. Their performance was videotaped from
the same visual angle as the test video, and the experimenter was not
present during assembly.
LOZANO, HARD, AND TVERSKY
1408
1
Overview of Analyses
For all results in the present and subsequent studies, dependent
measures were submitted to a factorial analysis of variance
(ANOVA), with all independent variables (e.g., assigned perspective, awareness of the later assembly task, segmentation order) as
factors, unless otherwise indicated. For all analyses, an alpha level
of less than .05 was used as the criterion for significance. Nonsignificant effects are noted with a marking of ns. As estimates of
effect size, a partial eta squared value (␩p2) is reported for significant ANOVA effects, and a Cohen’s d is reported for significant
t test effects.
Enclosure Score
Results
0.8
0.6
0.4
0.2
0
Actor
4
Assembly Errors
Does Perspective Affect Hierarchical Encoding?
Hierarchical encoding was evaluated in two ways. The first
measure, enclosure, assessed the hierarchical organization of participants’ segmentation pattern, using a technique developed by
Hard, Lozano, and Tversky (in press). Enclosure is defined as the
proportion of coarse breakpoints that fall after their closest fine
breakpoint in time. When this proportion is high, then most of the
coarse breakpoints take account of or enclose all of the relevant
fine units. If a coarse breakpoint precedes the breakpoint of the
final fine segment, it violates strict hierarchical encoding. In the
present study, most but not all of the coarse breakpoints in fact fell
after the closest fine breakpoint (M ⫽ 5.33, SEM ⫽ 0.97), instead
of before it (M ⫽ 3.07, SEM ⫽ 0.76), paired t(39) ⫽ 4.07, d ⫽
0.66, replicating previous findings by Hard, Lozano, and Tversky
(in press). Further rationale behind enclosure scores is provided in
Appendix C along with a more detailed explanation of how they
are calculated.
The second measure of hierarchical encoding assessed participants’ descriptions for hierarchical structure. This second measure
provided a more transparent assessment of hierarchical encoding,
and it was also used to validate the enclosure measure. During fine
segmentation, although observers were instructed to identify finelevel actions, some of them offered summaries that grouped the
preceding set of fine-level actions (e.g., “She inserted the first
screw,” “She inserted the second screw,” etc.) into a coarse-level
action (e.g., “She attached the top shelf”). The number of verbal
summaries each participant gave during fine segmentation was
used as a verbal measure of hierarchical encoding. Replicating
findings of Hard, Lozano, and Tversky (in press), the number of
verbal summaries correlated positively with enclosure scores,
r(38) ⫽ .41, validating enclosure scores as a measure of hierarchical encoding.
By both measures of hierarchical encoding, describing from an
actor perspective was superior to describing from a self perspective. As the top of Figure 2 shows, participants who described from
an actor perspective had higher enclosure scores than self perspective participants, F(1, 32) ⫽ 11.53, MSE ⫽ 0.28, ␩2p ⫽ .40, and
they used more verbal summaries (Ms ⫽ 2.45–1.00, SEMs ⫽
0.38 – 0.31), F(1, 32) ⫽ 10.92, MSE ⫽ 0.28, ␩2p ⫽ .33.
Awareness of the later assembly task had no effect on hierarchical encoding and no interactions with other variables. Segmentation order did affect hierarchical encoding, however. Replicating
findings from Hard, Lozano, and Tversky (in press), participants
who segmented in coarse–fine order had higher enclosure scores
Self
Assigned Perspective
0
Actor
Self
Assigned Perspective
Figure 2. Mean enclosure scores (top) and number of assembly errors
(bottom), as a function of assigned perspective, in Study 1A. Error bars
represent standard errors of means.
(M ⫽ 0.71, SEM ⫽ 0.05) than did participants who segmented in
fine– coarse order (M ⫽ 0.59, SEM ⫽ 0.03), F(1, 32) ⫽ 4.77,
MSE ⫽ 0.28, ␩2p ⫽ .15. The influence of segmentation order on
verbal summaries was less clear. There was a significant interaction between segmentation order and perspective instructions, F(1,
32) ⫽ 10.92, ␩2p ⫽ .25. Participants using an actor perspective
summarized more often if they segmented in coarse–fine order
(M ⫽ 3.30, SEM ⫽ 0.30) than in fine– coarse order (M ⫽ 1.60,
SEM ⫽ 0.31), but participants using a self perspective summarized
more often if they segmented in fine– coarse order (M ⫽ 1.60,
SEM ⫽ 0.20) rather than in coarse–fine order (M ⫽ 0.40, SEM ⫽
0.10).
Does Perspective Affect Learning?
Videotapes of assembly performance were coded for errors and
assembly time. Errors were counted even if the participant later
corrected them. Errors could take three forms: attaching pieces in
the wrong order (e.g., building the entire TV cart before trying to
insert the middle support board), attaching pieces that should not
be connected to each other (e.g., attaching wheels to the top shelf),
or attaching a piece in the wrong orientation (e.g., attaching the top
shelf upside down). On average, participants made 2.20 errors
(SEM ⫽ 0.22) and completed the TV cart assembly in 10 min
(SEM ⫽ 35.33 s). Assembly errors positively correlated with
assembly time, r(38) ⫽ .58, suggesting that participants did not
sacrifice accuracy for speed.
PERSPECTIVE AND ACTION UNDERSTANDING
If hierarchical encoding facilitates learning, then measures of
hierarchical encoding (enclosure scores and verbal summaries)
should predict learning. Confirming this prediction, enclosure
scores and verbal summaries predicted fewer assembly errors,
rs(38) ⫽ –.50 and –.72, respectively. If hierarchical encoding
facilitates learning, then factors that improve hierarchical encoding
should also improve learning. Confirming this prediction, participants who described the assembly video from an actor perspective
not only had better hierarchical encoding but also made half as
many assembly errors, as the bottom of Figure 2 shows, F(1, 32) ⫽
16.65, MSE ⫽ 0.34, ␩2p ⫽ .43.
Similarly, segmenting in coarse–fine order (M ⫽ 1.95, SEM ⫽
0.38) improved hierarchical encoding and led to fewer assembly
errors than segmenting in fine– coarse order (M ⫽ 2.50, SEM ⫽
0.38), F(1, 32) ⫽ 4.00, MSE ⫽ 0.34, ␩2p ⫽ .19. However, segmentation order interacted with perspective, F(1, 32) ⫽ 7.15, ␩2p ⫽
.22. When participants used an actor perspective, assembly performance did not depend on segmentation order, coarse–fine (M ⫽
1.70, SEM ⫽ 0.30) or fine– coarse (M ⫽ 1.30, SEM ⫽ 0.21). When
participants used a self perspective, assembly performance was
better for those who segmented in coarse–fine (M ⫽ 2.20, SEM ⫽
0.35) rather than fine– coarse (M ⫽ 3.70, SEM ⫽ 0.44) order.
Does Hierarchical Encoding Mediate Effects of
Perspective on Learning?
Assigned perspective affected measures of hierarchical encoding and also later assembly performance. Did perspective affect
these two variables independently or did hierarchical encoding
mediate the effect of perspective on learning? To address this
question, we performed a mediation analysis using the mediation
techniques of Baron and Kenny (1986), shown in Figure 3.
According to Baron and Kenny (1986) and Kenny and Judd
(1986), several preconditions must be met to establish mediation.
First, the initial variable (assigned perspective) must predict both
the potential mediator (hierarchical encoding, as measured by
enclosure scores) and the outcome variable (assembly errors). As
described in the top of Figure 4, a linear regression analysis
confirmed that assigned perspective4 predicted hierarchical encoding, t(1) ⫽ 15.17, and assembly errors, t(1) ⫽ –5.39. Second, the
potential mediator (enclosure) must predict the outcome variable
(assembly errors), even when controlling for the initial variable
Figure 3. Baron and Kenny’s (1986) mediation technique. Standardized
path coefficients are represented by a, b, c, and c⬘, where a represents the
association between the independent variable (IV) and the mediator, b
represents the association between the mediator and the dependent variable
(DV; when the IV is also a predictor of the DV), c represents the association between the IV and the DV, and c⬘ represents the association between
the IV and the DV when controlling for the mediator.
1409
Figure 4. The mediation analyses testing whether hierarchical encoding
mediated the relationship between assigned perspective and assembly
errors. Values have been substituted for the corresponding variables described in Figure 3. Top: Mediation analysis for Study 1A. Bottom:
Mediation analysis for Study 2.
(assigned perspective). Indeed, hierarchical encoding predicted
assembly errors when controlling for perspective, t(1) ⫽ –2.52. On
the basis of these correlations, a Sobel test (Sobel, 1982) showed
that mediation was significant (z ⫽ –2.48). Finally, to determine
whether hierarchical encoding completely mediated the effect of
perspective on assembly errors, we must show that the initial
variable (assigned perspective) no longer predicted the outcome
variable (assembly errors) when controlling for the mediator (enclosure). When we controlled for hierarchical encoding, the effect
of assigned perspective on assembly errors was no longer significant, t(1) ⫽ – 0.24, ns. Thus, hierarchical encoding fully mediated
the effects of assigned perspective on assembly errors.
Does Perspective at Encoding Affect Perspective During
Assembly?
All participants performed the assembly task in the same room
and at the same table where the actor was filmed. Thus, videotapes
of assembly performances could be coded for the spatial perspective participants adopted during assembly (see Figure 5 for an
example). Most participants adopted the same perspective during
assembly that they had described during segmentation: 95% of
participants who had described from an actor perspective assembled the TV cart by taking the actor’s perspective, meaning that
they built the TV cart in the same orientation and stood on the
same side of the table as the actor. Of the participants who had
described the video from a self perspective, 75% assembled the TV
cart from a self perspective, meaning that they oriented the TV cart
pieces in the opposite direction as the actor and stood on the
4
In Study 1A, assigned perspective was dummy coded: self perspective ⫽ 0, actor perspective ⫽ 1.
1410
LOZANO, HARD, AND TVERSKY
Figure 5. Illustration of the actor from the video (top), participants assembling from the actor’s perspective
(middle), and participants assembling from a self perspective (bottom).
opposite (observer) side of the table, ␹2(1, N ⫽ 20) ⫽ 23.41. This
result confirms a link between assembly perspective and the way
that participants perceived and encoded observed actions. Neither
segmentation order nor awareness of the later assembly task had
reliable effects on participants’ later assembly perspective.
A fourth of self perspective participants performed the assembly
task from an actor perspective. This raises the possibility that self
perspective participants performed the assembly performance
poorly not because they encoded a self perspective per se but
because some of them lacked the insight to perform the assembly
task from the same perspective they encoded. As a first way of
addressing this concern, we compared the assembly performances
of self perspective participants who chose to assemble from an
actor perspective with those who chose to assemble from a self
perspective. If incompatibility between action encoding and later
action execution accounted for the poor assembly performance in
the self perspective condition, then those who assembled from an
incompatible perspective should have made more assembly errors.
Surprisingly, participants who described from a self perspective
but performed assembly from an actor perspective (M ⫽ 1.67,
SEM ⫽ 1.20) outperformed those who maintained a self perspective (M ⫽ 2.95, SEM ⫽ 0.33), making slightly but not reliably
fewer assembly errors, t(19) ⫽ 1.60, ns.
As a second way of addressing this concern, we reanalyzed the
assembly error data and excluded participants in the self perspective condition who chose an actor perspective for assembly as well
as the 1 actor perspective participant who performed assembly
from a self perspective. Even with these participants excluded,
participants who described from an actor perspective made fewer
assembly errors (M ⫽ 1.44, SEM ⫽ 0.20) than those who described from a self perspective (M ⫽ 3.33, SEM ⫽ 0.46), F(1,
25) ⫽ 24.50, MSE ⫽ 27.52, ␩2p ⫽ .50. Combined, these analyses
indicate that differences in assembly performance were not due to
participants in the self perspective condition choosing an incompatible perspective at assembly.
Why Is Describing From an Actor Perspective Beneficial?
The above analyses show that describing actions from the actor’s perspective instead of one’s own improves hierarchical encoding and imitative learning. How might encoding actions from
the actor’s perspective serve action understanding? Are there other
differences between actor perspective and self perspective describers that provide clues? In this section, we explore some
possibilities.
PERSPECTIVE AND ACTION UNDERSTANDING
Did perspective affect attention to action in general? Perhaps
adopting the actor’s perspective facilitates hierarchical encoding
and learning because it simply focuses attention on action more
generally. To address this possibility, we compared verbal descriptions of actor perspective and self perspective describers for differences in the overall type of information encoded. This analysis
showed that descriptions in the two conditions were remarkably
similar.
Each participant’s descriptions were broken down into separate
clauses, and each clause was classified into three mutually exclusive categories: action statements, depiction statements, or comments. Action statements contained action verbs and described
movements of the actor or objects (e.g., “She moved to the left side
of the table” or “She inserted the screw”). Depiction statements
contained verbs of possession or state-of-being verbs and conveyed physical or structural characteristics of objects (e.g., “The
cart has four wheels” or “The screws are silver”). Comments were
any statements that were unrelated to the video itself (e.g., “Oops,
I forgot to hit the spacebar for that last action I described”). For
each participant, the percentage of statements of each type was
calculated, and participants in the two perspective conditions were
compared. On average, 84% of participants’ descriptions were
action statements, 9% were depiction statements, and 7% were
comments. Actor perspective and self perspective participants did
not differ in the mean percentages of action, depiction, or comment
statements that they made (from all three categories, highest t[38]
⫽ 1.10, ns). Thus, the content of descriptions did not differ as a
function of perspective.
Did perspective affect number of segmented units? To determine whether perspective instructions influenced how coarsely or
finely participants segmented the action sequence, we examined
the numbers of fine and coarse units that they identified. During
segmentation, participants identified approximately three times as
many fine units (M ⫽ 29.90, SEM ⫽ 2.84) as coarse units (M ⫽
8.90, SEM ⫽ 1.88), paired t(39) ⫽ 10.54, d ⫽ 1.76. This ratio of
coarse to fine units is consistent with previous findings using
everyday activities (Zacks, Tversky, & Iyer, 2001) as well as
abstract action sequences (Hard, Tversky, & Lang, in press).
Perspective did not affect how many coarse units participants
identified, but it did influence the number of fine units. Participants who described from an actor perspective identified more fine
units (M ⫽ 35.10, SEM ⫽ 2.67) than participants who described
from a self perspective (M ⫽ 24.70, SEM ⫽ 2.67), F(1, 32) ⫽
4.30, MSE ⫽ 0.17, ␩2p ⫽ .12. Participants who segmented in
fine– coarse order identified more fine units (M ⫽ 36.30, SEM ⫽
3.55) than participants who segmented in coarse–fine order (M ⫽
23.50, SEM ⫽ 3.55), F(1, 32) ⫽ 6.52, ␩2p ⫽ .17. No other effects
or interactions on the number of action units segmented were
significant.
Can these differences in number of segmented fine units explain
why describing from an actor perspective led to better hierarchical
encoding and learning? Perhaps describing from an actor perspective is a more stimulating task than describing from a self perspective, leading to more attention and thus to encoding a finer level of
detail? The results do not support this hypothesis, because numbers
of segmented fine units did not predict enclosure scores or assembly errors. Numbers of segmented coarse units and the ratio of
coarse to fine units were equally nonpredictive.
1411
Did perspective affect encoding of spatial information? Did
perspective instructions affect how often participants described
spatial information or the kind of spatial relations they attended to?
To answer this question, we broke down participants’ descriptions
into separate clauses. Each clause was categorized as containing
perspective-relevant information from one of four mutually exclusive categories: no perspective (e.g., “She inserted the screw”),
neutral perspective (e.g., “She inserted the screw from the side”),
actor perspective (e.g., “She inserted the screw on her left”), or self
perspective (e.g., “She inserted the screw on my right”). The total
number of descriptions in each category was determined for each
participant.5 These analyses yielded several key insights.
First, participants followed instructions: Self perspective descriptions were more common in the self perspective condition
(M ⫽ 9.85 descriptions, SEM ⫽ 1.35) than in the actor perspective
condition (M ⫽ 0.20, SEM ⫽ 0.20), t(38) ⫽ 7.07, d ⫽ 2.26. In
contrast, actor perspective descriptions were more common in the
actor perspective condition (M ⫽ 14.45, SEM ⫽ 1.82) than in the
self perspective condition (M ⫽ 1.70, SEM ⫽ 0.45), t(38) ⫽ 6.79,
d ⫽ 2.15. The top of Figure 6 illustrates these differences in
proportions of descriptions. Second, regardless of the perspective
participants were assigned to describe, they were equally likely to
describe spatial information. This was determined by comparing
the two groups on the mean number of statements describing any
spatial perspective, be it neutral, actor, or self (for self perspective
describers, M ⫽ 17.65, SEM ⫽ 1.94; for actor perspective describers, M ⫽ 23.00, SEM ⫽ 2.86), t(38) ⫽ –1.55, ns. This result
suggests that any differences in the two groups were due to the
specific perspective that participants encoded, not to how much
attention participants were paying to spatial information in general.
Third, as Figure 6 shows, participants in the self perspective
condition were more likely to describe from the wrong (i.e.,
unassigned) perspective than participants in the actor perspective
condition. Although it was common for self perspective participants to incorrectly use the actor’s perspective at least once (M ⫽
1.44, SEM ⫽ 0.41), actor perspective participants almost never
used a self perspective (M ⫽ 0.20, SEM ⫽ 0.13), t(38) ⫽ 2.26, d ⫽
3.67. Self perspective describers were also more likely to qualify
their perspective (M ⫽ 2.11, SEM ⫽ 0.72) than actor perspective
describers (M ⫽ 0.10, SEM ⫽ 0.10), t(38) ⫽ 2.89, d ⫽ 5.00. That
is, they were more likely to state the perspective they were using
(e.g., “She inserted the screw on the right, but that’s only if it’s my
right that we’re talking about”). Combined, these results suggest
that describing action from a self perspective might actually be
more difficult than describing action from the actor’s perspective.
Could the difficulty of describing action from a self perspective
explain why self perspective describers had difficulty hierarchically encoding and learning actions? Perhaps describing from a
self perspective was such a demanding task that it interfered with
action processing? Additionally, self perspective describers gave
5
When participants switched perspectives within a description (e.g.,
“She put a block on the left. I mean the right”), only the final utterance
(e.g., “the right”) was used to determine the perspective coding for that
description. This rule was applied so that participants who made perspective errors would not appear to have an inflated number of perspective
descriptions. Descriptions of this type were coded as a perspective error,
however.
LOZANO, HARD, AND TVERSKY
Proportion of References
1412
1
0.8
None
Neutral
Self
Actor
0.6
0.4
0.2
0
Actor
Self
Proportion of References
Assigned Perspective
1
0.8
None
Neutral
Observer
Actor
0.6
0.4
0.2
chical encoding? In fact, references to hands fully mediated effects
of perspective on enclosure scores (see the top of Figure 7). First,
a linear regression analysis confirmed that assigned perspective
predicted both enclosure scores, t(1) ⫽ 4.64, and hand references,
t(1) ⫽ 3.77. Second, hand references predicted hierarchical encoding when controlling for assigned perspective, t(1) ⫽ 5.51. According to a Sobel test, mediation was significant (z ⫽ 3.12).
Finally, controlling for hand references, the effect of assigned
perspective on hierarchical encoding was no longer significant,
t(1) ⫽ 0.81, ns.
In sum, these results indicate two potentially revealing differences between actor perspective and self perspective describers.
First, describing actions from an actor perspective appears to be
easier than describing actions from a self perspective. Second,
describing from an actor perspective focuses observers’ attention
on the actor’s body. This increased attention on the actor’s body
appears to be beneficial, as it accounts for the fact that actor
perspective describers had higher hierarchical encoding than self
perspective describers.
Discussion
0
Actor
Observer
Assigned Perspective
Figure 6. Mean proportion of description references made from each of
the four spatial perspective categories, as a function of assigned perspective, for Study 1A (top) and Study 2 (bottom).
less consistent descriptions from their assigned perspective. Perhaps this led to “mixed-up” representation of the spatial relations
in the task. The data suggest against both of these possibilities.
Mistakes in description perspective were uncorrelated with hierarchical encoding and with later assembly performance, highest
r(38) ⫽ .10, ns.
Self perspective and actor perspective participants also differed
in the kind of spatial information they were likely to describe.
Participants were far more likely to describe which of the actor’s
hands, left or right, was used to perform a given action if they
described from the actor’s perspective (M ⫽ 1.35, SEM ⫽ 0.21)
instead of their own (M ⫽ 0.25, SEM ⫽ 0.20), t(38) ⫽ –3.77, d ⫽
1.21. Actor perspective (M ⫽ 13.30, SEM ⫽ 1.92) and self
perspective (M ⫽ 11.30, SEM ⫽ 1.54) describers did not differ in
their likelihood of describing locations in space, as in left or right
sides of the table, t(38) ⫽ – 0.81, ns.
It is important to note that the number of descriptions concerning which of the actor’s hands performed an action predicted better
hierarchical encoding, as measured by enclosure, r(38) ⫽ .35, and
by verbal summaries, r(38) ⫽ .42. References to the actor’s hands
also predicted fewer later assembly errors, r(38) ⫽ –.47. The
number of descriptions concerning locations on the table predicted
neither measure of hierarchical encoding, for enclosure, r(38) ⫽
.01, ns, and for verbal summaries, r(38) ⫽ –.02, ns, nor assembly
errors, r(38) ⫽ –.15, ns. Given that actor perspective and self
perspective describers differed both in their tendency to describe
hands and in hierarchical encoding and that descriptions of hands
correlated with hierarchical encoding, did differences in the tendency to describe hands mediate effects of perspective on hierar-
The results from the present study support the hypothesis that
adopting the actor’s perspective facilitates both action understanding and action learning. In the present study, participants who
described actions from the actor’s perspective, rather than from
their own perspective, encoded the action sequence more hierarchically and later performed the action sequence faster and with
fewer errors. This effect held whether participants were learning
the actions intentionally or incidentally. Furthermore, adopting the
actor’s perspective was beneficial to action learning precisely
because it encouraged observers to encode actions hierarchically:
Figure 7. Mediation analysis showing that references to the actor’s hands
mediated effects of assigned perspective on hierarchical encoding, as
measured by enclosure scores. Values have been substituted for the corresponding variables described in Figure 3. Top: Mediation analysis for
Study 1A. Bottom: Mediation analysis for Study 2.
PERSPECTIVE AND ACTION UNDERSTANDING
Hierarchical encoding mediated effects of description perspective
on action learning.
Notably, describing from the actor’s perspective elicited detailed descriptions of the actor’s body, and, in particular, of which
hand the actor was using to perform the actions. The number of
descriptions of the actor’s hands predicted hierarchical encoding
and, according to a mediation analysis, accounted for why actor
perspective and self perspective describers differed in hierarchical
encoding. This finding suggests that focusing on the hands that
accomplish the task is associated with better encoding of the task’s
goal–subgoal structure. In sum, actively describing action from the
actor’s perspective provides an effective link from action perception to action execution, far more effective than describing from
one’s own perspective.
Remarkably, adopting the actor’s perspective appeared to be the
more natural way to describe action. Self perspective describers
were more likely than actor perspective describers to describe the
wrong perspective by mistake and to qualify the perspective they
were describing. Although these description mistakes and qualifications were uncorrelated with hierarchical encoding and action
learning, it remains possible that describing from a self perspective
is unnatural and therefore interferes with action understanding.
Alternatively, describing from an actor perspective might enhance
it. These possibilities are not mutually exclusive— both might be
true. Study 1B addressed these possibilities by comparing participants in the present study with participants who were not instructed to adopt a perspective. If describing from a self perspective interferes with action understanding, then explicitly describing
from a self perspective should lead to worse hierarchical encoding
and learning than describing freely. If describing from an actor
perspective enhances action understanding, then describing from
an actor perspective should lead to better hierarchical encoding
and learning than describing freely.
Study 1B: Describing Freely Versus Describing From a
Self Perspective or an Actor’s Perspective
Method
Ten Stanford undergraduates from the same population of introductory
psychology students used in Study 1A participated in exchange for course
credit. The stimuli, materials, and procedure were identical to those used in
Study 1A, except that participants were not given instructions to describe
1413
from any spatial perspective, nor were they given examples of the different
perspectives one might use to describe actions. Because awareness of the
later assembly task had no effect on performance in Study 1A, all participants in the present group were unaware that they would be performing the
assembly task themselves. Study 1B participants were run within 2 weeks
after completion of Study 1A. The 10 participants run in the present
condition were then compared with the 20 unaware participants from Study
1A, yielding a 3 ⫻ 2 factorial design, with assigned perspective (actor, self,
free) and segmentation order (coarse–fine, fine– coarse) as betweensubjects factors.
Results
Segmentation order did not affect any of the dependent measures reported below, nor did it interact with any other factors.
Thus, all data were collapsed across segmentation order. When an
effect of assigned perspective was reliable, post hoc analyses using
Dunnett’s (two-sided) t tests compared the actor perspective and
self perspective conditions with the free-describe condition. A
summary of the findings, including means and standard errors, is
reported in Table 1.
Differences in Number of Segmented Units
During segmentation, participants identified approximately four
times as many fine units (M ⫽ 29.53, SEM ⫽ 2.60) as coarse units
(M ⫽ 7.67, SEM ⫽ 0.67), paired t(29) ⫽ 11.55, d ⫽ 1.86.
Description perspective did not affect how many coarse units or
fine units participants identified, both Fs(2, 27) ⬍ 1, ns.
Differences in Hierarchical Encoding
How do self perspective and actor perspective describers compare with free describers in hierarchical encoding? An ANOVA,
with assigned perspective as a between-subjects factor, revealed a
reliable difference among the three conditions in hierarchical encoding, as indexed by both enclosure scores, F(2, 27) ⫽ 21.47,
MSE ⫽ 0.02, ␩2p ⫽ .61, and verbal summaries, F(2, 27) ⫽ 29.24,
MSE ⫽ 0.77, ␩2p ⫽ .68. Self perspective participants showed
impaired hierarchical encoding compared with free-describe participants, with reliably lower enclosure scores and fewer verbal
summaries. Actor perspective participants showed enhanced hierarchical encoding compared with free-describe participants, with
reliably higher enclosure scores and more verbal summaries.
Table 1
Effects of Description Perspective on Dependent Measures in Study 1B
Self perspective
Dependent variable
Enclosure
Summary statements
Assembly errors
Assembly perspective
Total fine units
Total coarse units
Actor-perspective statements
Self perspective statements
Left–right-side references
Left–right-hand references
M
SE
0.41
0.04
0.40
0.22
4.10
0.46
20% actor, 80% self
22.40
5.96
6.50
1.15
2.00
0.65
8.60
0.64
13.20
1.20
0.25
0.10
Free describe
M
SE
0.59
0.05
1.70
0.26
2.40
0.16
80% actor, 20% self
32.30
6.26
7.00
1.03
2.40
0.40
0.40
0.16
12.00
1.72
0.20
0.10
Actor perspective
M
SE
0.85
0.05
3.40
0.34
0.90
0.27
100% actor, 0% self
33.90
8.01
7.70
1.35
15.00
2.40
0.00
0.00
11.50
1.65
1.48
0.22
1414
LOZANO, HARD, AND TVERSKY
Differences in Learning
Assigned perspective reliably affected the number of assembly
errors participants made, F(2, 27) ⫽ 24.54, MSE ⫽ 1.04, ␩2p ⫽ .65.
For action learning, as for hierarchical encoding, self perspective
describers showed impaired performance compared with free describers, and actor perspective describers showed enhanced performance. Self perspective participants made reliably more errors
than free-describe participants, and actor perspective participants
made reliably fewer errors than free-describe participants. Across
the three conditions, participants made an average of 2.47 errors
(SEM ⫽ 0.30) and completed the TV cart assembly in 10.25 min
(SEM ⫽ 4.20 s). Assembly errors positively correlated with assembly time, r(28) ⫽ .72, suggesting that participants did not
sacrifice accuracy for speed.
Differences in Description and Assembly Perspectives
Whose perspective did free describers take when describing the
assembly task and when performing the assembly task themselves?
As in Study 1A, participants’ descriptions were divided into four
mutually exclusive categories: no perspective, neutral perspective,
actor perspective, or self perspective. Within the free-describe
condition, participants described more often from an actor perspective than a self perspective (M ⫽ 2.40, SEM ⫽ 0.40 for
number of actor perspective statements vs. M ⫽ 0.40, SEM ⫽ 0.16
for number of self perspective statements), paired t(9) ⫽ – 6.00,
d ⫽ 1.67, replicating findings by Hard, Lozano, and Tversky (in
press). Also, the more actor perspective statements that freedescribe participants gave, the more hierarchically they encoded
observed actions, as indexed both by enclosure scores, r(8) ⫽ .71,
and numbers of verbal summaries provided, r(8) ⫽ .54. Adopting
the actor’s perspective was not only more frequent in descriptions
of actions, it was more frequent in assembly performance: Free
describers were more likely to perform the assembly task from the
actor’s perspective (80%) rather than their own (20%).
How often did free describers describe from actor perspectives
and self perspectives compared with participants assigned to those
perspectives? Assigned perspective reliably affected the number of
actor perspective descriptions participants gave, F(2, 27) ⫽ 23.92,
MSE ⫽ 0.13, ␩2p ⫽ .67 (see Table 1). Although free describers
tended to adopt the actor’s perspective, they did so reliably less
often than actor perspective describers and equally as often as self
perspective describers. Free describers were, however, less likely
to describe from a self perspective than self perspective describers,
t(18) ⫽ 4.62, d ⫽ 2.52. Actor perspective describers never adopted
a self perspective.6
Analysis of the kind of spatial information participants described showed no differences across the three groups in how often
they described locations in space, as in left or right sides of the
table, F(2, 27) ⫽ 1.02, ns. The three groups did differ in how often
they described which of the actor’s hands, left or right, was used
to perform a given action, F(2, 27) ⫽ 13.90, MSE ⫽ 1.05, ␩2p ⫽ .30
(see Table 1). Actor perspective describers made more hand references than did free describers, but free describers did not differ
from self perspective participants in how many hand references
they made.
Discussion
The present study confirms that observers naturally describe
actions from the actor’s perspective and that their tendency to do
so predicts hierarchical encoding and learning. Furthermore, instructing participants to describe from their own perspective led to
poorer hierarchical encoding and learning than instructing participants to describe freely. Describing from a self perspective might
interfere with hierarchical encoding and learning because it is
incompatible with the way action is naturally described: Forcing
observers to describe in an unnatural way might be difficult and
thus compete with other processes, such as hierarchical encoding.
It is also possible that describing from a self perspective is incompatible with the way action is naturally understood. We explore
this possibility further in the General Discussion.
Although observers spontaneously adopted the actor’s spatial
perspective to describe actions, they showed enhanced hierarchical
encoding and learning when they were explicitly instructed to
adopt the actor’s perspective. This result supports the hypothesis
that adopting the actor’s perspective serves action understanding,
specifically inferences about goal–subgoal structure. This result
also has practical implications: Calling observers’ attention to the
actor’s perspective is a useful means of improving action understanding and learning.
This leads to a larger question: Why is adopting the actor’s
perspective beneficial? One possibility, which we return to in the
General Discussion, is that adopting the actor’s perspective encourages observers to simulate observed action—this increased
simulation helps them infer how actions are organized. But it is
also possible that differences in attention or motivation could
explain the superior performance of actor perspective describers
relative to free perspective and self perspective describers. Describing from the actor’s perspective might be more engaging than
freely describing or than describing from one’s own perspective,
leading to fewer description errors and to richer encoding of the
observed task. In other words, it may be that describing from any
perspective other than one’s own, not necessarily from the actor’s
perspective, would improve hierarchical encoding and learning.
Study 2 examined whether action understanding and learning
are improved by adopting any perspective other than one’s own or
by adopting the actor’s perspective specifically. Study 2 addressed
this question by showing participants a video portraying an actor
and an observer, both of whom were rotated 90° from the participant viewing the video. Participants described actions from the
perspective of either the actor or the observer in the video and later
executed the action sequence themselves. If adopting any perspective other than one’s own is sufficient to improve hierarchical
encoding and learning, then the two groups should perform equivalently. A second aim of Study 2 was to generalize several findings
from Studies 1A and 1B to a different assembly task.
6
The mean number of self perspective descriptions for actor perspective
describers was not submitted to an ANOVA because it had no variance and
therefore violated the normality assumption.
PERSPECTIVE AND ACTION UNDERSTANDING
1
Participants and Design
Sixteen Stanford University undergraduates participated in exchange for
course credit. A 2 ⫻ 2 ⫻ 2 ⫻ 2 mixed factorial design was used.
Segmentation level (fine, coarse) was varied within participant, and assigned perspective (actor, observer), segmentation order (coarse–fine, fine–
coarse), and actor position (left or right) were varied between participants.
Enclosure Score
Study 2: Describing From the Perspective of the Actor
Versus From the Perspective of an Observer in the Scene
Method
1415
0.8
0.6
0.4
0.2
0
Actor
Obser ver
Assigned Perspective
Stimuli and Materials
16
Assembly Errors
As in Studies 1A and 1B, participants viewed one practice video and one
test video. The practice video showed a female observer watching a female
actor make coffee. The test video contained the same observer and actor
but showed the actor assembling two horses and a heart using red, yellow,
green, and blue DUPLO blocks made by LEGO (Enfield, CT; see Appendix A for a detailed script of assembly). The test video was 3 min 28 s long.
In both videos, the observer and actor were 180° opposite each other and
at a 90° angle from the camera (see the bottom of Figure 1). Two versions
of the test video were created, each shown to half of the participants: The
actor was on the left side of the table in one video and on the right side of
the table in the other.
12
8
4
0
Actor
O b se r v e r
Assigned Perspective
Procedure
Prior to testing, participants completed the Vandenberg Mental Rotation
Test, a measure of spatial ability (Vandenberg & Kuse, 1978). Aside from
this test, the procedure was identical to that of Study 1A, except that when
describing each video, participants were instructed to adopt a perspective
that was offset by 90° from their viewing perspective (see Appendix B for
the exact instructions that participants received). Participants were randomly assigned to a perspective, actor or observer, and were instructed to
describe all units that they segmented from that perspective. After performing the segmentation task, participants received the same instructions
for the assembly task used in Studies 1A and 1B.
Results and Discussion
Does Perspective Affect Hierarchical Encoding?
As in Studies 1A and 1B, hierarchical encoding was evaluated
by both segmentation patterns (enclosure scores) and descriptions
(verbal summaries). As before, these measures were correlated,
r(14) ⫽ .57. As the top of Figure 8 shows, participants describing
from an actor perspective (M ⫽ 1.63, SEM ⫽ 0.32) encoded
actions more hierarchically than participants describing from an
observer perspective (M ⫽ 0.13, SEM ⫽ 0.13), according to
enclosure scores, F(1, 8) ⫽ 8.31, MSE ⫽ 0.17, ␩2p ⫽ .38, and to
verbal summaries, F(1, 8) ⫽ 19.64, MSE ⫽ 0.17, ␩2p ⫽ .44. No
other effects or interactions were reliable. Thus, it is adopting the
actor’s perspective specifically, and not any other perspective, that
enhances hierarchical encoding.
Figure 8. Mean enclosure scores (top) and number of assembly errors
(bottom), as a function of assigned perspective, in Study 2. Error bars
represent standard errors of means.
(SEM ⫽ 105.00 s). Assembly errors positively correlated with
assembly time, r(14) ⫽ .60, suggesting that there was no speed–
accuracy trade-off in performance.
Consistent with findings from Study 1A, participants who described the actions from an actor perspective performed the task
better than those who described it from an observer perspective. As
the bottom of Figure 8 shows, participants who described from an
observer perspective made about four times as many assembly
errors as those who described from an actor perspective, F(1, 8) ⫽
13.96, MSE ⫽ 0.22, ␩2p ⫽ .48. Participants who segmented in
fine– coarse order made over twice as many errors (M ⫽ 11.13,
SEM ⫽ 2.83) as participants who segmented in coarse–fine order
(M ⫽ 4.50, SEM ⫽ 1.58), F(1, 8) ⫽ 7.78, ␩2p ⫽ .27. No other
effects or interactions were reliable.
Spatial ability, as measured by Mental Rotation Test scores, did
not predict assembly time or errors. As in Studies 1A and 1B,
enclosure scores predicted fewer errors on the later assembly task,
r(14) ⫽ –.50. Similarly, using more verbal summaries led to fewer
assembly errors, r(14) ⫽ –.60.
Does Perspective Affect Learning?
Does Hierarchical Encoding Mediate Effects of
Perspective on Learning?
Videotapes of assembly performance were coded for errors
(e.g., attaching a block of the wrong size or color to another block)
and assembly time. On average, participants made 7.80 errors
(SEM ⫽ 1.78) and completed the assembly task in 8.80 min
Can the effects of perspective on assembly errors be explained
by changes in hierarchical encoding? To answer this question,
again we conducted a mediation analysis using the techniques of
Baron and Kenny (1986). As described in the bottom of Figure 4,
1416
LOZANO, HARD, AND TVERSKY
linear regression confirmed that assigned perspective7 reliably
predicted assembly errors, t(1) ⫽ –3.13, and hierarchical encoding,
as measured by enclosure scores, t(1) ⫽ 3.44. Hierarchical encoding also predicted assembly errors when controlling for assigned
perspective, t(1) ⫽ – 4.29. A Sobel test confirmed that significant
mediation had occurred (z ⫽ –2.39). When we controlled for
hierarchical encoding, the effect of assigned perspective on assembly errors was no longer significant, t(1) ⫽ – 0.59. Thus, hierarchical encoding fully mediated the effects of assigned perspective
on assembly errors.
Does Perspective at Encoding Affect Perspective During
Assembly?
As in Study 1A, assembly perspective was consistent with
encoding perspective. Of the participants who had described from
an actor perspective, 100% performed the LEGO assembly task by
taking that same (actor’s) perspective, meaning that they oriented
the blocks as they had appeared to the actor in the video and stood
on the same side of the table as the actor. Of participants who
described the video from an observer perspective, 63% assembled
from the observer’s perspective, meaning that they oriented the
blocks as they had appeared to the observer in the video and stood
on the observer’s side of the table, ␹2(1, N ⫽ 8) ⫽ 9.29.
Notably, the actor’s perspective was the “preferred” perspective
overall and was associated with better assembly performance overall: Participants in the observer perspective condition who performed assembly from the actor’s perspective made slightly fewer
errors (M ⫽ 8.33, SEM ⫽ 1.15) than those who maintained an
observer perspective during assembly (M ⫽ 12.25, SEM ⫽ 2.45).
Furthermore, when we reanalyzed the assembly error data and
excluded those who described from an observer perspective but
chose an actor perspective for assembly, participants who described from an actor perspective still made fewer assembly errors
(M ⫽ 3.38, SEM ⫽ 1.41) than those who described from a self
perspective (M ⫽ 8.08, SEM ⫽ 3.61), F(1, 11) ⫽ 11.43, MSE ⫽
33.92, ␩2p ⫽ .51. Collectively, these analyses indicate that differences in assembly performance were not attributable to self perspective describers’ choosing an incompatible perspective at
assembly.
Does Perspective Affect Attention to Action in General?
As in Study 1A, all descriptions were categorized as action,
depiction, or comment statements. On average, 85% of participants’ descriptions were action statements, 12% were descriptions,
and 3% were comments. Consistent with findings from Study 1A,
there were no differences between actor perspective and observer
perspective participants in the mean percentages of action, depiction, or comment statements that they used (for all three statement
types, highest t[14] ⫽ 1.18, ns).
assembly task (Hard, Lozano, & Tversky, in press). In contrast to
Study 1A, actor (M ⫽ 36.25, SEM ⫽ 4.86) and observer (M ⫽
37.63, SEM ⫽ 5.10) perspective participants did not differ in the
number of fine units segmented. There were no effects of segmentation order on the number of segmented units (M ⫽ 36.94, SEM ⫽
4.94 for fine– coarse vs. M ⫽ 34.35, SEM ⫽ 4.85 for coarse–fine).
This difference from Study 1A is likely attributable to the task
differences associated with assembling a TV cart versus assembling LEGO creations. Neither the total number of coarse units
segmented, the total number of fine units segmented, nor the ratio
of coarse to fine units segmented reliably predicted assembly
errors.
Did Perspective Affect Encoding of Spatial Information?
As in Study 1A, descriptions were also categorized as no perspective, neutral perspective, actor perspective, or observer perspective. The results of this coding can be seen in the bottom of
Figure 6. As in Study 1A, participants followed instructions: The
only observer perspective descriptions were given by participants
in the observer perspective condition (M ⫽ 9.00, SEM ⫽ 2.98).
The only actor perspective descriptions were given by participants
in the actor perspective condition (M ⫽ 8.38, SEM ⫽ 2.47).
Furthermore, the mean number of spatial descriptions— descriptions that coded any perspective—were equal for actor perspective
(M ⫽ 22.25, SEM ⫽ 6.13) and observer perspective (M ⫽ 21.13,
SEM ⫽ 4.39) participants, t(14) ⫽ 0.15, ns. Once again, later
differences in assembly performance were due to the spatial perspective that participants encoded actions from, not to differences
in attention to space more generally.
Similar to Study 1A, observer perspective participants had more
difficulty maintaining their assigned perspective, with 6 of the 8
observer perspective participants making description errors (M ⫽
1.50, SEM ⫽ 0.87), whereas none of the 8 actor perspective
participants did (Yates’s corrected, ␹2[1, N ⫽ 8] ⫽ 6.67). Furthermore, 5 of the 8 observer perspective participants qualified their
perspective (M ⫽ 0.63, SEM ⫽ 0.26), whereas none of the 8 actor
perspective participants did (Yates’s corrected, ␹2[1, N ⫽ 8] ⫽
4.65).8
Also similar to Study 1A, actor perspective participants (M ⫽
3.00, SEM ⫽ 0.82) focused more on the actor’s body than observer
perspective participants (M ⫽ 0.75, SEM ⫽ 0.37), giving more
descriptions that indicated which hand, left or right, had performed
an action, t(14) ⫽ 2.50, d ⫽ 1.25. Actor perspective (M ⫽ 5.38,
SEM ⫽ 2.29) and observer perspective (M ⫽ 8.25, SEM ⫽ 2.74)
participants did not differ in the number of times they described
left or right locations on the table, t(14) ⫽ – 0.81, ns. As in Study
1A, the number of references to the actor’s left or right hand
predicted better hierarchical encoding, as measured by enclosure,
r(14) ⫽ .91, and by verbal summaries, r(14) ⫽ .64. The number of
descriptions of the actor’s hands also positively correlated with
later assembly errors, r(14) ⫽ –.69. Descriptions concerning left
Does Perspective Affect Number of Segmented Units?
Participants identified approximately four times as many fine
units (M ⫽ 36.94, SEM ⫽ 4.92) as coarse units (M ⫽ 8.69, SEM ⫽
1.06), paired t(15) ⫽ 8.20, d ⫽ 5.92. This ratio of fine to coarse
units was slightly larger than that found in Study 1A but is
equivalent to that found in previous research using this same
7
In Study 2, assigned perspective was dummy coded: observer perspective ⫽ 0, actor perspective ⫽ 1.
8
Yates-corrected chi-square tests were adopted here instead of t tests
because actor perspective participants made no perspective errors and
never qualified their perspective, resulting in a violation of the normality
assumption.
PERSPECTIVE AND ACTION UNDERSTANDING
and right locations on the table did not predict hierarchical encoding and did not correlate with assembly performance.
Finally, just as in Study 1A, references to hands fully mediated
effects of perspective on enclosure scores (see bottom of Figure 7).
A linear regression analysis confirmed that assigned perspective
predicted both enclosure scores, t(1) ⫽ 2.73, and hand references,
t(1) ⫽ 2.50. Hand references predicted enclosure scores when
controlling for assigned perspective, t(1) ⫽ 6.18, and, according to
a Sobel test, mediation was significant (z ⫽ 2.50). Finally, when
we controlled for hand references, the effect of assigned perspective on enclosure scores was no longer significant, t(1) ⫽ 0.91, ns.
Thus, references to hands fully mediated the effects of assigned
perspective on hierarchical encoding.
General Discussion
Learning new skills through observation is not automatic, or
everyone would be expert skiers, dancers, and tennis players.
Nevertheless, people do acquire a wide range of complex skills
through observation. This simple fact suggests that people are
adept at translating tasks they see into tasks they can do. An
important component to this translation appears to be segmenting
and organizing an observed task into a hierarchical representation
of goals and subgoals—a representation that can be implemented
as an action plan (Hard, Lozano, & Tversky, in press; Zacks,
Tversky, & Iyer, 2001). Here, we have proposed that taking the
perspective of the actor while observing action facilitates hierarchical encoding of action and thus promotes action learning.
The studies reported here support that hypothesis. In one study,
participants observed and segmented an object assembly task
while giving a verbal play-by-play of the actions from the actor’s
or their own perspective. Describing actions from the actor’s
perspective instead of from their own led to better encoding of the
hierarchical goal–subgoal organization of those actions and better
subsequent performance of those actions. A follow-up to this study
showed that explicitly describing from an actor’s perspective was
superior for action understanding and learning relative to freely
describing, and both were superior to explicitly describing from a
self perspective. A final study showed that describing actions from
any perspective other than one’s own is not beneficial: It is
adopting the actor’s perspective specifically that promotes hierarchical encoding and learning.
What are observers doing in the present studies when they are
“taking the actor’s perspective”? There are a number of possibilities that are not mutually exclusive. It may be that observers are
simply engaging in visuospatial perspective taking—imagining
where objects are located in space, relative to the actor. It may be
that observers are engaging in mentalistic perspective taking—
imagining what their own goals and subgoals would be if they
were executing the observed task themselves. Finally, it may be
that observers are engaging in motoric perspective taking—mapping observed actions onto a representation of their own body.
Although all of these possibilities might be true, the data do seem
to strongly support the idea that observers are engaging in motoric
perspective taking or simulation: When participants described
from the actor’s perspective, they spontaneously described which
hand was performing certain actions. This tendency to describe
which hand performed an action was associated with better hierarchical encoding and in fact seemed to account for the fact that
1417
actor perspective describers encoded action more hierarchically
than self perspective or observer perspective describers. In contrast, descriptions of the location of an object in space from the
actor’s perspective were not associated with hierarchical encoding.
The fact that observers spontaneously described which of the
actor’s hands performed an action when describing from the actor’s perspective is consistent with findings that motor simulation
occurs as if observers are mapping observed actions anatomically
to their own bodies. In one demonstration of this fact, observers
viewed simple actions, such as moving toward a red dot, performed by another person’s left or right hand. When observers
watched left-hand actions, motor evoked potentials were larger in
observers’ left hands, whereas when observers watched right-hand
actions, motor evoked potentials were larger in observers’ right
hands (Aziz-Zadeh, Maeda, Zaidel, Mazziotta, & Iacoboni, 2002).
Similar evidence for an anatomical mapping has been found when
people observe actions performed by the feet (Y. W. Cheng,
Tzeng, Hung, Decety, & Hsieh, 2005).
How might motor simulation explain findings from the present
studies? When people try to understand actions, some form of
motor simulation might be automatic, such that observers implicitly relate the actor’s body to their own. This view predicts that
describing actions from the actor’s perspective, especially which
hand is performing those actions, should be natural and easy. In
contrast, describing actions from one’s own or another observer’s
perspective should be difficult and might impair hierarchical encoding by directly competing with it.
Describing actions from a self perspective impairs action understanding, but describing actions from the actor’s perspective
also enhances it. This could mean that motor simulation can be
enhanced by encouraging observers to put themselves in the actor’s shoes. Consistent with this view, instructing participants to
explicitly adopt the actor’s perspective led to more descriptions
about the actor’s hands than instructing participants to describe
freely. Alternatively, encouraging observers to take the actor’s
perspective might change the way they use their motor simulations
for understanding observed actions and their organization (cf.
Barsalou, 1999, 2003; Wilson & Knoblich, 2005). Although motor
simulation might account for the present findings, it remains to be
seen whether describing actions from the actor’s perspective actually elicits neural structures involved in planning and executing
actions. Future studies using transcranial magnetic stimulation or
functional magnetic resonance imaging methods could provide
valuable insight into the nature of the perspective-taking processes
observed in the present research.
The powerful links shown here among perspective taking, action
understanding, and action learning thus raise many questions. For
example, do benefits of adopting the actor’s perspective depend on
verbalizing that perspective, or are there nonverbal means of
perspective taking that are equally beneficial? Can taking an
actor’s perspective enhance understanding and performance of
other actions, in particular, the actions that are at the core of
effective social behavior? Also, what really happens when observers begin to think about space and actions performed in that space
from an actor’s point of view? The present data open the intriguing
possibility that spatial perspective taking provides a window into
the actor’s mind, giving observers insight into an actor’s goals,
intentions, and future behaviors.
LOZANO, HARD, AND TVERSKY
1418
References
Arbib, M. A., & Rizzolatti, G. (1996). Neural expectations: A possible
evolutionary path from manual skills to language. Communication and
Cognition, 29, 393– 424.
Aziz-Zadeh, L., Maeda, F., Zaidel, E., Mazziotta, J., & Iacoboni, M.
(2002). Lateralization in motor facilitation during action observation: A
TMS study. Experimental Brain Research, 144, 127–131.
Baldwin, D. A., Baird, J. A., Saylor, M. M., & Clark, M. A. (2001). Infants
parse dynamic action. Child Development, 72, 708 –717.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable
distinction in social psychological research: Conceptual, strategic, and
statistical considerations. Journal of Personality and Social Psychology,
51, 1173–1182.
Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Bain
Sciences, 22, 577– 609.
Barsalou, L. W. (2003). Situated simulation in the human conceptual
system. Language and Cognitive Processes, 18, 513–562.
Batson, C. D. (1991). The altruism question: Toward a social–
psychological answer. Hillsdale, NJ: Erlbaum.
Byrne, R. W., & Russon, A. E. (1998). Learning by imitation: A hierarchical approach. Behavioral and Brain Sciences, 21, 667–709.
Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The
perception– behavior link and social interaction. Journal of Personality
and Social Psychology, 76, 893–910.
Cheng, C. M., & Chartrand, T. L. (2003). Self-monitoring without awareness: Using mimicry as a nonconscious affiliation strategy. Journal of
Personality and Social Psychology, 85, 1170 –1179.
Cheng, Y. W., Tzeng, O. J. L., Hung, D., Decety, J., & Hsieh, J. C. (2005).
Modulation of spinal excitability during observation of bipedal locomotion. NeuroReport, 16, 1711–1714.
Cohen, J. D., MacWhinney, B., Flatt, M., & Provost, J. (1993). PsyScope:
An interactive graphic system for designing and controlling experiments
in the psychology laboratory using Macintosh computers. Behavior
Research Methods, Instruments & Computers, 25, 257–271.
Davis, M. H., Conklin, L., Smith, A., & Luce, C. (1996). Effect of
perspective taking on the cognitive representation of persons: A merging
of self and other. Journal of Personality and Social Psychology, 70,
713–726.
Fadiga, L., Craighero, L., & Olivier, E. (2005). Human motor cortex
excitability during the perception of others’ action. Current Opinion in
Neurobiology, 15, 213–218.
Galinsky, A. D., Ku, G., & Wang, C. S. (2005). Perspective-taking and
self– other overlap: Fostering social bonds and facilitating social coordination. Group Processes and Intergroup Relations, 8, 109 –124.
Galinsky, A. D., & Moskowitz, G. B. (2000). Perspective-taking: Decreasing stereotype expression, stereotype accessibility, and in-group favoritism. Journal of Personality and Social Psychology, 78, 708 –724.
Grafton, S. T., Arbib, M. A., Fadiga, L., & Rizzolatti, G. (1996). Localization of grasp representations in humans by positron emission tomography: II. Observation compared with imagination. Experimental Brain
Research, 112, 103–111.
Hard, B. M., Lozano, S. C., & Tversky, B. (in press). Hierarchical encoding
of behavior: Translating perception into action. Journal of Experimental
Psychology: General.
Hard, B. M., Tversky, B., & Lang, D. (in press). Segmenting abstract
events: Building event schemas. Memory and Cognition.
Hard, B. M., Zacks, J. M., & Tversky, B. (2006). Inferring structure in
behavior: The role of goals and language. Unpublished manuscript,
Stanford University and Washington University.
Hart, R. A., & Moore, G. T. (1973). The development of spatial cognition.
In R. M. Downs & D. Stea (Eds.), Image and environment (pp. 246 –
288). Chicago: Aldine.
Iacoboni, M. (2005). Understanding others: Imitation, language, empathy.
In S. Hurley & N. Chater (Eds.), Perspectives on imitation: From
neuroscience to social science (Vol. 1, pp. 77–100). Cambridge, MA:
MIT Press.
Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., &
Rizzolatti, G. (1999, December). Cortical mechanisms of human imitation. Science, 286, 2526 –2528.
Kenny, D. A., & Judd, C. M. (1986). Consequences of violating the
independence assumption in analysis of variance. Psychological Bulletin, 99, 422– 431.
Lashley, K. S. (1951). The problem of serial order in behavior. In L. A.
Jeffress (Ed.), Cerebral mechanisms in behavior: The Hixon Symposium
(pp. 112–146). Oxford, England: Wiley.
Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Lozano, S. C., Hard, B. M., & Tversky, B. (in press). Putting action in
perspective. Cognition.
Martin, B. A. (2006). Reading the language of action: Hierarchical encoding of behavior. Unpublished doctoral dissertation, Stanford University.
Meltzoff, A. N. (1995). Understanding the intentions of others: Reenactment of intended acts by 18-month-old children. Developmental
Psychology, 31, 838 – 850.
Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood
Cliffs, NJ: Prentice-Hall.
Newtson, D. (1973). Attribution and the unit of perception of ongoing
behavior. Journal of Personality and Social Psychology, 28, 28 –38.
Piaget, J., & Inhelder, B. (1956). The child’s conception of space. London:
Routledge and Kegan Paul.
Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends
in Neuroscience, 21, 188 –194.
Rizzolatti, G., Fadiga, L., Fogassi, L., & Gallese, V. (1999). Resonance
behaviors and mirror neurons. Archives Italiennes de Biologie, 137,
85–100.
Shelton, A. L., & McNamara, T. P. (1997). Multiple views of spatial
memory. Psychonomic Bulletin and Review, 4, 102–106.
Sobel, M. E. (1982). Asymptotic intervals for indirect effects in structural
equations models. In S. Leinhart (Ed.), Sociological methodology (pp.
290 –312). San Francisco: Jossey-Bass.
Travis, L. L. (1997). Goal-based organization of event memory in toddlers.
In P. W. van den Broek, P. J. Bauer, & T. Bourg (Eds.), Developmental
spans in event comprehension and representation: Bridging fictional
and actual events (pp. 111–138). Mahwah, NJ: Erlbaum.
Vandenberg, S. G., & Kuse, A. R. (1978). Mental rotations, a group test of
three-dimensional spatial visualization. Perceptual and Motor Skills, 47,
599 – 604.
Vorauer, J. D., & Cameron, J. J. (2002). So close, and yet so far: Does
collectivism foster transparency overestimation? Journal of Personality
and Social Psychology, 83, 1344 –1352.
Whiten, A. (2002). The imitator’s representation of the imitated: Ape and
child. In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases (pp. 98 –121). Cambridge, England:
Cambridge University Press.
Wilson, M., & Knoblich, G. (2005). The case for motor involvement in
perceiving conspecifics. Psychological Bulletin, 131, 460 – 473.
Zacks, J. M., Braver, T. S., Sheridan, M. A., Donaldson, D. I., Snyder,
A. Z., Ollinger, J. M., et al. (2001). Human brain activity time-locked to
perceptual event boundaries. Nature Neuroscience, 4, 651– 655.
Zacks, J. M., Tversky, B., & Iyer, G. (2001). Perceiving, remembering, and
communicating structure in events. Journal of Experimental Psychology: General, 130, 29 –58.
PERSPECTIVE AND ACTION UNDERSTANDING
1419
Appendix A
Test Video Scripts
The following script describes the steps followed by the actor in
the TV cart assembly test video. All spatial locations mentioned in
the script are described relative to the actor. Steps listed in boldface italic font correspond to higher level actions.
1. The actor places four pegs in a line at the upper left corner of
the table, using both hands.
2. The actor places screws in a line below the pegs, using both
hands.
3. The actor places four wheels in a line below the screws,
using both hands.
4. The actor places a screwdriver below the wheels, using both
hands.
5. The actor places the one sideboard on the lower right corner
of the table and then stacks the other sideboard on top of the first
one, using both hands.
6. The actor places the support board above the stacked sideboards, using both hands.
7. The actor places the bottom shelf above the support board,
using both hands.
8. The actor stacks the top shelf on top of the bottom shelf,
using both hands.
9. The actor picks up the top shelf, flips it upside down, and
positions it in the center of the table, using both hands.
10. The actor has now finished organizing the parts on the
table.
11. The actor picks up the first sideboard and positions it
upright and perpendicular to the top shelf on the left side, using
both hands.
12. The actor picks up a screw and inserts it in the upper left
corner of the first sideboard and top shelf, using her left hand.
13. The actor picks up another screw and inserts it in the lower
left corner of the first sideboard and top shelf, using her left hand.
14. The actor picks up the screwdriver and screws in the
screws she just inserted to the first sideboard and top shelf,
starting with the upper left screw and then moving to the lower
left screw, using her left hand.
15. The actor has now finished attaching the first sideboard.
16. The actor picks up two pegs and inserts them inside of the
first sideboard, using her left hand.
17. The actor picks up the support board and attaches it to the
pegs on the inside of the first sideboard, using both hands.
18. The actor picks up two pegs and attaches them to the right
end of the support board, using her right hand.
19. The actor picks up the second sideboard and attaches it to
the pegs in the right side of the support board, using both hands.
20. The actor has now finished attaching the support board.
21. The actor picks up a screw and inserts it in the upper right
corner of the second sideboard and top shelf, using her right hand.
22. The actor picks up another screw and inserts it in the lower
right corner of the second sideboard and top shelf, using her right
hand.
23. The actor picks up the screwdriver and screws in the screws
she just inserted in the second sideboard and top shelf, starting
with the upper right screw and then moving to the lower right
screw, using her right hand.
24. The actor has now finished attaching the second sideboard.
25. The actor picks up the bottom shelf and positions it in
between the two sideboards, using both hands.
26. The actor picks up a screw and inserts it in the upper left
corner of the first sideboard and bottom shelf, using her left hand.
27. The actor picks up another screw and inserts it in the lower
left corner of the first sideboard and bottom shelf, using her left
hand.
28. The actor picks up the screwdriver and screws in the
screws she just inserted in the first sideboard and bottom shelf,
starting with the upper left screw and then moving to the lower
left screw, using her left hand.
29. The actor picks up a screw and inserts it in the upper right
corner of the second sideboard and bottom shelf, using her right
hand.
30. The actor picks up another screw and inserts it in the lower
right corner of the second sideboard and bottom shelf, using her
right hand.
31. The actor picks up the screwdriver and screws in the
screws she just inserted in the second sideboard and bottom
shelf, starting with the upper right screw and then moving to the
lower right screw, using her right hand.
32. The actor has now finished attaching the bottom shelf.
33. The actor picks up a wheel and inserts it in the upper left
corner of the first sideboard, using her left hand.
34. The actor picks up a wheel and inserts it in the upper right
corner of the second sideboard, using her right hand.
35. The actor picks up a wheel and inserts it in the lower right
corner of the second sideboard, using her right hand.
36. The actor picks up a wheel and inserts it in the lower left
corner of the first sideboard, using her left hand.
37. The actor has now finished attaching the wheels to the
cart.
38. The actor flips the now-completed television cart over, so
that it is now in an upright position, using both hands.
39. The TV cart is now complete.
The following script describes the steps followed by the actor in
the LEGO assembly test video. All spatial locations mentioned in
the script are described relative to the actor. Steps listed in boldface italic font correspond to higher level actions.
1. The actor places nine yellow blocks in a vertical line on the
left side of the table, using her left hand.
(Appendixes continue)
1420
LOZANO, HARD, AND TVERSKY
2. The actor places 12 red blocks in a vertical line to the right
of the yellow blocks, using her left hand.
3. The actor places nine green blocks in a vertical line to the
right of the red blocks, using her right hand.
4. The actor places 13 blue blocks in a vertical line to the right
of the green blocks, using her right hand.
5. The actor has now finished organizing the blocks on the
table.
6. The actor stacks three small blue blocks on top of a small red
block to form the first leg of a horse, using her right hand.
7. The actor stacks three small blue blocks on top of a small red
block to form the second leg of a horse, using her right hand.
8. The actor connects the two legs with a large yellow block,
using her right hand.
9. The actor stacks three small blue blocks on top of the large
yellow block to form the horse’s neck, using her right hand.
10. The actor places a medium blue block on top of the horse’s
neck to form its nose, using her right hand.
11. The actor places a small yellow block, with a picture of an
eyeball on it, on top of the horse’s nose, using her right hand.
12. The actor places a red block, shaped like a saddle, on top of
the large yellow block that forms the horse’s back, using her right
hand.
13. The actor places the completed horse on the right side of the
table, using her right hand.
14. The blue horse is now complete.
15. The actor stacks three small green blocks on top of a small
red block to form the first leg of a horse, using her left hand.
16. The actor stacks three small green blocks on top of a small
red block to form the second leg of a horse, using her left hand.
17. The actor connects the two legs with a large yellow block,
using her left hand.
18. The actor stacks three small green blocks on top of the large
yellow block to form the horse’s neck, using her left hand.
19. The actor places a medium green block on top of the horse’s
neck to form its nose, using her left hand.
20. The actor places a yellow block, with a picture of an eyeball
on it, on top of the horse’s nose, using her left hand.
21. The actor places a red block, shaped like a saddle, on top of
the large yellow block that forms the horse’s back, using her left
hand.
22. The actor places the completed horse on the left side of the
table, using her left hand.
23. The green horse is now complete.
24. The actor connects four small blue blocks together to form
a plus shape, using both hands.
25. The actor connects five small red blocks together to form a
staircase shape, using both hands.
26. The actor connects five small yellow blocks together to
form a staircase shape, using both hands.
27. The actor connects the yellow staircase on top of the red
staircase, using her right hand.
28. The actor connects the blue plus shape to the top of the
yellow staircase, so as to form a heart, using her left hand.
29. The actor places the completed heart in between the two
horses, using both hands.
30. The heart is now complete.
Appendix B
Video Segmentation Instructions
The following is the introduction to segmentation that all participants received:
Human experience is very complex. As we go about our day-to-day
lives, we encounter a lot of information that we need to make sense of.
One way that we do this is to break down our experiences into events.
For example, when you think about your day, you think about it in
terms of the events that happened, such as eating lunch or going to
class. These are examples of events that you were directly involved in.
You can think about all of these events on a variety of scales. For
example, you can think about the day in terms of very small events,
like reaching for the alarm clock, picking up a box of cereal, or
dropping your keys on the floor. You can also think of the day in
terms of larger events, such as eating lunch, riding to class, or
attending a party. Thus, we can think about events as being as big or
as small as we want.
The following is the introduction to action description that all
participants in Studies 1A and 1B received. Instruction differences corresponding to different assigned perspectives appear
in boldface font:
In this experiment, we are interested in how people understand
events when thinking about them from someone else’s (their own)
perspective. You will watch two videos involving a person assembling objects. We will ask you to divide this video into separate
events. You will do this by using the SPACEBAR to mark off
where you believe one event has ended and another event has
begun. Every time you press the spacebar, please briefly state, in
terms of the actor’s (your own) perspective, what happened in the
segment you just observed.
The following is the introduction to action description that all
participants in Study 2 received. Instruction differences corresponding to different assigned perspectives appear in boldface
font:
In this experiment, we are interested in how people understand events
when thinking about them from an actor’s (an observer’s) perspective. You will watch two videos involving a person assembling
objects. We will ask you to divide each video into separate events.
You will do this by using the SPACEBAR to mark off where you
believe one event has ended and another event has begun. Every time
you press the spacebar, please briefly state, in terms of the actor’s
(the observer’s) perspective, what happened in the segment you just
observed.
PERSPECTIVE AND ACTION UNDERSTANDING
1421
Appendix C
Calculation of Enclosure Scores
Enclosure is a measure of hierarchical encoding that takes into
account the conceptual relation between the boundaries of coarse
units—coarse breakpoints—and the boundaries of fine units—fine
breakpoints. If action is perceived hierarchically, then fine units
should represent substeps of a corresponding coarse unit. For
example, the fine units “She built one leg” “She built a second leg”
“She attached the two legs to a body” and “She built the neck and
head” are substeps of the coarse unit “She built the blue horse.”
Thus, if we pair a given coarse breakpoint (e.g., “She built the blue
horse”) with the closest fine breakpoint in time, that fine breakpoint should represent the final substep of that coarse unit (e.g.,
“She built the neck and head”). When this relationship between a
coarse breakpoint and its closest fine breakpoint holds true, the
fine breakpoint tends to be enclosed by, that is, fall before, the
corresponding coarse breakpoint. In previous studies (Hard,
Lozano, & Tversky, in press), and in the current ones, when fine
breakpoints are not enclosed by the corresponding coarse breakpoints, over 75% of the time they are not hierarchically related to
the coarse unit. In previous studies, and in the current ones, the
enclosure pattern is the dominant one: Within a participant, coarse
breakpoints more frequently follow their closest fine breakpoint
than precede it: Study 1a, paired t(39) ⫽ 4.07, d ⫽ 0.66; Study 2,
paired t(15) ⫽ 6.06, d ⫽ 0.54.
The following is an example of how the enclosure score for each
participant is calculated:
Step 1. The left and center columns of Table C1 show a
chronological list of the points in time (starting from the beginning
of the video in milliseconds) that a participant marked as coarse
and fine breakpoints. We begin by lining up each coarse breakpoint with the fine unit it is temporally closest to. These results are
shown in the left and center columns of Table C1.
Step 2. Once a coarse breakpoint is lined up with a fine
breakpoint, we determine whether that coarse breakpoint
falls temporally before or after the fine breakpoint. The results
of this determination are shown in the final column of the
table.
Step 3. We now calculate the numerator of the enclosure
score. We do this by first checking for cases in which multiple
coarse breakpoints share (i.e., are closest to) the same fine
breakpoint. For each such case, we determine which of the
coarse breakpoints the fine breakpoint is in fact closest to. Only
this pairing will be used in determining the participant’s enclosure score; the other pairing is excluded. In Table C1, shared
cases are ones in which two coarse breakpoints are paired with
a single fine breakpoint. The italicized coarse breakpoint in
Table C1
Timing Information for Breakpoints
Coarse
breakpoints
15,828
33,428
44,606
57,843
66,292
71,449
75,351
83,828
86,347
90,515, 93,472
144,188
155,832
190,967, 192,802
200,147
Fine
breakpoints
1,630
12,839
28,443
37,702
50,078
57,303
66,713
70,159
75,655
82,995
86,238
91,303
98,414
102,119
108,956
111,892
118,374
124,911
138,553
142,889
153,310
160,646
173,990
186,743
192,239
197,879
Is coarse breakpoint before
or after fine breakpoint?
After
Before
Before
After
Before
After
Before
After
After
Before, After
After
After
Before, After
After
Note. The italicized coarse breakpoint in each shared case represents the
one that calculated toward the enclosure score.
each shared case represents the one that calculated toward the
enclosure score. The numerator of the enclosure score is then
equal to the total number of cases in which a coarse breakpoint
falls after its nearest fine breakpoint. Thus, our example participant has an enclosure score numerator equal to 9.
Step 4. Enclosure is calculated by taking the numerator calculated in Step 3 and dividing it by the total number of coarse units.
In our example, the participant has a total of 16 coarse units, so we
calculate the enclosure score to be 9/16 ⫽ .56.
Received May 29, 2005
Revision received April 12, 2006
Accepted April 13, 2006 䡲