Psychological Research DOI 10.1007/s00426-014-0606-0 ORIGINAL ARTICLE A new paper and pencil task reveals adult false belief reasoning bias Patricia I. Coburn • Daniel M. Bernstein Sander Begeer • Received: 14 October 2013 / Accepted: 23 August 2014 Springer-Verlag Berlin Heidelberg 2014 Abstract Theory of mind (ToM) is the ability to take other people’s perspective by inferring their mental state. Most 6-year olds pass the change-of-location false belief task that is commonly used to assess ToM. However, the change-of-location task is not suitable for individuals over 5 years of age, due to its discrete response options. In two experiments, we used a paper and pencil version of a modified change-of-location task (the Real Object Sandbox task) to assess false belief reasoning continuously rather than discretely in adults. Participants heard nine change-oflocation scenarios and answered a critical question after each. The memory control questions only required the participant to remember the object’s original location, whereas the false belief questions required participants to take the perspective of the protagonist. Participants were more accurate on memory trials than trials requiring perspective taking, and performance on paper and pencil trials correlated with corresponding trials on the Real Object Sandbox task. The Paper and Pencil Sandbox task is a convenient continuous measure of ToM that could be administered to a wide range of age groups. P. I. Coburn (&) D. M. Bernstein Kwantlen Polytechnic University, Surrey, BC, Canada e-mail: [email protected] Present Address: P. I. Coburn Simon Fraser University, Burnaby, BC, Canada S. Begeer VU University, Amsterdam, The Netherlands S. Begeer University of Sydney, Sydney, Australia Introduction Perspective taking is essential to social interaction, and requires individuals to reason about others’ mental states including their beliefs, emotions and intentions (Nickerson, 1999). Together these skills comprise theory of mind (ToM). Researchers often examine ToM by evaluating the understanding of mental states, most notably false beliefs, in young, typically developing children or in children with autism (Yirmiya, Erel, Shaked, & Solomonica-Levi, 1998; Wellman, Cross, & Watson, 2001). The most commonly used measure assesses first-order false belief reasoning, a component of ToM, with a change-of-location task (Wimmer & Perner, 1983; Baron-Cohen, Leslie, & Frith, 1985). However, this task is most appropriate for preschoolers and shows ceiling effects when administered to older children and adults. The current study uses a modified change-of-location task to reveal false belief reasoning errors in young adults. False belief reasoning requires individuals to make judgements about another person’s behavior when that person has a false belief about a situation. The classic change-of-location task measures first-order false belief reasoning—understanding the thoughts and intentions of a person who holds a false belief—as a discrete variable, based on children’s responses to hypothetical scenarios about social interactions (Wimmer & Perner, 1983). To pass this task, individuals must take the perspective of a protagonist who is unaware that an object has moved from one location to a second location. One scenario depicts Sally and Anne playing with a ball. Sally places the ball in the cupboard and then leaves the room. While Sally is gone, Anne moves the ball from the cupboard to a box. Participants must decide where Sally will search for the ball upon returning, and therefore must ignore their own 123 Psychological Research perspective and understand that Sally has a false belief about where the ball actually is. Participants who ignore their privileged knowledge are deemed to have a mature ToM, while those who fail to ignore their privileged knowledge are deemed to have an immature ToM. Most children under 4 years of age fail the false belief task (although see Perner & Roessler, 2012). By age 6, the majority of typically developing children pass this task, leading to the assumption that children older than age 6 have developed a first-order ToM understanding (Wellman et al., 2001). However, few studies have tested the assumption that children older than six have first-order ToM using age-appropriate measures for first-order ToM (Miller, 2009; although see Carpendale & Chandler, 1996). Interestingly, the studies that use appropriate measures to focus on first-order ToM in normal adults generally reveal perspective-taking errors: Adults show an egocentric bias (a tendency to err according to privileged knowledge) in their responses, indicating a failure in first-order ToM (Keysar, Lin, & Barr, 2003; Surtees & Apperly, 2012). In one experiment, participants (aged 8, 10, and 21 years) viewed an avatar who stood facing one direction in a cartoon room, resulting in only three walls being visible to the avatar. The participants’ task was to take a self-perspective or the avatar’s perspective and determine how many dots along the walls of the room were visible (Surtees & Apperly, 2012). Participants heard an auditory recording stating that either they or the avatar could see N dots, where N was one to three dots, and were asked to indicate whether this was correct. All age groups showed egocentric errors by answering more accurately from their own perspective than from the avatar’s perspective. Similar findings emerged from a communication experiment in which one participant (the director) instructed another participant (the addressee) to move objects around a 4 9 4 array of slots, containing individual objects in each slot (Keysar, Lin, & Barr, 2003). The objects in some of the slots were only visible to the addressee, and hidden from the director. To determine which object the director was referring to, the addressee had to employ ToM and assume the director’s perspective. In the experimental condition, the hidden object had an ambiguous name, similar to the target object that was to be moved. For example, when asked to relocate ‘the large cup’, in one slot there was a small cup and in another slot there was a larger cup, both of which were visible to the participant and the director. However, in the hidden slot there was an even larger cup. Although participants were aware that the director could not see the hidden object, surprisingly often they tried to move the hidden object when it fit the description of the to-bemoved object. Thus, participants were unable to ignore their privileged perspective when attempting to take the director’s naı̈ve perspective. These findings stand in stark 123 contrast to the mature false belief reasoning of children aged 6 and older who pass the classic ToM tasks. This discrepancy is likely due to the varying methods and tasks researchers use to assess ToM in different age groups. Having a task that could easily be administered to all age groups without procedural changes would add to the emerging literature on the development of perspective taking across the lifespan. In order to develop a first-order false belief task for a wider age range, Sommerville, Bernstein, and Meltzoff (2013) adapted the classic change-of-location task to measure first-order false belief in preschoolers and adults using a continuous outcome scale. This new task, which we refer to here as the Real Object Sandbox task, uses multiple false belief and memory control trials and a virtually unlimited number of response options capable of detecting false belief on a continuum. In the task, participants face the researcher with a large Styrofoam-filled sandbox between them. The researcher reads a scenario involving a protagonist who places an object in one location within the box, and then leaves the room. Another character then moves the object to a second location within the box. To illustrate these scenarios, the researcher locates and relocates real objects within the Styrofoam contained in the sandbox. The researcher asks the participant either a critical false belief or memory control question. The false belief question requires the individual to set aside his/her own privileged knowledge and take the naı̈ve perspective of the protagonist, while the memory control question requires only that the individual remember the original location of the object. Compared to the classic change-oflocation task, the Real Object Sandbox task allows for the object to be placed, and more critically, for the individual to respond to a location anywhere along the length of the 5-foot container. This allows for errors, operationalized as bias, to be measured as a continuous rather than a discrete variable. Recently, we created a more convenient paper and pencil version of the Real Object Sandbox task (Begeer, Bernstein, van Wijhe, Scheeren, & Koot, 2012). We compared performance in typically developing adolescents and adolescents with autism, using a false belief scenario and a no-false belief control scenario, in which the protagonist, respectively, does not or does know the hidden object’s current location. In the no-false belief scenario, the protagonist hid a different object in a second location. Therefore, this trial did not involve a change of location of the original object. Both groups showed more bias on the false belief than the no-false belief scenario, while individuals with autism showed significantly more bias than the typically developing adolescents. Perspective-taking deficits in clinical populations have a profound effect on social functioning. Psychological Research Thus, the detection of these deficits is an important component of assessment. Rather than using age-appropriate first-order ToM measures, various researchers have attempted to make ToM measures more suitable for older children and adults by increasing task complexity, resulting in advanced ToM measures, which primarily include ‘‘second-order’’ false belief tasks. These tasks assess the understanding of embedded mental states (e.g., thinking about what another person thinks about what a third person thinks; TagerFlusberg & Sullivan, 1994) and successfully measure ToM development in older children (Perner & Wimmer, 1985). However, typically developing 9-year olds often perform at ceiling on these tasks (Happé, 1994). Advanced ToM measures are generally unsuitable for (pre) school-aged children; moreover, the validity of these measures may be undermined by the fact that they assess related, general cognitive skills such as language and executive function rather than targeting core first-order ToM abilities (TagerFlusberg & Sullivan, 1994; Scheeren, de Rosnay, Koot, & Begeer, 2013; see also Rakoczy, Harder-Kasten, & Sturm, 2012; Rubio-Fernandez & Geurts, 2012). Finally, individuals employ first-order ToM frequently in everyday interactions, for example when determining what object someone is describing and providing one’s conversation partner with adequate information. Conversely, people seem to use second-order ToM skills in specific situations, such as playing complex card games such as poker or bridge. First-order ToM skills are thus more frequently employed than second-order ToM skills during everyday social interactions. Currently, there are few simple tasks that can be administered to preschoolers, older children and adults to measure false belief reasoning without altering the procedures. Developing a simple measure that could fulfill this requirement would be a significant contribution to ToM research. The paper and pencil version of the Sandbox task could satisfy this requirement if the egocentric perspectivetaking results from Begeer et al. (2012), which only examined adolescent performance, can be replicated in other age groups such as adults. Additionally, the Sandbox task version used by Begeer et al. (2012) included only one trial to measure false belief reasoning and one trial to measure no-false belief reasoning. Moreover, the latter condition involved participants’ memory for a different object being placed in a second location within the sandbox. It is, therefore, possible that this no-false belief control condition failed to control adequately for perspective taking in the false belief condition. The purpose of the current study was to administer the Paper and Pencil Sandbox task to assess first-order false belief performance in young adults. In Experiment 1, we included four false belief and four memory control trials derived from the Real Object Sandbox task and measured location accuracy as a bias toward the incorrect location (Sommerville et al., 2013). Comparing performance across several trials of mental state and non-mental state reasoning provides a robust measure of false belief reasoning. Utilizing several memory control trials instead of a single no-false belief trial, as done by Begeer et al. (2012), could illuminate perspective-taking difficulties that may exist in the face of general cognitive abilities such as working memory. Based on prior work, we predicted more bias on the false belief trials than the memory control trials (cf. Begeer et al., 2012; Sommerville et al., 2013; Surtees & Apperly, 2012). In Experiment 2, we sought to validate the Paper and Pencil Sandbox task by comparing it to performance on the Real Object Sandbox task. Experiment 1 We administered the Paper and Pencil Sandbox task to examine if individuals would show more bias on trials that require perspective taking than on trials that only require them to remember the original location of an object. This would indicate that adults demonstrate false belief reasoning bias. Method Participants Eighty-one undergraduates from a Canadian university (M age = 20.80 years; Range 18–29; 68 % female) received course credit through the psychology research pool. Participants completed the Paper and Pencil Sandbox task and then completed an unrelated task as part of another study. Materials Paper and Pencil Sandbox task The Paper and Pencil Sandbox task is a modification of the original, Real Object Sandbox task that involved a 5-foot long Styrofoam-filled box (Sommerville et al., 2013; see Begeer et al., 2012). The Paper and Pencil Sandbox task used here involved two versions of a container (e.g., sandbox, freezer, garden plot) printed on one side of an 8.500 9 1100 piece of paper (see Fig. 1a, b), and a single version of the container printed on the other side of the paper. The Paper and Pencil Sandbox task consisted of nine trials. On each trial, participants heard a different change-of-location scenario and then responded to a critical question. We used a word search book as a brief filler task between the scenario and response portion of each trial. 123 Psychological Research Fig. 1 a Example of page one—Paper and Pencil Sandbox scenario (wording for memory control and false belief was identical on the first page). b Example of flip side of page one—Paper and Pencil Sandbox scenario (memory control trial) The first side of the paper had an ‘‘9’’ on each container to indicate the original location (L1), in which a story character hides an object, and (L2), the change of location. The second side of the paper had no marking on the container to allow the participant to respond to the critical question. Each container was 148 millimeters long. The 123 participant heard a different scenario on each trial. There were nine trials and therefore nine different scenarios. In each scenario, a protagonist placed an object in the container (L1). While the protagonist was away (false belief and memory control trials), another character moved the object to another location (L2) within the container. On one Psychological Research trial (true belief), the protagonist watched as the other character moved the object. The four false belief trials required participants to take the protagonist’s perspective, who had a false belief about the object’s location, whereas the memory control trials only required participants to remember the original location of the objects. We included the true belief trial to prevent participants from realizing that the correct response on every other trial was L1. The correct response to the true belief trial was L2 rather than L1, because the protagonist watches the object being relocated. The trials followed a fixed order. The participants completed two memory control trials, two false belief trials, a true belief trial, two more false belief trials and finally two more memory control trials. Design We used a single factor (belief: false belief, memory control) within-subjects design. Procedure This study was approved by the Kwantlen Polytechnic University Research Ethics Board. Participants provided full informed consent before completing this study. Each participant heard nine scenarios and completed the task within 10 min. On a false belief trial, the researcher would read the first part of the scenario to the participant. For example, ‘‘Yoko and her mom have just come home from the grocery store and are putting the groceries away. Yoko puts the ice cream in the freezer here and then goes outside to play.’’ The researcher then turned the paper to show the participant the container here denoting the freezer with an ‘‘X’’ on it to indicate the location (L1). The paper had both parts of the scenario on it, therefore the researcher used a piece of paper to cover the part that was not meant to be viewed currently. After the participant had the opportunity to see L1 the researcher collected the paper and read the second part of the scenario, covering the first part of the scenario. ‘‘While Yoko is outside, her mom opens the freezer and moves the ice cream here.’’ The researcher then told the participant to search for words in a word search book. We administered this distractor task to prevent participants from using perceptual strategies to guide their search in the Sandbox task. After 20 s, the researcher turned the page to read the critical question: ‘‘When Yoko comes back, where will she look for the ice cream?’’ The researcher then showed the paper with another copy of the container on it. This one was blank allowing the participant to write an ‘‘9’’ to indicate where they thought Yoko would look for the ice cream. The correct response to this trial was L1. On a memory control trial, the researcher would use the same procedures as on the false belief trial; however, the Table 1 Mean bias (standard deviation) in proportions for the Paper and Pencil Sandbox task and the Real Object Sandbox task (Experiment 1 and Experiment 2) Experiment 1 Paper and pencil 70 mm Experiment 2 Paper and pencil 35 mm Paper and pencil 35 mm Real object 35 cm False belief 0.17 (0.19) 0.06 (0.10) 0.06 (0.10) 0.06 (0.07) Memory control 0.09 (0.17) -0.02 (0.11) 0.04 (0.08) 0.04 (0.06) critical question was now worded to tap participants’ memory for the original placement of the object: ‘‘Then Yoko comes back. Where did she put the ice cream before she went out to play?’’ Again, the correct response to this question was the first location. Although the false belief and memory control trials are conceptually identical except for the critical question, they differ in a fundamental way: the false belief trials require that participants remember where the protagonist originally placed the object and also to set aside their own privileged knowledge of L2 to respond from the perspective of the naı̈ve protagonist; the memory control trials require only that participants remember where the protagonist originally placed the object. Participants in the Sandbox task have a tendency to report a location closer to L2 rather than the correct answer, L1. We refer to this systematic response bias as L2 responses. Importantly, in the false belief condition, errors can result from (1) a failure to ignore one’s own privileged information, or (2) a failure to remember L1. Based on the false belief condition alone, it is, therefore, impossible to say whether only one or both of these errors occur. To clarify the degree of failure to remember L1, we can examine the magnitude of bias in the memory control condition. Errors made in this condition are independent of false beliefs. Consequently, more errors in the false belief condition compared to the memory control condition indicate a failure in false belief reasoning and egocentric responding, that is, a failure to ignore one’s own privileged information. For both false belief and memory control trials, we varied whether L2 moved to the right or the left of L1. This helped to prevent any inherent negative or positive bias for either the false belief or memory control trials. We also varied whether the distance between L1 and L2 was 35 or 70 mm. We compared bias on the false belief questions to bias on the memory control questions for both the short and long trials. The correct response to every false belief and memory control scenario was L1, while the correct response to the one true belief scenario was L2. The true 123 Psychological Research belief trial was included strictly to prevent participants from developing a response strategy; therefore, we did not analyze these responses. We expected that participants would show a greater shift towards L2 on the false belief trials than the memory control trials, indicating egocentric perspective taking, resulting from a failure to set aside the privileged knowledge of the second location. Results We measured location accuracy as the distance in mm between L1 and the participant’s answer. Any response that moved towards L2 we considered a positive bias, while movement in the opposite direction of L2 we considered a negative bias. We divided each bias score by the total length of the Paper and Pencil Sandbox to create a proportion score. Means and standard deviations for long and short trials are included in Table 1. Performance on false belief long trials correlated with performance on false belief short trials r = 0.30, p = 0.007. Performance on the memory control long trials did not significantly correlate with performance on memory control short trials r = 0.12, p = 0.29. We compared average bias on the two false belief long trials (70 mm) to the average bias on the two memory control long trials (70 mm). As expected, false belief reasoning bias was greater than memory control bias t(80) = 3.03, p = 0.003, d = 0.43. Next, we compared the average bias on the two false belief short trials (35 mm) to the average bias on the two memory control short trials (35 mm). Again, false belief reasoning bias was greater than memory control bias t(80) = 5.05, p \ 0.001, d = 0.72. These results indicate that with a sufficiently sensitive task, adults demonstrate errors in false belief reasoning similar to those seen in young children on the classic change-of-location task (Birch & Bloom 2007; Maehara & Umeda, 2013; Mitchell, Robinson, Isaacs, & Nye, 1996; although see Ryskin & Brown-Schmidt, 2014). We ran additional analyses to ensure that the results were not due to a few participants showing extreme scores on specific trials. We evaluated each individual’s overall bias score by subtracting overall memory control from overall false belief reasoning bias. Sixty-three participants (77 % of the sample) showed more false belief than memory control bias on this task as indicated by positive difference scores. Chi-square analysis comparing those who did and did not show bias was significant v2(1, N = 81) = 25.00, p \ 0.001. As a measure of individual differences on our task, we examined the number of participants who performed without error. We arbitrarily defined ‘‘errorless’’ performance for the long trials and short trials separately as follows: average bias falling between -0.013 and ?0.013 (proportion) for the long trials (2.7 % of the total distance 123 of the sandbox), and average bias falling between -0.007 and ?0.007 (proportion) for the short trials (1.35 % of the total distance of the sandbox). Five participants (0.06 proportion) exhibited errorless performance on the false belief long trials; 19 participants (0.23 proportion) exhibited errorless performance on the memory control long trials; 13 participants (0.16 proportion) exhibited errorless performance on the false belief short trials; 0 participants exhibited errorless performance on the memory control short trials. Discussion Most previous attempts to measure adult first-order false belief reasoning bias have failed because tasks designed for children were insensitive to adult performance. The results from Experiment 1 resemble those of Keysar et al. (2003) as well as Surtees and Apperly (2012) who both argue that adults demonstrate perspective-taking errors. Egocentric perspective taking may be an inherently human characteristic that is not only present in children, but also in adults. The Paper and Pencil Sandbox task adds to a growing and necessary arsenal of tasks for measuring ToM in different ages (Devine & Hughes, 2013; Lagattuta, Sayfan, & Harvey, 2013; Surtees & Apperly, 2012). Our findings suggest that previous conceptions of ToM suddenly maturing at age 4 may reflect the categorical nature of standard ToM tasks. The Paper and Pencil Sandbox task may be more appropriate than a categorical belief reasoning task for measuring ToM performance in relative degrees rather than absolute terms. Also, researchers could potentially use the Paper and Pencil Sandbox task to measure first-order false belief reasoning bias in other language-competent age groups (e.g., preschoolers through older adults) without having to adjust the materials or procedures (cf., Bernstein, Erdfelder, Meltzoff, Peria, & Loftus, 2011). Examining trials of errorless performance revealed that 19 individuals showed no errors on the memory control long trials. However, there were no observed errorless memory control short trials. Rather, individuals had a tendency to show a negative bias on the short trials. This could suggest that the difference between false belief and memory control short trials resulted from individuals’ failure to remember the original location rather than a failure in perspective taking. In Experiment 2, we sought to replicate and extend the effects that we observed in Experiment 1. Specifically, in Experiment 2, we tried to validate the Paper and Pencil Sandbox task by comparing it to performance on the Real Object Sandbox task. Also, we examined whether participants would exhibit a pattern of errorless performance similar to what we observed in Experiment 1. Psychological Research Experiment 2 To demonstrate that the Paper and Pencil Sandbox task and the Real Object Sandbox task tap a similar construct, we tested correlations between the two tasks. In particular, correlations between false belief and memory control performance on the Paper and Pencil and the Real Object version would add to the validity of the Paper and Pencil Sandbox task. Additionally, we sought to replicate procedures from Experiment 1 using only short trials. More positive bias on false belief compared to memory control trials would indicate that bias is driven by a failure in perspective taking rather than memory. Method Participants Eighty-five undergraduates from a Canadian university (M age 21.76 years; range 17–44 years; 83 % female) received course credit through the psychology research pool. Materials Paper and pencil sandbox task We developed a new Paper and Pencil Sandbox task with novel scenarios to allow participants to complete both the Paper and Pencil and the Real Object version of the Sandbox task. This prevented the participants from hearing identical scenarios in the two versions of the Sandbox task. Additionally, because the Real Object Sandbox task holds the distance between locations constant, the distance between L1 and L2 on the Paper and Pencil Sandbox task in Experiment 2 was constant on all trials. We chose to hold the distance between L1 and L2 at 35 mm because this distance yielded a larger effect size than the 70 mm distance in Experiment 1, and we wanted to examine if the pattern of errorless performance on memory control trials that we observed in Experiment 1 would replicate. Finally, to control for possible order effects, we counterbalanced whether the participants started with a false belief or memory control trial. We blocked the false belief and the memory control trials; therefore, participants completed four memory control trials or four false belief trials in succession before completing a true belief trial and then four false belief or four memory control trials, respectively. We also counterbalanced whether the participants started with the Real Object Sandbox task or the Paper and Pencil Sandbox task. Real object sandbox task The Real Object Sandbox task involved a 5-foot (150 cm) long Styrofoam-filled box (Sommerville et al., 2013). The procedures were similar to those for the Paper and Pencil Sandbox task described in Experiment 1 with the following exceptions: the participants stood facing the researcher who used real objects (approximately 50–75 mm in length) to depict each scenario; the objects were buried in the Styrofoam-filled sandbox which was placed between the researcher and the participant; and the distance between L1 and L2 was always 35 cm. After hearing the first part of the scenario, the participant turned around to complete the 20-s filler task. When the participant turned back around the researcher asked the critical question, to which the participant responded by pointing into the sandbox. The researcher then used a measuring tape that was attached to the opposite side of the sandbox (concealed from the participant) to note the response on each trial. To control for possible order effects, we counterbalanced whether the participants started with a false belief or memory control trial. If participants started with memory control (or false belief), they did so for both the Paper and Pencil, and the Real Object Sandbox task. Design We used a 2 (belief: false belief, memory control) 9 2 (task: Paper and Pencil, Real Object) 9 2 (task order: Real Object first, Paper and Pencil first) 9 2 (belief order: false belief first, memory control first) mixed design. Belief and task were within-subjects variables and task order and belief order were between-subjects variables. Procedure Participants provided full informed consent before completing this study. Procedures were similar to those described in Experiment 1. Each participant completed nine trials of either the Paper and Pencil Sandbox task or the Real Object Sandbox task. They then performed an unrelated filler task before completing nine different trials on the corresponding task. We expected a main effect of belief as evidenced by greater errors on false belief trials than memory control trials. We did not expect to observe an effect of task or a significant belief by task interaction. Results As in Experiment 1, we measured location accuracy as the distance between L1 and the participant’s answer. Any response that moved towards L2 we considered a positive bias, while movement in the opposite direction of L2 we considered a negative bias. We divided each bias score by the total length of the Sandbox (Paper and Pencil or Real Object) to create a proportion score. This allowed us to compare bias on the two tasks. We calculated overall false belief reasoning bias and overall memory control bias by 123 Psychological Research averaging the four false belief trials on each task and by averaging the four memory control trials on each task, respectively. There were significant correlations between false belief performance on the Paper and Pencil Sandbox task and the Real Object Sandbox task, r = 0.43, p \ 0.001, as well as memory control performance between the Paper and Pencil Sandbox task and the Real Object Sandbox task, r = 0.39, p \ 0.001. A 2 (belief) 9 2 (task) 9 2 (task order) 9 2 (belief order) Analysis of Variance was conducted to examine false belief performance across the two tasks. As predicted, there was a main effect of belief F (1, 80) = 5.19, p = 0.03, g2p = 0.06. Participants showed more bias on false belief trials (M = 0.06, SD = 0.07) than memory control trials (M = 0.04, SD = 0.06). The main effect of task was not significant F (1, 80) = 0.34, p = 0.57, nor was the belief by task interaction F (1, 80) = 0.11, p = 0.74. There was a significant task-by-task-order effect F (1, 80) = 17.09, p \ 0.001. Participants showed less bias overall (false belief and memory control collapsed) on whichever Sandbox task they performed second. This was more pronounced on the Paper and Pencil Sandbox task (M = 0.03, SD = 0.11 when performed second; M = 0.07, SD = 0.10 when performed first) than on the Real Object Sandbox task (M = 0.04, SD = 0.08 when performed second; M = 0.06, SD = 0.08 when performed first). There were no other significant interactions. Means and standard deviations for the Paper and Pencil Sandbox and the Real Object Sandbox are included in Table 1. As a measure of individual differences on our Paper and Pencil Sandbox task in Experiment 2, we examined the number of participants who performed without error. As in Experiment 1, we arbitrarily defined ‘‘errorless’’ performance as average bias (this time across all four trials) falling between -0.007 and ?0.007 (proportion). Three (0.04 proportion) participants exhibited errorless performance on the false belief trials; nine (0.11 proportion) participants exhibited errorless performance on the memory control trials. As a measure of individual differences on our Real Object Sandbox task, we examined the number of participants who performed without error. Again, we arbitrarily defined ‘‘errorless’’ performance as average bias (across all four trials) falling between -0.007 and ?0.007 (proportion). Eleven (0.13 proportion) participants exhibited errorless performance on the false belief trials; 12 (0.14 proportion) participants exhibited errorless performance on the memory control trials. One participant exhibited errorless performance on both the Paper and Pencil Sandbox and the Real Object Sandbox false belief trials (0.01 proportion). Three participants exhibited errorless performance on both the Paper and Pencil 123 Sandbox and the Real Object Sandbox memory control trials (0.04 proportion). Discussion Significant correlations emerged between the Paper and Pencil Sandbox task and the Real Object Sandbox task. This suggests that the two Sandbox tasks are tapping a similar construct. Research comparing performance on different ToM tasks is informative and contributes to the literature. Correlations between ToM tasks are not always found and some research indicates that different tasks tap different skills (Brent, Rios, Happé, & Charman, 2004; Henry, Phillips, Ruffman, & Bailey, 2013). For example, some tasks, such as the Reading the Mind in the Eyes task (Baron-Cohen et al., 2001), appear to tap more emotional recognition or affective processing, while other tasks, such as the Sandbox tasks tap more cognitive perspective taking (Henry et al., 2013). Additionally, results from a metaanalysis suggest that there is a relationship between performance on intelligence tasks and the Reading the Mind in the Eyes task (Baker, Peterson, Pulos, & Kirkland, 2014). This is but one example of research indicating that different ToM tasks could also be assessing various general cognitive abilities. Future research could consider running both versions of the Sandbox task with other advanced ToM tasks, such as the Strange Stories task (Happé, 1994), or the Reading the Mind in the Eyes task (Baron-Cohen et al., 2001). Investigation of how different ToM tasks relate to one another could further our understanding of ToM development. The results from Experiment 2 show that on both the Paper and Pencil Sandbox task, as well as the Real Object Sandbox task, participants have a tendency to shift more towards the outcome location (L2) when they are taking the perspective of the protagonist (false belief) compared to when they are simply remembering object location (memory control). There was one significant interaction, a task-by-task-order effect, suggesting that individuals showed less bias overall (higher accuracy) on whichever Sandbox task they performed second. The effect was more pronounced when participants completed the Paper and Pencil Sandbox second, than when they completed the Real Object Sandbox task second. This was most likely due to practice effects being more pronounced on the Paper and Pencil Sandbox task than on the Real Object Sandbox task. Additionally, it could be that performance on the Real Object Sandbox task is more stable than the version of the Paper and Pencil Sandbox task used in Experiment 2. Future research could seek to confirm this speculation. Psychological Research General discussion The Paper and Pencil Sandbox task is a convenient version of the Real Object Sandbox task. Both tasks provide continuous measures of false belief performance sensitive enough to detect perspective-taking bias in adults. In Experiment 1, participants showed more bias on false belief trials, which required perspective taking, than on memory control trials, which required memory for the original object location. Therefore, participants were more likely to shift their response to L2 (the outcome location) when they had to take the perspective of the protagonist than when they simply had to remember the object’s original location. In Experiment 2, we sought to validate the Paper and Pencil Sandbox task by having participants complete it along with the Real Object Sandbox task. The Paper and Pencil Sandbox task correlated with the Real Object Sandbox task, suggesting that the two tasks tap a similar construct. There were some differences in performance on the Paper and Pencil Sandbox task between the two experiments. Specifically, the proportion of bias was greater in Experiment 1, where we tested the Paper and Pencil Sandbox task on its own, compared to Experiment 2, where we tested the Paper and Pencil Sandbox task along with the Real Object Sandbox task. We made several changes to the Paper and Pencil Sandbox task in Experiment 2 to allow it to be run with the Real Object Sandbox task: We created nine new scenarios and omitted the long trials from the Paper and Pencil Sandbox task. Also, in Experiment 2, participants completed 18 trials between the two Sandbox tasks rather than the nine trials they completed in Experiment 1. It could be that the task used in Experiment 1 is a more sensitive measure of false belief performance. For example, in Experiment 1 the Sandbox task consisted of both long and short trials. It could be that participants are less able to develop performance strategies when two different trial lengths are used. We also blocked the false belief and memory control trials in Experiment 2 rather than alternating them as in Experiment 1. Additionally, we did not observe the same pattern of errorless trials in both experiments. Recall that in Experiment 1, there were no errorless memory control short trials and the consideration was that the results were being driven by negative bias on the memory control trials instead of a positive bias on the false belief trials. In Experiment 2, nine participants performed without error on the memory control trials and the overall mean bias on memory control trials was positive. In sum, the observed differences in performance on the Paper and Pencil Sandbox task between the two experiments are most likely due to changes in procedures, including changes to the Paper and Pencil Sandbox task itself. The current findings add to a line of research that highlights false belief reasoning bias in adults (see Royzman, Cassidy, & Baron, 2003). While both social and developmental psychologists have studied this topic extensively, the limited perspective-taking skills of normal adults seem to be inconsistent with many studies from the field of developmental psychology (Piaget, 1966; Tversky & Kahneman, 1974). In developmental psychology, it is often assumed that adults have a ‘‘full-blown’’ ToM after passing the first-order false belief task, suggesting that this capacity is in place much like our sensory information processes (Berk, 2012). Closer collaboration among social, cognitive, and developmental psychologists may add to our understanding of ToM in both children and adults. We believe that the Paper and Pencil Sandbox task could be used as a valuable tool to help bridge these subfields. The results from both experiments resemble those of Begeer et al. (2012) who demonstrated that typically developing adolescents showed more bias on a single false belief trial than on a single no-false belief trial in a paper and pencil version of the Real Object Sandbox task. The no-false belief trial used by Begeer et al. (2012) depicted the character placing a new object in a second location, rather than changing the location of the original object. In the current study (Experiment 1), the effect size denoting the difference between the false belief trials and memory control trials (d = 0.43 for long trials and d = 0.72 for short trials) was greater than the effect size between the single false belief trial and the single no-false belief trial reported by Begeer and colleagues (d = 0.35). In Experiment 2 of the current study, however, the effect size (d = 0.23) was smaller than that observed by Begeer et al. (d = 0.35). Additionally, Begeer and colleagues showed differences in performance between typically developing adolescents and adolescents with high functioning autism. Identification of perspective-taking deficits is an important element of clinical assessment, particularly when evaluating and treating developmental disorders. Future research could extend the findings of Begeer et al. (2012) and employ the Sandbox tasks to compare performance of individuals with and without developmental disorders. The two versions of the Sandbox task, the Paper and Pencil version and the Real Object version could conveniently be administered to various age groups from preschoolers to older adults, without having to adjust the procedures. Either task could be used to explore developmental differences in ToM performance. Future research could investigate whether older and younger adults show performance differences on the Paper and Pencil Sandbox task, as demonstrated with the Real Object Sandbox task (Bernstein, Thornton, & Sommerville, 2011). In sum, previous change-of-location tasks used only two locations and resulted in ceiling performance in most 123 Psychological Research typically developing individuals by the age of 6, making these tasks unsuitable for detecting false belief errors in older children and adults (Birch & Bloom, 2007). The current study has demonstrated a convenient Paper and Pencil version of the Sandbox task, successfully measuring adult false belief errors (bias) in situations that require individuals to take another person’s perspective. The Paper and Pencil Sandbox task correlated with the Real Object Sandbox task, further establishing its validity as a measure of false belief reasoning. Researchers could use the Paper and Pencil Sandbox task to assess false belief reasoning, and could do so easily and inexpensively. The Paper and Pencil Sandbox task could, thus be a valuable contribution to research on the development of perspective taking. Acknowledgments We thank Lecia Desjarlais, Devon Currie, Pawan Kahlon, John Dema-ala, and Kate Morrison who assisted with data collection, as well as Andre Aßfalg and Alessandro Metta for their comments. This work was supported by a grant from the Social Sciences and Humanities Research Council of Canada [410-20081681] and the Canada Research Chairs Program (950-228407). References Baker, C. A., Peterson, E., Pulos, S., & Kirkland, R. A. (2014). Eyes and IQ: a meta-analysis of the relationship between intelligence and ‘‘Reading the Mind in the Eyes’’. Intelligence, 44, 78–92. Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a ‘‘theory of mind’’ ? Cognition, 21, 37–46. Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001). The ‘‘Reading the Mind in the Eyes’’ test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. Journal of Child Psychology and Psychiatry, 42, 241–251. Begeer, S., Bernstein, D. M., van Wijhe, J., Scheeren, A. M., & Koot, H. (2012). A continuous false belief task reveals egocentric biases in children and adolescents with Autism Spectrum Disorders. Autism, 16, 357–366. Berk, L. E. (2012). Child development (9th ed.). Boston: Allyn and Bacon. Bernstein, D. M., Erdfelder, E., Meltzoff, A. N., Peria, W., & Loftus, G. R. (2011). Hindsight bias from 3 to 95 years of age. Journal of Experimental Psychology. Learning, Memory, and Cognition, 37, 378–391. Bernstein, D. M., Thornton, W. L., & Sommerville, J. A. (2011). Theory of mind through the ages: older and middle-aged adults exhibit more errors than do younger adults on a continuous falsebelief task. Experimental Aging Research, 37, 481–502. Birch, S. A. J., & Bloom, P. (2007). The curse of knowledge in reasoning about false beliefs. Psychological Science, 18, 382–386. Brent, E., Rios, P., Happé, F., & Charman, T. (2004). Performance of children with autism spectrum disorder on advanced theory of mind tasks. Autism, 8, 283–299. Carpendale, J. I., & Chandler, M. J. (1996). On the distinction between false belief understanding and subscribing to an interpretive theory of mind. Child Development, 67, 1686–1706. Devine, R. T., & Hughes, C. (2013). Silent films and strange stories: theory of mind, gender, and social experiences in middle childhood. Child Development, 84, 989–1003. 123 Happé, F. G. (1994). An advanced test of theory of mind: understanding the story characters’ thoughts and feelings by able autistic, mentally handicapped, and normal children and adults. Journal of Autism and Developmental Disorders, 24, 129–154. Henry, J. D., Phillips, L. H., Ruffman, T., & Bailey, P. E. (2013). A meta-analytic review of age differences in theory of mind. Psychology and Aging, 28, 826–839. Keysar, B., Lin, S., & Barr, D. J. (2003). Limits on theory of mind use in adults. Cognition, 89, 25–41. Lagattuta, K., Sayfan, L., & Harvey, C. (2013). Beliefs about thought probability: evidence for persistent errors in mindreading and links to executive control. Child Development, 85, 659–674. Maehara, Y., & Umeda, S. (2013). Reasoning bias in the recall of one’s own beliefs in a smarties task for adults. Japanese Psychological Research, 55, 292–301. Miller, S. A. (2009). Children’s understanding of second-order mental states. Psychological Bulletin, 135, 749–773. Mitchell, P., Robinson, E. J., Isaacs, J. E., & Nye, R. M. (1996). Contamination in reasoning about false belief: an instance of realist bias in adults but not children. Cognition, 59, 1–21. Nickerson, R. S. (1999). How we know and sometimes misjudge— what others know: imputing one’s own knowledge to others. Psychological Bulletin, 125, 737–759. Perner, J., & Roessler, J. (2012). From infants’ to children’s appreciation of belief. Trends in Cognitive Sciences, 16, 519–525. Perner, J., & Wimmer, H. (1985). ‘‘John thinks that Mary thinks that…’’ Attribution of second-order beliefs by 5- to 10- year- old children. Journal of Experimental Chld Psychology, 39, 437–471. Piaget, J. (1966). Judgment and reasoning in the child. Totowa: Littlefield, Adams & Company. Rakoczy, H., Harder-Kasten, A., & Sturm, L. (2012). The decline of theory of mind in old age is (partly) mediated by developmental changes in domain-general abilities. British Journal of Psychology, 103, 58–72. Royzman, E. B., Cassidy, K. W., & Baron, J. (2003). ‘‘I know, you know’’ Epistemic egocentrism in children and adults. Review of General Psychology, 7, 38–65. Rubio-Fernández, P., & Geurts, B. (2012). How to pass the falsebelief task before your 4th birthday. Psychological Science, 24, 27–33. Ryskin, R. A., & Brown-Schmidt, S. (2014). Do adults show a curse of knowledge in false-belief reasoning? A robust estimate of the true effect size. PLoS ONE, 9, e92406. doi:10.1371/journal.pone. 0092406. Scheeren, A. M., de Rosnay, M., Koot, H. M., & Begeer, S. (2013). Rethinking theory of mind in children and adolescents with autism. Journal of Child Psychology and Psychiatry, 54, 628–635. Sommerville, J. A., Bernstein, D. M., & Meltzoff, A. N. (2013). Measuring beliefs in centimeters: private knowledge biases preschoolers’ and adults’ representation of others’ beliefs. Child Development, 84, 1846–1854. Surtees, A. D. R., & Apperly, I. A. (2012). Egocentrism and automatic perspective taking in children and adults. Child Development, 83, 452–460. Tager-Flusberg, H., & Sullivan, K. (1994). A second look at secondorder belief attribution in autism. Journal of Autism and Developmental Disorders, 24, 577–586. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases. Science, 185, 1124–1131. Wellman, H. M., Cross, D., & Watson, J. (2001). Meta-analysis of theory-of-mind development: the truth about false belief. Child Development, 72, 655–684. Psychological Research Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13, 103–128. Yirmiya, N., Erel, O., Shaked, M., & Solomonica-Levi, D. (1998). Meta-analyses comparing theory of mind abilities of individuals with autism, individuals with mental retardation, and normally developing individuals. Psychological Bulletin, 124, 283–307. 123
© Copyright 2026 Paperzz