Journal of Experimental Psychology: General 2008, Vol. 137, No. 1, 39 –51 Copyright 2008 by the American Psychological Association 0096-3445/08/$12.00 DOI: 10.1037/0096-3445.137.1.39 The Control of Instrumental Action Following Outcome Devaluation in Young Children Aged Between 1 and 4 Years U. M. H. Klossek, J. Russell, and A. Dickinson University of Cambridge To determine the role of action– outcome learning in the control of young children’s instrumental behavior, the authors trained 18- to 48-month-olds to manipulate visual icons on a touch-sensitive display to obtain different types of video clips as outcomes. Subsequently, one of the outcomes was devalued by repeated exposure, and children’s propensity to perform the trained actions was tested in extinction. On test, children with a mean age greater than 2.5 years performed the action trained with the devalued outcome less than those trained with the still-valued outcome, thereby demonstrating that their actions were mediated by action– outcome learning. By contrast, the instrumental responses of younger children (mean age ⬍2 years) were resistant to outcome devaluation and may have been elicited directly by the icons associated with each response, rather than mediated by a specific action– outcome expectation. Keywords: goal-directed action, instrumental learning, outcome devaluation, children bust, as the novel response was retained for as long as 5– 6 days by 2- to 3-month-olds in the absence of any further training or exposure to the training situation (Rovee-Collier, 1983). However, most standard instrumental learning paradigms are designed to assess learning simply in terms of response acquisition. Therefore, although demonstrating that infants possess remarkable learning capacities even in the first few months of life, these studies do not permit inferences to be drawn regarding the intentional status of these actions. Establishing the goal-directed status of an action requires the demonstration that it is performed specifically to bring about a particular outcome, rather than just that it is acquired when it is followed by an attractive effect. Determining the intentional status of an instrumental action in the absence of a reliable verbal report from the agent is, however, not an easy matter. To address this issue empirically, students of animal learning have developed a behavioral assay for intentional, goal-directed action by devaluing the outcome following instrumental training. The concept of goal-directedness assayed by this procedure is one in which the action is goal-directed if performance is mediated by the interaction between two representations (Dickinson, 1985, 1989): The first is a representation of the causal or contingent relationship between the action and the outcome, whereas the second is a representation of the current value of the outcome. It is mediation by the action– outcome representation that ensures that the action is directed toward obtaining the outcome, whereas the dependence on a representation of the outcome value ensures that the outcome functions as a goal in controlling performance. In the prototypical outcome devaluation experiment (e.g., Adams & Dickinson, 1981), animals are initially trained to perform an instrumental action to gain access to a particular outcome, which is subsequently devalued. Although the effectiveness of the devaluation treatment depends on the nature of the outcome, a common procedure is the induction of specific satiety for the outcome through simple exposure (Balleine & Dickinson, 1998b; Colwill & Rescorla, 1985). This devaluation treatment must occur Most human adults rarely regard the capacity for intentional, goal-directed instrumental action as a remarkable achievement. We do it all the time: We are able to perform an action in the belief that it will bring about a goal or a change in the world that fulfils our needs and desires. Critically, a course of intentional action can be selected and initiated even though at the time of performing the action, the goal is not physically present. By being able to rely on internal representations of the desired goal or outcome and of the causal action– outcome relationship, intentional, goal-directed action enables agents to transcend their current physical environment and extend their sphere of influence to objects and events beyond the immediate present and local space. Although the capacity for goal-directed action in normal human adults does not require empirical demonstration, the intentional status of young children’s instrumental actions is less clear. Studies of instrumental learning in infancy have shown that infants, and even newborns, are sensitive to the contingent consequences of their behavior (Kalnins & Bruner, 1973; Rovee & Rovee, 1969; Siqueland, 1969; see also Hauf, Elsner, & Aschersleben, 2004) and that they reliably acquire a new instrumental action when it immediately causes an attractive outcome, but not when an equivalent outcome is presented independently or noncontingently (e.g., Rovee-Collier, Morrongiello, Aron, & Kupersmidt, 1978; Siqueland & DeLucia, 1969). This learning was remarkably ro- U. M. H. Klossek, J. Russell, and A. Dickinson, Department of Experimental Psychology, University of Cambridge, Cambridge, England. This research was supported by the Biotechnology and Biological Sciences Research Council, Medical Research Council, and the University of Cambridge Behavioural and Clinical Neuroscience Institute, supported by a joint award from the Medical Research Council and the Wellcome Trust. We thank M. R. F. Aitken for his assistance. Correspondence concerning this article should be addressed to A. Dickinson, Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom. E-mail: [email protected] 39 40 KLOSSEK, RUSSELL, AND DICKINSON in the absence of the opportunity to perform the instrumental response to prevent the formation of any direct association between the instrumental action and the devaluation treatment. Finally, the impact of the outcome devaluation is assessed through the subsequent performance of the instrumental action. If the action is goal-directed, performance of the action trained with the now devalued outcome should be reduced in this test relative to a control condition in which the outcome is not devalued. The critical feature of this test is that postdevaluation performance is measured in extinction and therefore in absence of the outcome. Testing in extinction ensures that performance is mediated by the representations the instrumental contingency established during training rather than through the direct impact of the devalued outcome during the test. We applied the same logic to investigate whether young children represent their own instrumental actions in terms of specific action– outcome relationships. There is currently little evidence that young children are able to select a specific action on the basis of the current value or utility of the past outcome of the action. In fact, studies that examined problem solving or planning, using, for example, child-appropriate versions of standard adult planning tasks, such as the Tower-of-Hanoi disc-transfer puzzle (Simon, 1975), have often found that 2- to 3-year-olds perform relatively poorly (e.g., Klahr & Robinson, 1981; Kopp, O’Connor, & Finger, 1975; O’Sullivan, Mitchell, & Daehler, 2001; Welsh, 1991). However, these tasks were designed to assess relatively complex cognitive skills—such as reasoning, planning, and rule-guided behavior—and it remains unclear whether young children experience difficulty in acquiring and performing the more basic form of goal-directed action assayed by the outcome devaluation procedure. Moreover, there are reasons for believing that infants and young children might have difficulty selecting an action on the basis of the current value of the outcome. For example, they perform reaching responses and other actions that supposedly reflect goaldirectedness, problem solving, and means– end planning, such as pulling a cloth to retrieve an out-of-reach toy, even when there is no goal, that is, no toy to be retrieved (e.g., Smith, Thelen, Titzer, & McLin, 1999; Willatts, 1999). Young children also have the tendency to reach perseveratively in object retrieval tasks and frequently repeat a learned response even though the action continues to be ineffective in obtaining a desirable outcome (e.g., Fox, Kagan, & Weiskopf, 1979; Gratch & Landers, 1971; Schutte & Spencer, 2002; Spencer, Smith, & Thelen, 2001). The purpose of the present series of experiments was therefore to determine whether instrumental actions performed by young children are sensitive to devaluation of the outcome. Experiment 1 demonstrated that children are sensitive to the instrumental contingency between the target response (touching an icon on a touch sensitive screen) and the outcome (the presentation of a short video clip). Experiments 2 and 3 then investigated whether devaluing the outcome by specific satiety had an effect on the subsequent instrumental behavior of children ranging between 18 and 48 months of age. Finally, Experiment 4 confirmed that the sensitivity to outcome devaluation was mediated by the instrumental contingency between action and outcome rather than being due to the predictive significance of the icons to which the responses were directed. Experiment 1 To ascertain that the touch response could be brought under instrumental control, we trained children of the youngest age used in our experiments to reach out and touch two different screen areas marked by red and green butterfly icons. During acquisition, one of the actions was followed by short video clips featuring colorful animated scenes, whereas the other produced no outcome. Given that the video clips were effective rewards for the children, the children should have learned to perform the rewarded action in preference to the nonrewarded one if they were sensitive to the action– outcome contingency. The second phase assessed the effect of reversing the instrumental contingency so that the action rewarded during acquisition no longer produced the outcome, whereas the previously unrewarded response did. It was important to determine whether the target actions were sensitive to changes in the action– outcome contingency. Unless the children adapted to the shift in contingency by reversing their preference for the two actions, it was possible that response perseveration would have masked any sensitivity to outcome devaluation. Method Participants Participants were 5 boys and 3 girls, aged between 19 and 26 months, with a mean age of 22.8 months (SD ⫽ 2.5), who were recruited from independent day care centers. Four children did not complete the session because they became distracted (2) or because of interference by other children (2) and were therefore replaced. Apparatus and Stimuli The experiment was run on a Sony Vaio laptop computer (F808K and GRX315MP) connected to a stand-alone flat panel Taxan CV600 LCD monitor with an effective display area of 21.5 ⫻ 16 cm and a screen resolution of 640 ⫻ 480, which was equipped with a Microtouch capacitive touch-screen system. Two external computer speakers (HK 195) connected to the laptop were situated behind the touch screen. The software used for controlling stimulus presentation and recording of responses was written and compiled using Microsoft Visual Basic Professional 6.0. Two 9 ⫻ 7.5-cm icons of a red and green butterfly on a white background were displayed on the screen, one on the left side and the other on the right side. These icons acted as targets for the touch response. Eight short video clips from a children’s cartoon were used as the outcomes. During pretraining, four 10-s clips were used, whereas four different 24-s clips constituted the outcomes during acquisition and reversal. All clips were in color and included music and other sounds but had no explicit verbal content. Procedure In this and all of the following experiments, the children were tested while seated at a table within easy reach of the touchsensitive monitor in a familiar room in their regular day nursery. They were introduced to the apparatus and the investigator by a member of the nursery staff well-known to the child. The child’s OUTCOME DEVALUATION IN CHILDREN carer and the investigator then sat next to the child at the table for the duration of the session. In some cases, the child sat on the carer’s lap for part or all of the session. Pretraining. On pretraining trials, only one icon was displayed in alternation on either the right or left side of the screen. Touching the butterfly picture always caused the picture to disappear and triggered a 10-s video presentation. After observing the experimenter complete two demonstration trials, one with the left icon and the other with the right icon, the child completed two practice trials that were identical to the demonstration trials. The order of icon presentation (i.e., left icon first vs. right icon first) and icon color were counterbalanced across participants. Training and reversal. At the start of each acquisition trial, both butterfly icons were presented, thus providing the child with a choice of two possible responses. The actual response– outcome contingency assignments were such that only one of the responses was rewarded with a video outcome, whereas the other response produced no outcome. The first correct touch response on the activated icon therefore always completed a trial. For half of the children, touching the left icon was rewarded, whereas the remaining children needed to touch the right icon to obtain a video reward. As soon as the rewarded response had been performed, the butterfly pictures disappeared and a 24-s video clip was shown. Once a video clip ended, the video display disappeared, the butterfly pictures were shown again, and the next trial began. After the eight acquisition trials had been completed, the response– outcome contingency was reversed for a further eight trials. Results and Discussion Acquisition and reversal performance was assessed by a discrimination ratio in the form of the reciprocal of the total number of responses per trial. Because the first correct response always terminated a trial, a ratio of 1 represents perfect discrimination, with lower values indicating poorer discrimination. The ratios were analyzed by a mixed analysis of variance (ANOVA), with the between-children response variable differentiating performance when right or left responses were correct during acquisition. The within-child variables of phase and trial distinguished performance during acquisition and reversal and on the different trials of each phase, respectively. The reliability of the effects in this and all subsequent analyses was assessed against a Type I error rate of .05. The analysis revealed no significant effects of the response variable, and therefore we collapsed the discrimination ratios across this variable for presentation. As Figure 1 illustrates, the relative performance of the correct, rewarded response increased over trials, both during acquisition and following reversal of the original response– outcome relationships on Trial 9. This description was confirmed by the statistical analyses. The significant main effect of trial, F(1, 7) ⫽ 3.82, MSE ⫽ 0.08, did not interact with phase, F(7, 42) ⫽ 1.09, p ⫽ .386, MSE ⫽ 0.01. Separate analyses for linear trend using the coefficients ⫺7, ⫺5, ⫺3, ⫺1, 1, 3, 5, 7 revealed a significant linear trend for response reciprocals both during acquisition, F(1, 42) ⫽ 4.41, MSE ⫽ 0.49, and reversal, F(1, 42) ⫽ 4.16, MSE ⫽ 0.12, thereby demonstrating that there was a significant increase over trials in children’s tendency to perform the correct rewarded response relative to the incorrect, unrewarded response. 41 Figure 1. Mean discrimination ratios during two-trial blocks of acquisition and after reversal. Error bars represent standard error. In summary, children of the youngest age used in the present series of experiments showed an increasing and significant tendency to perform the action that caused the video outcome to occur. Moreover, when the action– outcome assignments were changed, the children reversed their responding accordingly. This experiment has therefore shown that video outcomes are effective rewards for children aged between approximately 1 and 2 years and that their touch responses are sensitive to the instrumental response–reward contingencies. After establishing these necessary prerequisites, the next experiment was designed to examine whether these instrumental actions are sensitive to outcome devaluation. Experiment 2 In the second experiment, children were initially trained to touch both colored butterfly icons. Touching one butterfly produced video clips from one cartoon series as the outcome, whereas touching the other icon produced clips from a second cartoon series. Following this instrumental training, one of the video outcomes was devalued by presenting it repeatedly to reduce its incentive value by inducing satiety for this video. To assess the effectiveness of this devaluation treatment, we gave half of the children a reacquisition test in which they were given the opportunity to obtain further video outcomes under conditions identical to training. If the incentive or goal value of the video outcome had been reduced by the devaluation procedure, the children should have responded less for this devalued video than for the other outcome, which should have retained its incentive value. The critical condition to assess whether children’s instrumental responses were goal-directed was the postdevaluation extinction test experienced by the remaining children. During the extinction test, children were also given the opportunity to perform the two instrumental responses on the same display that was used during training. In this condition, however, no outcomes were presented to ensure that children’s performance reflected the knowledge that had been acquired during instrumental training. If performance was mediated by knowledge about the specific response– outcome relationships experienced during training, and if children were able KLOSSEK, RUSSELL, AND DICKINSON 42 to integrate this knowledge with the information concerning the changes in outcome value, they should have performed the response trained with the devalued outcome less than the response that had been trained with the still valued outcome. To assess whether sensitivity to outcome devaluation varied with age, we recruited the children into three 10-month age bands between 16 and 48 months. Method Participants Seventy-two children aged between 17.7 and 46.7 months were allocated to the extinction condition (18 girls, 18 boys) and the reacquisition condition (17 girls, 19 boys). Within each condition, participants were divided into three 10-month age bands, each comprising approximately equal numbers of boys and girls (see Table 1): 16 to 26.9 months (Age Group 1), 27 to 36.9 months (Age Group 2), and 37 to 48 months (Age Group 3). An additional 43 children (10 in Age Group 1, 17 in Age Group 2, and 16 in Age Group 3) did not complete a full session and were replaced. The reasons were the following: interference by other children, the child’s carer, or other events (17); technical problems or experimenter error (13); spontaneously abandoning the task because of distraction (11); failure to acquire the required touch responses (2). Apparatus and Stimuli The touch-screen apparatus and butterfly icons were the same as those used in Experiment 1. Three 24-s video clips from each of two different and highly discriminable children’s cartoons were used as outcomes. All clips were in color and included music and sounds produced by the characters, but had no explicit verbal content. Procedure All except 2 of the children were tested at their regular day nursery in the same way as described in Experiment 1. The other 2 children completed the task at our departmental testing facility accompanied by their mothers. Unless stated otherwise, the general procedure was the same as in Experiment 1. Results and Discussion Table 1 Age Profile of the Children (in Months) Variable Experiment 2 Extinction condition Age Group 1 Age Group 2 Age Group 3 Reacquisition condition Age Group 1 Age Group 2 Age Group 3 Experiment 3 Age Group 1 Experiment 4 Age Group 2 Age Group 3 Instrumental training. During the 9-min instrumental training phase, the two butterfly icons were shown side by side in the center of the touch screen display. Initially, the investigator directed the child’s attention to the screen display and then demonstrated each touch response once, thereby triggering the appropriate video presentation. Thereafter, the child was encouraged to touch the pictures and received verbal encouragement and praise for doing so. Every touch on one of the pictures always triggered a clip from the same video series so that one response was always followed by a clip from one of the two cartoons and the other response by a clip from the other cartoon. The assignment of the cartoons to the responses was counterbalanced across children within each group. Simultaneous responses on both pictures did not trigger a video, and children were only allowed to use one hand at a time to touch the pictures. Successive touch responses on the same icon triggered the three successive clips from the appropriate cartoon in a fixed order, regardless of whether clips from the other cartoon were produced in between. Outcome devaluation. Following training, the butterfly icons disappeared and the three 24-s video clips from one of the two cartoons were repeated four times, each with an interval of 3 s between clips. For half the children, these clips were from one cartoon, with the remaining children being shown the clips from the other cartoon. Because response– outcome assignments and icon location were counterbalanced across participants, this also meant that for half of the children in each group, it was the video outcome associated with left responses that was devalued, and for the remaining half the video outcome associated with right responses was devalued. Extinction and reacquisition tests. During the tests, the children again had the same opportunity to touch the red and green butterfly icons as during instrumental training. For children in the extinction test, none of their responses triggered any video presentations for a period of 2 min. Thereafter, the icons were activated again for a final 4-min period, which was not part of the test but ended the session on a positive note by allowing the children once again to produce the videos. For the children in the reacquisition condition, touching the butterfly icons triggered video presentations in the same way as during training. To compensate for the time taken up by the video presentations, we designed the instrumental test in the reacquisition condition to last 6 min. Instrumental Training n M SD Range 12 12 12 22.8 32.4 41.1 2.4 2.8 3.5 17.7–26.0 28.2–36.9 37.5–46.7 12 12 12 22.8 32.1 41.4 2.6 2.5 2.5 18.5–25.9 28.1–35.7 38.7–45.9 24 22.5 2.6 16.2–26.3 16 16 31.0 40.1 2.2 2.2 27.2–34.4 36.1–43.9 The number of responses per minute was calculated for the time during which the children had opportunity to respond to the butterfly icons. Because the variance of the response rates increased with the mean, a square-root transformation was applied to these rates in this and all subsequent analyses before they were evaluated by a mixed ANOVA. The between-children variables were age and condition, which distinguished between performance of the children in the extinction and reacquisition conditions. The within-child variable of devaluation contrasted performance of the action trained with the video outcome that was subsequently devalued with performance of the action trained with the outcome that was still valued at the time of testing. As Table 2 illustrates, the mean rate of responding increased systematically with age, F(2, 60) ⫽ 14.0, MSE ⫽ 0.81. More OUTCOME DEVALUATION IN CHILDREN 43 Table 2 Mean Responses per Minute (Square-Root Transformed) During Instrumental Training To-be-devalued response Variable Experiment 2 Extinction condition Age Group 1 Age Group 2 Age Group 3 Reacquisition condition Age Group 1 Age Group 2 Age Group 3 Experiment 3 Age Group 1 Experiment 4 Age Group 2 Age Group 3 Valued response M SD (M SD) M SD (M SD) 4.6 8.0 11.1 3.5 7.2 5.0 (2.0 0.8) (2.6 1.2) (3.3 0.7) 6.4 8.4 9.4 5.0 5.0 5.3 (2.4 1.0) (2.8 0.8) (3.0 0.8) 5.0 7.6 9.9 3.3 4.2 3.3 (2.1 0.7) (2.7 0.8) (3.1 0.5) 3.7 6.7 10.3 3.2 4.2 5.7 (1.8 0.7) (2.5 0.8) (3.1 0.8) 4.6 4.0 (2.0 0.8) 4.5 4.0 (2.0 0.8) 3.6 3.3 1.9 2.3 (1.8 0.5) (1.7 0.6) 3.5 2.8 1.9 1.2 (1.8 0.5) (1.6 0.4) importantly, however, there was no evidence that the response rates for the actions trained with the “to-be-devalued” and valued outcomes differed reliably during training. There was no significant main effect of devaluation, nor did this variable enter into any significant interactions (Fs ⬍ 1). Extinction Test Because the response rates during instrumental training varied as a function of age, the rates during the tests were expressed as a percentage of the absolute training rate to minimize the contribution of between-children variance. In the extinction condition, omission of the video outcome on test produced rapid extinction of responding, and some children failed to respond at all in the second minute of the extinction test. Therefore, test rates are based on responding during the first minute of the extinction test. The top panel of Figure 2, which presents responding in the extinction test as a percentage of the training rate, shows that a substantial devaluation effect was observed in the two older groups in that these children performed the response trained with the now devalued outcome at a lower relative rate than the response trained with the valued outcome. By contrast, there was no evidence that the youngest children were sensitive to outcome devaluation, as the relative rates at which they performed valued and devalued responses were not markedly different. This description was confirmed by statistical analysis. Because the variance of the percentage responding during the tests increased with the mean (see Figure 2), this measure was square-root transformed prior to analysis in this and the subsequent analyses (see Table 3 for the transformed data). There was a significant Age ⫻ Devaluation interaction, F(1, 30) ⫽ 4.8, MSE ⫽ 27.1, and analyses of the simple main effects showed that the children performed the valued response more than the devalued one in Age Group 2, F(1, 11) ⫽ 12.0, MSE ⫽ 25.4, and Age Group 3, F(1, 11) ⫽ 19.5, MSE ⫽ 25.5, but not in Age Group 1 (F ⬍ 1). Furthermore, evidence that the magnitude of the devaluation effect varied with chronological age comes from the significant positive correlation between age and discrimination ratio, where the latter represents the number of valued responses as a ratio of the total number of responses (r ⫽ .36, p ⫽ .033). Reacquisition Test The pattern of responding during the test in the reacquisition condition shows that the absence of a devaluation effect in the extinction test for the youngest children was not due to a failure of the devaluation treatment to reduce the incentive value of the exposed video clips. As the bottom panel of Figure 2 and Table 3 Figure 2. Mean percentage responding by the different age groups during the test in the extinction condition (top panel) and reacquisition condition (bottom panel) for the responses trained with the valued and devalued outcomes. Error bars represent standard error. KLOSSEK, RUSSELL, AND DICKINSON 44 Table 3 Mean of the Square-Root Transformed Percentage Responding During the Tests Devalued response Variable Experiment 2 Extinction condition Age Group 1 Age Group 2 Age Group 3 Reacquisition condition Age Group 1 Age Group 2 Age Group 3 Experiment 3 Age Group 1 Experiment 4 Age Group 2 Age Group 3 Valued response M SD M SD 9.4 6.5 6.1 5.2 5.4 3.1 9.6 13.8 15.3 6.3 7.5 6.8 9.9 8.1 4.7 3.9 3.2 2.8 17.7 13.5 10.6 10.6 6.0 6.3 9.3 6.4 10.2 7.6 6.9 6.6 4.4 4.7 9.9 11.4 4.7 4.7 illustrate, when the video outcomes were presented contingent on responding during reacquisition, children of all ages performed the response that produced the devalued video less often than the one which produced the valued video, F(1, 30) ⫽ 22.9, MSE ⫽ 30.6. Moreover, the fact that the F ratio for the interaction between age and devaluation was less than 1 indicates that the magnitude of the devaluation effect did not vary with age. There was also a significant effect of devaluation in each age group, Fs(1, 11) ⬎ 4.9, and in contrast to the extinction condition, there was no significant correlation between age and relative performance of the valued response during the reacquisition test (r ⫽ .21, p ⫽ .22). In summary, the instrumental responses of children aged between 2 and 4 years appeared to be goal-directed. The significant devaluation effect observed in Age Groups 2 and 3 demonstrated that these children could integrate instrumental knowledge acquired during training with information about a subsequent change in the value of one of the outcomes and adapt their behavior appropriately. Unexpectedly, however, the present results did not provide evidence for goal-directed action in the 18- to 27-month-olds, as the children in this age group responded equally for valued and devalued outcomes in the extinction test. The performance of the youngest reacquisition group confirmed that the absence of a significant devaluation effect in the extinction condition was not due to a failure of youngest children to discriminate between the different types of video outcome, nor a result of the ineffectiveness of the devaluation procedure. When allowed to earn video outcomes directly in the reacquisition test, the youngest children did show a significant preference for the response whose outcome had not been devalued, indicating that the repeated presentations of the relevant video series prior to the test had been successful in reducing the incentive or goal value of the video outcome for this age group. Experiment 3 examined a possible reason why our devaluation procedure may have failed to detect the goal-directed nature of the actions of the youngest children. Experiment 3 In a series of studies, Rovee-Collier and colleagues (for a review, see Rovee-Collier, 1997) have shown that a failure of instrumental performance by young infants does not always reflect an absence of learning because the presentation of a reminder cue in the form of stimuli from the training context could reinstate performance. The presence of the reminder cue aided retrieval of the training memory and the deployment of this memory in the control of responding. Therefore, it is possible that the youngest children experienced difficulty in retrieving information about the outcome devaluation experience during the extinction test. The fact that the youngest children showed a substantial devaluation effect in the reacquisition condition suggests that actually presenting the video outcome during the test may have aided retrieval of information of the relative values of the two outcomes, thereby allowing the youngest children to manifest differential responding in accordance with outcome value. Therefore, we examined the effect of providing children in the youngest age group with a reminder cue in the form of a still picture from either the devalued or valued video during the extinction test in the hope that this cue would facilitate retrieval of information about the relative values of the two outcomes. If the absence of a devaluation effect in Experiment 2 was due to the fact that the youngest children had difficulty in retrieving this information, then the children should have been more likely to perform the devalued response less than the valued one in an extinction test conducted in the presence of a reminder cue. Method Participants, Apparatus, and Stimuli Twenty-four children (13 boys and 11 girls) matching the age profile of the youngest age group in Experiment 2 (see Table 1) were trained and tested with the same apparatus, stimuli, and video outcomes as in Experiment 2. An additional 16 children did not complete a full session and were replaced for the following reasons: abandoning the task spontaneously (5); fussing or crying (4); equipment malfunction or experimenter error (3); interference by other children, carers, or other events (2); failure to perform the required touch responses during instrumental training (2). Procedure The instrumental training and outcome devaluation procedures were the same as those used in Experiment 2. The children were trained to touch the two butterfly icons in order to be presented with different video outcomes, before the video that acted as the outcome for one of the responses was devalued by repeated exposure. The children then received an extinction test that was also identical to that in Experiment 2, except for the presence of a reminder cue. This 5 ⫻ 7-cm cue was a still picture taken from one of the video cartoons and located above the two butterfly pictures in the center of the screen. For half of the children, the cue was from the devalued video cartoon; the remaining half was presented with a cue from the valued cartoon. Touching the cue picture during the extinction test had no programmed consequences. OUTCOME DEVALUATION IN CHILDREN Results and Discussion As the response rates during instrumental training did not differ significantly for the children presented with the valued and devalued reminder cues during extinction testing (F ⬍ 1), the rates were collapsed for presentations. As Table 2 illustrates, these rates were very similar to those produced by the youngest children in the extinction and reacquisition conditions of Experiment 2, and indeed, the F ratio for the contrast between the response rates of children in the present experiment versus the youngest groups in Experiment 2 was less than 1. As in Experiment 2, the two responses were performed at the same rate during training (F ⬍ 1). During the extinction test, the presence of the reminder cues in the present experiment did not induce a devaluation effect. Neither the main effect of the reminder cue valence on the square-root transform of the percentage responding relative to the training rates (F ⬍ 1) nor the Cue Valence ⫻ Devaluation interaction, F(1, 20) ⫽ 1.1, p ⫽ .31, MSE ⫽ 37.7, was significant, so the test performance was collapsed across the valence of the reminder cue for presentation. The mean percentage response rates for the valued and devalued responses were 125.8 (SE ⫽ 35.4) and 158.8 (SE ⫽ 39.6), respectively; which, importantly, did not differ significantly following the square-root transformation (F ⬍ 1; see Table 3). The absence of a reliable devaluation effect was not due to the failure of the valence of the reminder cues to transfer to the extinction test. During this test, the children touched not only the target butterfly icons but also the reminder cues and did so more frequently in the case of the valued cue compared with the devalued cue, t(1, 22) ⫽ 2.17. The mean number of contacts with the devalued cue was 1.0 (SE ⫽ 0.5), but 3.5 (SE ⫽ 1.0) with the valued cue. Seven of the 12 children presented with the devalued cue did not touch it, whereas all but 1 of the 12 children touched the valued cue at some point during the test. Given these data, we think that it is unlikely that absence of a devaluation effect for the youngest children was due to a failure to retrieve information about the relative values of the two outcomes during the test. Rather, it is more likely that the process controlling their touch responding was insensitive to representations of the current values of the outcomes. As mediation by a representation of outcome value is one of our criteria for goal-directedness (see the introduction), responding by the youngest children was not goal-directed by this criterion. We consider the nature of this process in the General Discussion. However, the differential responding to the valued and devalued reminder cues not only strengthens the interpretation of the insensitivity of the youngest children to outcome devaluation but also raises interpretative issues for the outcome devaluation effect observed in the older children. touched the valued reminder cue without explicit training or instruction demonstrates that attractive visual stimuli can directly elicit touch responses. At issue, therefore, is whether the children touched the butterfly icons because they learned that these responses produced valued outcomes or because the icons themselves became attractive stimuli through their association with the video outcomes and, as a result, became capable of directly eliciting the response. In other words, the question is whether icon touching was an instrumental response generated by the specific action– outcome contingency or an elicited response generated by the predictive relationship between the butterfly icons and their associated outcomes. By our criteria for goal-directedness (Dickinson, 1985, 1989; Dickinson & Balleine, 1993), only in the former case would we characterize the touch response as goaldirected. The fact that touch responding by the older children was sensitive to outcome devaluation does not decide the issue because there is extensive evidence that a cue associated with an intrinsically attractive event also loses its attractiveness and hence its capacity to elicit approach responses when the associated event is devalued (e.g., Colwill & Motzkin, 1994). To determine whether a behavior is under instrumental control, one must show that it is mediated by the relationship between the action and its specific outcome rather than by an association between the outcome and a discrete stimulus, such as the butterfly icon. To this end, the relationships between the stimuli and outcomes need to be held constant, so only the specific action under investigation is uniquely related to the outcome that it produces. This condition is instantiated in the so-called bidirectional paradigm (Grindley, 1932), which is illustrated by the modification of our procedure used in Experiment 4. During instrumental training, the children were presented with a single icon in the center of the screen that they first had to touch and then drag, either to the left or to the right, in order to produce video outcomes. Dragging the icon to the left produced video clips from one cartoon and dragging it to the right produced clips from the other cartoon. Although the initial touch response directed at the icon could have been elicited by the conditioned attractiveness of the icon through its association with the outcomes, the contingency between each drag response and its video outcome was entirely arbitrary. Consequently, the direction of the drag response could not have been mediated by a stimulus– outcome association and must have reflected a sensitivity to the instrumental contingency. At issue, therefore, is whether these actions were sensitive to outcome devaluation when tested in extinction. As only the older children showed an outcome devaluation effect in Experiment 2, we assessed the effect of the devaluation on the bidirectional drag response in children in the two older age bands. Method Experiment 4 The key finding of Experiment 2 was the significant devaluation effect observed in the two older age groups, which suggests that the children of approximately 32 months and above were capable of goal-directed action that was based on knowledge of the instrumental action– outcome contingency. Strictly speaking, however, the procedure employed in Experiment 2 does not warrant this conclusion. The fact that children in Experiment 3 spontaneously 45 Participants Thirty-two children (16 girls and 16 boys) aged between 27 and 48 months were divided into two 10-month age bands corresponding to those for Age Groups 2 and 3 in Experiment 2 (see Table 1). An additional 17 children in Age Group 2 and 13 children in Age Group 3 did not complete a full session and were replaced for the following reasons: interference by others or external events (8), 46 KLOSSEK, RUSSELL, AND DICKINSON failure to acquire the required instrumental actions (6), failure to complete the minimum number of responses during instrumental training (6), equipment malfunction and experimenter error (5), abandoning the task because of distraction or fussing (5). Procedure Unless otherwise stated, the apparatus, stimuli, video outcomes, and general procedure were the same as those used in Experiment 2. In the present experiment, the instrumental actions required moving a small butterfly-shaped icon in different directions across the display. The display used during both training and test showed a small, 2 ⫻ 2-cm butterfly picture in the center of the screen, which served as a response manipulandum for the bidirectional action. When this square was touched, a smaller, red butterfly shape appeared under the child’s finger that could then be dragged across the touch-sensitive display as long as contact with the screen was maintained. If the butterfly icon was dragged far enough onto one of the two invisible trigger areas (each representing a vertical 5 ⫻ 16-cm strip) on the far left and far right of the display and then released, a video clip was presented. If the icon was dragged and released anywhere else on the screen, it disappeared and had to be picked up again from the center, and no video clip was shown. However, drags to the left and right that fell just short of the trigger areas and thus did not produce a video outcome were recorded as an instrumental response if they fell within the immediately adjacent area, which was of the same height as and half the width of each trigger area. Action– outcome assignments and type of video devalued were counterbalanced across children in each age group but were always consistent for individual children so that, for example, a left drag would always produce clips from one cartoon, whereas a drag to the right would start a video clip from the other cartoon. Pretraining. To train the children to perform the required instrumental actions, the experimenter demonstrated and explained the movement sequence to the child on the first two practice trials. Thereafter, the child completed three practice trials with each action. If necessary, the experimenter demonstrated the icon pick-up and icon drag sequence again during these trials. During pretraining, only two short, 8-s clips, one from each series, were used as outcomes. Instrumental training. After pretraining, three new 12-s clips from each cartoon were used as instrumental outcomes. Initially, each response was trained separately. Half of the children were first trained to perform drags to the left, whereas the remaining half performed drags to the right first. In each case, one half of the screen was blacked out, so that only one of the two responses could be performed until five video outcomes had been obtained by dragging the icon in the relevant direction. After both responses had received this separate training, the whole screen was shown, and the child was free to make responses in either direction until at least four video outcomes from each series had been triggered or 4 min had elapsed, depending on which criterion was met first. In the latter case, to proceed to the next stage, a child had to produce at least two outcomes for each response to ensure that she or he had experienced both action– outcome contingencies. Including the five outcomes earned by each response during the single response training, this criterion ensured that the effect of outcome devaluation was assessed after each response had earned no less than seven outcomes but not more than nine. Outcome devaluation. The outcome devaluation was similar to that used in Experiment 2. One set of three 12-s video clips was repeated five times with an interclip interval of 3 s. The devalued video outcome was counterbalanced with respect to the response to which it was assigned, left or right, and the cartoon from which it came. Extinction test. Following outcome devaluation, the children were free to perform both actions for 2 min. However, neither response was followed by a video outcome. After the extinction test, the responses once again generated their respective outcomes for 2 min. Results and Discussion Instrumental performance was assessed during the period of instrumental training, in which the children were free to make both responses. As Table 2 shows, prior to devaluation there was no difference in the rate at which children performed the two actions. Following square-root transformation of the response rates, the F ratios for the effect of devaluation and the interaction of this variable with age were both less than 1. Age also did not significantly affect responding, F(1, 28) ⫽ 1.78, MSE ⫽ 0.17, p ⫽ .19. As Figure 3 and Table 3 illustrate, the children showed a devaluation effect in the extinction test by performing the response trained with the valued outcome relatively more than those trained with the devalued outcome, F(1, 28) ⫽ 7.86, MSE ⫽ 29.6. However, we can only be confident about the reliability of this devaluation effect for Age Group 3. Although there was no reliable interaction between age and devaluation (F ⬍ 1), planned comparisons revealed that the effect of devaluation was significant for the older children, F(1, 14) ⫽ 6.5, MSE ⫽ 27.1, but not for those in the younger Age Group 2, F(1, 28) ⫽ 2.14, MSE ⫽ 31.98, p ⫽ .17. The significant overall devaluation effect, which did not interact with age, is consistent with the claim that the children’s instrumental performance was primarily mediated by knowledge of the Figure 3. Mean percentage responding by the different age groups during the test for the responses trained with the valued and devalued outcomes. Error bars represent standard error. OUTCOME DEVALUATION IN CHILDREN action– outcome contingencies. Even when both actions were directed toward a common response manipulandum, children over 3 years of age were able to encode a particular outcome in relation to a representation of the specific action that had produced it during training. Although the apparent devaluation effect in the test performance of children with a mean age of 31 months leads to the same conclusion for the younger children, the fact that the effect was not statistically reliable for Age Group 2 in the absence of an interaction with age makes the results difficult to interpret. If action– outcome learning did take place during training in both age groups, this learning was apparently less effective in controlling instrumental performance in the younger compared with the older group. General Discussion The main finding from this series of experiments is that children over 3 years of age are capable of goal-directed action as assessed by the outcome devaluation paradigm. In the absence of the outcomes during the extinction tests, these children performed both stimulus-directed responses (Experiment 2) and arbitrary bidirectional actions (Experiment 4) at a reduced level if the training outcome had been previously devalued. To do so, at least in the case of the bidirectional action, they must have encoded the specific contingency between action and outcome during training before integrating this information with a representation of the current value of the outcome. This capacity meets the criteria for goal-directed action offered by Dickinson and colleagues (Dickinson, 1985, 1989; Dickinson & Balleine, 1993). There are two classes of psychological accounts of goal-directed behavior, the associative and the cognitive, both of which come in a variety of forms. Associative two-process theories assume that the pairing of the stimulus (S) with the outcome (O) establishes a stimulus– outcome association (S 3 O) through a Pavlovian learning process. The second, instrumental learning process then generates an outcome–response association (O 3 R) through experience with the response– outcome contingency, thereby allowing performance to be controlled by an S 3 O 3 R associative chain. On the basis of the assumption that the devaluation treatment generally decreases the excitability of the outcome representation, this model predicts reduced responding following devaluation. When expressed in terms of folk psychology, the sight of a butterfly icon made the children think of the associated outcome, which in turn made them think of the response that had produced that outcome. If outcome devaluation decreased the likelihood that the children were thinking of the devalued outcome, they would have been less likely to perform the devalued response. Whether or not two-process theory can explain the devaluation effects observed in the present experiments, and especially Experiment 4, depends on the nature of the instrumental learning process that generates the O 3 R link of the chain. According to Trapold and Overmier (1972), this link arises from the reinforcement of an association between the activated representation of the outcome that is excited by the stimulus and the response through the classic stimulus–response reinforcement mechanism (Hull, 1943; Thorndike, 1911). However, the demonstration of a devaluation effect with the bidirectional paradigm in Experiment 4 is problematic for this account. As the single visual icon was equally paired with both outcomes, each response would have been reinforced in 47 the presence of activated representations of both outcomes so that devaluation of either of the outcomes should have had a comparable effect on both responses. This problem does not arise in version of two-process theory espoused by Pavlov (1932) and his students (Asratyan, 1974; Gormezano & Tait, 1976), who assumed the O 3 R association arises directly from the pairing of each response with its outcome. As a consequence, each association is both outcome and response specific. A devaluation effect with the bidirectional response is also not problematic for the associative– cybernetic model of goaldirected behavior (Dickinson, 1994; Sutton & Barto, 1981; Thorndike, 1931). According to this theory, the sight of the butterfly icon causes the child to think of the two available responses, a left drag and a right drag, which would each in turn retrieve a thought of the associated outcome through the learned R 3 O associations. If the outcome is valuable, the theory assumes that the activation of the outcome representation feeds back on the response representation or thought to cause the child to perform it. Therefore, this theory assumes that a goal-directed action is mediated through an S 3 R 3 O associative chain, in which the activation of the response by the stimulus is not sufficient to generate overt responding without excitatory feedback from an activated and positively evaluated outcome representation. When expressed more colloquially, the R 3 O association enables the child to evaluate covertly the consequences of each action before deciding which one to perform. Cognitive theories of goal-directed behavior also come in a variety of forms. There are a gamut of action theories (Greve, 2001), which assume that human goal-directed behavior is mediated by explicit, propositional-like representations of instrumental contingencies, whereas causal model theory argues that agents construct models of the causal efficacy of their actions (Waldmann & Walker, 2005). The scope of these cognitive theories is not restricted to human action and, in one form or another, they have also been applied to goal-directed behavior of nonhuman animals (Blaisdell, Sawa, Leising, & Waldmann, 2006; Dickinson, 1980; Heyes & Dickinson, 1990). It remains an important research issue whether the capacity of young children for goal-directed action depends on associative or cognitive processes. In contrast to the oldest children, the sensitivity of the two younger age groups to outcome devaluation was less conclusive. Although the same overall pattern of results was obtained in the 27- to 36-month-olds, planned subgroup analyses in Experiment 4 failed to yield a significant devaluation effect in this group when a bidirectional response procedure was used. The rationale for using a bidirectional assay was to establish the mediation of responding by a representation of the action– outcome relationship rather than a stimulus– outcome association. Given that the devaluation effect was highly reliable for Age Group 2 in Experiment 2, which used a procedure that may have engaged stimulus– outcome learning, one possible interpretation is that the period between 2 and 3 years of age brings about a transition in behavioral control from stimulus– outcome learning to fully intentional goal-directed action. For the youngest children, on the other hand, we could find no evidence that their responding was sensitive to outcome devaluation. Only when their responses actually produced the outcomes, as in the reacquisition test in Experiment 2, did they respond less for the devalued outcome than for the valued one, which estab- 48 KLOSSEK, RUSSELL, AND DICKINSON lished that the devaluation treatment was effective for the 18- to 27-month-olds. Because the presentation of the outcomes during the reacquisition test may have reminded these children of the different values of the outcomes, we presented reminder cues for the video outcomes during the extinction test in Experiment 3 to parallel the conditions in the reacquisition test. However, the results indicated that the absence of a devaluation effect in the youngest group was not due to a failure of the outcome devaluation to transfer to the test of performance in extinction, as devalued and valued responses were performed at equivalent rates even when children’s responses to the reminder cues concurrently registered the differential value of the outcomes. Although our procedure in Experiment 3 therefore showed effective transfer of current outcome value, it is possible that presenting the reminder cues throughout the instrumental test may not have been the most effective way to aid retrieval of children’s training memories. Effective reminder treatments have involved a pre-cuing procedure (e.g., Spear & Parsons, 1976), using either a salient cue from the original training context as in the present study (e.g., Rovee-Collier, Sullivan, Enright, Lucas, & Fagen, 1980), or a brief period of retraining administered in the original training context (e.g., Campbell & Jaynes, 1966; see also Adler, Wilk, & Rovee- Collier, 2000). It is also possible that the youngest children’s insensitivity to outcome devaluation reflected the fact that they did not receive sufficient training to learn about the action– outcome contingencies. Although the duration of instrumental training was identical in all three age groups, the older children in Experiment 2 responded at a faster rate and therefore, on average, experienced a greater number of actions that were followed by an outcome (Age Group 2 M ⫽ 18.8, SE ⫽ 0.6; Age Group 3 M ⫽ 19.6, SE ⫽ 0.3) than the youngest children (Age Group 1 M ⫽ 16.9, SE ⫽ 0.9). Although the differences between these group means is not large, younger children may simply require relatively more training than older children for reliable action– outcome learning. As a result, the total number of responses made during training in the youngest group may still have failed to provide the critical level of exposure to the instrumental contingencies necessary for effective encoding of the relevant relationships at this age (see also Barr, Dowden & Hayne, 1996). In evaluating this explanation, however, it should be noted that animal studies of the effect of amount of training on sensitivity to outcome devaluation found that instrumental behavior is initially established as goal-directed and then becomes autonomous of the current goal value with further training (Adams, 1982; Dickinson, Balleine, Watt, Gonzalez, & Boakes, 1995; Holland, 2004; Killcross & Coutureau, 2003). Therefore, we consider it unlikely that the insensitivity of the youngest children to outcome devaluation was due to insufficient training. Finally, it also seems implausible that the failure to detect a devaluation effect in the youngest children was due to a lack of power in our tests. A contrast between the performance of the devalued and valued actions of the children in Age Group 1 of Experiments 2 and 3 combined (n ⫽ 36) yielded a t value less than 1 with a power greater than .98 to detect a difference as large as that exhibited by Age Group 2 in Experiment 2. In summary, the learning process that mediated responding by the youngest children in our task remains undetermined. Although the 18- to 27-month-olds appeared to have access to the current, relative values of the outcomes and clearly remembered how to perform the instrumental actions on test, their knowledge about the current incentive value of the outcomes did not seem to impact the processes responsible for the control of responding. As already noted, such resistance to outcome devaluation has often been observed in nonhuman animals following instrumental training even in the presence of the devalued outcome, as in Experiment 3 (Dickinson, Nicholas, & Adams, 1983), and especially following overtraining (e.g., Adams, 1982; Dickinson et al., 1995; Holland, 2004). On the basis of these findings, Dickinson and colleagues (Dickinson, 1985, 1989; Dickinson et al., 1995) have argued that instrumental training engages concurrently two different learning systems, one underlying goal-directed action and the other mediating stimulus–response learning, which does not encode the identity of the outcome. In human adults, the operation of this latter, habitual mechanism is clearly revealed when we make a slip-ofaction because a particular stimulus elicits a well-trained but unintended response (Reason, 1992). Therefore, according to this theory, the goal-directed and habitual systems are in competition for the control of behavior, with the particular training history and test conditions favoring one system over the other. The distinction between the goal-directed and habitual systems has recently been validated in animal studies by their dissociation in the rat prefrontal cortex (Balleine & Dickinson, 1998a; Corbit & Balleine, 2003; Coutureau & Killcross, 2003; Killcross & Coutureau, 2003; Ostlund & Balleine, 2005) and dorsal (Yin, Knowlton, & Balleine, 2004, 2005) and ventral striatum (Corbit, Muir, & Balleine, 2001). Critically, the acquisition of goal-directed action depends on the integrity of the prelimbic area of the rat prefrontal cortex (Ostlund & Balleine, 2005), which is thought to be homologous to dorsal prefrontal structures in the primate brain (Preuss, 1995; Rushworth, Walton, Kennerley, & Bannermann, 2004). Moreover, sensitivity to outcome devaluation is impaired by lesions of the orbital frontal cortex in monkeys (Baxter, Parker, Lindner, Izquierdo, & Murray, 2000; Izquierdo, Suda, & Murray, 2004). This analysis raises the possibility that the actions of the youngest children were mediated by the habitual system, which accords with findings suggesting that 2-year-olds are significantly less skilled and more susceptible to perseverative, stimulus-controlled performance on problem solving and reasoning tasks than 3-yearold children (e.g., DeLoache & Sharon, 2005; O’Sullivan et al., 2001). Moreover, the development of the capacity for goaldirected action, at least as assessed by outcome devaluation, may depend on the maturation of the prefrontal cortex, which is known to undergo progressive development throughout early childhood (see e.g., Giedd et al., 1999; J. Huttenlocher, 1979; P. R. Huttenlocher, 1990, 2002; Jernigan et al., 1991; Mrzlijak, Uylings, van Eden, & Judas, 1990; Pfefferbaum et al., 1993; Sowell, Delis, Stiles, & Jernigan, 2001). Nonetheless, the failure to find any evidence of truly goaldirected action in the youngest children in the present study would seem to conflict with claims made by numerous studies regarding the appreciation of goal-directed action in much younger infants and toddlers. On closer inspection, however, the conflict may be more apparent than real. First, while our studies concerned children’s appreciation of the linkage between their own actions and arbitrarily related, not currently perceived goal states, experimental studies of goal-directed action in infants have often investigated surprise reactions (assessed by looking time; for a review, see OUTCOME DEVALUATION IN CHILDREN Schöner & Thelen, 2006) to the anomalous (nonrational) acts of other agents (e.g., Gergely & Csibra, 2003; Woodward, 1998). For a recent attempt to explain discrepancies between abilities measured by looking time and the later capacity to take action or frame a judgment on the basis of such knowledge, see Russell (in press). Secondly, demonstrations of insightful goal-directed behavior in older infants and toddlers have tended to employ the technique of deferred imitation, in which a novel action on a novel object is demonstrated on day one, not practiced at that time, and then evoked days or weeks afterwards (for a review, see e.g., Meltzoff, 2002). But although these studies tell us that children much younger than those who failed to demonstrate a devaluation effect here can remember and reproduce the particular form of another’s bodily act (Meltzoff, 1988) and may even appreciate why it was constrained to be that particular way (Gergely, Bekkering, & Király, 2002), no knowledge is being evoked of how their own actions, established by instrumental learning, relate to rewarding outcomes arbitrarily linked to them. Moreover, when viewed in a different light, our data accord rather well with those from another area of cognitive– developmental research: the early development of executive function. Executive-inhibition tasks are typically those in which a prepotent lure has to be resisted while holding an arbitrary rule in working memory. It is generally claimed that between 3 and 4 years of age performance on executive-inhibition tasks becomes efficient (Perner & Lang, 1999; Russell, Mauthner, Sharpe, & Tidswell, 1991). However, when simpler executive-inhibition tasks are used, such as those requiring detour reaching plus an arbitrary act (Bı́ro, 2001; Russell, Judah, & Johnson, 2007), 3 years is the approximate age at which performance becomes less stimulus-controlled and less susceptible to the perseverative repetition of a habitual response. Whatever the merits of the hypothesis that the transition in behavioral control to fully intentional goal-directed action occurs between 2 and 3 years, the present study has provided an empirical demonstration that by the time they are 3 years old, children are capable of true goal-directed action because they can pursue specific goals that are not in their immediate perceptual field in a way that is directly sensitive to changes in the current value of the goal and mediated by knowledge of the causal consequences of their actions. This capacity is an important component of becoming a fully autonomous intentional agent. References Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology: Comparative and Physiological Psychology, 34(B), 77–98. Adams, C. D., & Dickinson, A. (1981). Instrumental responding following reinforcer devaluation. The Quarterly Journal of Experimental Psychology: Comparative and Physiological Psychology, 33(B), 109 –121. Adler, S. A., Wilk, A., & Rovee-Collier, C. (2000). Reinstatement versus reactivation effects on active memory in infants. Journal of Experimental Child Psychology, 75, 93–115. Asratyan, E. A. (1974). Conditional reflex theory and motivational behaviour. Acta Neurobiologiae Experimentalis, 34, 15–31. Balleine, B. W., & Dickinson, A. (1998a). Goal-directed instrumental action: Contingency and incentive learning and their cortical substrates. Neuropharmacology, 37, 407– 419. Balleine, B. W., & Dickinson, A. (1998b). The role of incentive learning 49 in instrumental outcome revaluation by sensory-specific satiety. Animal Learning & Behavior, 26, 46 –59. Barr, R., Dowden, A., & Hayne, H. (1996). Developmental changes in deferred imitation by 6- to 24-month-old infants. Infant Behavior and Development, 19, 159 –170. Baxter, M. G., Parker, A., Lindner, C. C. C., Izquierdo, A. D., & Murray, E. A. (2000). Control of response selection by reinforcer value requires interaction of amygdala and orbital prefrontal cortex. Journal of Neuroscience, 20, 311– 4319. Bı́ro, S. (2001). The role of the arbitrariness of actions in autistic executive functions. Unpublished doctoral dissertation, University of Cambridge, Cambridge, England. Blaisdell, A. P., Sawa, K., Leising, K. J., & Waldmann, M. R. (2006, February 17). Causal reasoning in rats. Science, 311, 1020 –1022. Campbell, B. A., & Jaynes, J. (1966). Reinstatement. Psychological Review, 73, 478 – 480. Colwill, R. M., & Motzkin, D. K. (1994). Encoding of the unconditioned stimulus in Pavlovian conditioning. Animal Learning & Behavior, 22, 384 –394. Colwill, R. M., & Rescorla, R. A. (1985). Postconditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes, 11, 120 –132. Corbit, L. H., & Balleine, B. B. (2003). The role of prelimbic cortex in instrumental conditioning. Behavioral Brain Research, 146, 145–157. Corbit, L. H., Muir, J. L., & Balleine, B. W. (2001). The role of the nucleus accumbens in instrumental conditioning: Evidence of a functional dissociation between accumbens core and shell. Journal of Neuroscience, 21, 3251–3260. Coutureau, E., & Killcross, S. (2003). Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behavioural Brain Research, 146, 167–174. DeLoache, J. S., & Sharon, T. (2005). Symbols and similarity: You can get too much of a good thing. Journal of Cognition and Development, 6, 33– 49. Dickinson, A. (1980). Contemporary animal learning theory. Cambridge, England: Cambridge University Press Dickinson, A. (1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society of London, Series B, 308, 67–78. Dickinson, A. (1989). Expectancy theory in animal conditioning. In S. B. Klein & R. R. Mowrer (Eds.), Contemporary learning theories: Pavlovian conditioning and the status of traditional learning theory (pp. 279 –308). Hillsdale, NJ: Erlbaum. Dickinson, A. (1994). Instrumental conditioning. In N. J. Mackintosh (Ed.), Animal learning and cognition (pp. 45–79). San Diego, CA: Academic Press. Dickinson, A., & Balleine, B. (1993). Actions and responses: The dual psychology of behaviour. In N. Eilan & R. McCarthy & B. Brewer (Eds.), Spatial representation (pp. 276 –293). Oxford, England: Blackwell. Dickinson, A., Balleine, B., Watt, A., Gonzalez, F., & Boakes, R. A. (1995). Motivational control after extended instrumental training. Animal Learning & Behavior, 23, 197–206. Dickinson, A., Nicholas, D. J., & Adams, C. D. (1983). The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. Quarterly Journal of Experimental Psychology: Comparative and Physiological Psychology, 35(B), 35–51. Fox, R., Kagan, J., & Weiskopf, S. (1979). The growth of memory during infancy. Genetic Psychology Monographs, 99, 91–130. Gergely, G., Bekkering, H., & Király, I. (2002, February 14). Rational imitation in preverbal infants. Nature, 415, 755. Gergely, G., & Csibra, G. (2003). Teleological reasoning in infancy: The naive theory of rational action. Trends in Cognitive Sciences, 7, 287– 292. 50 KLOSSEK, RUSSELL, AND DICKINSON Giedd, J. N., Blumenthal, J., Jeffries, N. O., Castellanos, F. X., Liu, D., Zijdenbos, A., et al. (1999). Brain development during childhood and adolescence: A longitudinal MRI study. Nature Neuroscience, 2, 861– 863. Gormezano, I., & Tait, R. W. (1976). The Pavlovian analysis of instrumental conditioning. Pavlovian Journal of Biological Sciences, 11, 37–55. Gratch, G., & Landers, W. F. (1971). Stage IV of Piaget’s theory of infant’s object concepts: A longitudinal study. Child Development, 42, 359 –372. Greve, W. (2001). Traps and gaps in action explanation: Theoretical problems of a psychology of human action. Psychological Review, 108, 435– 451. Grindley, G. C. (1932). The formation of a simple habit in guinea pigs. British Journal of Psychology, 23, 127–147. Hauf, P., Elsner, B., & Aschersleben, G. (2004). The role of action effects in infants’ action control. Psychological Research, 68, 115–125. Heyes, C., & Dickinson, A. (1990). The intentionality of animal action. Mind and Language, 5, 87–103. Holland, P. C. (2004). Relations between Pavlovian-instrumental transfer and reinforcer devaluation. Journal of Experimental Psychology: Animal Behavior Processes, 30, 104 –117. Hull, C. L. (1943). Principles of behavior. New York: Appleton. Huttenlocher, J. (1979). Synaptic density in human frontal cortex: Developmental changes and effects of ageing. Brain Research, 163, 195–205. Huttenlocher, P. R. (1990). Morphometric study of human cerebral cortex development. Neuropsychologia, 28, 517–527. Huttenlocher, P. R. (2002). Morphometric study of human cerebral cortex development. In M. H. Johnson (Ed.), Brain development and cognition: A reader (2nd ed., pp. 117–128). Malden, MA: Blackwell. Izquierdo, A., Suda, R. K., & Murray, E. A. (2004). Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. Journal of Neuroscience, 24, 7540 –7548. Jernigan, T., Archibald, S. L., Berhow, M. T., Sowell, E. R., Foster, D. S., & Hesselink, J. R. (1991). Cerebral structure on MRI, Part I: Localization of age-related changes. Biological Psychiatry, 29, 55– 67. Kalnins, I., & Bruner, J. S. (1973). The coordination of visual observation and instrumental behaviour in early infancy. Perception, 2, 307–314. Killcross, S., & Coutureau, E. (2003). Coordination of actions and habits in the medial prefrontal cortex of rats. Cerebral Cortex, 13, 400 – 408. Klahr, D., & Robinson, M. (1981). Formal assessment of problem-solving and planning processes in preschool children. Cognitive Psychology, 13, 113–127. Kopp, C. B., O’Connor, M. J., & Finger, I. (1975). Task characteristics and a Stage VI sensorimotor problem. Child Development, 46, 569 –573. Meltzoff, A. N. (1988). Infant imitation after a 1-week delay: Long-term memory for novel acts and multiple stimuli. Developmental Psychology, 24, 470 – 476. Meltzoff, A. N. (2002). Imitation as a mechanism of social cognition: Origins of empathy, theory of mind, and the representation of action. In U. Goswami (Ed.), Handbook of child cognitive development (pp. 6 –25). Oxford, England: Blackwell. Mrzlijak, L., Uylings, H. B. M., van Eden, J., & Judas, M. (1990). Neuronal development in human prefrontal cortex in prenatal and postnatal states. In H. B. M. Uylings, J. van Eden, M. A. de Bruin, M. A. Corner, & M. G. P. Feenstra (Eds.), The prefrontal cortex: Its structure, function, and pathology (Vol. 85, pp. 185–222). Amsterdam: Elsevier. Ostlund, S. B., & Balleine, B. W. (2005). Lesions of medial prefrontal cortex disrupt the acquisition but not the expression of goal-directed learning. The Journal of Neuroscience, 25, 7763–7770. O’Sullivan, L. P., Mitchell, L. L., & Daehler, M. W. (2001). Representation and perseveration: Influences on young children’s representational insight. Journal of Cognition and Development, 2, 339 –365. Pavlov, I. P. (1932). The reply of a physiologist to a psychologist. Psychological Review, 39, 91–127. Perner, J., & Lang, B. (1999). Development of theory of mind and executive control. Trends in Cognitive Sciences, 3, 337–344. Pfefferbaum, A., Mathalon, D. H., Sullivan, E. V., Rawles, J. M., Zipursky, R. B., & Lim, K. O. (1993). A quantitative MRI study of changes in brain morphology from infancy to late adulthood. Archives of Neurology, 51, 874 – 887. Preuss, T. (1995). Do rats have a PFC? Journal of Cognitive Neuroscience, 7, 1–24. Reason, J. (1992). Human error. Cambridge, England: Cambridge University Press. Rovee, C. K., & Rovee, D. T. (1969). Conjugate reinforcement of infant exploratory behavior. Journal of Experimental Child Psychology, 8, 33–39. Rovee-Collier, C. K. (1983). Infants as problem-solvers: A psychobiological perspective. In M. D. Zeiler & P. Harzem (Eds.), Advances in the analysis of behaviour (Vol. 3, pp. 63–101). Oxford, England: Wiley. Rovee-Collier, C. (1997). Dissociations in infant memory: Rethinking the development of implicit and explicit memory. Psychological Review, 104, 467– 498. Rovee-Collier, C., Morrongiello, B. A., Aron, M., & Kupersmidt, J. (1978). Topographical response differentiation in three-month-old infants. Infant Behavior and Development, 1, 323–333. Rovee-Collier, C. K., Sullivan, E. V., Enright, M. K., Lucas, D., & Fagen, J. W. (1980, June 6). Reactivation of infant memory. Science, 208, 1159 –1161. Rushworth, M. F. S., Walton, M. E., Kennerley, S. W., & Bannermann, D. M. (2004). Action set and decisions in the medial frontal cortex. Trends in Cognitive Sciences, 8, 410 – 416. Russell, J. (in press). Controlling core knowledge: Conditions for the ascription of intentional states to self and others by children. Synthêse. Russell, J., Judah, G., & Johnson, S. (2007). Performance in a third-person versus a first-person version of an executive task in preschool children. Manuscript submitted for publication. Russell, J., Mauthner, N., Sharpe, S., & Tidswell, T. (1991). The “windows task” as a measure of strategic deception in preschoolers and autistic subjects. British Journal of Developmental Psychology, 9, 331–349. Schöner, G., & Thelen, E. (2006). Using dynamic field theory to rethink infant habituation. Psychological Review, 113, 273–299. Schutte, A. R., & Spencer, J. P. (2002). Generalizing the dynamic field theory of the A-not-B error beyond infancy: Three-year-olds’ delay- and experience-dependent location memory biases. Child Development, 73, 377– 404. Simon, H. A. (1975). The functional equivalence of problem-solving skills. Cognitive Psychology, 7, 268 –288. Siqueland, E. R. (1969, March). The development of instrumental exploratory behaviour during the first year of human life. Paper presented at the meeting of the Society for Research in Child Development, Santa Monica, CA. Siqueland, E. R., & DeLucia, C. A. (1969, September 12). Visual reinforcement of nonnutritive sucking in human infants. Science, 165, 1144 –1146. Smith, L. B., Thelen, E., Titzer, R., & McLin, D. (1999). Knowing in the context of acting: The task dynamics of the A-not-B error. Psychological Review, 106, 235–260. Sowell, E. R., Delis, D., Stiles, J., & Jernigan, T. L. (2001). Improved memory functioning and frontal lobe maturation between childhood and adolescence: A structural MRI study. Journal of the International Neuropsychological Society, 7, 312–322. Spear, N. E., & Parsons, P. (1976). Alleviation of forgetting by reactivation treatment: A preliminary analysis of the ontogeny of memory processing. In D. L. Medin, W. Roberts, & R. Davis (Eds.), Processes in animal memory (pp. 135–165). Hillsdale, NJ: Erlbaum. Spencer, J. P., Smith, L. B., & Thelen, E. (2001). Tests of a dynamic systems account of the A-not-B error: The influence of prior experience OUTCOME DEVALUATION IN CHILDREN on the spatial memory abilities of two-year-olds. Child Development, 72, 1327–1346. Sutton, R. S., & Barto, A. G. (1981). An adaptive network that constructs and uses an internal model of its world. Cognition and Brain Theory, 4, 217–246. Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York: Macmillan. Thorndike, E. L. (1931). Human learning. New York: Appleton Century Crofts. Trapold, M. A., & Overmier, J. B. (1972). The second learning process in instrumental learning. In A. A. Black & W. F. Prokasy (Eds.), Classical conditioning: II. Current research and theory (pp. 427– 452). New York: Appleton. Waldmann, M. R., & Walker, J. M. (2005). Competence and performance in causal learning. Learning and Behavior, 33, 211–229. Welsh, M. (1991). Rule-guided behaviour and self-monitoring on the Tower of Hanoi disc-transfer task. Cognitive Development, 6, 59 –76. 51 Willatts, P. (1999). Development of means-end planning in young infants: Pulling a support to retrieve a distant object. Developmental Psychology, 35, 651– 667. Woodward, A. L. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition, 69, 1–34. Yin, H. H., Knowlton, B. J., & Balleine, B. B. (2004). Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. European Journal of Neuroscience, 19, 181– 189. Yin, H. H., Knowlton, B. J., & Balleine, B. B. (2005). Blockade of NMDA receptors in the dorsomedial striatum prevents action– outcome learning in instrumental conditioning. European Journal of Neuroscience, 22, 505–512. Received September 11, 2006 Revision received June 5, 2007 Accepted June 12, 2007 䡲 E-Mail Notification of Your Latest Issue Online! Would you like to know when the next issue of your favorite APA journal will be available online? This service is now available to you. Sign up at http://notify.apa.org/ and you will be notified by e-mail when issues of interest to you become available!
© Copyright 2026 Paperzz