The Contribution of Orbitofrontal Cortex to Action Selection SEAN B. OSTLUND AND BERNARD W. BALLEINE Department of Psychology and the Brain Research Institute, University of California at Los Angeles, Los Angeles, Calfornia, USA ABSTRACT: A number of recent findings suggest that the orbitofrontal cortex (OFC) influences action selection by providing information about the incentive value of behavioral goals or outcomes. However, much of this evidence has been derived from experiments using Pavlovian conditioning preparations of one form or another, making it difficult to determine whether the OFC is selectively involved in stimulus–outcome learning or whether it plays a more general role in processing reward value. Although many theorists have argued that these are fundamentally similar processes (i.e., that stimulus-reward learning provides the basis for choosing between actions based on anticipated reward value), several behavioral findings indicate that they are, in fact, dissociable. We have recently investigated the role of the OFC in the control of free operant lever pressing using tests that independently target the effect of stimulus–outcome learning and outcome devaluation on performance. We found that OFC lesions disrupted the tendency of Pavlovian cues to facilitate instrumental performance but left intact the suppressive effects of outcome devaluation. Rather than processing goal value, therefore, we hypothesize that the contribution of the OFC to goal-directed action is limited to encoding predictive stimulus–outcome relationships that can bias instrumental response selection. KEYWORDS: instrumental conditioning; goal-directed action; Pavlovian conditioning INTRODUCTION The orbitofrontal cortex (OFC) is thought by many to be a critical neural substrate of reward learning and action selection.1–6 How the OFC actually contributes to action selection, however, remains a matter of considerable debate. One possibility is that it is responsible for processing the motivational value of expected rewards. In support for this general account, damage to the OFC has been shown to disrupt the sensitivity of conditioned responses, including Address for correspondence: Sean Ostlund, Department of Psychology, UCLA, Box 951563, Los Angeles, CA 90095-14563. Voice: 310.825.2998; fax: 310.206.5895. [email protected] C 2007 New York Academy of Sciences. Ann. N.Y. Acad. Sci. 1121: 174–192 (2007). doi: 10.1196/annals.1401.033 174 OSTLUND & BALLEINE 175 anticipatory approach7–9 and conditioned reaching,10 to manipulations of expected reward value. Furthermore, a number of functional neuroimaging11,12 and single-unit recording studies13,14 have found anticipatory activity in the OFC corresponding to motivational features (e.g., magnitude and valence) of the expected outcome. However, other evidence suggests that the OFC contributes more to action selection than a simple evaluation of the incentive-motivational status of reward. For instance, some OFC neurons display anticipatory firing patterns related to the timing and location of reward delivery.15,16 In addition, although it is true that the anticipatory firing of some OFC neurons can be modulated by sensory-specific satiety (i.e., by selectively sating the subject on one of several food outcomes), the activity of a significant proportion of these neurons appears to be unaffected by this treatment.13 OFC neurons have also been shown to display preferences for sensory features of the anticipated outcome, like texture and taste.17 Such findings suggest that, rather than merely processing reward value, the OFC is involved in encoding a rich representation of the training outcome. In this paper we review several recent findings from our lab that call into question the simple reward processing view of OFC function. Much of the evidence implicating the OFC in processing outcome value has been obtained using behavioral tasks that rely predominantly on stimulus–outcome (S-O) learning. As we will see, however, S-O learning tasks are ill suited for studying the processes that underlie truly goal-directed action selection and that involve deciding between different courses of action based on the value of their consequences. Using tasks that are better suited to assess goal-directed action selection, however, we have found that the OFC plays an important but highly selective role in this selection process. Although these experiments suggest that the OFC plays little, if any, direct role in the way that the relative reward value of the instrumental outcome affects action selection and choice, it does appear to be critically involved in the way animals extract information from predictive relations between environmental cues regarding the likelihood of certain consequences and use this information to guide action selection accordingly; that is, the OFC appears to affect choice by influencing reward prediction rather than reward value. GOAL-DIRECTED ACTIONS IN RATS When it comes to action selection, not all actions are alike. Some are selected because they produce a desired outcome or goal whereas others are reflexively elicited whenever certain stimulus conditions arise. Of course, most naturally occurring activity can appear “goal directed” to an outside observer and, as a consequence, specific tests have to be conducted to ascertain the nature of the processes controlling any specific action. For example, it is perfectly 176 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES rational for a woodlouse to seek out shade when it finds itself in direct sunlight, but we would not want to mistake their negative phototaxis for a deliberated, goal-directed action. We therefore need empirically based criteria for distinguishing goal-directed actions from conditioned and unconditioned reflexes in nonhuman subjects. It has been argued that, in order to be considered goal directed, an action must satisfy two diagnostic criteria: the goal criterion and the contingency criterion.18,19 In order to satisfy the goal criterion, the performance of the action must be shown to depend on the desirability its consequences or outcome. Goaldirected actions should therefore be sensitive to motivational manipulations that render the outcome either more or less valuable to the organism. To satisfy the contingency criterion, an action must be shown to be dependent on the specific consequences that it produces; that is, it must be demonstrated that the performance of the action is mediated by the subject’s knowledge of the specific contingency, or causal relationship, that exists between the action and its outcome. Conditioned Approach Responses To see how these criteria are applied, consider first a typical Pavlovian conditioning study in which a hungry rat is given repeated pairings between a tone and a food outcome (FIG. 1). Aside from an orienting response, initially, the tone will have little impact on the rat’s behavior. Over the course of training, however, the rat will come to approach the location of the food delivery whenever the tone is presented. Does this conditioned approach behavior satisfy the criteria for goal-directed action? In order to answer this question, we must know much more than the conditioning procedure that was used. We must determine what the subject actually learned during training. It turns out that there are at least three forms of learning that could support approach responding in this situation (FIG. 1). One possibility is that this conditioned approach behavior is the natural consequence of encoding the tone–food relationship that was scheduled by the experimenter. According to this S-O view of learning, the tone comes to evoke an expectation of food, which will in turn elicit a set of species-typical foraging behaviors that prepare the rat for the food delivery, including, of course, the approach response. However, it is also true that the rat will tend to experience an incidental relationship between the approach response and food, since it must approach the food cup in order to consume the outcome. Encoding this approach-food, or response–outcome (R-O), relationship would also increase their performance of the approach response. In this case, however, the rat should approach the food cup because it believes that this action produces food, not because it anticipates food based on the cue presentation. A third possibility is that conditioned approach is supported by stimulus–response (S-R) learning. Indeed, many early OSTLUND & BALLEINE 177 FIGURE 1. Relationships that could support conditioned approach performance. See text for details. learning theorists posited that this single form of learning provides the basis for all acquired behavior.20 This view assumes that the outcome itself plays no direct role in controlling performance, but is instead responsible for modulating the strength of S-R associations according to a reinforcement/punishment process; whereas an appetitive outcome, or reinforcer, will tend to strengthen S-R associations, an aversive outcome, or punisher, will tend to weaken them. According to this account, conditioned approach behavior should be supported by a direct association between the tone stimulus and the approach response. These accounts make different predictions about the sensitivity of approach performance to post-training manipulations of outcome value. Both the R-O and S-O account assume that a representation of the training outcome serves as a critical mediating link in the chain of events that governs response selection. Therefore, both accounts predict that a reduction in outcome value will produce a corresponding response decrement. In contrast, the S-R view assumes that primary function of the outcome is catalytic; it reinforces the S-R association. As the outcome itself is not encoded in this associative structure, this S-R view predicts that post-training manipulations of outcome value should have no effect on performance. In contrast to this prediction, however, it has been repeatedly shown that manipulations of outcome value strongly affect conditioned approach performance.21,22 For example, Colwill and Motzkin22 trained rats on two distinct S-O contingencies, such that each stimulus (a tone and light) was paired with a different outcome (sucrose solution and food pellets). One of the outcomes was then devalued through lithium chloride–induced conditioned taste aversion training (i.e., subjects were made nauseous after consuming the 178 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES outcome). In a subsequent test, the rats displayed less conditioned approach to the food cup during the stimulus that had been paired with the now devalued outcome than during the other stimulus, whose training outcome remained valuable. Several features of this experiment are worth noting. First, the test was conducted in extinction to preclude an alternative interpretation based on the S-R account. Had the devalued outcome actually been delivered during this session, it would have had the opportunity to weaken the supporting S-R association. The observation of a devaluation effect in extinction, therefore, demonstrates that approach performance depends on an associative structure that incorporates a representation of the training outcome. Second, the outcome selectivity of this effect nicely rules out alternative interpretations based on nonspecific motivational or behavioral effects that might have resulted from the devaluation treatment. For instance, had the subjects been generally nauseous or inactive, a response decrement should have been observed for both stimuli regardless of which outcome they signaled during training. With regard to our criteria of goal-directed action, conditioned approach performance clearly satisfies the goal criterion. However, considerable evidence suggests that it fails the contingency criterion. Assessments of conditioned approach suggest that this response is not controlled by the R-O contingency; that is, its performance is not dependent on its relationship to its consequences. Holland23 exposed hungry rats to a situation in which a tone was paired with food delivery. Unlike the case in a typical conditioned approach experiment, however, these rats had to refrain from approaching the food source during the tone presentation in order to gain access to the food outcome; that is, the food delivery was cancelled on any trial in which an approach response was performed. On any goal-directed analysis of approach responding, rats should never acquire an approach response in this situation. Nevertheless, relative to a group that received the signal and food in an unpaired relation, Holland found that rats on this schedule acquired and maintained the approach response during the tone even though this resulted in the loss of a significant portion of the food available to them. Indeed, their level of responding was indistinguishable from that of rats exposed to a consistent pairing between the signal and food without the omission contingency. These results suggest that this simple anticipatory approach response is not acquired through R-O learning, but is instead the product of S-O learning. If the conditioned approach response is indeed acquired through a learning process that encodes the S-O association, then it should be particularly sensitive to manipulations of the Pavlovian contingency. In support of this hypothesis it has been shown that conditioned approach depends on how reliably the eliciting stimulus signals its particular outcome. An effective method for reducing the predictive status of a stimulus is to deliver its outcome noncontingently. This treatment ensures that, even though the stimulus continues to be paired with its outcome, it no longer serves as a reliable predictor of that outcome. Several studies have shown that this kind of manipulation, known as OSTLUND & BALLEINE 179 contingency degradation, weakens the capacity of a stimulus to elicit conditioned approach.24–26 For example, Delamater24 trained rats on two Pavlovian contingencies, such that each stimulus terminated with the delivery of a different food outcome. After this initial training phase, one of the S-O relationships was degraded by delivering its corresponding outcome noncontingently during the intertrial interval. The rats were found to adjust their approach behavior accordingly; that is, they approached the food cup less during the stimulus that no longer reliably signaled its outcome than during the other stimulus, even though both stimuli continued to be paired with their respective outcomes. Importantly, by employing an outcome-selective design, Delamater was able to control for nonspecific motivational (e.g., satiation) and behavioral (e.g., response competition) effects that might have complicated an interpretation of the results. This finding, therefore, provides strong evidence that conditioned approach is mediated primarily by the S-O contingency. Free Operant Performance The investigation of the learning and motivational processes that support the performance of conditioned approach has provided clear evidence that this response is controlled by predictive learning involving the S-O association and is not controlled by the R-O contingency. In contrast, recent evidence suggests that responses trained in the free operant situation have all the hallmarks of goal-directed actions. In free operant conditioning rats are taught to perform arbitrary actions to gain access to some valued goals or other; in the paradigm case, hungry rats are taught to press a freely available lever to gain access to a food reward. There is, in fact, considerable evidence that free operant lever pressing in rats is sensitive to the causal relation between the action and its consequences; not only will rats stop responding if lever pressing no longer delivers a valued food, but they will also stop responding even faster if their performance cancels otherwise freely available food (i.e., leads to the omission of the outcome).27,28 Likewise, when the situation is arranged such that access to a particular rewarding outcome is equally probable whether a rat presses the lever that delivers that reward or not, rats quickly stop performing the specific response that gains access to that outcome while maintaining their performance of other actions.18,29,30 Likewise, a number of studies have investigated the sensitivity of instrumental performance to post-training outcome devaluation.31,32 These studies have established that, under most training conditions, instrumental performance satisfies the goal criterion of goal-directed action. In a recent example, we trained hungry rats to press two levers, one for food pellets and another for sucrose solution.33 A specific-satiety treatment was then used to selectively devalue one of the two training outcomes; that is, rats were allowed to freely consume one outcome for 1 hour before the rats were returned to the experimental chamber 180 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES for a test in which they were allowed to choose between the two levers in extinction. The rats were found to substantially reduce their performance of the action that had earned the pre-fed (devalued) outcome, but continued to perform the other action at a high rate. These findings confirm that a dichotomy exists between the processes that control conditioned responses, like the anticipatory approach response, and those that control goal-directed instrumental actions. Importantly, although both categories of behavior display sensitivity to outcome devaluation, indicating that neither is the product of a purely S-R structure, only instrumental performance exhibits sensitivity to manipulations of R-O contingency. The conditioned approach response and free operant lever pressing differ qualitatively; rats approach because they anticipate a valuable outcome, but they lever press because they believe that this action will produce a valuable outcome. This distinction is particularly important for researchers interested in the neural basis of action selection. It seems unlikely that much will be revealed about the neural processes unique to goal-directed action selection by studying the substrates of Pavlovian S-O learning. This distinction also raises a warning: it is possible to devise a behavioral task that uses nominally instrumental procedures but that generates S-O learning. If the behavioral product of the task can be construed as a conditioned approach response directed at the goal location or the signal itself, it is difficult to argue that it represents an arbitrary, instrumental action. In this case, care must be taken to determine what role the scheduled R-O contingency actually plays in controlling performance. But what about the influence of changes in outcome value on conditioned approach behavior? As with lever pressing, the approach response exhibits sensitivity to outcome devaluation and although lever pressing and conditioned approach seem to be supported by fundamentally different associative structures, it remains a possibility, therefore, that they rely on a common reward process. Indeed, as we will see in the next section, it has been argued that Pavlovian learning provides the motivational support for instrumental performance. However, we will also review a number of recent findings that are incompatible with this view and that suggest, instead, that instrumental action selection is governed by a separate reward-evaluation process. OUTCOME VALUE AND THE OFC As mentioned in the introduction, there is considerable evidence that the OFC plays a role in processing outcome value. One of the most convincing pieces of evidence for this claim is the finding that although rats with OFC lesions readily acquire the anticipatory approach response to a stimulus paired with a food reward, their performance of this response is insensitive to the devaluation of the rewarding outcome.7–9 Similarly, the effect of outcome devaluation on approach performance can also be abolished by lesions of the OSTLUND & BALLEINE 181 basolateral amygdala34 (BLA), a structure that has long been implicated in the processing of the emotional properties of events.35 The BLA shares reciprocal connections with the OFC,36–39 and recent evidence suggests that the sensitivity of a conditioned reaching task to changes in outcome value depends on the interaction of the BLA and OFC. For example, Baxter et al.40 trained monkeys on a task in which the identity of a target object signaled which of two outcomes (e.g., fruit or peanut) could be earned by displacing that object. The performance of unoperated monkeys on this task was found to be sensitive to outcome devaluation; when pre-fed to satiety on one of the two outcomes, the unoperated group tended to select an object that signaled the non–pre-fed (valued) outcome over an object that signaled the pre-fed (devalued) outcome. In contrast, the performance of monkeys with a unilateral lesion of the OFC plus a unilateral lesion of the contralateral BLA was found to be significantly less sensitive to outcome devaluation and, indeed, their impairment was similar to that observed when either of these structures was lesioned bilaterally.41,42 Furthermore, this deficit did not appear to be due to a simple additive effect of OFC and BLA damage, as monkeys with ipsilateral lesions of the OFC and BLA did not show an equivalent level of impairment.43 Generally, these results have been interpreted as suggesting that the BLA and OFC work in tandem as parts of a broader circuit that uses anticipated value to guide response selection. Other studies using a range of species and experimental techniques have implicated the OFC in outcome encoding. Single-unit recording studies have found cue-evoked, anticipatory firing in the OFC during go/no-go performance.14,44 When it is considered that the object displacement task used in primates, although nominally instrumental, involves a clear S-O learning component, and when taken together with evidence that OFC lesions abolish the sensitivity of conditioned approach to outcome devaluation7–9 and disrupt the influence of Pavlovian outcome expectations on instrumental response selection,45 it becomes clear that there is scant evidence directly implicating the OFC in instrumental R-O learning. Although this might seem trivial—as mentioned above, sensitivity to outcome devaluation has been found in both Pavlovian conditioned responses and goal-directed instrumental actions—it is of prime importance. In fact, current evidence suggests that instrumental and Pavlovian “values” are not mediated by a common process, but are in fact determined by distinct evaluative processes mediated by independent neural systems.18,19,29 Pavlovian and Instrumental Values Although they are clearly supported by distinct associative structures, it has long been assumed that Pavlovian conditioned responses and instrumental actions rely on a common incentive or reward process. This view is often expressed in the form of one or other version of two-process theory.46–48 As mentioned earlier, the fact that instrumental actions are typically performed on 182 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES some manipulandum or other ensures that there is an embedded Pavlovian relationship between the sensory features of that manipulandum (e.g., the sight of a lever) and the outcome delivery. Many theories of instrumental performance place heavy emphasis on this relationship, proposing that the Pavlovian process that it engages is responsible for modulating the expression of instrumental learning; specifically, environmental cues that signal reward are assumed to facilitate or invigorate instrumental actions. In line with this account, the influence of Pavlovian learning over instrumental performance can be demonstrated using the Pavlovian-instrumental transfer effect; introducing a stimulus that has been independently paired with food into the instrumental learning situation has long been known to facilitate instrumental performance.49 The transfer effect has since been well established and, in some cases, shown to be outcome specific.22,50,51 The top panel of FIGURE 2 depicts the results of a typical transfer experiment from our laboratory.33 Hungry rats were first given Pavlovian training with two S-O contingencies (e.g., white noise → grain pellets & tone → sucrose solution) before being given instrumental training on two R-O contingencies (e.g., left lever press → grain pellets & right lever press → sucrose solution). After training, the rats were returned to the chamber for a test session in which they were allowed to perform both actions in extinction (i.e., no outcomes were delivered) and, at various points in the session, the two stimuli were presented while the rats were lever pressing. As can be seen in the figure, stimulus presentations selectively facilitated the action with which it shared a common outcome (Same), relative to the action that earned a different outcome (Different). A number of theorists have argued that this Pavlovian-instrumental interaction is responsible for mediating the influence of manipulations of incentivemotivation on instrumental performance.48 From this perspective, outcome devaluation is thought to affect performance indirectly by attenuating the facilitatory contribution of the Pavlovian process. Importantly, this account posits that the impact of outcome devaluation on both Pavlovian conditioned approach and instrumental lever pressing is mediated by the same incentive process that underlies the transfer effect. Consequently, all three phenomena should also depend on a common neural circuitry. Given the evidence described above, one might predict that this circuit involves both the OFC and BLA. Alternatively, of course, it is possible that instrumental reward processing does not rely on Pavlovian learning. In this case, we should expect to find both behavioral and neural dissociations between these three phenomena. Dissociating Transfer and Devaluation A number of studies have shown that lesions of the BLA made before training disrupt both instrumental outcome devaluation and outcome-selective transfer.52–54 In addition, a preliminary study from our laboratory found that OSTLUND & BALLEINE 183 FIGURE 2. The results of a Pavlovian-instrumental transfer test, plotted across successive 2-min trials for the Baseline (pre-stimulus) period and for stimuli Same and Different. The top panel presents a typical effect, taken from a sham-lesioned control group. The middle and bottom panels present the results of OFC-lesioned rats that underwent surgery either before (middle) or after (bottom) training. 184 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES excitotoxic BLA lesions made after training were also effective in abolishing these effects.55 Thus, the BLA seems to be involved in incentive processing for both conditioned approach and instrumental performance and also appears to play a critical role in mediating the influence of Pavlovian learning over instrumental action selection. The OFC also appears to be involved in Pavlovian-instrumental transfer. For instance, we recently contrasted the effects of OFC lesions made before or after initial training on the transfer effect.33 These data are presented in the middle (Pre-training group) and bottom (Post-training group) panels of FIGURE 2, directly under the “typical” transfer data which came from the group that served as a sham-lesioned control in this study. Although pre-training lesions had no apparent effect, post-training lesions did impair transfer performance. This pattern of findings suggests that although OFC contributes to transfer performance under normal conditions, it may not play an essential role. In the absence of the OFC, other structures, perhaps including the BLA, may be sufficient to support normal transfer performance. Of course, it is also possible that compensation occurred at the level of the OFC and that more complete damage to this region prior to training would have resulted in impairment. This finding clearly implicates the OFC in the control of instrumental performance. It is also consistent with the notion that the OFC contributes to a general incentive-motivational system that is responsible for both transfer and reward processing. In order to assess this claim, and thereby provide a more thorough characterization of the involvement of the OFC in instrumental action selection, we also investigated whether these lesions would have an impact on the sensitivity of instrumental performance to outcome devaluation. In contrast to the simple reward processing view of OFC function, however, we found that both pre- and post-training lesioned groups exhibited normal shifts in their choice performance after outcome devaluation. Instrumental outcome devaluation appears, therefore, to be dissociable from Pavlovian-instrumental transfer, at least at the level of the OFC.33 These findings indicate that the OFC is particularly important for using S-O learning to guide action selection. They also call into question the notion that reward evaluation in instrumental performance is mediated by a Pavlovian learning process. In fact, this is not the first piece of evidence against this latter view of incentive processing. For instance, Corbit and Balleine56 found that lesions of the prelimbic area of the prefrontal cortex (PL) also produce dissociable effects on instrumental outcome devaluation and Pavlovian-instrumental transfer. In this study, however, the pattern of effects was the reverse of those observed in the OFC; that is, lesions of the PL abolished the sensitivity of instrumental performance to outcome devaluation but had no effect on transfer. PL lesions have also been shown to impair instrumental contingency degradation learning, indicating that this structure is important for encoding R-O associations.19,29,56 When taken together, therefore, these studies provide evidence OSTLUND & BALLEINE 185 that the respective contributions of S-O and R-O associations to instrumental action selection can be doubly dissociated at the level of the prefrontal cortex, with the OFC supporting the former and the PL supporting the latter. The incentive-motivational version of two-process theory is also challenged by studies that have investigated the nature of the transfer effect. For instance, the notion that Pavlovian learning provides motivational support for action selection is not easily reconciled with the selectivity typically found in transfer studies using outcomes of approximately the same motivational value (e.g., food pellets and sucrose solution). Indeed, Pavlovian cues have been shown to guide response selection based on features of the outcome representation that are motivationally neutral.57 Of course, such findings are not entirely incompatible with an incentive-motivational interpretation of transfer, they merely argue that this phenomenon is supported, at least in part, by a sensorily rich expectation of the training outcome. More damning still, however, is evidence that transfer stimuli and outcome devaluation can act independently of one another in controlling performance. In order to explain the outcome devaluation effect, the two-process account assumes that the influence of Pavlovian learning over response selection is gated by the current value of the mediating outcome. With standard instrumental lever pressing, the sight of the lever should remind the rat of its associated outcome, but whether this facilitates pressing or not must depend on how desirable that outcome is to the rat. Similarly, in transfer, the potency of a particular cue in facilitating performance should depend on the value of the anticipated outcome. In contrast to this prediction, several studies have found that reducing an outcome’s value does not affect its capacity to mediate Pavlovian-instrumental transfer.58–60 INFORMATION, CHOICE, AND THE OFC Thus, there seems to be overwhelming evidence against the claim that the influence of reward value on instrumental response selection is mediated by S-O learning. We do not deny, of course, that Pavlovian cues influence choice. Rather, it is clear that the influences of reward value and of predictive cues on choice performance are mediated by distinct processes that are subserved by distinct neural systems. If, however, Pavlovian learning does not influence instrumental performance though an incentive or motivational mechanism then it remains to be determined how it acts to guide action selection. Some two-process theories apply a cognitive interpretation to transfer-like phenomena,60–62 suggesting that Pavlovian cues have the capacity to elicit an expectation of their outcome that, in turn, results in the selection of any action (or actions) associated with that outcome. According to this account, the tendency of an environmental cue to remind the agent of a particular outcome is enough to bias the agent’s decision to perform an action associated with that outcome. In line with this interpretation, the influence of Pavlovian cues on action selection has been shown to depend on the predictive status of that 186 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES cue.24,63 One piece of evidence comes from the Delamater24 study we mentioned earlier. After selectively degrading one of two S-O contingencies and observing the effect of this treatment on conditioned approach performance, Delamater conducted a test of Pavlovian-instrumental transfer using these stimuli. He found that the stimulus from the degraded contingency had not only lost its capacity to elicit approach, but it was also ineffective in facilitating instrumental performance. In contrast to the degraded contingency, the stimulus from the other, nondegraded contingency was unaffected and generated clear evidence of transfer. When carefully considered, this finding suggests a possible hypothesis as to why OFC lesions disrupt the influence of Pavlovian cues on action selection. As mentioned above, Pavlovian conditioned responses are not under the subject’s control; these responses are elicited by conditioned stimuli and, in that sense, are not truly selected in the same way one selects the best option among alternatives. Although Pavlovian cues are of limited value in selecting a conditioned response, cues with a relatively high predictive validity are of considerable value in selecting between different courses of action. Stimuli that provide information on the likely payoff associated with selecting one as opposed to another course of action reduces uncertainty and, as such, can and should bias action selection. Hence, stimuli that are highly predictive of specific consequences are likely to bias choice toward actions associated with those consequences. This analysis of transfer explains why reducing the predictive validity of a cue reduces its influence on action selection. It also suggests that the reason why OFC lesions abolish transfer is because they too remove the capacity of cues to provide information about their specific outcomes. From this perspective, the contribution of the OFC to goal-directed instrumental performance has little to do with processing reward value and instead involves encoding the predictive status of cues so that they may bias action selection. If this account is true, then animals with lesions of the OFC should be relatively insensitive to changes in the predictive status of Pavlovian cues. In order to test this hypothesis, we assessed the effect of OFC lesions on Pavlovian contingency degradation learning.33 Rats were trained to asymptote on two distinct S-O relationships (e.g., noise → grain & tone → sucrose) before being given sham or excitotoxic lesions of the OFC. Training resumed after a brief recovery period. During this phase, however, one of the two S-O relationships was degraded by delivering the corresponding outcome with a fixed probability throughout the session regardless of whether its stimulus was present or absent. The results of this experiment are presented in FIGURE 3. The sham group displayed normal Pavlovian contingency learning, withholding their magazine approach performance during the stimulus that was paired with the noncontingent outcome (Degraded), but maintaining their performance to the control stimulus (Nondegraded). In contrast, the OFC group exhibited a general decrease in responding to both stimuli, regardless of their predictive status. OSTLUND & BALLEINE 187 FIGURE 3. The results of Pavlovian contingency degradation training, plotted across successive 2-day blocks for the Degraded stimulus and the Nondegraded stimulus. The top panel displays the results of the sham-lesioned group and the bottom panel displays the results of the OFC-lesioned group. Open circles represent approach performance on the last day of Pavlovian training. These data reveal that, although not necessary for conditioned approach performance per se, the OFC is critical for encoding and updating predictive S-O relationships; that is, for establishing the relative validity of predictive cues with respect to their specific consequences. Without this capacity, it 188 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES seems likely that conditioned approach performance comes under the control of alternative learning processes. For instance, one interpretation of the nonspecific reduction in responding displayed by the OFC group is that their performance is supported by their intact capacity for R-O learning. In this experiment, of course, although no explicit contingency was scheduled between the approach response and food, occasional approach-food pairings could have supported some indiscriminate approach behavior (i.e., approach should have been equally likely during both stimuli and the inter-trial interval). This account also fits nicely with an early finding that OFC lesions render conditioned approach performance abnormally sensitive to the introduction of an instrumental contingency,64 confirming that this structure is selectively involved in S-O but not R-O learning. CONCLUSIONS These findings call into question the simple view that the OFC is generally involved in processing the incentive value of outcomes. As we have seen, the OFC of rodents and the lateral but likely not the differentiated medial part of the OFC of primates65 (the latter being, perhaps, homologous to the rodent prelimbic cortex66 ) only appears to be necessary for processing outcome value when task performance is likely to be mediated by S-O learning. This analysis also applies to electrophysiological and functional imaging studies reporting evidence of OFC involvement in reward processing. The fact that OFC activity can reflect the motivational value of expected outcomes has been well established.11–14 These studies, however, tend to involve substantial S-O learning components, raising the possibility that this neural activity reflects a purely Pavlovian incentive process. Nevertheless, the OFC appears to be involved in more than just processing the incentive value of Pavlovian outcomes; it also appears to encode the predictive relationship between specific cues and their respective outcomes and, hence, is important in mediating the way that this learning influences instrumental performance. Although the OFC does not appear to mediate the influence of goal value over instrumental action selection, the findings reviewed here suggest that it is involved in the way information extracted from the environment is used to decide between alternative courses of action. ACKNOWLEDGMENT The preparation of this article and much of the research that it describes was supported by Grant No. 56446 from the National Institute of Mental Health. OSTLUND & BALLEINE 189 REFERENCES 1. HOLLAND, P.C. & M. GALLAGHER. 2004. Amygdala-frontal interactions and reward expectancy. Curr. Opin. Neurobiol. 14: 148–155. 2. HOLLERMAN, J.R., L. TREMBLAY & W. SCHULTZ. 2000. Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior. Prog. Brain Res. 126: 193– 215. 3. O’DOHERTY, J.P. 2004. Reward representations and reward-related learning in the human brain: insights from neuroimaging. Curr. Opin. Neurobiol. 14: 769–776. 4. ROLLS, E.T. 2004. The functions of the orbitofrontal cortex. Brain Cogn. 55: 11–29. 5. ROBERTS, A.C. 2006. Primate orbitofrontal cortex and adaptive behaviour. Trends Cogn. Sci. 10: 83–90. 6. SCHOENBAUM, G. & M. ROESCH. 2005. Orbitofrontal cortex, associative learning, and expectancies. Neuron 47: 633–636. 7. GALLAGHER, M., R.W. MCMAHAN & G. SCHOENBAUM. 1999. Orbitofrontal cortex and representation of incentive value in associative learning. J. Neurosci. 19: 6610–6614. 8. PICKENS, C.L. et al. 2003. Different roles for orbitofrontal cortex and basolateral amygdala in a reinforcer devaluation task. J. Neurosci. 23: 11078–11084. 9. PICKENS, C.L. et al. 2005. Orbitofrontal lesions impair use of cue-outcome associations in a devaluation task. Behav. Neurosci. 119: 317–322. 10. IZQUIERDO, A., R.K. SUDA & E.A. MURRAY. 2004. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J. Neurosci. 24: 7540–7548. 11. GOTTFRIED, J.A., J. O’DOHERTY & R.J. DOLAN. 2003. Encoding predictive reward value in human amygdala and orbitofrontal cortex. Science 301: 1104–1107. 12. O’DOHERTY, J. et al. 2000. Sensory-specific satiety-related olfactory activation of the human orbitofrontal cortex. Neuroreport 11: 893–897. 13. CRITCHLEY, H.D. & E.T. ROLLS. 1996. Hunger and satiety modify the responses of olfactory and visual neurons in the primate orbitofrontal cortex. J. Neurophysiol. 75: 1673–1686. 14. TREMBLAY, L. & W. SCHULTZ. 1999. Relative reward preference in primate orbitofrontal cortex. Nature 398: 704–708. 15. LIPTON, P.A., P. ALVAREZ & H. EICHENBAUM. 1999. Crossmodal associative memory representations in rodent orbitofrontal cortex. Neuron 22: 349–359. 16. ROESCH, M.R., A.R. TAYLOR & G. SCHOENBAUM. 2006. Encoding of timediscounted rewards in orbitofrontal cortex is independent of value representation. Neuron 51: 509–520. 17. ROLLS, E.T. et al. 1999. Responses to the sensory properties of fat of neurons in the primate orbitofrontal cortex. J. Neurosci. 19: 1532–1540. 18. BALLEINE, B.W. & A. DICKINSON. 1998. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37: 407–419. 19. DICKINSON, A. & B.W. BALLEINE. 1993. Actions and responses: the dual psychology of behaviour. In Spatial Representation. N. Eilan, R. McCarthy & M.W. Brewer, Eds.: 277–293. Basil Blackwell Ltd. Oxford. 20. HULL, C.L. 1943. Principles of Behavior. Appleton. New York. 21. HOLLAND, P.C. & J.J. STRAUB. 1979. Differential effects of two ways of devaluing the unconditioned stimulus after Pavlovian appetitive conditioning. J. Exp. Psychol. Anim. Behav. Process 5: 65–78. 190 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES 22. COLWILL, R.M. & D.K. MOTZKIN. 1994. Encoding of the unconditioned stimulus in Pavlovian conditioning. Anim. Learn. Behav. 22: 384–394. 23. HOLLAND, P.C. 1979. Differential effects of omission contingencies on various components of Pavlovian appetitive conditioned responding in rats. J. Exp. Psychol. Anim. Behav. Process 5: 178–193. 24. DELAMATER, A.R. 1995. Outcome-selective effects of intertrial reinforcement in Pavlovian appetitive conditioning. Anim. Learn. Behav. 23: 31–39. 25. DURLACH, P.J. & D.O. SHANE. 1993. The effect of intertrial food presentations on anticipatory goal-tracking in the rat. Q. J. Exper. Psychol. B. 46: 289–318. 26. FARWELL, B.J. & J.J. AYRES. 1979. Stimulus-reinforcer and response-reinforcer relations in the control of conditioned appetitive headpoking (goal tracking) in rats. Learn. Motiv. 10: 295–312. 27. DICKINSON, A. et al. 1998. Omission learning after instrumental pretraining. Q. J. Exper. Psychol. B. 51: 271–286. 28. UHL, C.N. 1974. Response elimination in rats with schedules of omission training, including yoked and response-independent reinforcement comparisons. Learn. Motiv. 5: 511–531. 29. BALLEINE, B.W. 2005. Neural bases of food-seeking: affect, arousal and reward in corticostriatolimbic circuits. Physiol. Behav. 86: 717–730. 30. HAMMOND, L.J. 1980. The effect of contingency upon the appetitive conditioning of free-operant behavior. J. Exp. Anal. Behav. 34: 297–304. 31. ADAMS, C.D. & A. DICKINSON. 1981. Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B. 33: 109–121. 32. COLWILL, R.M. & R.A. RESCORLA. 1985. Postconditioning devaluation of a reinforcer affects instrumental responding. J. Exp. Psychol. Anim. Behav. Process 11: 120–132. 33. OSTLUND, S.B. & B.W. BALLEINE. 2007. Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental conditioning. J. Neurosci. 27: 4819– 4825. 34. HATFIELD, T. et al. 1996. Neurotoxic lesions of basolateral, but not central, amygdala interfere with Pavlovian second-order conditioning and reinforcer devaluation effects. J. Neurosci. 16: 5256–5265. 35. BALLEINE, B.W. & S. KILLCROSS. 2006. Parallel incentive processing: an integrated view of amygdala function. Trends Neurosci. 29: 272–279. 36. KITA, H. & S.T. KITAI. 1990. Amygdaloid projections to the frontal cortex and the striatum in the rat. J. Comp. Neurol. 298: 40–49. 37. KRETTEK, J.E. & J.L. PRICE. 1977. Projections from the amygdaloid complex to the cerebral cortex and thalamus in the rat and cat. J. Comp. Neurol. 172: 687–722. 38. MCDONALD, A.J. 1991. Organization of amygdaloid projections to the prefrontal cortex and associated striatum in the rat. Neuroscience 44: 1–14. 39. MCDONALD, A.J., F. MASCAGNI & L. GUO. 1996. Projections of the medial and lateral prefrontal cortices to the amygdala: a Phaseolus vulgaris leucoagglutinin study in the rat. Neuroscience 71: 55–75. 40. BAXTER, M.G. et al. 2000. Control of response selection by reinforcer value requires interaction of amygdala and orbital prefrontal cortex. J. Neurosci. 20: 4311–4319. 41. MALKOVA, L., D. GAFFAN & E.A. MURRAY. 1997. Excitotoxic lesions of the amygdala fail to produce impairment in visual learning for auditory secondary reinforcement but interfere with reinforcer devaluation effects in rhesus monkeys. J. Neurosci. 17: 6011–6020. OSTLUND & BALLEINE 191 42. IZQUIERDO, A., R.K. SUDA & E.A. MURRAY. 2004. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J. Neurosci. 24: 7540–7548. 43. IZQUIERDO, A. & E.A. MURRAY. 2004. Combined unilateral lesions of the amygdala and orbital prefrontal cortex impair affective processing in rhesus monkeys. J. Neurophysiol. 91: 2023–2039. 44. SCHOENBAUM, G., A.A. CHIBA & M. GALLAGHER. 1998. Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nat. Neurosci. 1: 155–159. 45. MCDANNALD, M.A. et al. 2005. Lesions of orbitofrontal cortex impair rats’ differential outcome expectancy learning but not conditioned stimulus-potentiated feeding. J. Neurosci. 25: 4626–4632. 46. ASRATYAN, E.A. 1974. Conditioned reflex theory and motivational behavior. Acta Neurobiol. Exp. 34: 15–31. 47. BOLLES, R.C. 1972. Reinforcement, expectancy, and learning. Psychol. Rev. 79: 394–409. 48. RESCORLA, R.A. & R.L. SOLOMON. 1967. Two-process learning theory: relationships between Pavlovian conditioning and instrumental training. Psychol. Rev. 74: 151–183. 49. ESTES, W.K. 1943. Discriminative conditioning. I: a discriminative property of conditioned anticipation. J. Exp. Psychol. 32: 150–155. 50. COLWILL, R.M. & R.A. RESCORLA. 1988. Associations between the discriminative stimulus and the reinforcer in instrumental learning. J. Exp. Psychol. Anim. Behav. Process 14: 155–164. 51. KRUSE, J.M. et al. 1983. Pavlovian conditioned stimulus effects on instrumental choice behavior are reinforcer specific. Learn. Motiv. 14: 165–181. 52. BALLEINE, B.W., A.S. KILLCROSS & A. DICKINSON. 2003. The effect of lesions of the basolateral amygdala on instrumental conditioning. J. Neurosci. 23: 666– 675. 53. BLUNDELL, P., G. HALL & S. KILLCROSS. 2001. Lesions of the basolateral amygdala disrupt selective aspects of reinforcer representation in rats. J. Neurosci. 21: 9018–9026. 54. CORBIT, L.H. & B.W. BALLEINE. 2005. Double dissociation of basolateral and central amygdala lesions on the general and outcome-specific forms of pavlovianinstrumental transfer. J. Neurosci. 25: 962–970. 55. OSTLUND, S.B. & B.W. BALLEINE. 2005. Lesions of the orbitofrontal cortex disrupt Pavlovian, but not instrumental, outcome encoding. Soc. Neuro. Abstracts Program No. 71.2. 56. CORBIT, L.H. & B.W. BALLEINE. 2003. The role of prelimbic cortex in instrumental conditioning. Behav. Brain Res. 146: 145–157. 57. FEDORCHAK, P.M. & R.C. BOLLES. 1986. Differential outcome effect using a biologically neutral outcome difference. J. Exp. Psychol. Anim. Behav. Process 12: 125–130. 58. COLWILL, R.M. & R.A. RESCORLA. 1990. Effect of reinforcer devaluation on discriminative control of instrumental behavior. J. Exp. Psychol. Anim. Behav. Process 16: 40–47. 59. HOLLAND, P.C. 2004. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process 30: 104–117. 60. RESCORLA, R.A. 1994. Transfer of instrumental control mediated by a devalued outcome. Anim. Learn. Behav. 22: 27–33. 192 ANNALS OF THE NEW YORK ACADEMY OF SCIENCES 61. BALLEINE, B.W. & S.B. OSTLUND. 2007. Still at the choice-point: action selection and initiation in instrumental conditioning. Ann. N. Y. Acad. Sci. 1104: 147–171. 62. TRAPOLD, M.A. & J.B. OVERMIER. 1972. The Second Learning Process in Instrumental Learning. Appleton-Century-Crofts. New York. 63. RESCORLA, R.A. 1999. Learning about qualitatively different outcomes during a blocking procedure. Learn. Behav. 27: 140–151. 64. CHUDASAMA, Y. & T.W. ROBBINS. 2003. Dissociable contributions of the orbitofrontal and infralimbic cortex to pavlovian autoshaping and discrimination reversal learning: further evidence for the functional heterogeneity of the rodent frontal cortex. J. Neurosci. 23: 8771–8780. 65. PADOA-SCHIOPPA, C. & J.A. ASSAD. 2006. Neurons in the orbitofrontal cortex encode economic value. Nature 441: 223–226. 66. ONGUR, D. & J.L. PRICE. 2000. The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb. Cortex 10: 206–219.
© Copyright 2026 Paperzz