The Contribution of Orbitofrontal Cortex to Action Selection

The Contribution of Orbitofrontal
Cortex to Action Selection
SEAN B. OSTLUND AND BERNARD W. BALLEINE
Department of Psychology and the Brain Research Institute, University
of California at Los Angeles, Los Angeles, Calfornia, USA
ABSTRACT: A number of recent findings suggest that the orbitofrontal
cortex (OFC) influences action selection by providing information about
the incentive value of behavioral goals or outcomes. However, much of
this evidence has been derived from experiments using Pavlovian conditioning preparations of one form or another, making it difficult to
determine whether the OFC is selectively involved in stimulus–outcome
learning or whether it plays a more general role in processing reward
value. Although many theorists have argued that these are fundamentally similar processes (i.e., that stimulus-reward learning provides the
basis for choosing between actions based on anticipated reward value),
several behavioral findings indicate that they are, in fact, dissociable.
We have recently investigated the role of the OFC in the control of free
operant lever pressing using tests that independently target the effect of
stimulus–outcome learning and outcome devaluation on performance.
We found that OFC lesions disrupted the tendency of Pavlovian cues to
facilitate instrumental performance but left intact the suppressive effects
of outcome devaluation. Rather than processing goal value, therefore, we
hypothesize that the contribution of the OFC to goal-directed action is
limited to encoding predictive stimulus–outcome relationships that can
bias instrumental response selection.
KEYWORDS: instrumental conditioning; goal-directed action; Pavlovian
conditioning
INTRODUCTION
The orbitofrontal cortex (OFC) is thought by many to be a critical neural
substrate of reward learning and action selection.1–6 How the OFC actually contributes to action selection, however, remains a matter of considerable debate.
One possibility is that it is responsible for processing the motivational value
of expected rewards. In support for this general account, damage to the OFC
has been shown to disrupt the sensitivity of conditioned responses, including
Address for correspondence: Sean Ostlund, Department of Psychology, UCLA, Box 951563, Los
Angeles, CA 90095-14563. Voice: 310.825.2998; fax: 310.206.5895.
[email protected]
C 2007 New York Academy of Sciences.
Ann. N.Y. Acad. Sci. 1121: 174–192 (2007). doi: 10.1196/annals.1401.033
174
OSTLUND & BALLEINE
175
anticipatory approach7–9 and conditioned reaching,10 to manipulations of expected reward value. Furthermore, a number of functional neuroimaging11,12
and single-unit recording studies13,14 have found anticipatory activity in the
OFC corresponding to motivational features (e.g., magnitude and valence) of
the expected outcome.
However, other evidence suggests that the OFC contributes more to action
selection than a simple evaluation of the incentive-motivational status of reward. For instance, some OFC neurons display anticipatory firing patterns
related to the timing and location of reward delivery.15,16 In addition, although
it is true that the anticipatory firing of some OFC neurons can be modulated by
sensory-specific satiety (i.e., by selectively sating the subject on one of several
food outcomes), the activity of a significant proportion of these neurons appears to be unaffected by this treatment.13 OFC neurons have also been shown
to display preferences for sensory features of the anticipated outcome, like
texture and taste.17 Such findings suggest that, rather than merely processing
reward value, the OFC is involved in encoding a rich representation of the
training outcome.
In this paper we review several recent findings from our lab that call into
question the simple reward processing view of OFC function. Much of the
evidence implicating the OFC in processing outcome value has been obtained
using behavioral tasks that rely predominantly on stimulus–outcome (S-O)
learning. As we will see, however, S-O learning tasks are ill suited for studying the processes that underlie truly goal-directed action selection and that
involve deciding between different courses of action based on the value of
their consequences. Using tasks that are better suited to assess goal-directed
action selection, however, we have found that the OFC plays an important
but highly selective role in this selection process. Although these experiments
suggest that the OFC plays little, if any, direct role in the way that the relative
reward value of the instrumental outcome affects action selection and choice,
it does appear to be critically involved in the way animals extract information
from predictive relations between environmental cues regarding the likelihood
of certain consequences and use this information to guide action selection
accordingly; that is, the OFC appears to affect choice by influencing reward
prediction rather than reward value.
GOAL-DIRECTED ACTIONS IN RATS
When it comes to action selection, not all actions are alike. Some are selected
because they produce a desired outcome or goal whereas others are reflexively
elicited whenever certain stimulus conditions arise. Of course, most naturally
occurring activity can appear “goal directed” to an outside observer and, as
a consequence, specific tests have to be conducted to ascertain the nature
of the processes controlling any specific action. For example, it is perfectly
176
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
rational for a woodlouse to seek out shade when it finds itself in direct sunlight,
but we would not want to mistake their negative phototaxis for a deliberated,
goal-directed action. We therefore need empirically based criteria for distinguishing goal-directed actions from conditioned and unconditioned reflexes in
nonhuman subjects.
It has been argued that, in order to be considered goal directed, an action
must satisfy two diagnostic criteria: the goal criterion and the contingency
criterion.18,19 In order to satisfy the goal criterion, the performance of the action
must be shown to depend on the desirability its consequences or outcome. Goaldirected actions should therefore be sensitive to motivational manipulations
that render the outcome either more or less valuable to the organism. To satisfy
the contingency criterion, an action must be shown to be dependent on the
specific consequences that it produces; that is, it must be demonstrated that
the performance of the action is mediated by the subject’s knowledge of the
specific contingency, or causal relationship, that exists between the action and
its outcome.
Conditioned Approach Responses
To see how these criteria are applied, consider first a typical Pavlovian conditioning study in which a hungry rat is given repeated pairings between a tone
and a food outcome (FIG. 1). Aside from an orienting response, initially, the
tone will have little impact on the rat’s behavior. Over the course of training,
however, the rat will come to approach the location of the food delivery whenever the tone is presented. Does this conditioned approach behavior satisfy
the criteria for goal-directed action? In order to answer this question, we must
know much more than the conditioning procedure that was used. We must
determine what the subject actually learned during training.
It turns out that there are at least three forms of learning that could support approach responding in this situation (FIG. 1). One possibility is that this
conditioned approach behavior is the natural consequence of encoding the
tone–food relationship that was scheduled by the experimenter. According to
this S-O view of learning, the tone comes to evoke an expectation of food,
which will in turn elicit a set of species-typical foraging behaviors that prepare the rat for the food delivery, including, of course, the approach response.
However, it is also true that the rat will tend to experience an incidental relationship between the approach response and food, since it must approach the
food cup in order to consume the outcome. Encoding this approach-food, or
response–outcome (R-O), relationship would also increase their performance
of the approach response. In this case, however, the rat should approach the food
cup because it believes that this action produces food, not because it anticipates
food based on the cue presentation. A third possibility is that conditioned approach is supported by stimulus–response (S-R) learning. Indeed, many early
OSTLUND & BALLEINE
177
FIGURE 1. Relationships that could support conditioned approach performance. See
text for details.
learning theorists posited that this single form of learning provides the basis
for all acquired behavior.20 This view assumes that the outcome itself plays no
direct role in controlling performance, but is instead responsible for modulating the strength of S-R associations according to a reinforcement/punishment
process; whereas an appetitive outcome, or reinforcer, will tend to strengthen
S-R associations, an aversive outcome, or punisher, will tend to weaken them.
According to this account, conditioned approach behavior should be supported
by a direct association between the tone stimulus and the approach response.
These accounts make different predictions about the sensitivity of approach
performance to post-training manipulations of outcome value. Both the R-O
and S-O account assume that a representation of the training outcome serves as
a critical mediating link in the chain of events that governs response selection.
Therefore, both accounts predict that a reduction in outcome value will produce a corresponding response decrement. In contrast, the S-R view assumes
that primary function of the outcome is catalytic; it reinforces the S-R association. As the outcome itself is not encoded in this associative structure, this S-R
view predicts that post-training manipulations of outcome value should have
no effect on performance. In contrast to this prediction, however, it has been
repeatedly shown that manipulations of outcome value strongly affect conditioned approach performance.21,22 For example, Colwill and Motzkin22 trained
rats on two distinct S-O contingencies, such that each stimulus (a tone and light)
was paired with a different outcome (sucrose solution and food pellets). One of
the outcomes was then devalued through lithium chloride–induced conditioned
taste aversion training (i.e., subjects were made nauseous after consuming the
178
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
outcome). In a subsequent test, the rats displayed less conditioned approach to
the food cup during the stimulus that had been paired with the now devalued
outcome than during the other stimulus, whose training outcome remained
valuable. Several features of this experiment are worth noting. First, the test
was conducted in extinction to preclude an alternative interpretation based on
the S-R account. Had the devalued outcome actually been delivered during
this session, it would have had the opportunity to weaken the supporting S-R
association. The observation of a devaluation effect in extinction, therefore,
demonstrates that approach performance depends on an associative structure
that incorporates a representation of the training outcome. Second, the outcome
selectivity of this effect nicely rules out alternative interpretations based on
nonspecific motivational or behavioral effects that might have resulted from
the devaluation treatment. For instance, had the subjects been generally nauseous or inactive, a response decrement should have been observed for both
stimuli regardless of which outcome they signaled during training.
With regard to our criteria of goal-directed action, conditioned approach performance clearly satisfies the goal criterion. However, considerable evidence
suggests that it fails the contingency criterion. Assessments of conditioned
approach suggest that this response is not controlled by the R-O contingency;
that is, its performance is not dependent on its relationship to its consequences.
Holland23 exposed hungry rats to a situation in which a tone was paired with
food delivery. Unlike the case in a typical conditioned approach experiment,
however, these rats had to refrain from approaching the food source during
the tone presentation in order to gain access to the food outcome; that is, the
food delivery was cancelled on any trial in which an approach response was
performed. On any goal-directed analysis of approach responding, rats should
never acquire an approach response in this situation. Nevertheless, relative to a
group that received the signal and food in an unpaired relation, Holland found
that rats on this schedule acquired and maintained the approach response during the tone even though this resulted in the loss of a significant portion of the
food available to them. Indeed, their level of responding was indistinguishable
from that of rats exposed to a consistent pairing between the signal and food
without the omission contingency. These results suggest that this simple anticipatory approach response is not acquired through R-O learning, but is instead
the product of S-O learning.
If the conditioned approach response is indeed acquired through a learning process that encodes the S-O association, then it should be particularly
sensitive to manipulations of the Pavlovian contingency. In support of this hypothesis it has been shown that conditioned approach depends on how reliably
the eliciting stimulus signals its particular outcome. An effective method for
reducing the predictive status of a stimulus is to deliver its outcome noncontingently. This treatment ensures that, even though the stimulus continues to
be paired with its outcome, it no longer serves as a reliable predictor of that
outcome. Several studies have shown that this kind of manipulation, known as
OSTLUND & BALLEINE
179
contingency degradation, weakens the capacity of a stimulus to elicit conditioned approach.24–26 For example, Delamater24 trained rats on two Pavlovian
contingencies, such that each stimulus terminated with the delivery of a different food outcome. After this initial training phase, one of the S-O relationships
was degraded by delivering its corresponding outcome noncontingently during
the intertrial interval. The rats were found to adjust their approach behavior
accordingly; that is, they approached the food cup less during the stimulus that
no longer reliably signaled its outcome than during the other stimulus, even
though both stimuli continued to be paired with their respective outcomes.
Importantly, by employing an outcome-selective design, Delamater was able
to control for nonspecific motivational (e.g., satiation) and behavioral (e.g.,
response competition) effects that might have complicated an interpretation of
the results. This finding, therefore, provides strong evidence that conditioned
approach is mediated primarily by the S-O contingency.
Free Operant Performance
The investigation of the learning and motivational processes that support
the performance of conditioned approach has provided clear evidence that this
response is controlled by predictive learning involving the S-O association and
is not controlled by the R-O contingency. In contrast, recent evidence suggests
that responses trained in the free operant situation have all the hallmarks of
goal-directed actions. In free operant conditioning rats are taught to perform
arbitrary actions to gain access to some valued goals or other; in the paradigm
case, hungry rats are taught to press a freely available lever to gain access to
a food reward. There is, in fact, considerable evidence that free operant lever
pressing in rats is sensitive to the causal relation between the action and its
consequences; not only will rats stop responding if lever pressing no longer
delivers a valued food, but they will also stop responding even faster if their
performance cancels otherwise freely available food (i.e., leads to the omission
of the outcome).27,28 Likewise, when the situation is arranged such that access
to a particular rewarding outcome is equally probable whether a rat presses the
lever that delivers that reward or not, rats quickly stop performing the specific
response that gains access to that outcome while maintaining their performance
of other actions.18,29,30
Likewise, a number of studies have investigated the sensitivity of instrumental performance to post-training outcome devaluation.31,32 These studies have
established that, under most training conditions, instrumental performance satisfies the goal criterion of goal-directed action. In a recent example, we trained
hungry rats to press two levers, one for food pellets and another for sucrose solution.33 A specific-satiety treatment was then used to selectively devalue one
of the two training outcomes; that is, rats were allowed to freely consume one
outcome for 1 hour before the rats were returned to the experimental chamber
180
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
for a test in which they were allowed to choose between the two levers in
extinction. The rats were found to substantially reduce their performance of
the action that had earned the pre-fed (devalued) outcome, but continued to
perform the other action at a high rate.
These findings confirm that a dichotomy exists between the processes that
control conditioned responses, like the anticipatory approach response, and
those that control goal-directed instrumental actions. Importantly, although
both categories of behavior display sensitivity to outcome devaluation, indicating that neither is the product of a purely S-R structure, only instrumental
performance exhibits sensitivity to manipulations of R-O contingency. The
conditioned approach response and free operant lever pressing differ qualitatively; rats approach because they anticipate a valuable outcome, but they lever
press because they believe that this action will produce a valuable outcome.
This distinction is particularly important for researchers interested in the neural basis of action selection. It seems unlikely that much will be revealed about
the neural processes unique to goal-directed action selection by studying the
substrates of Pavlovian S-O learning. This distinction also raises a warning: it
is possible to devise a behavioral task that uses nominally instrumental procedures but that generates S-O learning. If the behavioral product of the task
can be construed as a conditioned approach response directed at the goal location or the signal itself, it is difficult to argue that it represents an arbitrary,
instrumental action. In this case, care must be taken to determine what role the
scheduled R-O contingency actually plays in controlling performance.
But what about the influence of changes in outcome value on conditioned
approach behavior? As with lever pressing, the approach response exhibits sensitivity to outcome devaluation and although lever pressing and conditioned
approach seem to be supported by fundamentally different associative structures, it remains a possibility, therefore, that they rely on a common reward
process. Indeed, as we will see in the next section, it has been argued that Pavlovian learning provides the motivational support for instrumental performance.
However, we will also review a number of recent findings that are incompatible
with this view and that suggest, instead, that instrumental action selection is
governed by a separate reward-evaluation process.
OUTCOME VALUE AND THE OFC
As mentioned in the introduction, there is considerable evidence that the
OFC plays a role in processing outcome value. One of the most convincing
pieces of evidence for this claim is the finding that although rats with OFC
lesions readily acquire the anticipatory approach response to a stimulus paired
with a food reward, their performance of this response is insensitive to the
devaluation of the rewarding outcome.7–9 Similarly, the effect of outcome devaluation on approach performance can also be abolished by lesions of the
OSTLUND & BALLEINE
181
basolateral amygdala34 (BLA), a structure that has long been implicated in
the processing of the emotional properties of events.35 The BLA shares reciprocal connections with the OFC,36–39 and recent evidence suggests that the
sensitivity of a conditioned reaching task to changes in outcome value depends
on the interaction of the BLA and OFC. For example, Baxter et al.40 trained
monkeys on a task in which the identity of a target object signaled which of
two outcomes (e.g., fruit or peanut) could be earned by displacing that object.
The performance of unoperated monkeys on this task was found to be sensitive
to outcome devaluation; when pre-fed to satiety on one of the two outcomes,
the unoperated group tended to select an object that signaled the non–pre-fed
(valued) outcome over an object that signaled the pre-fed (devalued) outcome.
In contrast, the performance of monkeys with a unilateral lesion of the OFC
plus a unilateral lesion of the contralateral BLA was found to be significantly
less sensitive to outcome devaluation and, indeed, their impairment was similar
to that observed when either of these structures was lesioned bilaterally.41,42
Furthermore, this deficit did not appear to be due to a simple additive effect
of OFC and BLA damage, as monkeys with ipsilateral lesions of the OFC and
BLA did not show an equivalent level of impairment.43 Generally, these results
have been interpreted as suggesting that the BLA and OFC work in tandem as
parts of a broader circuit that uses anticipated value to guide response selection.
Other studies using a range of species and experimental techniques have implicated the OFC in outcome encoding. Single-unit recording studies have
found cue-evoked, anticipatory firing in the OFC during go/no-go performance.14,44 When it is considered that the object displacement task used in
primates, although nominally instrumental, involves a clear S-O learning component, and when taken together with evidence that OFC lesions abolish the
sensitivity of conditioned approach to outcome devaluation7–9 and disrupt the
influence of Pavlovian outcome expectations on instrumental response selection,45 it becomes clear that there is scant evidence directly implicating the OFC
in instrumental R-O learning. Although this might seem trivial—as mentioned
above, sensitivity to outcome devaluation has been found in both Pavlovian
conditioned responses and goal-directed instrumental actions—it is of prime
importance. In fact, current evidence suggests that instrumental and Pavlovian
“values” are not mediated by a common process, but are in fact determined by
distinct evaluative processes mediated by independent neural systems.18,19,29
Pavlovian and Instrumental Values
Although they are clearly supported by distinct associative structures, it
has long been assumed that Pavlovian conditioned responses and instrumental actions rely on a common incentive or reward process. This view is often
expressed in the form of one or other version of two-process theory.46–48 As
mentioned earlier, the fact that instrumental actions are typically performed on
182
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
some manipulandum or other ensures that there is an embedded Pavlovian relationship between the sensory features of that manipulandum (e.g., the sight of
a lever) and the outcome delivery. Many theories of instrumental performance
place heavy emphasis on this relationship, proposing that the Pavlovian process
that it engages is responsible for modulating the expression of instrumental
learning; specifically, environmental cues that signal reward are assumed to
facilitate or invigorate instrumental actions.
In line with this account, the influence of Pavlovian learning over instrumental performance can be demonstrated using the Pavlovian-instrumental
transfer effect; introducing a stimulus that has been independently paired with
food into the instrumental learning situation has long been known to facilitate
instrumental performance.49 The transfer effect has since been well established
and, in some cases, shown to be outcome specific.22,50,51 The top panel of
FIGURE 2 depicts the results of a typical transfer experiment from our laboratory.33 Hungry rats were first given Pavlovian training with two S-O contingencies (e.g., white noise → grain pellets & tone → sucrose solution) before being
given instrumental training on two R-O contingencies (e.g., left lever press →
grain pellets & right lever press → sucrose solution). After training, the rats
were returned to the chamber for a test session in which they were allowed to
perform both actions in extinction (i.e., no outcomes were delivered) and, at
various points in the session, the two stimuli were presented while the rats were
lever pressing. As can be seen in the figure, stimulus presentations selectively
facilitated the action with which it shared a common outcome (Same), relative
to the action that earned a different outcome (Different).
A number of theorists have argued that this Pavlovian-instrumental interaction is responsible for mediating the influence of manipulations of incentivemotivation on instrumental performance.48 From this perspective, outcome
devaluation is thought to affect performance indirectly by attenuating the facilitatory contribution of the Pavlovian process. Importantly, this account posits
that the impact of outcome devaluation on both Pavlovian conditioned approach
and instrumental lever pressing is mediated by the same incentive process that
underlies the transfer effect. Consequently, all three phenomena should also
depend on a common neural circuitry. Given the evidence described above, one
might predict that this circuit involves both the OFC and BLA. Alternatively,
of course, it is possible that instrumental reward processing does not rely on
Pavlovian learning. In this case, we should expect to find both behavioral and
neural dissociations between these three phenomena.
Dissociating Transfer and Devaluation
A number of studies have shown that lesions of the BLA made before
training disrupt both instrumental outcome devaluation and outcome-selective
transfer.52–54 In addition, a preliminary study from our laboratory found that
OSTLUND & BALLEINE
183
FIGURE 2. The results of a Pavlovian-instrumental transfer test, plotted across successive 2-min trials for the Baseline (pre-stimulus) period and for stimuli Same and Different.
The top panel presents a typical effect, taken from a sham-lesioned control group. The
middle and bottom panels present the results of OFC-lesioned rats that underwent surgery
either before (middle) or after (bottom) training.
184
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
excitotoxic BLA lesions made after training were also effective in abolishing
these effects.55 Thus, the BLA seems to be involved in incentive processing
for both conditioned approach and instrumental performance and also appears
to play a critical role in mediating the influence of Pavlovian learning over
instrumental action selection.
The OFC also appears to be involved in Pavlovian-instrumental transfer.
For instance, we recently contrasted the effects of OFC lesions made before
or after initial training on the transfer effect.33 These data are presented in
the middle (Pre-training group) and bottom (Post-training group) panels of
FIGURE 2, directly under the “typical” transfer data which came from the group
that served as a sham-lesioned control in this study. Although pre-training
lesions had no apparent effect, post-training lesions did impair transfer performance. This pattern of findings suggests that although OFC contributes
to transfer performance under normal conditions, it may not play an essential role. In the absence of the OFC, other structures, perhaps including the
BLA, may be sufficient to support normal transfer performance. Of course, it
is also possible that compensation occurred at the level of the OFC and that
more complete damage to this region prior to training would have resulted in
impairment.
This finding clearly implicates the OFC in the control of instrumental performance. It is also consistent with the notion that the OFC contributes to
a general incentive-motivational system that is responsible for both transfer
and reward processing. In order to assess this claim, and thereby provide a
more thorough characterization of the involvement of the OFC in instrumental
action selection, we also investigated whether these lesions would have an impact on the sensitivity of instrumental performance to outcome devaluation. In
contrast to the simple reward processing view of OFC function, however, we
found that both pre- and post-training lesioned groups exhibited normal shifts
in their choice performance after outcome devaluation. Instrumental outcome
devaluation appears, therefore, to be dissociable from Pavlovian-instrumental
transfer, at least at the level of the OFC.33
These findings indicate that the OFC is particularly important for using
S-O learning to guide action selection. They also call into question the notion
that reward evaluation in instrumental performance is mediated by a Pavlovian
learning process. In fact, this is not the first piece of evidence against this latter
view of incentive processing. For instance, Corbit and Balleine56 found that
lesions of the prelimbic area of the prefrontal cortex (PL) also produce dissociable effects on instrumental outcome devaluation and Pavlovian-instrumental
transfer. In this study, however, the pattern of effects was the reverse of those
observed in the OFC; that is, lesions of the PL abolished the sensitivity of instrumental performance to outcome devaluation but had no effect on transfer. PL
lesions have also been shown to impair instrumental contingency degradation
learning, indicating that this structure is important for encoding R-O associations.19,29,56 When taken together, therefore, these studies provide evidence
OSTLUND & BALLEINE
185
that the respective contributions of S-O and R-O associations to instrumental
action selection can be doubly dissociated at the level of the prefrontal cortex,
with the OFC supporting the former and the PL supporting the latter.
The incentive-motivational version of two-process theory is also challenged
by studies that have investigated the nature of the transfer effect. For instance,
the notion that Pavlovian learning provides motivational support for action
selection is not easily reconciled with the selectivity typically found in transfer
studies using outcomes of approximately the same motivational value (e.g.,
food pellets and sucrose solution). Indeed, Pavlovian cues have been shown to
guide response selection based on features of the outcome representation that
are motivationally neutral.57 Of course, such findings are not entirely incompatible with an incentive-motivational interpretation of transfer, they merely
argue that this phenomenon is supported, at least in part, by a sensorily rich expectation of the training outcome. More damning still, however, is evidence that
transfer stimuli and outcome devaluation can act independently of one another
in controlling performance. In order to explain the outcome devaluation effect,
the two-process account assumes that the influence of Pavlovian learning over
response selection is gated by the current value of the mediating outcome. With
standard instrumental lever pressing, the sight of the lever should remind the
rat of its associated outcome, but whether this facilitates pressing or not must
depend on how desirable that outcome is to the rat. Similarly, in transfer, the
potency of a particular cue in facilitating performance should depend on the
value of the anticipated outcome. In contrast to this prediction, several studies
have found that reducing an outcome’s value does not affect its capacity to
mediate Pavlovian-instrumental transfer.58–60
INFORMATION, CHOICE, AND THE OFC
Thus, there seems to be overwhelming evidence against the claim that the
influence of reward value on instrumental response selection is mediated by
S-O learning. We do not deny, of course, that Pavlovian cues influence choice.
Rather, it is clear that the influences of reward value and of predictive cues
on choice performance are mediated by distinct processes that are subserved
by distinct neural systems. If, however, Pavlovian learning does not influence
instrumental performance though an incentive or motivational mechanism then
it remains to be determined how it acts to guide action selection.
Some two-process theories apply a cognitive interpretation to transfer-like
phenomena,60–62 suggesting that Pavlovian cues have the capacity to elicit an
expectation of their outcome that, in turn, results in the selection of any action
(or actions) associated with that outcome. According to this account, the tendency of an environmental cue to remind the agent of a particular outcome is
enough to bias the agent’s decision to perform an action associated with that
outcome. In line with this interpretation, the influence of Pavlovian cues on
action selection has been shown to depend on the predictive status of that
186
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
cue.24,63 One piece of evidence comes from the Delamater24 study we mentioned earlier. After selectively degrading one of two S-O contingencies and
observing the effect of this treatment on conditioned approach performance,
Delamater conducted a test of Pavlovian-instrumental transfer using these stimuli. He found that the stimulus from the degraded contingency had not only
lost its capacity to elicit approach, but it was also ineffective in facilitating instrumental performance. In contrast to the degraded contingency, the stimulus
from the other, nondegraded contingency was unaffected and generated clear
evidence of transfer.
When carefully considered, this finding suggests a possible hypothesis as
to why OFC lesions disrupt the influence of Pavlovian cues on action selection. As mentioned above, Pavlovian conditioned responses are not under the
subject’s control; these responses are elicited by conditioned stimuli and, in
that sense, are not truly selected in the same way one selects the best option
among alternatives. Although Pavlovian cues are of limited value in selecting
a conditioned response, cues with a relatively high predictive validity are of
considerable value in selecting between different courses of action. Stimuli
that provide information on the likely payoff associated with selecting one
as opposed to another course of action reduces uncertainty and, as such, can
and should bias action selection. Hence, stimuli that are highly predictive of
specific consequences are likely to bias choice toward actions associated with
those consequences. This analysis of transfer explains why reducing the predictive validity of a cue reduces its influence on action selection. It also suggests
that the reason why OFC lesions abolish transfer is because they too remove the
capacity of cues to provide information about their specific outcomes. From
this perspective, the contribution of the OFC to goal-directed instrumental performance has little to do with processing reward value and instead involves
encoding the predictive status of cues so that they may bias action selection.
If this account is true, then animals with lesions of the OFC should be
relatively insensitive to changes in the predictive status of Pavlovian cues.
In order to test this hypothesis, we assessed the effect of OFC lesions on
Pavlovian contingency degradation learning.33 Rats were trained to asymptote
on two distinct S-O relationships (e.g., noise → grain & tone → sucrose)
before being given sham or excitotoxic lesions of the OFC. Training resumed
after a brief recovery period. During this phase, however, one of the two S-O
relationships was degraded by delivering the corresponding outcome with a
fixed probability throughout the session regardless of whether its stimulus was
present or absent. The results of this experiment are presented in FIGURE 3. The
sham group displayed normal Pavlovian contingency learning, withholding
their magazine approach performance during the stimulus that was paired with
the noncontingent outcome (Degraded), but maintaining their performance to
the control stimulus (Nondegraded). In contrast, the OFC group exhibited a
general decrease in responding to both stimuli, regardless of their predictive
status.
OSTLUND & BALLEINE
187
FIGURE 3. The results of Pavlovian contingency degradation training, plotted across
successive 2-day blocks for the Degraded stimulus and the Nondegraded stimulus. The top
panel displays the results of the sham-lesioned group and the bottom panel displays the
results of the OFC-lesioned group. Open circles represent approach performance on the
last day of Pavlovian training.
These data reveal that, although not necessary for conditioned approach
performance per se, the OFC is critical for encoding and updating predictive
S-O relationships; that is, for establishing the relative validity of predictive
cues with respect to their specific consequences. Without this capacity, it
188
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
seems likely that conditioned approach performance comes under the control of alternative learning processes. For instance, one interpretation of the
nonspecific reduction in responding displayed by the OFC group is that their
performance is supported by their intact capacity for R-O learning. In this experiment, of course, although no explicit contingency was scheduled between
the approach response and food, occasional approach-food pairings could have
supported some indiscriminate approach behavior (i.e., approach should have
been equally likely during both stimuli and the inter-trial interval). This account also fits nicely with an early finding that OFC lesions render conditioned
approach performance abnormally sensitive to the introduction of an instrumental contingency,64 confirming that this structure is selectively involved in
S-O but not R-O learning.
CONCLUSIONS
These findings call into question the simple view that the OFC is generally
involved in processing the incentive value of outcomes. As we have seen,
the OFC of rodents and the lateral but likely not the differentiated medial
part of the OFC of primates65 (the latter being, perhaps, homologous to the
rodent prelimbic cortex66 ) only appears to be necessary for processing outcome
value when task performance is likely to be mediated by S-O learning. This
analysis also applies to electrophysiological and functional imaging studies
reporting evidence of OFC involvement in reward processing. The fact that
OFC activity can reflect the motivational value of expected outcomes has been
well established.11–14 These studies, however, tend to involve substantial S-O
learning components, raising the possibility that this neural activity reflects a
purely Pavlovian incentive process.
Nevertheless, the OFC appears to be involved in more than just processing the incentive value of Pavlovian outcomes; it also appears to encode
the predictive relationship between specific cues and their respective outcomes and, hence, is important in mediating the way that this learning influences instrumental performance. Although the OFC does not appear to
mediate the influence of goal value over instrumental action selection, the
findings reviewed here suggest that it is involved in the way information extracted from the environment is used to decide between alternative courses of
action.
ACKNOWLEDGMENT
The preparation of this article and much of the research that it describes was
supported by Grant No. 56446 from the National Institute of Mental Health.
OSTLUND & BALLEINE
189
REFERENCES
1. HOLLAND, P.C. & M. GALLAGHER. 2004. Amygdala-frontal interactions and reward
expectancy. Curr. Opin. Neurobiol. 14: 148–155.
2. HOLLERMAN, J.R., L. TREMBLAY & W. SCHULTZ. 2000. Involvement of basal ganglia
and orbitofrontal cortex in goal-directed behavior. Prog. Brain Res. 126: 193–
215.
3. O’DOHERTY, J.P. 2004. Reward representations and reward-related learning in the
human brain: insights from neuroimaging. Curr. Opin. Neurobiol. 14: 769–776.
4. ROLLS, E.T. 2004. The functions of the orbitofrontal cortex. Brain Cogn. 55: 11–29.
5. ROBERTS, A.C. 2006. Primate orbitofrontal cortex and adaptive behaviour. Trends
Cogn. Sci. 10: 83–90.
6. SCHOENBAUM, G. & M. ROESCH. 2005. Orbitofrontal cortex, associative learning,
and expectancies. Neuron 47: 633–636.
7. GALLAGHER, M., R.W. MCMAHAN & G. SCHOENBAUM. 1999. Orbitofrontal cortex
and representation of incentive value in associative learning. J. Neurosci. 19:
6610–6614.
8. PICKENS, C.L. et al. 2003. Different roles for orbitofrontal cortex and basolateral
amygdala in a reinforcer devaluation task. J. Neurosci. 23: 11078–11084.
9. PICKENS, C.L. et al. 2005. Orbitofrontal lesions impair use of cue-outcome associations in a devaluation task. Behav. Neurosci. 119: 317–322.
10. IZQUIERDO, A., R.K. SUDA & E.A. MURRAY. 2004. Bilateral orbital prefrontal
cortex lesions in rhesus monkeys disrupt choices guided by both reward value
and reward contingency. J. Neurosci. 24: 7540–7548.
11. GOTTFRIED, J.A., J. O’DOHERTY & R.J. DOLAN. 2003. Encoding predictive reward
value in human amygdala and orbitofrontal cortex. Science 301: 1104–1107.
12. O’DOHERTY, J. et al. 2000. Sensory-specific satiety-related olfactory activation of
the human orbitofrontal cortex. Neuroreport 11: 893–897.
13. CRITCHLEY, H.D. & E.T. ROLLS. 1996. Hunger and satiety modify the responses of
olfactory and visual neurons in the primate orbitofrontal cortex. J. Neurophysiol.
75: 1673–1686.
14. TREMBLAY, L. & W. SCHULTZ. 1999. Relative reward preference in primate orbitofrontal cortex. Nature 398: 704–708.
15. LIPTON, P.A., P. ALVAREZ & H. EICHENBAUM. 1999. Crossmodal associative memory representations in rodent orbitofrontal cortex. Neuron 22: 349–359.
16. ROESCH, M.R., A.R. TAYLOR & G. SCHOENBAUM. 2006. Encoding of timediscounted rewards in orbitofrontal cortex is independent of value representation.
Neuron 51: 509–520.
17. ROLLS, E.T. et al. 1999. Responses to the sensory properties of fat of neurons in
the primate orbitofrontal cortex. J. Neurosci. 19: 1532–1540.
18. BALLEINE, B.W. & A. DICKINSON. 1998. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology
37: 407–419.
19. DICKINSON, A. & B.W. BALLEINE. 1993. Actions and responses: the dual psychology
of behaviour. In Spatial Representation. N. Eilan, R. McCarthy & M.W. Brewer,
Eds.: 277–293. Basil Blackwell Ltd. Oxford.
20. HULL, C.L. 1943. Principles of Behavior. Appleton. New York.
21. HOLLAND, P.C. & J.J. STRAUB. 1979. Differential effects of two ways of devaluing the unconditioned stimulus after Pavlovian appetitive conditioning. J. Exp.
Psychol. Anim. Behav. Process 5: 65–78.
190
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
22. COLWILL, R.M. & D.K. MOTZKIN. 1994. Encoding of the unconditioned stimulus
in Pavlovian conditioning. Anim. Learn. Behav. 22: 384–394.
23. HOLLAND, P.C. 1979. Differential effects of omission contingencies on various
components of Pavlovian appetitive conditioned responding in rats. J. Exp. Psychol. Anim. Behav. Process 5: 178–193.
24. DELAMATER, A.R. 1995. Outcome-selective effects of intertrial reinforcement in
Pavlovian appetitive conditioning. Anim. Learn. Behav. 23: 31–39.
25. DURLACH, P.J. & D.O. SHANE. 1993. The effect of intertrial food presentations on
anticipatory goal-tracking in the rat. Q. J. Exper. Psychol. B. 46: 289–318.
26. FARWELL, B.J. & J.J. AYRES. 1979. Stimulus-reinforcer and response-reinforcer
relations in the control of conditioned appetitive headpoking (goal tracking) in
rats. Learn. Motiv. 10: 295–312.
27. DICKINSON, A. et al. 1998. Omission learning after instrumental pretraining. Q. J.
Exper. Psychol. B. 51: 271–286.
28. UHL, C.N. 1974. Response elimination in rats with schedules of omission training,
including yoked and response-independent reinforcement comparisons. Learn.
Motiv. 5: 511–531.
29. BALLEINE, B.W. 2005. Neural bases of food-seeking: affect, arousal and reward in
corticostriatolimbic circuits. Physiol. Behav. 86: 717–730.
30. HAMMOND, L.J. 1980. The effect of contingency upon the appetitive conditioning
of free-operant behavior. J. Exp. Anal. Behav. 34: 297–304.
31. ADAMS, C.D. & A. DICKINSON. 1981. Instrumental responding following reinforcer
devaluation. Q. J. Exp. Psychol. B. 33: 109–121.
32. COLWILL, R.M. & R.A. RESCORLA. 1985. Postconditioning devaluation of a reinforcer affects instrumental responding. J. Exp. Psychol. Anim. Behav. Process
11: 120–132.
33. OSTLUND, S.B. & B.W. BALLEINE. 2007. Orbitofrontal cortex mediates outcome
encoding in Pavlovian but not instrumental conditioning. J. Neurosci. 27: 4819–
4825.
34. HATFIELD, T. et al. 1996. Neurotoxic lesions of basolateral, but not central, amygdala interfere with Pavlovian second-order conditioning and reinforcer devaluation effects. J. Neurosci. 16: 5256–5265.
35. BALLEINE, B.W. & S. KILLCROSS. 2006. Parallel incentive processing: an integrated
view of amygdala function. Trends Neurosci. 29: 272–279.
36. KITA, H. & S.T. KITAI. 1990. Amygdaloid projections to the frontal cortex and the
striatum in the rat. J. Comp. Neurol. 298: 40–49.
37. KRETTEK, J.E. & J.L. PRICE. 1977. Projections from the amygdaloid complex to the
cerebral cortex and thalamus in the rat and cat. J. Comp. Neurol. 172: 687–722.
38. MCDONALD, A.J. 1991. Organization of amygdaloid projections to the prefrontal
cortex and associated striatum in the rat. Neuroscience 44: 1–14.
39. MCDONALD, A.J., F. MASCAGNI & L. GUO. 1996. Projections of the medial and
lateral prefrontal cortices to the amygdala: a Phaseolus vulgaris leucoagglutinin
study in the rat. Neuroscience 71: 55–75.
40. BAXTER, M.G. et al. 2000. Control of response selection by reinforcer value requires interaction of amygdala and orbital prefrontal cortex. J. Neurosci. 20:
4311–4319.
41. MALKOVA, L., D. GAFFAN & E.A. MURRAY. 1997. Excitotoxic lesions of the amygdala fail to produce impairment in visual learning for auditory secondary reinforcement but interfere with reinforcer devaluation effects in rhesus monkeys. J.
Neurosci. 17: 6011–6020.
OSTLUND & BALLEINE
191
42. IZQUIERDO, A., R.K. SUDA & E.A. MURRAY. 2004. Bilateral orbital prefrontal
cortex lesions in rhesus monkeys disrupt choices guided by both reward value
and reward contingency. J. Neurosci. 24: 7540–7548.
43. IZQUIERDO, A. & E.A. MURRAY. 2004. Combined unilateral lesions of the amygdala
and orbital prefrontal cortex impair affective processing in rhesus monkeys. J.
Neurophysiol. 91: 2023–2039.
44. SCHOENBAUM, G., A.A. CHIBA & M. GALLAGHER. 1998. Orbitofrontal cortex and
basolateral amygdala encode expected outcomes during learning. Nat. Neurosci.
1: 155–159.
45. MCDANNALD, M.A. et al. 2005. Lesions of orbitofrontal cortex impair rats’ differential outcome expectancy learning but not conditioned stimulus-potentiated
feeding. J. Neurosci. 25: 4626–4632.
46. ASRATYAN, E.A. 1974. Conditioned reflex theory and motivational behavior. Acta
Neurobiol. Exp. 34: 15–31.
47. BOLLES, R.C. 1972. Reinforcement, expectancy, and learning. Psychol. Rev. 79:
394–409.
48. RESCORLA, R.A. & R.L. SOLOMON. 1967. Two-process learning theory: relationships between Pavlovian conditioning and instrumental training. Psychol. Rev.
74: 151–183.
49. ESTES, W.K. 1943. Discriminative conditioning. I: a discriminative property of
conditioned anticipation. J. Exp. Psychol. 32: 150–155.
50. COLWILL, R.M. & R.A. RESCORLA. 1988. Associations between the discriminative
stimulus and the reinforcer in instrumental learning. J. Exp. Psychol. Anim.
Behav. Process 14: 155–164.
51. KRUSE, J.M. et al. 1983. Pavlovian conditioned stimulus effects on instrumental
choice behavior are reinforcer specific. Learn. Motiv. 14: 165–181.
52. BALLEINE, B.W., A.S. KILLCROSS & A. DICKINSON. 2003. The effect of lesions of
the basolateral amygdala on instrumental conditioning. J. Neurosci. 23: 666–
675.
53. BLUNDELL, P., G. HALL & S. KILLCROSS. 2001. Lesions of the basolateral amygdala
disrupt selective aspects of reinforcer representation in rats. J. Neurosci. 21:
9018–9026.
54. CORBIT, L.H. & B.W. BALLEINE. 2005. Double dissociation of basolateral and
central amygdala lesions on the general and outcome-specific forms of pavlovianinstrumental transfer. J. Neurosci. 25: 962–970.
55. OSTLUND, S.B. & B.W. BALLEINE. 2005. Lesions of the orbitofrontal cortex disrupt Pavlovian, but not instrumental, outcome encoding. Soc. Neuro. Abstracts
Program No. 71.2.
56. CORBIT, L.H. & B.W. BALLEINE. 2003. The role of prelimbic cortex in instrumental
conditioning. Behav. Brain Res. 146: 145–157.
57. FEDORCHAK, P.M. & R.C. BOLLES. 1986. Differential outcome effect using a biologically neutral outcome difference. J. Exp. Psychol. Anim. Behav. Process 12:
125–130.
58. COLWILL, R.M. & R.A. RESCORLA. 1990. Effect of reinforcer devaluation on discriminative control of instrumental behavior. J. Exp. Psychol. Anim. Behav. Process 16: 40–47.
59. HOLLAND, P.C. 2004. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process 30: 104–117.
60. RESCORLA, R.A. 1994. Transfer of instrumental control mediated by a devalued
outcome. Anim. Learn. Behav. 22: 27–33.
192
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES
61. BALLEINE, B.W. & S.B. OSTLUND. 2007. Still at the choice-point: action selection
and initiation in instrumental conditioning. Ann. N. Y. Acad. Sci. 1104: 147–171.
62. TRAPOLD, M.A. & J.B. OVERMIER. 1972. The Second Learning Process in Instrumental Learning. Appleton-Century-Crofts. New York.
63. RESCORLA, R.A. 1999. Learning about qualitatively different outcomes during a
blocking procedure. Learn. Behav. 27: 140–151.
64. CHUDASAMA, Y. & T.W. ROBBINS. 2003. Dissociable contributions of the orbitofrontal and infralimbic cortex to pavlovian autoshaping and discrimination
reversal learning: further evidence for the functional heterogeneity of the rodent
frontal cortex. J. Neurosci. 23: 8771–8780.
65. PADOA-SCHIOPPA, C. & J.A. ASSAD. 2006. Neurons in the orbitofrontal cortex
encode economic value. Nature 441: 223–226.
66. ONGUR, D. & J.L. PRICE. 2000. The organization of networks within the orbital
and medial prefrontal cortex of rats, monkeys and humans. Cereb. Cortex 10:
206–219.