Oscillatory profiles of positive, negative and neutral feedback stimuli

International Journal of Psychophysiology 107 (2016) 37–43
Contents lists available at ScienceDirect
International Journal of Psychophysiology
journal homepage: www.elsevier.com/locate/ijpsycho
Oscillatory profiles of positive, negative and neutral feedback stimuli
during adaptive decision making
Peng Li a, Travis E. Baker b, Chris Warren c, Hong Li a,⁎
a
b
c
Brain Function and Psychological Science Research Center, Shenzhen University, Shenzhen, China
Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, Canada
Department of Psychology, Leiden University, Leiden, The Netherlands
a r t i c l e
i n f o
Article history:
Received 20 October 2015
Received in revised form 22 June 2016
Accepted 30 June 2016
Available online 01 July 2016
Keywords:
Theta
Beta-gamma
Feedback-related negativity
Neutral feedback
Reinforcement learning
a b s t r a c t
The electrophysiological response to positive and negative feedback during reinforcement learning has been well
documented over the past two decades, yet, little is known about the neural response to uninformative events
that often follow our actions. To address this issue, we recorded the electroencephalograph (EEG) during a
time-estimation task using both informative (positive and negative) and uninformative (neutral) feedback. In
the time-frequency domain, uninformative feedback elicited significantly less induced beta-gamma activity
than informative feedback. This result suggests that beta-gamma activity is particularly sensitive to feedback
that can guide behavioral adjustments, consistent with other work. In contrast, neither theta nor delta activity
were sensitive to the difference between negative and neutral feedback, though both frequencies discriminated
between positive, and non-positive (neutral or negative) feedback. Interestingly, in the time domain, we observed a linear relationship in the amplitude of the feedback-related negativity (neutral N negative N positive),
a component of the event-related brain potential thought to index a specific kind of reinforcement learning signal
called a reward prediction error. Taken together, these results suggest that the reinforcement learning system
treats neutral feedback as a special case, providing valuable information about the electrophysiological measures
used to index the cognitive function of frontal midline cortex.
© 2016 Published by Elsevier B.V.
1. Introduction
Our ability to predict and evaluate the consequences of our actions is
fundamental to adaptive decision making. Reinforcement learning (RL)
theory holds that if an action is followed by positive feedback then that
action will have a greater probability of being performed again, whereas
if an action is followed by negative feedback then that action will have a
lesser probability of being performed again (i.e. Thorndike's Law of Effect: Catania, 1999). But in everyday life, not all of our actions are followed by such binary consequences, but rather by uninformative events. In
fact, the term neutral operants has long been used by RL theorists to describe responses from the environment that neither increase nor decrease the probability of a behavior being repeated (Skinner, 1938).
While observations of electrophysiological activity over frontal midline
cortex have motivated a wealth of experimental and theoretical analyses of RL, it remains unclear how uninformative feedback is ultimately
processed during trial-and-error learning tasks.
Over the last decade, both time domain and time-frequency domain
analyses of electrophysiological recordings have been increasingly used
in research concerned with neural processes that differentiate
⁎ Corresponding author at: No 3688, Nanhai Road, Nanshan District, Shenzhen 518060,
China.
E-mail address: [email protected] (H. Li).
http://dx.doi.org/10.1016/j.ijpsycho.2016.06.018
0167-8760/© 2016 Published by Elsevier B.V.
performance feedback indicating positive outcomes (e.g., monetary
gain, correct feedback) from negative outcomes (e.g., monetary loss,
error feedback) (Weinberg et al., 2014). In the time domain, event-related brain potential (ERP) studies have revealed a negative-going deflection in the ERP that peaks over frontal-central recording sites
approximately 250 ms following feedback presentation. This feedback-locked ERP component, termed the feedback-related negativity
(FRN), is typically enhanced following unexpected task-relevant events
(e.g. negative feedback, errors) and is reduced or absent following positive feedback.1 Interestingly, the few FRN studies examining neutral
feedback have produced largely mixed results (Holroyd et al., 2006;
Kujawa et al., 2013; Huang and Yu, 2014; Yu and Zhou, 2006). In particular, studies either report larger FRNs to neutral feedback compared to
negative and positive feedback (Müller et al., 2005; Kujawa et al.,
1
Recent evidence suggests that the difference in FRN amplitude between reward and
error trials results from a positive-going deflection, the reward positivity (Rew-P), elicited
by reward feedback (see Holroyd et al., 2008; Warren and Holroyd, 2012; Baker and
Holroyd, 2011; Proudfit, 2015). Because the Rew-P typically occurs during the time-range
of the FRN and P300, the difference-wave method is commonly used to isolate the reward
positivity from other ERP components by taking the difference between the ERPs to positive and negative feedback. For the purpose of this study, we focused our analysis on condition-specific ERP effects by measuring the amplitudes of the FRN elicited by neutral,
negative, and positive feedback.
38
P. Li et al. / International Journal of Psychophysiology 107 (2016) 37–43
2013; Huang and Yu, 2014) or comparable FRN amplitudes between
neutral and negative feedback (Holroyd et al., 2006).2
Although neutral feedback has yet been investigated in the time-frequency domain, electroencephalogram (EEG) oscillations in the theta
frequency range (4–8 Hz) recorded over frontal midline areas of the
scalp have been associated with outcome processing (Cavanagh and
Frank, 2014; Cavanagh et al., 2012), as well as other cognitive processes
related to effort, attention and motivation (for reviews Hsieh and
Ranganath, 2014; Mitchell et al., 2008). Notably, more frontal midline
theta power is observed following negative feedback compared to positive feedback, suggesting that this signal reflects an error-driven learning mechanism consistent with principles of reinforcement learning
(Cavanagh et al., 2010). However, others have argued that frontal midline theta reflects sensitivity to important cognitive events in general
rather than to errors in particular (Cavanagh et al., 2012), and signal
the deployment of control (Cavanagh and Frank, 2014). Furthermore,
power in the delta (1–4 Hz) and beta-gamma (20–30 Hz) frequency
range have been shown to increase following positive feedback compared to negative feedback (Bernat et al., 2011; Cohen et al., 2007;
HajiHosseini et al., 2012; Marco-Pallares et al., 2008). In particular, recent work has indicated that feedback-locked delta band activity that
related to reward positivity appears to be specific to surprising rewards,
but does not predict associated behavioral adjustments (Cavanagh,
2015). By contrast, feedback-locked beta-gamma activity is thought to
reflect a salience signal and has been associated with ongoing adjustments of behavior (HajiHosseini and Holroyd, 2015) and cognitive demand (Chen et al., 2012; Gilbert and Sigman, 2007; Lee et al., 2003).
While decomposing feedback-related EEG and ERPs into spectral
quantities has provided a thorough understanding of the cognitive processes underlying RL, much of the existing research on these time-frequency components often characterizes feedback-locked oscillatory
activity (delta, theta, beta-gamma) as total spectral power. Unfortunately, this approach does not capture all the information available in
these signals because total power within a given frequency band consists of the stimulus phase-locked part of the EEG that gives rise to the
ERP, called the “evoked” power, and the non phase-locked part of the
EEG that is invisible in the ERP, called the “induced” power (TallonBaudry and Bertrand, 1999). Importantly, current thinking holds that
these activities reflect different cognitive processes, such that evoked
activity reflects bottom-up neural activity, whereas induced activity is
thought to reflect top-down modulation (Tallon-Baudry and Bertrand,
1999). Indeed, it was recently demonstrated that total theta power is
equally sensitive to outcome valence and outcome probability, however, evoked theta power was mainly sensitive to outcome valence whereas induced theta power was mainly sensitive to outcome probability
(Hajihosseini and Holroyd, 2013). The role of delta and beta-gamma
band phase dynamics in feedback processing remains unknown. The
difference in dominant frequencies (delta, theta, and beta-gamma) between negative and positive feedback could provide a deeper understanding of the phase dynamics (evoked vs induced) at play during
trial-by-trial RL.
Furthermore, it is also important to consider the relationship between time domain and time-frequency domain measures. For example, a relationship between feedback-related delta activity and the
amplitude of the P300 component has been demonstrated (e.g. Bernat
et al., 2007; Cavanagh and Frank, 2014), possibly suggesting that the
evoked portion of delta following feedback may contribute to feedback-related differences observed in the amplitude of the P300. However, this relationship has never been formally tested. Further, theta
oscillations and the FRN have been extensively studied in parallel, decades-long literatures. Feedback-induced theta power and the FRN
2
It is interesting to note that across 5 experiments reported in Holroyd et al. (2006), the
authors detailed that neutral feedback stimuli elicited larger FRNs than did the negative
feedback stimuli, but was not statistically significant (see Figs. 1 and 2, Holroyd et al.,
2006).
occur at about the same time (200–400 ms post feedback) and share
the same scalp location (over the frontal midline), suggesting a functional relationship between these two phenomena. In particular, converging evidence across multiple methodologies indicates that the
anterior cingulate cortex (ACC) is the source of both frontal midline
theta oscillations (Cavanagh and Frank, 2014) and the FRN (Holroyd
and Yeung, 2012). Importantly, recent examinations of theta power
and the FRN have provided a nuanced account about their relationship
(Hajihosseini and Holroyd, 2013). Under this account, unexpected,
task-relevant events elicit an ACC-dependent control process that manifests in the frequency domain as theta oscillations over frontal-central
areas of the scalp (Cavanagh and Frank, 2014). In the time domain,
the “evoked” portion of this theta activity that is consistent in phase
across trials gives rise to the FRN (Hajihosseini and Holroyd, 2013; see
also Yeung et al., 2004). Although both measures provide valuable information about cognitive function of frontal midline cortex, it has recently
been argued that FRN amplitude is specifically sensitive to dopamine reinforcement learning signals whereas evoked theta power reflects the
ACC response to unexpected events.
Given the relationship between theta oscillations and FRN, it is perhaps surprising that neutral feedback has not yet been investigated in
the time-frequency domain. Furthermore, because of the inconsistency
in FRN studies examining neutral operants, the functional role of the ACC
in the cognitive processes that underlie reinforcement learning remains
incomplete. Thus, in order to further our understanding of the cognitive
processes underlying informative and non-informative feedback during
RL, the electrophysiological response to the good, the bad, and the neutral needs to be further characterized. In the present study, we present
an harmonious application of both ERP (i.e. FRN) and time-frequency
(i.e. evoked and induced delta, theta, and beta-gamma power) approaches in an attempt to elucidate the discrete aspects of the electrophysiological dynamics between positive, negative, and neutral
feedback (Holroyd et al., 2012).
2. Methods
2.1. Participants and procedure
Nineteen undergraduate students (eight males) aged 18–23 years
participated in the experiment for monetary compensation. All participants had normal or corrected-to normal vision, were right-handed
and had no neurological or psychological disorders. Two subjects were
excluded out from the final analysis due to their poor behavioral performance. The study was approved by the local ethics committee. Participants were asked to perform a time estimation task (e.g. Miltner et al.,
1997) that included neutral feedback. They were required to press the
spacebar following a cue (a 1500 Hz sound that lasted 50 ms) to indicate
that their estimate of 1 s had elapsed. Following their response, a feedback stimulus appeared on the screen indicating whether their estimation was correct (positive feedback, win 5 cents; a circle with a check
mark), incorrect (negative feedback, 0 cents; a circle with a cross
mark) or the feedback was absent (neutral feedback, either 5 or 0
cents: a circle with nothing inside). To note, participants did not know
whether or not they received money following neutral feedback immediately, but would receive money for correct response (in total, 50% of
the trials) at the end of the experiment. Participants were told that if
their reaction time was within the time window from 900 ms to
1100 ms, they would receive positive feedback; otherwise they would
get negative feedback. However, this time window narrowed by
10 ms if they responded correctly on the previous trial and widened
by 10 ms if they responded incorrectly on the previous trial. For 1/3 of
negative-feedback trials and 1/3 of positive-feedback trials, the appropriate feedback was randomly replaced with neutral feedback. Of the
288 trials, participants received 28% neutral feedback, 35% positive feedback and 37% negative feedback in total.
P. Li et al. / International Journal of Psychophysiology 107 (2016) 37–43
2.2. EEG Acquisition
3. Results
The electroencephalogram (EEG) was recorded at 64 scalp sites
using tin electrodes mounted in an elastic cap (Brain Product, Munchen,
Germany), with a ground electrode placed on the frontal midline and
references placed on the left and right mastoids. Vertical electrooculograms (EOGs) were recorded supra-orbitally and infra-orbitally relative
to the left eye. The horizontal EOG was recorded as the difference in activity from the right versus the left orbital rim. The impedance of all
electrodes was kept below 10 kΩ. The EEG and EOG were amplified
using a 0.05–100 Hz bandpass and continuously digitized at 500 Hz/
channel for offline analysis. The EEG data were further filtered offline
(0.1–40 Hz bandwidth) for ERP analysis. Then, ocular artifacts were
corrected using the eye movement correction algorithm described by
Gratton et al. (1983). Trials with EOG artifacts (mean EOG voltage exceeding 80 μV) and peak-to-peak deflection and those contaminated artifacts due to amplifier clipping exceeding 80 μV were excluded from
averaging. Less than 5% of trials were rejected after preprocessing in
each of the three conditions.
3.1. Trial-to-trial behavioral adjustment
39
The raw reaction time (RT) does not provide any more information
than accuracy in this kind of time estimation task. However, the absolute value of change in RT (△ RT) between the N trial and N + 1 trial
gives a measure of the behavioral adjustment following each of the
three types of feedback, and controls for slow changes in performance
over time (Dutilh et al., 2012). A one way ANOVA was conducted on
this data with the three types of feedback as independent variables. As
showed in Fig. 1A, results showed that the main effect of feedback valence was significant, F(2, 34) = 90.9 p b 0.001, η2 = 0.85. The following
pairwise comparisons revealed that △ RT following negative feedback
(230 ± 37 ms) was significantly larger than △ RT following neutral
feedback (178 ± 43 ms), t(16) = 8.95, p b 0.001, and △ RT following
neutral feedback was significantly larger than △ RT following positive
feedback (144 ± 27 ms), t(16) = 5.05, p b 0.001.
3.2. ERP results
2.3. Time-frequency analysis.
To extract time-frequency information from EEG data associated
with feedback stimulus presentation, 2-s epochs centered on feedback
onset were extracted from the single-trial data. The EEG epochs were
convolved with a complex 7-cycle Morlet wavelet using custom-written
Matlab routines (see Marco-Pallares et al., 2008; Hajihosseini and
Holroyd, 2013) that implement the method described by Lachaux et
al. (1999). Changes in power over time (squared amplitude of the convolution between the signal and the wavelet) in the 1 to 40 Hz frequency range were computed for each single trial and averaged for each
subject and condition before creating grand averages across subjects.
The relative change in the power for each condition was determined
by averaging the baseline activity (100 ms prestimulus) across time
for each frequency and then subtracting the average from each data
point following stimulus presentation for the corresponding frequency.
For each subject, the total power was calculated as the average value of
time-frequency power across single trials and the evoked power was
determined directly from the averaged ERPs, and induced power was
then identified by subtracting the evoked theta power from the total
theta power (Hajihosseini and Holroyd, 2013; Behroozmand et al.,
2015). For statistical purposes, the mean induced and evoked power
was obtained within a 200 ms window following the onset of the feedback stimulus (Delta [1–3 Hz], 300–500 ms; theta [4–8 Hz], 200–
400 ms; beta-gamma [20–40 Hz], 350–550 ms, cf. Cunillera et al.,
2012; HajiHosseini and Holroyd, 2015). All our analyses were restricted
to channel FCz.
2.4. ERP analysis
The FRN was quantified by first segmenting the EEG into 800 ms
epochs time-locked to the feedback stimulus, including a 200 ms baseline preceding the feedback. After baseline correction, FRN amplitude
was evaluated for each participant and feedback condition (positive,
negative, neutral) using a base-to-peak algorithm described in
Holroyd et al. (2006), see also Holroyd et al., 2003; Holroyd et al.,
2008), as follows. First, the most positive voltage within a 160 to
260 ms window following feedback presentation was taken as the
base of the FRN. Then, the sample with the most negative value within
a time window starting from the base of FRN to 400 ms after feedback
presentation was taken as the peak. FRN amplitude was calculated as
the difference between these base and peak amplitudes. The algorithm
assigned 0 μV where no N200 was detected. FRN values at FCz were analyzed with a one-way analyses of variance (ANOVA).
Fig. 1B illustrates the ERPs elicited by the neutral, negative, and positive feedback. A one-way ANOVA analysis on the FRN amplitude with
feedback condition (Negative, Neutral & Positive) as a factor revealed
a main effect of feedback,3 F(2, 32) = 21.09, p b 0.001, η2 = 0.57.
Pairwise comparisons revealed that the FRN following neutral feedback
(M = −7.80 μV, SEM = 1.04) was significantly larger than both FRN following negative feedback (M = −6.23 μV, SEM = 0.94), t(16) = 2.12,
p b 0.05, and positive feedback (M = − 1.76 μV, SEM = 0.39), t(16)
=5.36, p b 0.001. Consistent with previous research, the FRN following
negative feedback was larger than the FRN following positive feedback,
t(16) = 4.52, p b 0.001. The difference waves between negative and
positive and between neutral and positive and corresponding scalp distributions were shown in Fig. 1C and Fig. 1D.
3.3. Time-frequency results
3.3.1. Delta (1–3 Hz)
Fig. 2 shows the time-frequency results for both induced and evoked
power across each feedback condition. Fig. 3 highlights the effect of
feedback on delta activity. A two-way ANOVA on delta activity with
Power (induced, evoked) and Feedback (positive, negative, neutral) revealed a main effect of Feedback, F(2, 32) = 8.81, p = 0.001, η2 = 0.36.
Post-hoc analyses indicated that delta activity overall was characterized
by greater power for positive feedback (M = 0.18 dB, SEM = 0.02) than
negative (M = 0.11 dB, SEM = 0.02), t(16) = 3.96, p b 0.001, and neutral feedback (M = 0.10 dB, SEM = 0.02), t(16) = 3.97, p b 0.001. No differences in delta power were observed between negative and neutral
feedback, t(16) = 0.55 , p = 0.59). The main effect of power didn't
reach significant, F(1, 16) b 1, p = 0.38, η2 = 0.05. There was no significant interaction effect between power and feedback, F(2, 32) = 2.2,
p = 0.13, η2 = 0.12.
3.3.2. Theta (4–8 Hz)
As shown in Fig.2, a two-way ANOVA on theta activity with Power
(induced, evoked) and Feedback (positive, negative, neutral) revealed
a main effect of Power, F(1, 16) = 9.08, p b 0.01, η2 = 0.36, a main effect
of feedback, F(2, 32) = 4.48, p b 0.05, η2 = 0.22, and an interaction between Power and Feedback, F(2, 32) = 9.94, p b 0.005, η2 = 0.38 (Fig.
3
Note that the pattern of results is the same if the FRN is measured by a mean amplitude approach from 220 to 320 ms: the main effect of valence is significant, F(1.2,
19.9) = 24.08, p b 0.001, η2 = 0.60. Pair-wise comparisons further showed that neutral
feedback elicited significant larger FRN (M = 8.63 μV, SEM = 1.4) than the negative feedback (M = 10.88 μV, SEM = 1.62, p b 0.01). The negative feedback also elicited a larger FRN
than positive feedback (M = 16.46 μV, SEM = 2.21, p b 0.001).
40
P. Li et al. / International Journal of Psychophysiology 107 (2016) 37–43
Fig. 1. (A) Bar showed averaged changed RT cross trials for each condition, the error bars represents standard error; (B) Grand average ERP at FCz associated with neutral (green line),
negative (red line), and positive (blue line) feedback; (C) Difference wave between neutral and positive condition (Neutral-Positive DW) and difference wave between negative and
positive condition (Negative-Positive DW); (D) Scalp distributions of difference waves. (For interpretation of the references to color in this figure legend, the reader is referred to the
web version of this article.)
2A). Post-hoc analyses indicated that theta was characterized by greater
induced power (M = 0.60 dB, SEM = 0.09) than evoked power (M =
0.36 dB, SEM = 0.06), t(16) = 3.01, p b 0.01. In regards to Feedback,
theta activity overall was characterized by reduced power for positive
feedback (M = 0.41 dB, SEM = 0.07) relative to negative feedback
(M = 0.55 dB, SEM = 0.09), t(16) = 2.68, p b 0.02, and neutral (M =
0.48 dB, SEM = 0.07), t(16) = 2.56, p b 0.03. Theta power following negative feedback didn't differ from that following neutral feedback,
t(16) = 1.26, p = 0.23. More importantly, there was a significant interaction effect between Power and Feedback, post-hoc tests indicated that
evoked power following positive feedback (M = 0.16 dB, SEM = 0.03)
was significantly reduced compared to negative (M = 0.49 dB,
SEM = 0.10; t(16) = 3.82, p = 0.002) and neutral (M = 0.42 dB,
SEM = 0.06; t(16) = 5.20, p b 0.001) feedback. No differences were observed between evoked power following neutral and negative feedback,
t(16) = 1.32, p = 0.21. By contrast, induced power following positive
feedback was marginally significantly larger (M = 0.65 dB, SEM =
0.12) than neutral feedback (M = 0.54 dB, SEM = 0.09), t(16) = 1.96,
p = 0.07. No other differences were detected (p N 0.05).
3.3.3. Beta-Gamma (20–40 Hz)
A two-way ANOVA on beta-gamma activity with Power (induced,
evoked) and Feedback (positive, negative, neutral) revealed a main effect of Feedback, F(2, 32) = 8.38, p = 0.001, η2 = 0.34, and an interaction between Power and Feedback, F(2, 32) = 8.41, p = 0.001, η2 =
0.35. The interaction (Fig. 3A & Fig. 3B) indicates that induced power
Fig. 2. The time frequency representations were shown in Fig. 2: (A) Induced power following positive (left panel), negative (middle panel) and neutral (right panel) feedback; (B) Evoked
power following positive (left panel), negative (middle panel) and neutral (right panel) feedbacks; All data recorded at channel FCz.
P. Li et al. / International Journal of Psychophysiology 107 (2016) 37–43
41
Fig. 3. The time course of power change of delta, theta and beta-gamma band for positive, negative and neutral feedback: (A) The time course of the change in induced delta (left panel),
induced theta (middle panel), and induced beta-gamma (right panel) power associated with neutral (green solid line), negative (red solid line), and positive (blue solid line) feedback (B)
The time course of the change in evoked delta (left panel), evoked beta-gamma (middle panel) and evoked-theta (right panel) power associated with neutral (green dash line), negative
(red dash line), and positive (blue dash line) feedback, All data recorded at channel FCz. (For interpretation of the references to color in this figure legend, the reader is referred to the web
version of this article.)
following neutral feedback (M = −0.043 dB, SEM = 0.039) was significantly reduced compared to negative (M = .064 dB, SEM = 0.040),
t(16) = 2.74, p b 0.02, and positive (M = 0.179 dB, SEM = 0.069),
t(16) = 3.47, p = 0.003, feedback. Induced beta-gamma power following positive feedback was larger than negative feedback, but the difference was only marginally significant t(16) = 2.05, p = 0.057. No
significant difference was observed between the three feedback conditions for evoked power (all p N 0.05). As well, the main effect of Power
was not significant, F(1, 16) = 2.65, p = 0.12, η2 = 0.14. Post-hoc analyses indicated that beta-gamma activity overall was characterized by
greater power for positive feedback (M = 0.18 dB, SEM = 0.07) than
negative (M = 0.06 dB, SEM = 0.04), t(16) = 2.11, p = 0.051, and neutral (M = −0.04 dB, SEM = 0.04), t(16) = 3.19, p b 0.005, feedback. Finally, negative feedback elicited a stronger beta-gamma response than
the neutral feedback, t(16) = 2.65, p b 0.02.
4. Discussion
The purpose of this study was to apply both a time-frequency and
ERP analysis to the electrophysiological activity elicited by neutral feedback during an RL paradigm. Our ERP analysis revealed that neutral
feedback elicited a larger FRN than negative feedback, replicating
some previous work (Huang and Yu, 2014; Yu and Zhou, 2006), but inconsistent with other work (Holroyd et al., 2006). Our time-frequency
analysis revealed three oscillatory frequencies that were sensitive to
our feedback manipulation, theta, delta, and beta-gamma.
In our time-frequency analysis, the theta and delta bands exhibited a
similar pattern of sensitivity to feedback, whereby both frequency
bands distinguished positive feedback from each of negative and neutral
feedback, but did not distinguish between negative and neutral feedback. This similarity is in line with work from Bernat and colleagues,
demonstrating that the time-domain FRN is produced by differences
in both theta and delta power (Bernat et al., 2011; Bernat et al., 2015).
However, we observed differences in the time-domain FRN between
neutral and negative feedback, suggesting an additional layer to the
FRN that cannot be explained solely by theta and delta power. In contrast to theta and delta activity, beta-gamma activity differentiated neutral feedback from informative feedback, and thus we speculate that the
neutral-feedback FRN is produced by an interaction of two effects on
EEG oscillations: valence effects (theta and delta), and informative versus uninformative feedback effects (beta-gamma).
Perhaps our most important finding is that induced beta-gamma
power dissociated between uninformative feedback (i.e. neutral feedback) and informative feedback. Interestingly, the lack of induced
beta-gamma to neutral feedback coincided with the pattern of behavioral adjustments following feedback. In particular, subjects did not adjust their behavior systematically following neutral feedback, but
tended to keep their response pattern following positive feedback, and
made large adjustments in response time following negative feedback.
The beta-gamma range has been associated with behavioral adjustments in other work (e.g. Cunillera et al., 2012; Van de Vijver et al.,
2011), and has been shown to be more sensitive to valence in trialand-error learning tasks that require ongoing adjustments of behavior
(such as the time-estimation task) (HajiHosseini and Holroyd, 2015).
Notably, it has been suggested that (total) beta-gamma activity represents a “motivational value signal” (HajiHosseini et al., 2012, p.1683),
and that beta-gamma increases are representative of increased attentional resources being applied to particularly important events. In this
context, our findings support the idea that induced beta-gamma reflects
a manifestation of a motivational value signal that energizes behavioral
adjustments when feedback is meaningful, but that is absent when
feedback provides no task-relevant information. Therefore, based on
previous literature and on our beta-gamma effects, we speculate that induced beta-gamma activity reflects a motivational response to feedback
(HajiHosseini and Holroyd, 2015) and active inhibition/disinhibition of
motor commands (Cavanagh and Frank, 2014), possibly highlighting a
top-down modulatory role during RL. The nature of induced delta and
beta-gamma may provide further insight into the neural computations
supporting RL.
An influential theory holds that the FRN is produced by the impact of
phasic increases and decreases in dopamine activity coding for positive
and negative reward prediction error signals (RPE) on ACC (RPE-ACC theory; Holroyd and Coles, 2002; Holroyd et al., 2008). RPEs constitute the
learning term in powerful reinforcement learning algorithms that indicate when events are “better” or “worse” than expected (Sutton and
Barto, 1998) and substantial evidence over the past decade has confirmed
that the FRN reflects an RPE signal (for reviews see Walsh and Anderson,
2012; Sambrook and Goslin, 2015). The RPE-ACC theory holds that FRN
42
P. Li et al. / International Journal of Psychophysiology 107 (2016) 37–43
amplitude is regulated up and down by dopamine RPE signals conveyed
to ACC. In particular, positive dopamine RPE signals conveyed to the
ACC following unexpected positive feedback suppresses the production
of the FRN whereas negative dopamine RPE signals enhance FRN amplitude (Holroyd et al., 2008; see Proudfit, 2015 for review).
The FRN has been shown to categorically distinguish between positive RPEs and negative RPEs, showing a more negative voltage for the
latter. The RPE-ACC theory requires the FRN to show two further properties beyond this categorical distinction. That is, the FRN should be sensitive to both the prior likelihood of reward (likely, unlikely) and
outcome magnitude (how much better or worse than expected value
an outcome is). With this in mind, the larger FRN following neutral feedback suggests that the size of the negative RPE varied as either a function of outcome magnitude or outcome likelihood, or possibly both. In
regards to an effect of magnitude, it is possible that participants came
to view neutral feedback a “much” worse than expected because neutral
feedback gave no useable information. Alternatively, it is also possible
that feedback likelihood modulated the negative RPE size following
neutral feedback, resulting in a larger FRN amplitude. Because of the dynamics of trial-and-error learning tasks, subjects may have come to expect positive or negative feedback to follow their responses and
categorized uninformative feedback as a relatively rare event (1/3 of trials vs. 2/3 of trials giving informative feedback).
We note two potential limitations of our study. First, our analysis
was based on data from 17 subjects. Though this sample size is on par
and even exceeds most ERP studies, it is small relative to the recent
trend of increasing sample sizes in ERP research. Second, our neutral
feedback stimulus was relatively unique compared to the feedback in
the positive and negative conditions, which could provoke a low-level
salience reaction in our subjects that would not be seen to negative or
positive feedback. That being said, the effects we report were exhibited
later in the EEG than components typically associated with low-level
physical features of stimuli, such as the N1 (Luck et al., 1990).
5. Conclusion
To the best of our knowledge, a time-frequency and time-domain
analyses have yet to be utilized together to investigate neutral operants
during RL. Using neutral feedback during RL, we present further evidence of the role of induced beta-gamma oscillations as a motivational
signal for behavioral adjustment (HajiHosseini et al., 2012), and argue
that the FRN cannot be explained as an effect of theta oscillatory activity
alone (Cavanagh and Frank, 2014). Our results suggest that the FRN also
contains power from other frequency bands, including delta power
(Bernat et al., 2011; Bernat et al., 2015), and beta-gamma. Lastly, these
findings motivate further study of the role of these electrophysiological
signals to neutral feedback in understanding individual differences (e.g.
Hirsh and Inzlicht, 2008; Gu et al., 2010; Li et al., 2015) and psychopathology (e.g. Proudfit, 2015; Baker et al., 2011; Morris et al., 2008) associated with RL. For instance, according to the “aberrant salience”
hypothesis, schizophrenia patients attribute salience to otherwise neutral environmental stimuli, and those stimuli may ultimately appear
meaningful and evoke delusional mood in patients (for review, see
Deserno et al., 2013). These issues are ripe for future investigations.
Acknowledgment
This work was supported by the National Natural Science Foundation
of China (NSFC31300872&81171289), and MOE (Ministry of Education
in China) Project of Humanities and Social Sciences (13YJC190013).
References
Baker, T.E., Stockwell, T., Barnes, G., Holroyd, C.B., 2011. Individual differences in substance dependence: at the intersection of brain, behaviour and cognition. Addict.
Biol. 16, 458–466.
Behroozmand, R., Ibrahim, N., Korzyukov, O., Robin, D.A., Larson, C.R., 2015. Functional
role of delta and theta band oscillations for auditory feedback processing during
vocal pitch motor control. Front. Neurosci. 9.
Bernat, E.M., Malone, S.M., Williams, W.J., Patrick, C.J., Iacono, W.G., 2007. Decomposing
delta, theta, and alpha time–frequency erp activity from a visual oddball task using
pca. Int. J. Psychophysiol. 64 (1), 62–74.
Bernat, E.M., Nelson, L.D., Steele, V.R., Gehring, W.J., Patrick, C.J., 2011. Externalizing psychopathology and gain–loss feedback in a simulated gambling task: dissociable components of brain response revealed by time-frequency analysis. J. Abnorm. Psychol.
120 (2), 352–364.
Bernat, E.M., Nelson, L.D., Baskin-Sommers, A.R., 2015. Time-frequency theta and delta
measures index separable components of feedback processing in a gambling task.
Psychophysiology 52 (5), 626–637.
Catania, A.C., 1999. Thorndike's legacy: learning, selection, and the law of effect. J. Exp.
Anal. Behav. 72 (3), 425–428.
Cavanagh, J.F., 2015. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. Neuroimage 110, 205–216.
Cavanagh, J.F., Frank, M.J., 2014. Frontal theta as a mechanism for cognitive control.
Trends Cogn. Sci. 18 (8), 414–421.
Cavanagh, J.F., Frank, M.J., Klein, T.J., Allen, J.J., 2010. Frontal theta links prediction
errors to behavioral adaptation in reinforcement learning. NeuroImage 49 (4),
3198–3209.
Cavanagh, J.F., Figueroa, C.M., Cohen, M.X., Frank, M.J., 2012. Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. Cereb. Cortex 22,
2575–2586.
Chen, C.C., Kiebel, S.J., Kilner, J.M., Ward, N.S., Stephan, K.E., Wang, W.J., Friston, K.J., 2012.
A dynamic causal model for evoked and induced responses. NeuroImage 59 (1),
340–348.
Cohen, M.X., Elger, C.E., Ranganath, C., 2007. Reward expectation modulates feedback-related negativity and EEG spectra. NeuroImage 35, 968–978.
Cunillera, T., Fuentemilla, L., Periañez, J., Marco-Pallarès, J., Krämer, U.M., Càmara, E., ...
Rodríguez-Fornells, A., 2012. Brain oscillatory activity associated with task switching
and feedback processing. Cogn. Affect. Behav. Neurosci. 12 (1), 16–33.
Deserno, L., Boehme, R., Heinz, A., Schlagenhauf, F., 2013. Reinforcement learning and dopamine in schizophrenia: dimensions of symptoms or specific features of a disease
group? Front. Psychiatry 4, 1–16.
Dutilh, G., Ravenzwaaij, D.V., Nieuwenhuis, S., Han, L.J.V.D.M., Forstmann, B.U.,
Wagenmakers, E.J., 2012. How to measure post-error slowing: a confound and a simple solution. J. Math. Psychol. 56 (3), 208–216.
Gilbert, C.D., Sigman, M., 2007. Brain states: top-down influences in sensory processing.
Neuron 54 (5), 677–696.
Gratton, G., Coles, M.G., Donchin, E., 1983. A new method for offline removal of ocular artifact. Electroencephalogr. Clin. Neurophysiol. 55, 468–484.
Gu, R., Ge, Y., Jiang, Y., Luo, Y.J., 2010. Anxiety and outcome evaluation: the good, the bad
and the ambiguous. Biol. Psychol. 85, 200–206.
Hajihosseini, A., Holroyd, C.B., 2013. Frontal midline theta and N200 amplitude reflect
complementary information about expectancy and outcome evaluation. Psychophysiology 50, 550–562.
HajiHosseini, A., Holroyd, C.B., 2015. Sensitivity of frontal beta oscillations to reward valence but not probability. Neurosci. Lett. 602, 99–103.
HajiHosseini, A., Rodríguez-Fornells, A., Marco-Pallarés, J., 2012. The role of beta-gamma
oscillations in unexpected rewards processing. NeuroImage 60 (3), 1678–1685.
Hirsh, J.B., Inzlicht, M., 2008. The devil you know neuroticism predicts neural response to
uncertainty. Psychol. Sci. 19, 962–967.
Holroyd, C.B., Coles, M.G.H., 2002. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109,
679–709.
Holroyd, C.B., Yeung, N., 2012. Motivation of extended behaviors by anterior cingulate
cortex. Trends Cogn. Sci. 16 (2), 122–128.
Holroyd, C.B., Nieuwenhuis, S., Yeung, N., Cohen, J.D., 2003. Errors in reward prediction
are reflected in the event-related brain potential. Neuroreport 14 (18), 2481–2484.
Holroyd, C.B., Hajcak, G., Larsen, J.T., 2006. The good, the bad and the neutral: electrophysiological responses to feedback stimuli. Brain Res. 1105, 93–101.
Holroyd, C.B., Pakzad-Vaezi, K.L., Krigolson, O.E., 2008. The feedback correct-related positivity: sensitivity of the event-related brain potential to unexpected positive feedback. Psychophysiology 45 (5), 688–697.
Holroyd, C.B., HajiHosseini, A., Baker, T.E., 2012. ERPs and EEG oscillations, best friends
forever: comment on Cohen et al. Trends Cogn. Sci. 16 (4), 192.
Hsieh, L.T., Ranganath, C., 2014. Frontal midline theta oscillations during working memory
maintenance and episodic encoding and retrieval. NeuroImage 85, 721–729.
Huang, Y., Yu, R., 2014. The feedback-related negativity reflects “more or less” prediction
error in appetitive and aversive conditions. Front. Neurosci. 8.
Kujawa, A., Smith, E., Luhmann, C., Hajcak, G., 2013. The feedback negativity reflects favorable compared to nonfavorable outcomes based on global, not local, alternatives. Psychophysiology 50 (2), 134–138.
Lachaux, J.P., Rodriguez, E., Martinerie, J., Varela, F.J., 1999. Measuring phase synchrony in
brain signals. Hum. Brain Mapp. 8, 194–208.
Lee, K.H., Williams, L.M., Breakspear, M., Gordon, E., 2003. Synchronous gamma activity: a
review and contribution to an integrative neuroscience model of schizophrenia. Brain
Res. Rev. 41 (1), 57–78.
Li, P., Song, X., Wang, J., Zhou, X., Li, J., Lin, F., et al., 2015. Reduced sensitivity to neutral
feedback versus negative feedback in subjects with mild depression: evidence from
event-related potentials study. Brain Cogn. 100, 15–20.
Luck, S.J., Heinze, H.J., Mangun, G.R., Hillyard, S.A., 1990. Visual event-related potentials
index focused attention within bilateral stimulus arrays. II. Functional dissociation
of P1 and N1 components. Electroencephalogr. Clin. Neurophysiol. 75 (6), 528–542.
P. Li et al. / International Journal of Psychophysiology 107 (2016) 37–43
Marco-Pallares, J., Cucurell, D., Cunillera, T., García, R., Andrés-Pueyo, A., Münte, T.F.,
Rodríguez-Fornells, A., 2008. Human oscillatory activity associated to reward processing in a gambling task. Neuropsychologia 46, 241–248.
Miltner, W.H.R., Braun, C.H., Coles, M.G.H., 1997. Event-related brain potentials following
incorrect feedback in a time-estimation task: evidence for a “generic” neural system
for error detection. J. Cogn. Neurosci. 9, 788–798.
Mitchell, D.J., McNaughton, N., Flanagan, D., Kirk, I.J., 2008. Frontal-midline theta from the
perspective of hippocampal “theta”. Prog. Neurobiol. 86 (3), 156–185.
Morris, S.E., Heerey, E.A., Gold, J.M., Holroyd, C.B., 2008. Learning-related changes in brain
activity following errors and performance feedback in schizophrenia. Schizophr. Res.
99, 274–285.
Müller, S.V., Möller, J., Rodriguez-Fornells, A., Münte, T.F., 2005. Brain potentials related to
self-generated and external information used for performance monitoring. Clin.
Neurophysiol. 116, 63–74.
Proudfit, G.H., 2015. The Reward Positivity: From Basic Research on Reward to a Biomarker for Depression (Psychophysiology).
Sambrook, T.D., Goslin, J., 2015. A neural reward prediction error revealed by a meta-analysis of ERPs using great grand averages. Psychol. Bull. 141 (1), 213–235.
Skinner, B.F., 1938. The Behavior of Organisms: An Experimental Analysis. Appleton-Century,
New York.
43
Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: An Introduction (Vol. 1, No. 1).
MIT press, Cambridge.
Tallon-Baudry, C., Bertrand, O., 1999. Oscillatory gamma activity in humans and its role in
object representation. Trends Cogn. Sci. 3, 151–162.
Van de Vijver, I., Ridderinkhof, K.R., Cohen, M.X., 2011. Frontal oscillatory dynamics predict feedback learning and action adjustment. J. Cogn. Neurosci. 23 (12), 4106–4121.
Walsh, M.M., Anderson, J.R., 2012. Learning from experience: event-related potential correlates of reward processing, neural adaptation, and behavioral choice. Neurosci.
Biobehav. Rev. 36 (8), 1870–1884.
Warren, C.M., Holroyd, C.B., 2012. The impact of deliberative strategy dissociates ERP
components related to conflict processing vs. reinforcement learning. Front. Neurosci.
6 (43), 1–17.
Weinberg, A., Riesel, A., Proudfit, G.H., 2014. Show me the money: the impact of actual rewards and losses on the feedback negativity. Brain Cogn. 87, 134–139.
Yeung, N., Bogacz, R., Holroyd, C.B., Cohen, J.D., 2004. Detection of synchronized oscillations in the electroencephalogram: an evaluation of methods. Psychophysiology 41
(6), 822–832.
Yu, R., Zhou, X., 2006. Brain potentials associated with outcome expectation and outcome
evaluation. Neuroreport 17, 1649.