Drift diffusion model of reward and punishment learning in

Behavioural Brain Research 291 (2015) 147–154
Contents lists available at ScienceDirect
Behavioural Brain Research
journal homepage: www.elsevier.com/locate/bbr
Research report
Drift diffusion model of reward and punishment learning in
schizophrenia: Modeling and experimental data
Ahmed A. Moustafa a,∗ , Szabolcs Kéri b,c,d , Zsuzsanna Somlai e , Tarryn Balsdon a ,
Dorota Frydecka f , Blazej Misiak f,g , Corey White h
a
School of Social Sciences and Psychology, Marcs Institute for Brain and Behaviour, University of Western Sydney, Penrith, NSW, Australia
Nyírő Gyula Hospital–National Institute of Psychiatry and Addictions, Budapest, Hungary
c
University of Szeged, Faculty of Medicine, Department of Physiology, Szeged, Hungary
d
Budapest University of Technology and Economics, Department of Cognitive Science, Hungary
e
Semmelweis University, Department of Psychiatry and Psychotherapy, Budapest, Hungary
f
Wroclaw Medical University, Department and Clinic of Psychiatry, Wroclaw, Poland
g
Wroclaw Medical University, Department of Genetics, Wroclaw, Poland
h
Department of Psychology, Syracuse University, Syracuse, NY, USA
b
h i g h l i g h t s
•
•
•
•
It is the first drift diffusion model of behavioral data from schizophrenia patients.
Unlike controls, schizophrenia patients show punishment learning deficits.
Schizophrenia patients show slow motor/encoding time.
Unlike controls, schizophrenia patients use a strategy favoring accuracy over speed.
a r t i c l e
i n f o
Article history:
Received 22 January 2015
Received in revised form 5 May 2015
Accepted 13 May 2015
Available online 22 May 2015
Keywords:
Schizophrenia
Reinforcement learning
Decision making
Reward
Punishment
Drift diffusion model (DDM)
a b s t r a c t
In this study, we tested reward- and punishment learning performance using a probabilistic classification
learning task in patients with schizophrenia (n = 37) and healthy controls (n = 48). We also fit subjects’ data
using a Drift Diffusion Model (DDM) of simple decisions to investigate which components of the decision
process differ between patients and controls. Modeling results show between-group differences in multiple components of the decision process. Specifically, patients had slower motor/encoding time, higher
response caution (favoring accuracy over speed), and a deficit in classification learning for punishment,
but not reward, trials. The results suggest that patients with schizophrenia adopt a compensatory strategy of favoring accuracy over speed to improve performance, yet still show signs of a deficit in learning
based on negative feedback. Our data highlights the importance of applying fitting models (particularly
drift diffusion models) to behavioral data. The implications of these findings are discussed relative to
theories of schizophrenia and cognitive processing.
© 2015 Elsevier B.V. All rights reserved.
1. Introduction
International diagnostic systems classify schizophrenia (SZ) as
a psychotic disorder, with several positive symptoms such as delusions and hallucinations, as well as negative symptoms, such as
affective flattening, alogia or avolition. However, cognitive deficits
∗ Corresponding author at: Department of Veterans Affairs, New Jersey Health
Care System, East Orange, New Jersey, United States of America.
E-mail address: [email protected] (A.A. Moustafa).
http://dx.doi.org/10.1016/j.bbr.2015.05.024
0166-4328/© 2015 Elsevier B.V. All rights reserved.
are also increasingly recognized as the core component of SZ symptomatology. These deficits occur in multiple domains of cognitive
functioning, with moderate to large effect sizes for impairments
across memory, motor performance, attention, IQ, executive function and working or verbal memory, compared to controls [1].
Notably, these deficits precede the onset of overt psychosis and
are a risk factor for the onset of SZ [2]. Several lines of evidence
also indicate that cognitive impairment may predict functional
outcomes, such as self-care, community functioning, and social
problem solving and furthermore, that cognitive impairment may
be a better predictor of these outcomes than psychotic symptoms
[3,4].
148
A.A. Moustafa et al. / Behavioural Brain Research 291 (2015) 147–154
Research has indicated that SZ patients show learning and decision making deficits, especially in the context of rewards and
punishment. A deficit in updating the expected value of choices,
especially loss, and disruption in associative learning underlying
the representation of expectancies has been shown in the Iowa
Gambling task (IGT) [5–7], the Monetary Incentive Delay task
[8], the Wisconsin Card Sorting Task (WCST) [9], delayed reward
discounting, and reinforcement learning paradigms (Gold et al.
[19]). However, the deficit does not present in the same manner as the loss insensitivity of an orbitofrontal cortex lesion, as,
in the WCST, patients do not always select significantly less from
advantageous decks (as seen in patients with orbitofrontal cortex lesions [10]. Rather patients make more perseveration errors,
indicating a role of learning or as Shurman et al. suggest, working memory (see also [11,12]). Furthermore, imaging studies have
revealed reduced activity in the ventral striatum during the anticipation of gain or loss compared with normal controls [13,14],
and reduced error-related negativity (ERN) in probabilistic learning tasks (Morris and co-workers) indicating an underlying deficit
in signaling prediction errors for value based learning and decision
making.
However, there are also inconsistent results. For instance, Morris et al. [15], later found that although response-related ERN was
reduced in SZ patients compared to controls, their feedback-related
ERN was intact. Furthermore, whilst Polli et al. [16] found that
SZ patients could immediately correct their errors in an antisaccade task, a later replication of this result accompanied by fMRI
[16] showed reduced error-related activity in both dorsal and rostral Anterior Cingulate Cortex (ACC) (even once medication was
taken account of), which has been associated with perseveration
errors. Some studies have found little to no effect of reward. For
instance, Waltz et al. [17] found that whilst SZ patients showed
reduced activity for reward in a passive conditioning task, they
showed intact responses to unexpected reward omissions, which
is supported by a further conditioning experiment by Dowd and
Barch [18]. Of note these experiments did not require participants
to make value based decision. The differences in these results and
even behavioral studies emphasize the need for a more subtle
understanding of what is going on; one cannot simply compare a
saccade task with a reward learning task with a conditioning task.
Clearly there are instances in which SZ patients show significantly
different behavior and neural responses from normal controls and
it is important to understand the nuances of what might be driving these differences in some cases, and why they lay dormant in
others.
In an attempt to account for abnormalities in reward learning,
Gold et al. [19] compared performance of SZ patients and normal
controls on a number of various tasks, including the International
Affective Picture System ratings, delayed reward discounting, the
Wisconsin card sorting task, rapid reversal learning and reinforcement learning paradigms. Their results indicated that processing
deficits may be explained by an inability to fully represent value.
Gold et al. [19] linked this internal representational difference to
the differences in patients’ pleasantness ratings when they are
asked to imagine a scene to when they are shown the scene as
a picture. Patients displayed normal positive emotion when presented with visual stimuli, but displayed poor pleasantness ratings
when asked to imagine a scene, suggesting further that reports of
anhedonia in SZ patients may come down to testing procedures
that require a level of value representation that is impaired in
patients.
A further avenue of research to discern the nature of cognitive impairments in SZ patients may be found by deconstructing
the decision making process to further dissociate at what stage SZ
patients differ from normal controls. In this study, we applied a drift
diffusion model (DDM), to behavioral data from patients with SZ
to understand the information processing mechanism of impaired
learning and decision making. We hypothesized that SZ patients
will be impaired at learning from negative feedback, as suggested
by prior studies, although here, we use a DDM model that takes
into account both accuracy and reaction time. Further, given prior
studies and observations of general slowness and motor impairments (see [49]) in SZ patients, we hypothesize that patients will
show a combination of increased response caution (favoring accuracy over speed) and slower motor execution time in comparison to
controls.
1.1. Drift diffusion models
When comparing task performance between groups, it is important to note that multiple decision components could differ among
participants. Thus, for example, observing slower responses for the
SZ group could be indicative of a difference in response caution
rather than a deficit in reward learning. In these situations, reaction time models like the drift diffusion model (DDM) [20] can be
fitted to data to circumvent this problem. Notably, DDM can include
parameters that can be mapped on to psychological constructs,
allowing researchers to make comparisons of the intactness or disruption of different decision components in ways not possible with
behavioral data alone.
Because DDM is mathematically specified, it makes precise predictions about how the different components relate to reaction time
and accuracy. Importantly, this process can be inverted, whereby
observed behavioral data are fitted with DDM to estimate the values
of the decision components driving the behavior. This technique
has been ubiquitously applied to investigate processing differences
across a range of domains [21,22] (White et al.). By estimating the
values of the decision components for each participant, researchers
are able to make group comparisons of these psychologically meaningful parameters.
There are two main advantages to a DDM analysis over traditional RT or accuracy comparisons: increased specificity and
increased sensitivity. For specificity, the DDM allows identification
of which decision components account for behavioral performance
in the task. For example, slower RT could be due to slow motor
response (non-decision time), increased caution (boundary separation), and/or poorer task performance (drift rates). The model
allows these components to be separately estimated to disentangle how they contribute to the observed behaviour. For sensitivity,
past work has shown that DDM parameters are more sensitive to
small differences than RT or accuracy. Several studies have shown
that DDM parameters can detect differences that are not significant in the behavioural data (see [23,24]). This is because DDM
controls for the effects of each decision component, meaning that
any differences in response caution or bias are controlled for when
estimating task performance (i.e., drift rates, see [24,25]). For example, imagine that Participant A has poorer learning ability than
Participant B, but is more cautious when responding. This could
lead to equivalent accuracy between them, as the lower accuracy
from poor learning is offset by the higher accuracy from increased
caution. Thus comparing accuracy values alone is insufficient to
detect processing differences between them. In contrast, using
the DDM approach circumvents this problem because it estimates
multiple components of the process simultaneously, allowing
the conclusion that the participants differ in both caution and
learning.
In this regard, DDM provides a principled method for comparing
different aspects of the decision process between SZ patients and
controls. This DDM approach was employed in the present study
to investigate which components differ between SZ patients and
controls in the reward and punishment learning task.
A.A. Moustafa et al. / Behavioural Brain Research 291 (2015) 147–154
Table 1
Demographic and clinical characteristics of the participants. Data is expressed
as mean ± SD. The last row shows antipsychotic regimen of the patients with
schizophrenia in our sample; the number represents the number of patients with
administered the corresponding antipsychotic medication. The study was approved
by the local ethics board. After complete description of the study, written informed
consent was obtained.
SZ (n = 37)
CONT (n = 48)
Statistical
difference
M/F
15/22
18/30
Age (years)
36.8 (10.2)
37.4 (11.0)
Education (years)
12.6 (2.5)
12.8 (3.0)
chisquare = 0.04,
d.f. = 1 (p = 0.85)
t = 0.34, d.f. = 83
(p = 0.74)
t = 0.27, d.f. = 83
(p = 0.79)
Duration of illness
(years)
Number of episodes
GAF
PANSS positive
PANSS negative
PANSS general
Medications
8.3 (6.6)
5.4 (4.7)
55.0 (19.3)
12.6 (4.8)
15.9 (6.9)
27.3 (8.1)
clozapine (3),
olanzapine (11),
risperidone (8),
aripiprazole (2),
amisulpride (2),
haloperidol (2)
flupenthixol (4),
clozapine + haloperidol
(2), quetiapine + risperidone
(3)
2. Methods
Participants were 37 patients with schizophrenia and 47 healthy
control volunteers with no psychiatric history. The patients were
recruited at Department of Psychiatry and Psychotherapy (Semmelweis University, Budapest, Hungary). The patients participated
in a psychosocial rehabilitation program and were not in an acute
psychotic state at the time of testing. The control volunteers were
hospital or university employees and their acquaintances who were
matched to the patients for age, gender, and education (Table 1, all
p’s > 0.05). A diagnosis of schizophrenia was based on the DSMIV criteria [26]. All participants received the Mini International
Neuropsychiatric Interview Plus (MINI-Plus) [27]. Detailed medical
records were available from all patients. Subjects with substance
or alcohol use disorders were excluded from the study. General
functioning was assessed with the Global Assessment of Functioning (GAF) scale [26]. Clinical symptoms were evaluated with
the Positive and Negative Symptoms Scale (PANSS) [28] (Table 1).
These scales were administered by trained clinicians (Z.S. and S.K.)
who were blind to reward- and punishment-learning data at the
time of clinical assessment (inter-rater reliability: Cohen’s kappa
and inter-rater correlation r > 0.7). Assessment of the patients
was based on individual interviews with the patients and with
one of their family members. Patients and controls were matched
for tobacco smoking (30% of participants were heavy smokers in
both groups) because smoking may have an influence on rewardlearning [29]. Antipsychotic medications used by the patients were
shown in Table 1. The average daily value of chlorpromazineequivalent antipsychotic dose was 378.6 mg (S.D. = 236.0) [30].
2.1. Reward vs. punishment learning task
We used the same task as employed in prior studies [31–35]. On
each trial, participants viewed one of four images (S1–S4) (Fig. 1),
149
and were asked to guess whether it belonged to category A or
category B. Stimuli S1 and S3 belonged to category A with 80%
probability and to category B with 20% probability, while stimuli
S2 and S4 belonged to category B with 80% probability and to
category A with 20% probability. Stimuli S1 and S2 were used in
the reward-learning task. In this task if the participant correctly
guessed category membership on a trial with either of these stimuli,
a reward of +25 points was received; if the participant guessed
incorrectly, no feedback appeared. Stimuli S3 and S4 were used
in the punishment-learning task. In this task, if the participant
guessed incorrectly on a trial with either of these stimuli, a punishment of −25 was received; correct guesses received no feedback.
The experiment was conducted on a Macintosh i-book, programmed in the SuperCard language (Allegiant Technologies, San
Diego, CA). The participant was seated in a quiet testing room at a
comfortable viewing distance from the screen. The keyboard was
masked except for two keys, labelled “A” and “B” which the participant could use to enter responses. Before the experiment, the
participant received the following instructions: “In this experiment, you will be shown pictures, and you will guess whether those
pictures belong to category “A” or category “B”. A picture does not
always belong to the same category each time you see it. If you
guess correctly, you may win points. If you guess wrong, you may
lose points. You will see a running total of your points as you play.
We will start you off with a few points now. Press the mouse button
to begin practice.”
In the practice phase, the participant received sample trials from
the punishment- and reward-learning tasks. The practice stimuli
were not included in the experiment. The participant saw a practice
image, with a prompt to choose category A or B, and a running
tally of points at the lower right corner of the screen. The tally was
initialized to 500 points at the start of practice. The participant was
first instructed to press the “A” key, which resulted in a punishment
of −25 and updated point tally and then the “B” key, which resulted
in no feedback. The participant then saw a second practice figure
and was instructed first to press the “B” key which resulted in a
reward of +25 and updated point tally and then the “A” key, which
resulted in no feedback.
After these two practice trials, a summary of instructions
appeared: “So . . . For some pictures, if you guess CORRECTLY, you
WIN points (but, if you guess incorrectly, you win nothing). For
other pictures, if you guess INCORRECTLY, you LOSE points (but, if
you guess correctly, you lose nothing). Your job is to win all the
points you can–and lose as few as you can. Remember that the
same picture does not always belong to the same category. Press the
mouse button to begin the experiment.” From here, the experiment
began. On each trial, the participant saw one of the four stimuli
(S1, S2, S3, or S4) and was prompted to guess whether it was an
“A” or a “B”. On trials in the reward-learning task (with stimuli
S1 or S2), correct answers were rewarded with positive feedback
and gain of 25 points; incorrect answers received no feedback. On
trials in the punishment-learning task (with stimuli S3 or S4), incorrect answers were punished with negative feedback and loss of
25 points; correct answers received no feedback. The no-feedback
outcome was thus ambiguous, as it could signal lack of reward (if
received during a trial with S1 or S2) or lack of punishment (if
received during a trial with S3 or S4).
The task contained 160 trials, divided into 4 blocks of 40 trials each. Within a block, trial order was randomized. Training on
the reward-learning task (S1 and S2) and punishment-learning task
(S3 and S4) were intermixed. Within each block, each stimulus
appeared 10 times, 8 times with the more common outcome (e.g.
category “A” for S1 and S3 and “B” for S2 and S4) and 2 times with
the less common outcome.
At the end of the 160 trials, if the participant’s running tally of
points was less than 525 (i.e. no more than the 500 points awarded
150
A.A. Moustafa et al. / Behavioural Brain Research 291 (2015) 147–154
Fig. 1. Screen shot of the reward-punishment task used in the present study. (A) On each trial, the participant saw one of four abstract shapes and was asked whether this
shape belonged to category A or B. (B) For some stimuli (S1 and S2), correct responses were rewarded with visual feedback and 25 points winnings in 80% of the trials,
whereas for other stimuli (S3 and S4), incorrect responses were punished with a loss of 25 points in 80% of the trials (see text for more details).
at the start of the experiment), additional trials were added on
which the participant’s response was always taken as correct, until
the tally was at least 525. This was added in order to minimize frustration in participants by ensuring that all participants terminated
the experiment with more points than they had started with. Data
from any such additional trials were not analyzed. Here, we used
the following timing parameters. The response window was on
until participants respond; this allowed for investigating reaction
time differences among the groups as well as different task conditions, which is important for DDM analyses. Trials were separated
by 1-s intervals. Feedback was on for 1 s, after subject response. The
task duration was about 12–15 min. On each trial, the computer
recorded whether the participant made the optimal response (i.e.
category A for S1 and S3, and category B for S2 and S4), regardless
of actual outcome.
2.2. Drift diffusion model
Importantly, DDM belongs to a class of evidence accumulation
models that posit simple decisions involving the gradual accumulation of noisy evidence until a criterial amount is reached. In the
model, the decision process starts between the two boundaries that
correspond to the response options (Fig. 2). Over time, noisy evidence from the stimulus is sampled and accumulated until the
process reaches a boundary, signaling the commitment to that
response. The time taken to reach the boundary corresponds to
the decision time, and the overall response time is given the decision time plus residual non-decision time. Non-decision time in
the model (Ter), accounts for the duration of processes outside the
decision itself, namely encoding of the stimulus and execution of
the motor response.
In addition to the non-decision time component, DDM has three
primary components that affect decision processing. The distance
between the two boundaries (a-0), gives indices of response caution or speed/accuracy settings. A wide boundary separation means
that more evidence needs to be sampled to reach a boundary, so
responses will be slower. But the decision process is also less likely
to reach the wrong boundary due to noisy evidence, so responses
are simultaneously more accurate. Thus, boundary separation indicates how much evidence is required before committing to the
response and provides a measure of the speed/accuracy tradeoff. The starting point of evidence accumulation (z), indicates a
response bias for one option over the other. If the starting point
is closer to one boundary, less evidence is required to reach that
decision than the alternative. Thus if the starting point is closer
to boundary A, responses for Option A will be more probable and
faster than for Option B. Finally, the drift rate (v) gives an index
of the direction and strength of the stimulus evidence driving the
accumulation process. Positive values of drift rate indicate evidence
for Option A and negative values indicate evidence for Option B.
Further, a large absolute value of drift rate indicates very strong
evidence for that option, which will result in fast responses and a
high probability of choosing that option. The drift rate is tied to the
task at hand, in this case it would indicate how well the participant has learned to correctly classify the stimuli after learning the
reward and punishment contingencies.
A DDM was fitted to each participant’s behavioral data using the
X2 method (R. [36]). The .1, .3, .5, .7, and .9 quantiles of the reaction time distribution were calculated for both correct and error
responses to represent the shape of the distributions. These quantiles were entered into the fitting routine along with the choice
probabilities. Then the fitting routine uses a simplex algorithm [37]
to adjust the parameter values and find the ones that provide the
closest match to the observed data (by minimizing the X2 value).
This process allows for the estimation of the different decision components in the DDM.
A standard DDM was used in this study, with the following
parameters estimated in the fitting routine: boundary separation
(a), non-decision time (ter), starting point (z), and drift rates (v),
for each condition. Additionally, DDM often includes variability
parameters to account for the fact that these values can vary from
trial to trial (see R. [38]). However, these variability parameters
are not well-estimated when there are limited observations (160)
in the data, so they were excluded when fitting the model. The
relatively low number of observations in each learning block (40)
also precludes fitting the model to the learning blocks separately to
assess how the parameters change over the course of the learning
blocks. That is, the 40 observations (10 per condition) in a block
are not sufficient to accurately estimate DDM parameters. Thus,
the model was fit to the overall data to investigate broad level
differences in performance between patients and controls.
To provide a thorough account of the data and statistical results,
each between-group comparison was presented with the t-value,
95% confidence interval, and the Bayes Factor (BF). The latter one
was derived using an online package for a Bayesian t-test [50], with
the effect size set at .5 (small to moderate effect). Calculating BF
A.A. Moustafa et al. / Behavioural Brain Research 291 (2015) 147–154
151
Fig. 2. Schematic of the drift diffusion model (DDM). See text for details.
from these tests provides the relative evidence for or against the
null hypothesis (i.e., no difference between groups). For example,
BF of 3 in favor of the alternative hypothesis indicates that the alternative hypothesis (effect size of .5) is 3 times more likely than the
null hypothesis based on the data. In general, BF of less than 3 indicates weak evidence, BF of 3-8 indicates moderate evidence, and
BF > 10 indicates strong evidence. It should be noted that BF has
the advantages of quantifying the strength of evidence and permitting evidence for the null hypothesis, the latter of which is not
possible with traditional frequentist statistics.
3. Results
3.1. Behavioral results
The choice probabilities and reaction time quantiles, averaged
across all four blocks of the experiment, were shown in Fig. 3.
For the left panel, the choice probabilities show that both groups
learned to differentiate the stimuli, as S1 and S3 were more likely
to be categorized as Response A than S2 and S4. In brief, there
were no significant differences in response proportions between
patients and controls (all p’s > .4). The right panel shows the reaction time quantiles collapsed across all responses. As the figure
shows, SZ patients were significantly slower than controls (median
reaction time: t(83) = 2.73, p = .008, 95% CI: [85.4, 542.3], Cohen’s
d = .599). The resulting BF was 5.8 in favor of the alternative, indicating moderate evidence for a between-group difference in RT.
These behavioral data were interpreted below through the use of
DDM parameters.
3.2. Computational results
An important consideration when using DDM to decompose
data is that the estimated parameter values are only interpretable
if the model successfully fits the data. To assess this, the predicted
data from the best fitting parameters is plotted against the observed
data in Fig. 4. The primary criterion for assessing model fit in this situation is assessing whether the model predictions matchup against
the observed behavioral data. The figure shows that the model
captured the choice probabilities and reaction time quantiles well,
supporting the interpretation of the estimated parameters. Further,
the best fitting X2 values did not differ between SZ patients and controls (p = .27), and were in the range of fitting values from similar
studies (e.g., [24]).
The DDM parameters were shown in Fig. 4 for SZ patients and
controls. All parameter comparisons used a simple t-test to assess
differences between patients and controls. The results show that
multiple decision components differed between the groups. Nondecision time was significantly slower for SZ patients (t(83) = 2.94,
p = .004, 95% CI: [.035,.180], Cohen’s d = .645), indicating slower
encoding and/or motor execution time. The resulting BF was 9.24 in
favor of the alternative, indicating moderate to strong evidence for
a between-group difference in non-decision time. Boundary separation was likewise larger for SZ patients (t(83) = 3.27, p = .002,
95% CI: [.013,.054], Cohen’s d = .718), indicating more cautious
speed/accuracy settings for the patients. The resulting BF was 20.5
in favor of the alternative, indicating very strong evidence for a
between-group difference in boundary separation. For the starting point measure (z/a), higher values indicate a response bias
for Response A. There was no significant difference in starting
point between the groups (t(83)=.446, p = .657, 95% CI: [−.018,.028],
Cohen’s d = .097). The resulting BF was 3.1 in favor of the null, indicating weak evidence for no between-group difference in response
bias.
The drift rates, which provide an index of how well participants learned to select the correct response, showed stronger
evidence (better performance) for the control group in comparison with SZ patients. The drift rate measure in Fig. 4 was given
as a discriminability measure separately for reward and punishment trials: larger values indicate a better ability to correctly match
the stimulus with the response (S1 goes with response A, S2 goes
with response B, etc.). A mixed-ANOVA was conducted with group
(SZ, control) as the between-subject factor and condition (reward,
punishment) as the within-subject factor. The ANOVA showed a
trend for the main effect of group, with lower drift rates in the
SZ patients (f(1,164) = 3.62, p = .059), but no main effect of condition (f(1,164) = .018, p = .89) nor an interaction (f(1,164) = 1.27,
p = .26). Planned comparisons of the drift rates showed significantly
lower drift rates for SZ patients than controls for punishment trials (t(83) = 2.19, p = 032, 95% CI: [.005, .101], Cohen’s d = .482). The
resulting BF was 2.02 for the alternative, indicating weak evidence
for a between-group difference in drift rates for punishment trials.
Conversely, drift rates on reward trials showed no reliable difference between SZ patients and controls (t(83)=.635, p = .527, 95% CI:
[−.054,.104], Cohen’s d = .139). The resulting BF was 2.82 in favor of
the null, indicating weak evidence for no between-group difference
in drift rates for reward trials.
4. Discussion
Overall the DDM analysis shows that multiple decision components differ between patients and controls. Patients with SZ had
152
A.A. Moustafa et al. / Behavioural Brain Research 291 (2015) 147–154
Fig. 3. Behavioral data from SZ patients and controls. Left panel shows response proportions for each of the four stimulus conditions. Right panel shows reaction time
quantiles for all responses. Error bars represent 95% confidence intervals.
slower encoding/motor time, more cautious speed/accuracy settings, and a relative deficit in learning to avoid the worse choice
for punishment trials. However, it should be noted that the evidence for the learning deficit in punishment trials was weak and
further studies will be needed to understand how robust it is.
Moreover, the group differences were strongest (as assessed by
BF) for the non-decision time component, suggesting that slower
encoding and motor time is the primary determinant of the
slower RTs in SZ patients. Overall, these results suggest a
multi-faceted profile of differences driving performance in this
reward/punishment learning task.
A notable finding from these results is that although SZ patients
did not have different response proportions compared to controls,
they had a significantly weaker drift rate from the DDM analysis
Fig. 4. DDM parameters averaged across participants. Error bars represent 95% confidence intervals. z/a refers to starting point measure.
A.A. Moustafa et al. / Behavioural Brain Research 291 (2015) 147–154
for punishment trials. This discrepancy is likely driven by the group
difference in response caution: SZ patients were significantly more
cautious in their responding, which leads to slower reaction time
but also higher accuracy. Thus, our results suggest that the increase
in caution might be a compensatory strategy to improve accuracy
at the expense of response speed. Importantly, once this difference
in response caution was controlled by the DDM analysis, a specific deficit in learning for punishment trials was observed in the
drift rates. These findings point to the importance of controlling
for differences in decision components that affect the behavioral
measures of choice probabilities and reaction time (see [25]).
This study is the first to apply DDM to behavioral data from SZ
patients in a probabilistic learning task. The model-based analysis has several advantages over traditional comparisons of reaction
time and accuracy. First, the model accounts for all of the behavioral data simultaneously, including accuracy values and reaction
time distributions for correct and error responses. Thus, the full set
of behavioral data is taken into consideration when estimating the
DDM parameters. Second, the model provides more specificity in
the analyses, as the values of the different decision components can
be compared separately, such as response caution (speed/accuracy
trade-off), response bias for one of the options (starting point),
average rate at which information accumulates (drift rate) and nondecision time (encoding/motor time). Finally, the model provides
more sensitivity to detect processing differences, as extraneous
effects of the other parameters (e.g., response caution is controlled
for when estimating the drift rates).
Using DDM, we found that SZ patients had slower encoding/motor time, were more cautious in responding, and had a
specific deficit in learning to avoid in the punishment trials in comparison to healthy control subjects. As mentioned above, increased
response caution for patients could be indicative of a compensatory
strategy whereby accuracy is improved at the expense of response
speed. This compensatory strategy allowed the patients to perform
comparably to controls in terms of the choice accuracy. However,
once this difference in caution is taken into account through DDM, a
deficit in learning to select the appropriate response on punishment
trials (but not reward trials) still remains. This suggests a relative
deficit in punishment learning associated with SZ.
The punishment learning deficit revealed in this study has
been reported previously [42]; however there is also one study,
which did not confirm these findings (Waltz et al.). Although the
deficit in drift rates for punishment trials was significant with
traditional frequentist analysis, the evidence for this deficit in
punishment learning was relatively weak based on the Bayesian
analysis (BF = 2.0). This weak effect might partially account for why
punishment learning deficits are detected in some studies but not
others. Future studies will be important for determining how reliable and robust this effect is. Interestingly, the model revealed
that this deficit is not necessarily caused by abnormal value estimation. Summerfield and Koechlin [39] showed that reward and
punishment valences bias the decision starting point of normal participants. Similarly, in this study participants’ starting point was
biased by reward, but SZ patients’ starting point did not differ significantly from controls. An important difference between the two
paradigms is that in the paradigm developed by Summerfield and
Koechlin, participants were explicitly told the value of each stimulus before the decision, whereas in the current study it had to be
learned. It appears that at least the implicit learning of value did not
appear to bias SZ patients’ decisions differently to healthy controls
in this study.
It is possible that our results are due to medication effect. It
is difficult to dissociate medication from disease effects [40], as
patients are often medicated in most cognitive studies, and medications used vary from a study to another. These can possibly
explain conflicting results in the literature. For example, Waltz et al.
153
[41] found that schizophrenia patients showed diminished reward
learning compared to controls. Some other studies have reported
intact learning from reward but impaired learning from punishment in schizophrenia patients [42]. Thus, it is possible that our
results are due to medications used rather than schizophrenia itself.
We found that longer non-decision time, larger boundaries and
slower drift rate all contribute to significantly slower reaction times
in SZ patients. These slower reaction times have been reported
across a number of different studies in SZ, thereby suggesting
a more general deficit [43,44]. Baving et al. [45] suggested that
patients with SZ show decision making impairment as they do
not retrieve information about the potential options, which would
perhaps account for the longer non-decision time, but might also
explain the slower drift rate. Given the evidence for reduced errorrelated negativity and response negativity in SZ patients [44], there
is a theme within the literature of a reduced evidence accumulation,
or at least a reduced “automatic retrieval” of this evidence for accumulation, which is in agreement with the suggestion made by Gold
et al. that decision making deficits in SZ patients is a result of being
unable to ‘fully represent’ the value of an outcome. Thus, SZ patients
may display a compensatory strategy of widening their boundaries
to allow more, slower, evidence accumulation, resulting in slower
reaction times in order to improve accuracy.
To our knowledge, this is the first study to apply DDM to behavioral data from schizophrenia patients. Applying DDM may enable
to provide a more detailed explanation of the nature of decision
making processes in SZ. In this study, SZ patients did not differ significantly from normal controls in the way their starting point was
biased by the reward associated with the decision, rather the deficit
appeared to manifest in accessing this information, at the evidence
accumulation phase and the action planning phase. The results of
this study emphasize the need for further research into this area
and an examination of how these cognitive deficits may interact
with the positive and negative symptoms of SZ. Especially, it has
been shown that sensitivity to feedback valence in the striatum is
predictive of negative symptom severity (such as avolition or anhedonia), rather than to a diagnosis of schizophrenia itself [46,47].
Moreover, an important modulator of decision speed is motivational salience, thus negative symptoms severity may additionally,
influence slower reaction times observed in schizophrenia patients
[48].
Future studies could benefit from employing the DDM approach
used in the present study. For example, it is unclear to what extent
SZ patients show processing deficits in non-learning tasks; testing
SZ patients in a perceptual discrimination task and using DDM to
analyze the data could shed more light on the profile of cognitive
processes associated with SZ. Future studies can address this question and a range of others by using the DDM approach from the
present study.
References
[1] Heinrichs RW, Zakzanis KK. Neurocognitive deficit in schizophrenia: a quantitative review of the evidence. Neuropsychology 1998;12(3):426–45.
[2] Kahn RS, Keefe RS. Schizophrenia is a cognitive illness: time for a change in
focus. JAMA Psychiatr 2013;70(10):1107–12.
[3] Green MF, Kern RS, Braff DL, Mintz J. Neurocognitive deficits and functional
outcome in schizophrenia: are we measuring the right stuff? Schizophr Bull
2000;26(1):119–36.
[4] Velligan DI, Bow-Thomas CC, Mahurin RK, Miller AL, Halgunseth LC. Do specific neurocognitive deficits predict specific domains of community function in
schizophrenia? J Nerv Ment Dis 2000;188(8):518–24.
[5] Brambilla P, Perlini C, Bellani M, Tomelleri L, Ferro A, Cerruti S, et al. Increased
salience of gains versus decreased associative learning differentiate bipolar
disorder from schizophrenia during incentive decision making. Psychol Med
2013;43(3):571–80.
[6] Kester HM, Sevy S, Yechiam E, Burdick KE, Cervellione KL, Kumra S. Decisionmaking impairments in adolescents with early-onset schizophrenia. Schizophr
Res 2006;85(1-3):113–23.
154
A.A. Moustafa et al. / Behavioural Brain Research 291 (2015) 147–154
[7] Lee Y, Kim YT, Seo E, Park O, Jeong SH, Kim SH, et al. Dissociation of emotional
decision-making from cognitive decision-making in chronic schizophrenia.
Psychiatr Res 2007;152(2-3):113–20.
[8] Simon JJ, Biller A, Walther S, Roesch-Ely D, Stippich C, Weisbrod M, et al. Neural
correlates of reward processing in schizophrenia—relationship to apathy and
depression. Schizophr Res 2010;118(1-3):154–61.
[9] Shurman B, Horan WP, Nuechterlein KH. Schizophrenia patients demonstrate a
distinctive pattern of decision-making impairment on the Iowa Gambling Task.
Schizophr Res 2005;72(2-3):215–24.
[10] Bechara A, Damasio AR, Damasio H, Anderson SW. Insensitivity to future
consequences following damage to human prefrontal cortex. Cognition
1994;50(1-3):7–15.
[11] Beninger RJ, Wasserman J, Zanibbi K, Charbonneau D, Mangels J, Beninger BV.
Typical and atypical antipsychotic medications differentially affect two nondeclarative memory tasks in schizophrenic patients: a double dissociation.
Schizophr Res 2003;61(2-3):281–92.
[12] Ritter LM, Meador-Woodruff JH, Dalack GW. Neurocognitive measures of prefrontal cortical dysfunction in schizophrenia. Schizophr Res 2004;68(1):65–73.
[13] Juckel G, Schlagenhauf F, Koslowski M, Filonov D, Wustenberg T, Villringer A,
et al. Dysfunction of ventral striatal reward prediction in schizophrenic patients
treated with typical, not atypical, neuroleptics. Psychopharmacology (Berl)
2006;187(2):222–8.
[14] Juckel G, Schlagenhauf F, Koslowski M, Wustenberg T, Villringer A, Knutson
B, et al. Dysfunction of ventral striatal reward prediction in schizophrenia.
Neuroimage 2006;29(2):409–16.
[15] Morris SE, Holroyd CB, Mann-Wrobel MC, Gold JM. Dissociation of response
and feedback negativity in schizophrenia: electrophysiological and computational evidence for a deficit in the representation of value. Front Hum Neurosci
2011;5:123.
[16] Polli FE, Barton JJ, Thakkar KN, Greve DN, Goff DC, Rauch SL, et al. Reduced
error-related activation in two anterior cingulate circuits is related to impaired
performance in schizophrenia. Brain 2008;131(Pt 4):971L 986.
[17] Waltz JA, Schweitzer JB, Gold JM, Kurup PK, Ross TJ, Salmeron BJ, et al.
Patients with schizophrenia have a reduced neural response to both unpredictable and predictable primary reinforcers. Neuropsychopharmacology
2009;34(6):1567–77.
[18] Dowd EC, Barch DM. Pavlovian reward prediction and receipt in schizophrenia:
relationship to anhedonia. PLoS One 2012;7(5):e35622.
[19] Gold JM, Waltz JA, Prentice KJ, Morris SE, Heerey EA. Reward processing
in schizophrenia: a deficit in the representation of value. Schizophr Bull
2008;34(5):835–47.
[20] Ratcliff R. A theory of memory retrieval. Psychol Rev 1978;85:59–108.
[21] Gomez P, Perea M. Decomposing encoding and decisional components in
visual-word recognition: a diffusion model analysis. Q J Exp Psychol (Hove)
2014;67(12):2455–66.
[22] Petrov AA, Van Horn NM, Ratcliff R. Dissociable perceptual-learning
mechanisms revealed by diffusion-model analysis. Psychon Bull Rev
2011;18(3):490–7.
[23] Pe ML, Vandekerckhove J, Kuppens P. A diffusion model account of the relationship between the emotional flanker task and rumination and depression.
Emotion 2013;13(4):739–47.
[24] White CN, Ratcliff R, Vasey MW, McKoon G. Anxiety enhances threat processing
without competition among multiple inputs: a diffusion model analysis. Emotion 2010;10(5):662–77.
[25] White CN, Ratcliff R, Vasey MW, McKoon G. Using diffusion models to understand clinical disorders. J Math Psychol 2010;54(1):39–52.
[26] American Psychiatric Association, DSM-IV: Diagnostic and Statistical Manual
of Mental Disorders, fourth ed. American Psychiatric Association, Washington,
DC, 1994.
[27] Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, et al.
The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development
and validation of a structured diagnostic psychiatric interview for DSM-IV and
ICD-10. J Clin Psychiatr 1998;59(Suppl 20):22–33, quiz 34-57.
[28] Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS)
for schizophrenia. Schizophr Bull 1987;13(2):261–76.
[29] Yip SW, Sacco KA, George TP, Potenza MN. Risk/reward decision-making in
schizophrenia: a preliminary examination of the influence of tobacco smoking
and relationship to Wisconsin Card Sorting Task performance. Schizophr Res
2009;110(1-3):156–64.
[30] Woods SW. Chlorpromazine equivalent doses for the newer atypical antipsychotics. J Clin Psychiatr 2003;64(6):663–7.
[31] Bodi N, Keri S, Nagy H, Moustafa A, Myers CE, Daw N, et al. Reward-learning and
the novelty-seeking personality: a between- and within-subjects study of the
effects of dopamine agonists on young Parkinson’s patients. Brain 2009;132(Pt
9):2385L 2395.
[32] Keri S, Moustafa AA, Myers CE, Benedek G, Gluck MA. {alpha}Synuclein gene duplication impairs reward learning. Proc Natl Acad Sci
2010;107(36):15992–4.
[33] Moustafa AA, Krishna R, Eissa AM, Hewedi DH. Factors underlying probabilistic and deterministic stimulus-response learning performance in
medicated and unmedicated patients with Parkinson’s disease. Neuropsychology 2013;27(4):498–510.
[34] Myers CE, Moustafa AA, Sheynin J, Vanmeenen KM, Gilbertson MW, Orr SP, et al.
Learning to obtain reward, but not avoid punishment, is affected by presence
of PTSD symptoms in male veterans: empirical data and computational model.
PLoS One 2013;8(8):e72508.
[35] Somlai Z, Moustafa AA, Keri S, Myers CE, Gluck MA. General functioning predicts
reward and punishment learning in schizophrenia. Schizophr Res 2011.
[36] Ratcliff R, Tuerlinckx F. Estimation of the parameters of the diffusion model:
approaches to dealing with contaminant reaction times and parameter variability. Psychon Bull Rev 2002;9:438–81.
[37] Nelder JA, Mead R. A simplex method for function minimization nelder. Computer J 1965:308–13.
[38] Ratcliff R, McKoon G. The diffusion decision model: theory and data for twochoice decision tasks. Neural Comput 2008;20(4):873–922.
[39] Summerfield C, Koechlin E. Economic value biases uncertain perceptual choices
in the parietal and prefrontal cortices. Front Hum Neurosci 2010;4:208.
[40] Foerde K, Poldrack RA, Khan BJ, Sabb FW, Bookheimer SY, Bilder RM, et al.
Selective corticostriatal dysfunction in schizophrenia: examination of motor
and cognitive skill learning. Neuropsychology 2008;22(1):100–9.
[41] Waltz JA, Frank MJ, Wiecki TV, Gold JM. Altered probabilistic learning and
response biases in schizophrenia: behavioral evidence and neurocomputational modeling. Neuropsychology 2011;25(1):86–97.
[42] Fervaha G, Agid O, Foussias G, Remington G. Impairments in both reward and
punishment guided reinforcement learning in schizophrenia. Schizophr Res
2013;150(2-3):592–3.
[43] Hutton SB, Murphy FC, Joyce EM, Rogers RD, Cuthbert I, Barnes TR, et al. Decision making deficits in patients with first-episode and chronic schizophrenia.
Schizophr Res 2002;55(3):249–57.
[44] Morris SE, Heerey EA, Gold JM, Holroyd CB. Learning-related changes in brain
activity following errors and performance feedback in schizophrenia. Schizophr
Res 2008;99(1-3):274–85.
[45] Baving L, Wagner M, Cohen R, Rockstroh B. Increased semantic and repetition priming in schizophrenic patients. J Abnorm Psychol 2001;110(1):
67–75.
[46] Waltz JA, Kasanova Z, Ross TJ, Salmeron BJ, McMahon RP, Gold JM, et al. The roles
of reward, default, and executive control networks in set-shifting impairments
in schizophrenia. PLoS One 2013;8(2):e57257.
[47] Waltz JA, Schweitzer JB, Ross TJ, Kurup PK, Salmeron BJ, Rose EJ, et al. Abnormal
responses to monetary outcomes in cortex, but not in the basal ganglia, in
schizophrenia. Neuropsychopharmacology 2010;35(12):2427–39.
[48] Avila I, Lin SC. Motivational salience signal in the basal forebrain is coupled with faster and more precise decision speed. PLoS Biol 2014;12(3):
e1001811.
[49] Midorikawa A, Hashimoto R, Noguchi H. Impairment of motor dexterity
in schizophrenia assessed by a novel finger movement test. Psychiat Res
2008;159(3):281–9, http://dx.doi.org/10.1016/j.psychres.2007.04.004.
[50] Morey RD, Rouder JN. Bayes factor approaches for testing interval null hypotheses. Psychol Methods 2011;16:406–19.