Gender differences in cognitive inhibition: Results from a meta

University of Essex
Department of Psychology
PS934 Research Project
IMPORTANT NOTE:
This document is a slightly modified version of a MSc thesis. It is the property of Espen
Arizelas Sjoberg. You are free to cite this document in your own research, but please alert
me if you do so ([email protected]). Papers for publishing based on this dissertation
are currently in the preparation stage. News on publications may be found at
iamalivep05.wordpress.com
Hope you find this paper useful!
Gender differences in cognitive inhibition: Results from a meta-analysis, a
negative priming Stroop task, and a stop-signal task
Espen A. Sjoberg
Supervisor: Geoff Cole
Date: 06/09/2013
Word count: 10,324
ABSTRACT
Gender differences in cognitive inhibition experiments have been largely overlooked in the
literature. An evolutionary hypothesis proposed that women should outperform men on
inhibition tasks due to a differential evolution of mating strategies. In the Stroop task,
however, it is believed that a possible female advantage may be due to superior verbal
abilities rather than inhibition abilities. To investigate this, Study 1 was a meta-analysis
conducted on the colour-word (incongruent) subtask of the Stroop test. A small (d = 0.12)
overall female advantage was found that persisted through different ages and cultures. It
was also found that the advantage increased with more accurate measurements. Study 2
was a negative priming Stroop task. A significant female advantage was found on both the
incongruent and negative priming trials, and a lack of an interaction suggests that females
outperformed males due to their increased verbal abilities. Study 3 was a stop-signal task,
where a significant female advantage in inhibition was found that could not be explained as
a speed/accuracy trade-off. Evidence suggests that gender differences are weak in cognitive
inhibition, except for the Stroop task where there is a small to moderate female advantage
best explained through superior verbal abilities in women. Evidence for an evolved
inhibition mechanism in women is weak outside of sexual contexts, though it may still be
present in tasks involving motor inhibition.
INTRODUCTION
Inhibition refers to the active or automatic suppression of a mental process or behaviour
(MacLeod, 2007). Sometimes this is investigated in social contexts, such as resisting
temptation, but usually inhibition is measured through a cognitive experiment where
conflicting mental processes are involved and at least one must be suppressed. Active
inhibition can be very difficult and is considered an executive function, primarily associated
with the prefrontal cortex in the brain (Fuster, 1984).
One aspect of inhibition that has rarely been investigated is gender differences. It is believed
that this is because in cognitive psychology gender differences are often not of interest
among many researchers (Halpern, 2000). In cognitive vision research less than 1% of
articles report gender data (Abramov, Gordon, Feldman, & Chavarga, 2012a). Possibly this is
because it is assumed that in lower-level cognitive tasks gender differences are too small to
have any impact on behaviour. Gender differences have been outlined in several cognitive
domains by Halpern (2000), but inhibition was not address in this review. Similarly,
Mitrushina, Boone, Razani, and D'elia (2005) mention several gender differences in cognitive
neuropsychological assessments, but do not talk about inhibition except the fact that the
Stroop task involves it. In their small (k = 10) meta-analysis on the Stroop task gender was
not investigated due to insufficient data. This speaks to the lack of research on gender
differences in cognitive inhibition, and a review of available studies is required.
Review of evidence of gender differences in inhibition tasks
Considering that many inhibition tasks are used for neurological assessments it is important
to establish if any gender advantages exist that may jeopardise the assessment of a patient.
There are a several methods available that assess cognitive inhibition in varying degrees,
such as thought suppression (Wegner, Schneider, Carter III, & White, 1987), the stop-signal
task (Logan & Cowan, 1984), the anti-saccade task (Guitton, Buchtel, & Douglas, 1985), the
Wisconsin Card Sorting Task (Berg, 1948), and the Stroop (1935) task.
The Wisconsin Card Sorting Task (WCST) involves sorting cards into categories following an
unstated rule which may change during the task. This requires inhibition to some degree
because participants must inhibit their previous strategies if suddenly told they are sorting
cards wrong. Two studies on children found no sex differences (Boone, Ghaffarian, Lesser,
Hill-Gutierrez, & Berman, 1993; Rosselli & Ardila, 1993), but one study on adults found a
female advantage (Paniak, Miller, Murphy, & Patterson, 1996).
In thought suppression experiments participants are told to not think about specific
thoughts, which ironically will usually increase the frequency of the thought (Wegner et al.,
1987). Two studies found a female advantage (Rassin, 2003; Wegner & Zanakos, 1994), but
another found no difference (Wegner, Shortt, Blake, & Page, 1990).
The stop-signal task (or go/no-go task) involves responding rapidly to a specific stimulus that
occur frequently, but withholding their response to another, less frequently occurring
stimuli. Four studies reporting gender data all found no difference in withheld responses (Li,
Huang, Constable, & Sinha, 2006; Li, Zhang, Duann, Yan, Sinha, & Mazure, 2009; Rucklidge &
Tannock, 2002; Thakkar, Congdon, Poldrack, Sabb, London, Cannon, & Bilder, in press).
In the anti-saccade task participants are presented with a flashed cue in their peripheral
vision and are told to look in the opposite direction. This is difficult because there is a strong
automatic tendency to look at the flashed cue (Roberts, Hager, & Heron, 1994). One study
found males to be slightly more accurate (Friedman, Miyake, Young, DeFries, Corley, &
Hewitt, 2008) while another found no difference (Luna, Garver, Urban, Lazar, & Sweeney,
2004).
In the colour-word subtask of the Stroop test participants have to name the ink colour of
incongruous colour-words (e.g. the word “red” written in blue ink). The general consensus
from several reviews is that gender differences do not exist in the colour-word subtask
(Bjorklund & Kipp, 1996; Jensen & Rohwer, 1966; Maccoby & Jacklin, 1974; MacLeod, 1991;
Mitrushina et al., 2005; Rovainen, 2011). However, multiple studies have been found that
report a significant female advantage (Baroun & Alansari, 2006; Davis, Jorgenson, Kritselis,
& Opella, 1981; Golden, 1974b; Peretti, 1969, 1971; Sarmany, 1977). This may suggest that
indeed there is a small female advantage that sometimes escapes significance in the Stroop
colour-word task. Regrettably, this has never been systematically investigated, but
considering the amount of studies available the Stroop task is an excellent candidate for a
meta-analysis.
In sum, gender differences in cognitive inhibition tasks appear to be weak. An interesting
observation is that when gender differences are found they are usually in favour of females.
However, it is difficult to draw conclusions on the subject because studies reporting gender
data were relatively rare.
The evolution of a female inhibition mechanism
Bjorklund and Kipp (1996) proposed that women have an innate inhibition mechanism that
should lead them to outperform men in inhibition tasks, especially if the task relates to sex
or reproduction. This should occur because females in most animal species are choosy when
selecting partners, and in turn this means that females inhibit potentially unsuitable
partners to ensure that the best possible genes go into the next generation.
The hypothesis (henceforth referred to as the evolved inhibition hypothesis) is an extension
of Trivers (1972) parental investment theory, which suggests that females and males have
evolved different mating strategies due to a differential investment in their offspring. In
most animal species the female pays a higher cost for having offspring, such as pregnancy
and birth. By contrast, the male investment in some species do not amount to more than
sperm contribution (Clutton-Brock, 2007). Because the females invest more in their
offspring it is in their advantage to evaluate several males and select the one who appears
to have the best genes (Janetos, 1980). In other words, females mate selectively, while
males mate indiscriminately.
Bjorklund and Kipp (1996) proposed that if females are choosy when selecting a mate it
would benefit them to inhibit their own behaviours when evaluating males: specifically, they
must inhibit any cues signalling sexual interest as well as avoid choosing a potentially
unsuitable mate.
Some indirect evidence exists suggesting that women show greater inhibition in contexts
related to sex and relationships. Women appear to shield their relationship more effectively
through behaviours such as avoiding thoughts or fantasies about other potential partners
(Beckmann, unpublished; Person, Terestman, Myers, Goldberg, & Salvadori, 1989). Women
also appear to inhibit their sexual arousal more effectively than men (Chivers, Seto,
Lalumiere, Laan, & Grimbos, 2010; Milhausen, Graham, Sanders, Yarber, & Maitland, 2010).
Even if they are implicitly aroused they show greater explicit inhibition compared to men
(Suschinsky, Lalumiere, & Chivers, 2009).
There appears to be several findings that support the hypothesis of a female inhibition
mechanism in social contexts related to reproduction. However, if women have evolved
such a mechanism one can make the suggestion that it should also be applicable to
cognitive inhibition. An evolved mechanism may give women an edge in performance over
men. Bjorklund and Kipp (1996) reviewed several cognitive inhibition paradigms, including
the Stroop task, but concluded that evidence was weak for a female advantage in cognition
and that the female inhibition advantage most likely only applies to reproductive contexts.
Present study
Out of the inhibition paradigms mentioned above, the Stroop task has the largest amount of
studies available with gender data and is therefore an ideal candidate for a meta-analysis.
The proposed meta-analysis will systematically investigate any gender difference across all
published Stroop studies that have available gender data, and also attempt to identify
variables that affect any such difference.
In addition, an extended version of the negative priming Stroop task will be conducted to
compare any results with findings from the meta-analysis. In the negative priming version of
the Stroop task the colour to-be-named in one trial is identical to the ignored colour in the
preceding trial (Neill, 1977). This is believed to isolate the inhibition mechanism (Tipper,
Bourque, Anderson, & Brehaut, 1989), making it ideal to investigate gender differences in
cognitive inhibition.
A third experiment will be a stop-signal task. This will investigate if gender differences exist
in an inhibition task with more of a motor component (clicking a button). According to
Bjorklund and Kipp (1996), the female inhibition advantage should be slightly higher in more
behavioural tasks that involve motor movement, presumably because there are fewer
conflicting mental processes.
Three competing hypotheses of gender differences in inhibition
Any female advantage observed in the Stroop task is sometimes attributed to superior
female verbal abilities (Lee et al., 2004), specifically that women can name colours faster
than men (Golden, 1974b; MacLeod, 1991; Seo, Lee, Choo, Kim, Kim, Youn, Jhoo, & Woo,
2008; Stroop, 1935). That women have better verbal abilities than men have been well
documented in both adults (Chipman & Kimura, 1998; Hyde & Linn, 1988) and children
(Burman, Bitan, & Booth, 2008). It is therefore possible that any observed female advantage
in the Stroop task is not due to greater inhibition abilities, but a greater ability to verbally
label and name colours. Support for this hypothesis would be to find a female advantage
across all age groups in the meta-analysis, and that gender differences are unchanged
between the incongruent and negative priming conditions in the proposed negative priming
Stroop task.
An alternative explanation is the already outlined evolved inhibition hypothesis (Bjorklund &
Kipp, 1996). If a female Stroop advantage is due to an innate female mechanism evolved for
mating purposes, we would expect that any female advantage is only present after puberty.
Having an advantage in pre-puberty would not be beneficial because the female cannot
become pregnant yet. Another prediction from this hypothesis is that any observed female
advantage should be greater in the negative priming version of the Stroop task compared to
the incongruent version.
A third hypothesis is proposed by Broverman, Klaiber, Kobayashi, and Vogel (1968), who
suggested that hormones play a negative role on tasks involving inhibition, and that the
different levels of androgen and estrogen in males and females would create a male
advantage in inhibition tasks such as the Stroop. Halpern (2000) disregards this hypothesis
because the physiological mechanisms it is based on are questionable. This hypothesis
would only be supported if an overall male advantage is found in the Stroop meta-analysis.
STUDY 1: META-ANALYSIS OF THE COLOUR-WORD STROOP SUBTASK
Previous research and justification
No meta-analysis has previously been conducted on any Stroop measure with gender in
mind. Of interest to the present study are gender differences in the colour-word subtask, or
incongruent condition (henceforth referred to as “CW”), as this involves inhibition. The
general consensus of the literature appears to be that gender differences do not exist or are
at least not an important factor in the Stroop CW test. The problem is that the conclusions
of such reviews are highly subjective and gender effects are rarely discussed in any great
detail. Once again this speaks to the scarcity of interest in gender differences on the task:
Izawa and Silver (1988) found that of the 192 Stroop papers they reviewed, only 14 (or 7%)
reported gender data.
Interference or colour-word?
It can be argued that measuring gender differences in Stroop interference is a more
appropriate way to measure inhibition differences. However, this measure will by itself give
no insight into gender differences in inhibition because performance is calculated based on
both the colour naming and CW task, and performance on either task may not be known. It
would be more appropriate to investigate whether a CW gender difference exists and which
variables are likely to affect this.
Stroop versions
Since the original version was proposed in 1935 several modifications have been made and
there are a variety of Stroop tests available. Most versions involve: 1) reading colour-words
in black ink, 2) naming colour of objects, and 3) incongruent colour naming. Some versions
include a congruent condition where the word and ink match (Graf, Uttl, & Tuokko, 1995),
or an incongruent condition where the word must be read rather the colour named (Dodrill,
1978; Stroop, 1935; Trenerry, Crosson, DeBoe, & Leber, 1989).
The setup in the CW condition is practically identical in all versions, except for number of
items used and response measurement. Most versions measure the time it takes to
complete a number of items, which is usually 100 items (Bohnen, Jolles, & Twijnstra, 1992;
Comalli, Wapner, & Werner, 1962; Daniel, 1972; Delis, Kaplan, & Kramer, 2001; Stroop,
1935). Alternatively, some versions measure the number of items named within a
timeframe, which is usually 45 seconds (Golden, 1978; Trenerry et al., 1989). Finally, some
studies use one-item trials where participants have to respond to a single incongruent trial
(Izawa & Silver, 1988; Jorgenson, Davis, Opella, & Angerstein, 1980; Laeng, Låg, & Brennen,
2005). There appears to be no uniform standard for this version of the Stroop task and the
setup varies slightly from study to study.
The main versions of the Stroop task have been summarised in Table 1. Note that some
studies make minor modifications of these versions for their own experiments, and these
are not listed.
METHOD
Literature search
Studies were found by searching with the keywords “Stroop” and “sex OR gender” in Google
Scholar, PsycARTICLES, and SAGE Journals. Studies were also found from the reference list
of retrieved papers. In addition, 22 authors who had previously conducted a Stroop task
were contacted for additional papers, published or unpublished. This yielded a response
rate of 15 (68%), and an additional 11 (two unpublished) papers were found.
Table 1: Summary of the different Stroop versions. W = reading colour words printed in black ink, CN = naming
colour of objects/items, CW = naming ink colour of colour words printed in incongruent colour, WC = reading
the colour words printed in incongruent colour, CG = naming ink colour of colour words printed in congruent
colour, CNW = naming ink colour of neutral words, SCW = same as CW except coloured triangles are around
some trials for added difficulty.
Version
Measurement
Trials
Comment
Single trial
Items per
trial
1
Reaction time
CW, WC
Bohnen
100
Reaction time
W, CN, CW,
SCW
Comalli
Daniel
100
100
Reaction time
Reaction time
W, CN, CW
W, CN, CW
Dodrill
Golden
Graf
Kaplan
Malayalam
Stroop
Trenerry
Victoria
176
100
27
100
40
100
112
24
Reaction time
No. of items
Reaction time
Reaction time
Reaction time
Reaction time
No. of items
Reaction time
WC, CW
W, CN, CW
W, CN, CG, CW
W, CN, CW
W, CN, CW
W, CN, WC, CW
WC, CW
CN, CNW, CW
Reaction time is the time from
stimulus onset to verbal
response. Number of trials varies
from study to study.
CN trial uses blocks
SCW is identical to CW trial
except some trials have coloured
rectangles around the word
CN trial uses rectangles
Participants wear headphones
and listen to colour names, 75%
of which are incongruent to the
trial.
CN trial uses “XXXX”
CN trial uses “XXXX”
CN trial uses blocks
CN trial uses squares
CN trial uses strips
CN trial uses dots
CNW trial involves naming the ink
colour of neutral words
Selection criteria
1) Study report performance on an incongruent CW component. Some studies reported
Stroop performance data without clearly specifying what this data refers to (e.g.
Amato et al., 2006), while others used an overall combined Stroop performance
measurement that gave no insight into CW performance (e.g. Mekarski, Cutmore, &
Suboski, 1996). Several modified Stroop tasks exist that do not employ a CW
condition and these were not included.
2) Healthy participants. Several studies were found that used the Stroop task for
assessing cognitive functioning in neuropsychological patients, for example those
suffering with multiple sclerosis, ADHD, or schizophrenia. These were unsuitable for
comparison, but if healthy controls were used as a comparative group, the data from
these controls were used.
3) Studies could not employ experimental manipulations that would affect data. If a
study used experimental conditions that were designed to alter Stroop performance
(e.g. making participants anxious or giving them alcohol), then the data could not be
included. However, in cases where participants were grouped based on
characteristics such as education or intelligence, it was suitable to combine the
groups into one effect size.
4) Repetition trials were not included. A study by Connor, Franzen, and Sharp (1988)
found that practice or experience with Stroop tasks will not significantly alter sex
differences in Stroop performance, but including repetition trials will inflate the total
N of the sample. This will exaggerate any gender differences found. Similarly, studies
were excluded if they tested the same sample used in a different study, with one
exception: van der Elst, Molenberghs, van Boxtel, and Jolles (in press) tested a subsample from the van der Elst et al. (2006) sample 12 years after the original testing
phase, and it was elected to be included.
This yielded a total amount of 115 studies that fit the selection criteria. However, the
majority of these studies did not report adequate data or information to allow effect sizes to
be calculated, and the total number of effect sizes available through calculation was 88. In
all cases of missing information an attempt was made to contact the authors to ask for
additional data. 13 authors were able to accommodate the request, and this generated an
additional 38 effect sizes.
Following the selection criteria, the total number of effect sizes to be used in the analysis
was 126 across 60 studies, which included two unpublished studies, one Master thesis, and
one PhD thesis. Total N = 21314; 9853 of which were males and 11382 of which were
female, across 21 countries, with an age range of 7-92. All of the included studies are listed
in Table 2.
One study by Lord and Taylor (1991) met the selection criteria, but reported data that
yielded effect sizes as high as d = 28.86 in favour of females! As this is likely a
methodological error, the study was treated as an outlier and excluded.
Table 2: Studies with effect size data included in the present study. A positive value signifies a female
advantage, and a negative value shows a male advantage. M = male participants, F = female participants, UG
= undergraduates, d = effect size Cohen’s d. Note that in the “Age” column the mean age is reported where
possible.
Study
Stroop
version
Country
Age
M
F
d
Afsaneh et al. (2012)
Alansari and Baroun (2004)
Single trial
Comalli
Comalli
Golden
Iran
United Kingdom
Kuwait
Saudi Arabia
30
21
21
16-65
31
36
60
99
47
34
80
99
-0.11
0.28
-0.12
0.07
Golden
Saudi Arabia
32
5
5
0.06
Golden
Trenerry
South Africa
Australia
28
79
12
52
21
317
0.03
0.17
Malayalam
Malayalam
Malayalam
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Comalli
Stroop
India
India
India
Italy
Italy
Italy
Italy
Italy
Italy
Italy
Kuwait
Greece
21-25
10-12
10-12
49
19-29
30-39
40-49
50-59
60-69
70-81
21
26
110
32
33
87
15
16
15
18
18
15
122
44
98
35
32
122
15
15
19
27
36
5
382
46
-0.76*
0.37
-0.41
0.17*
0.99*
0.54
-0.20
0.40
0.57
0.39
0.27*
0.09
Al-Ghatani, Obonsawin, Binshaig, and
Al-Moutaery (2011)
Al-Ghatani, Obonsawin, and AlMoutaery (2010)
Andrews (2009)
Anstey, Matters, Brown, and Lord
(2000)
Asha (1989)
Asha (1991)
Barbarotto et al. (1998)
Baroun and Alansari (2006)
Beratis, Rabavilas, Papadimitriou, and
Papageorgiou (2010)
Table 2: continued
Study
Stroop
version
Country
Age
M
F
d
Bettner et al. (1971)
Buck, Hillman, and Castelli (2008)
Christiansen and Oades (2010)
Cohen and Fischer (1980)
Comalli
Golden
Single trial
Custom
Custom
Custom
Custom
Custom
Custom
Golden
Golden
Golden
Golden
Trenerry
Trenerry
Trenerry
Single trial
Stroop
United States
United States
Germany
Germany
Germany
Germany
Germany
Germany
Germany
United States
United States
United States
United States
United States
United States
United States
United States
Netherlands
77-89
9
11
7
9
9
10
11
12
18-25
8
10
20
10
14
20
UG
24-64
8
41
15
12
12
12
12
12
12
17
20
20
20
12
12
12
15
290
16
33
22
12
12
12
12
12
12
23
20
20
20
12
12
12
38
157
0.53*
0.44*
0.42
-0.06
0.73
0.28
0.16
0.14
0.76
0.09
0.68
0.00
-0.14
0.24
0.15
1.24*
0.65*
0.42*
Golden
Golden
Stroop
Golden
Single trial
Single trial
Kaplan
Kaplan
Kaplan
Single trial
Victoria
Victoria
Victoria
Golden
Golden
Golden
Golden
Golden
Golden
Golden
Golden
Golden
Golden
Golden
Golden
Golden
Bohnen
Comalli
Comalli
Stroop
Stroop
Golden
Portugal
United States
Sweden
United States
United States
United States
United States
United States
United States
Norway
Hong Kong
Hong Kong
Hong Kong
Spain
Spain
Spain
Spain
Spain
Spain
Spain
Spain
Spain
United States
United States
United States
United States
South Africa
United States
United States
India
India
Spain
13
20
30
34
UG
18-24
50-64
65-74
75-89
23
12
14
16
55-61
62-64
65-67
68-70
71-73
74-76
77-79
80-82
82+
56-68
19
65
77
6-8
9-10
12-14
8-9
10-11
50-80
646
102
24
28
32
51
24
39
27
82
51
50
63
472
473
441
387
324
303
287
219
136
79
19
54
54
44
19
11
18
18
134
776
117
24
63
32
73
19
20
24
100
50
55
62
565
542
505
458
393
347
299
211
126
224
37
57
71
58
31
20
18
18
210
0.12
0.27*
-0.36
0.05
0.48
0.49*
0.32
0.10
0.27
0.56*
0.16
0.05
0.02
0.04
-0.03
-0.02
-0.07
-0.11
-0.08
-0.10
-0.04
0.02
0.32
0.06
0.49*
0.28
0.34
0.19
0.06
0.93*
-0.10
0.15
Connor et al. (1988)
Daniel, Pelotte, and Lewis (2000)
Davies and Rose (1999)
Davis et al. (1981)
de Grip, Bosma, Willems, and Van
Boxtel (2008)
Esgalhado and Pereira (2012)
Golden (1974b)
Gustafson and Källmén (1990)
Insua (2001)
Izawa and Silver (1988)
Jorgenson et al. (1980)
Kang et al. (2013)
Laeng et al. (2005)
Lee, Yuen, and Chan (2002)
Llinàs-Reglà, Vilalta-Franch,
López-Pousa, Calvó-Perxas,
and Garre-Olmo (2013)
Lucas et al. (2005)
Martin and Franzen (1989)
Moering, Schinka, Mortimer,
and Graves (2004)
Oosthuizen and Phipps (2012)
Panek, Rush, and Slade (1984)
Pati and Dash (1990)
Peña-Casanova et al. (2009)
Table 2: continued
Study
Stroop
version
Country
Age
M
F
d
Peretti (1969)
Custom
Custom
Custom
Custom
Golden
Single trial
United States
United States
United States
United States
Spain
United Kingdom
11-13
14-16
17-20
17-21
35
28
50
50
50
25
65
24
50
50
50
25
114
24
0.21
0.25
0.26
0.85*
0.17
1.15
Daniel
Golden
Stroop
Stroop
Daniel
Daniel
Stroop
Slovakia
Korea
India
India
Slovakia
Slovakia
Netherlands
22
71
18
20
Adults
12
12
35
208
25
25
10
47
31
22
356
25
25
30
47
37
0.89*
0.29*
-0.02
1.24*
1.11*
0.13
0.05
Comalli
United States
30
15
27
0.43*
Stroop
Stroop
Golden
United States
United States
United States
UG
UG
Adults
29
17
35
71
15
38
0.18
0.51
0.34
Victoria
Victoria
Victoria
Victoria
Victoria
Victoria
Victoria
Stroop
Canada
Canada
Canada
Canada
Canada
Canada
Canada
Netherlands
20-29
30-39
40-49
50-59
60-69
70-79
80-89
65
20
15
10
12
12
24
15
424
20
9
8
24
43
37
23
397
0.02
0.94
-0.15
0.24
0.35
0.47
0.12
0.14*
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Stroop
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
Netherlands
24-26
29-31
34-36
39-41
44-46
49-51
54-56
59-61
64-66
69-71
74-76
79-81
32-37
37-42
42-47
47-52
52-57
57-62
62-67
67-72
72-77
78
79
78
77
78
79
81
81
76
81
77
30
35
54
61
69
68
66
50
56
49
81
79
87
80
81
80
88
75
77
73
80
29
33
42
63
62
65
68
62
45
62
0.39*
0.28
-0.07
0.41*
0.68*
0.28
0.23
0.10
0.25
0.22
0.40*
-0.08
-0.18
0.50*
0.21
-0.11
0.38*
0.24
0.02
0.14
0.09
Peretti (1971)
Rognoni et al. (2013)
Sanders, Riggs, Simpson, and Davies
(unpublished)
Sarmany (1977)
Seo et al. (2008)
Singh (1991)
Sladekova and Daniels (1981)
Stins, Polderman, Boomsma, and de
Geus (2005)
Strickland, D'elia, James, and Stein
(1997)
Stroop (1935)
Swerdlow, Filion, Geyer, and Braff
(1995)
Troyer, Leach, and Strauss (2006)
van Boxtel, ten Tusscher, Metsemakers,
Willems, and Jolles (2001)
van der Elst et al. (2006)
van der Elst et al. (in press)
Table 2: continued
Study
Stroop
version
Country
Age
M
F
d
van der Elst et al. (in press)
Stroop
Stroop
Stroop
Comalli
Golden
Stroop
Comalli
Custom
Custom
Comalli
Trenerry
k = 126
Netherlands
Netherlands
Netherlands
Netherlands
Canada
Norway
Denmark
United States
United States
United States
Greece
77-82
82-87
87-92
85
22
34
71
UG
7-8
6-12
54
N:
36
12
2
294
23
182
44
59
45
68
337
9853
37
24
8
161
16
158
56
69
42
62
268
11382
0.05
0.73
0.26
0.23*
0.12
0.30*
0.02
-1.02*
0.47*
0.60*
0.02
van Exel et al. (2001)
Vanier (unpublished)
Vaskinn et al. (2011)
Vogel, Stokholm, and Jørgensen (2013)
Von Kluge (1992)
Wolf and Gow (1986)
Wolff et al. (1983)
Zalonis et al. (2009)
* significant gender difference
Analysis procedure
Cohen’s d was used as a measure of effect size as this is preferable when looking at
differences between groups, such as gender differences (Ellis, 2010; Hyde, 1990). In our
study Cohen’s d measures the standardized difference between the means of males and
females. The effect sizes were calculated based on reported means and standard deviations
as outlined in Cohen (1969), or by converting the relevant t, F, x2, p, or r statistic based on
formulas provided by Lipsey and Wilson (2001)1. A positive effect size reflected a female
advantage on the task (such as faster reaction time or more items read compared to males),
and a negative value signified a male advantage.
The meta-analytic procedure follows the method presented in Howell (2013), which allowed
the calculation of a weighted d that takes into account the sample size of both males and
females in the studies. Lipsey and Wilson (2001) outlines how to convert this effect size into
a z-score that can be checked for significance. In addition, the homogeneity of effect sizes
(called Q, but reported as x2) will be calculated in order to investigate whether a sample of
1
Dr. Jared DeFife is to be thanked for the use of his Excel Effect Size calculator which made conversions into d
much easier.
effect sizes were drawn from the same sample population. If a sample is homogenous it can
be concluded that the experiments are in effect replications of each other. In the event that
a sample is found to be heterogeneous the sample must be partitioned into sub-samples in
order to further investigate what variables influence performance.
As outlined in Linn and Petersen (1985), achieving homogeneity in a meta-analysis can
sometimes be difficult and achieving near-homogeneity may be more appropriate.
Therefore the criterion set out in Voyer, Postma, Brake, and Imperato-McGinley (2007) was
followed: if a homogeneity analysis is significant at the p = .05 level but not at the p = .005
level, then for practical purposes the sample can be considered near-homogenous and
further partitioning is not necessary.
A fail-safe analysis will also be conducted to investigate a potential publication bias in the
literature where non-significant results are not published, called the file drawer problem.
This calculation gives insight into how many studies with non-significant results must be
conducted for the weighted d to become zero. The method employed is outlined in
Rosenthal (1979). If an effect size is found to be resistant to the file drawer problem it
suggests that a publication bias has not occurred.
Additional analyses
The data from the meta-analysis provides a unique opportunity to investigate certain
variables of the Stroop task, and so some additional exploratory analyses will take place.
First, the effect of age will be investigated as it has previously been found that performance
decrease with age (Ben-David & Schneider, 2009). Second, the CW data will be partitioned
into Stroop version to see if gender differences vary depending on which version of the
Stroop task is administered. Third, effect sizes from different cultures will be compared.
Fourth, a separate, smaller meta-analysis will be conducted on gender differences in the
negative priming CW Stroop task. Because these analyses are exploratory they will not be
partitioned further should they be heterogeneous unless it is relevant to the main analysis.
RESULTS
Main analysis: Gender differences in CW Stroop
The analysis of the 126 effect sizes revealed a mean weighted effect size of d = 0.121 (z =
8.577, p <.0001), showing an overall female advantage in the Stroop CW task, considered a
small effect size. The effect size was also resistant to the file drawer problem. However, the
effect size was not homogenous, x2(125) = 303.714, p <.0001. Thus, the effect sizes in the
sample were not drawn from the same population, and further partitioning is required.
The effect sizes were partitioned into four age categories: under 13, 13-18, 19-64, and 65
years and older. If a study only reported only the age range then the median age of the
range was used. Studies with an “undergraduate” or “adult” sample were put in the 19-64
age category. The age of the elderly category was based on the average retirement age of
countries included.
The partitioning resulted in significant between-group heterogeneity, x2(3) = 16.124, p
<.0001, indicating a significant relation between age and gender on CW performance. The
results of this partitioning is summarised in Table 3. All groups showed a significant female
advantage, but only the younger than 13 group and the 13-18 group achieved homogeneity.
Thus, the adult and the elderly group (19-64) required further partitioning.
Table 3: Summary of meta-analytic statistics of gender differences in the CW Stroop subtask as a function of
age category
Age category
k
Weighted effect
size (d) with 95% CI
0.121 (0.093-0.149)
0.272 (0.161-0.387)
0.120 (0.031-0.208)
0.153 (0.112-0.195)
0.063 (0.019-0.107)
Overall
126
Younger than 13
23
13-18
9
19-64
63
Older than 64
31
* p <.05
† homogeneity achieved
Fail-safe
Test of
Homogeneity
number
significance (Z)
statistic (x2)
4051 ғ
8.577*
303.714*
0
4.793*
24.941†
65 ғ
2.652*
14.777†
687 ғ
7.196*
193.699*
414 ғ
2.827*
54.328*
ғ resistant to the file drawer problem

The adult sample was partitioned by Stroop version used in the study. All modifications that
did not fit into any known Stroop version (Table 1), as well as versions only occurring once
(like Bohnen), were classified as “Other/Custom”. The partitioning is summarised in Table 4.
The partitioning resulted in significant between-group heterogeneity, x2(7) = 102.105, p
<.0001, indicating a significant relationship between adulthood and Stroop version on CW
performance. Except for the Trenerry version, all Stroop versions showed a significant
female advantage and all achieved homogeneity or near-homogeneity. The Other/Custom
category found a large male advantage, though not surprisingly this was heterogeneous.
Table 4: Summary of meta-analytic statistics of gender differences in the CW Stroop subtask for different
Stroop versions in the adult age category (ages 19-64).
Stroop version
k
Weighted effect size (d)
with 95% CI
0.501 (0.323-0.679)
0.208 (0.049-0.368)
0.982 (0.504-1.459)
0.075 (0.006-0.145)
0.278 (0.210-0.346)
0.032 (-0.126-0.189)
0.278 (-0.049-0.604)
-0.727 (-0.945-(-0.509))
Single trial
6
Comalli
4
Daniel
2
Golden
14
Stroop
27
Trenerry
2
Victoria
5
Other/Custom
3
* p <.05
† homogeneity achieved
‡ near-homogeneity achieved
Fail-safe
Test of
Homogeneity
number
significance (Z)
statistic (x2)
0
5.527*
11.004†
0
2.556*
4.419†
0
4.029*
0.172†
381 ғ
2.125*
10.689†
43
8.059*
47.655‡
6
0.393
0.516†
66 ғ
1.668
3.641†
0
-6.537*
13.496*
ғ resistant to the file drawer problem
The elderly sample was also partitioned into Stroop version used in the study. The
partitioning is summarised in Table 5. The partitioning resulted in a significant betweengroup heterogeneity, x2(3) = 16.791, p <.001, indicating a significant relationship between
old age and Stroop version on CW performance. The Stroop and Comalli category found a
significant female advantage and achieved homogeneity, while the Golden version did not
reach significance.
Table 5: Summary of meta-analytic statistics of gender differences in the CW Stroop subtask for different
Stroop versions in the elderly age category (65 years and older).
Stroop version
k
Comalli
11
Golden
11
Stroop
4
Other/Custom
5
* p <.05
† homogeneity achieved
Weighted effect size
(d) with 95% CI
0.165 (0.058-0.272)
0.005 (-0.048-0.058)
0.241 (0.109-0.373)
0.202 (-0.002-0.405)
Fail-safe
Test of
Homogeneity
number
significance (Z)
statistic (x2)
40 ғ
3.027*
1.800†
0
0.200
27.388*
33
3.583*
6.935†
12
1.939
1.257†
ғ resistant to the file drawer problem
Additional analysis: Cross-cultural
Only 8 studies reported the ethnicity of the sample and it was therefore more convenient to
group the effect sizes into four continents: North America, Europe, Asia, and Africa (no
studies were available from South America). The results are summarised in Table 6.
A significant female advantage was found in the North American, European, and Asian
samples, but none of them achieved homogeneity.
Table 6: Summary of meta-analytic statistics of gender differences in the CW Stroop subtask as a function of
Continental participant sample.
Continent
k
Africa
2
Asia
17
Europe
65
North America
42
* p <.05
† homogeneity achieved
Weighted effect size
(d) with 95% CI
0.261 (-0.086-0.608)
0.114 (0.032-0.197)
0.107 (0.075-0.139)
0.313 (0.237-0.390)
Fail-safe
Test of
Homogeneity
number
significance (Z)
statistic (x2)
33 ғ
1.475
0.529†
0
2.714*
66.859*
640 ғ
6.573*
144.804*
951 ғ
8.023*
97.107*
ғ resistant to the file drawer problem
Additional analysis: Stroop version
The effect sizes were grouped in versions of the Stroop type as outlined in Table 1 (no
studies used the Graf or Dodrill version). Some were classified into the appropriate version
based on the description of the CW condition. For instance, Alansari and Baroun (2004) used
the Golden version, but is better classified as Comalli because they measured reaction time
rather than number of items named.
The results are shown in Table 7. All of the different versions found a significant female
advantage, except for the Kaplan, Trenerry, and Victoria versions. The Malayalam version
found a significant male advantage, but this was not homogenous. All of the other
significant advantages found achieved homogeneity or near-homogeneity, aside from
Other/Custom category.
Table 7: Summary of meta-analytic statistics of gender differences in the CW Stroop subtask as a function of
Stroop version used.
Stroop version
k
Weighted effect size
(d) with 95% CI
0.476 (0.304-0.648)
0.231 (0.123-0.338)
0.478 (0.169-0.787)
0.043 (0.005-0.082)
0.222 (-0.104-0.548)
-0.455 (-0.678(-0.232))
0.234 (0.180-0.288)
0.030 (-0.105-0.166)
0.161 (-0.004-0.326)
0.160 (0.017-0.303)
Single trial
7
Comalli
10
Daniel
3
Golden
29
Kaplan
3
Malayalam
3
Stroop
43
Trenerry
5
Victoria
10
Other/Custom
13
* p <.05
† homogeneity achieved
‡ near-homogeneity achieved
Fail-safe
Test of
Homogeneity
number
significance (Z)
statistic (x2)
0
5.433*
11.038†
107 ғ
4.197*
10.515†
0
3.034*
6.839‡
378 ғ
2.190*
49.319‡
3
1.333
0.325†
0
-3.999*
15.323*
46
8.488*
63.294‡
0
0.436
8.653†
265 ғ
1.909
6.119†
0
2.193*
49.967*
ғ resistant to the file drawer problem
Additional meta-analysis: Negative priming
Studies on negative priming Stroop tasks that included gender differences were rare. Only
three studies reporting gender data from healthy participants were identified (Christiansen
& Oades, 2010; Steel, Hemsley, & Jones, 1996; Visser, Das-Smaal, & Kwakman, 1996). A
forth unpublished study was identified but the author reported the data lost (Harnishfeger,
Pope, & Kirijan, 1995).
The analysis found a weighted effect size d = 0.386 (0.148-0.625), though this was not
significant, z = 1.702, p = .09, nor resistant to the file drawer problem.
Table 8: Studies with gender effect size data employing a negative priming version of Stroop. A positive value
signifies a female advantage, and a negative value shows a male advantage. M = male participants, F = female
participants, d = effect size Cohen’s d.
Study
Stroop
version
Country
Mean
age
M
F
d
Christiansen and Oades (2010)
Steel, Hemsley, and Jones (1996)
Visser, Das-Smaal, and Kwakman (1996)
* significant gender difference
Single trial
Single trial
Stroop
k=3
Germany
United Kingdom
Netherlands
11
28
10
15
19
98
132
22
17
112
151
0.48
0.51
0.35*
N:
DISCUSSION
A significant female advantage was found on the CW subtask, which persisted across age
groups, cultures, and Stroop versions. However, the female advantage appeared to vary by
age, actually showing the largest gender difference in pre-puberty. This is contrary to the
evolved inhibition hypothesis (Bjorklund & Kipp, 1996) where we would expect a significant
female advantage to only occur after puberty when pregnancy becomes possible. It may
therefore seem more likely that the female advantage is due to increased verbal abilities.
That girls develop language and verbal abilities earlier than boys supports this notion
(Burman et al., 2008).
Investigating the effects of Stroop versions suggested that the female advantage largely
depended on which Stroop version was used. Specifically, the more detail in the
measurement the bigger the difference becomes. The Golden and Trenerry versions, which
measure number of correctly named colours, showed practically no gender difference. By
contrast, the Stroop, Comalli, Daniel, Kaplan, and Victoria versions record reaction times,
which is arguably a more accurate measurement, and effect sizes varied from d = 0.161 to d
= 0.478, though the Kaplan and Victoria version did not reach significance. An interesting
observation is that the effect size practically not differ between the Stroop, Comalli, and
Kaplan versions (d = 0.234, 0.231, and 0.222, respectively), which is to be expected because
the CW condition is identical in all three versions. Finally, the single trial version where only
one word is presented at a time showed the largest significant female advantage (d = 0.476),
presumably because this measurement is even more accurate, measuring milliseconds.
The investigation of cross-cultural effects showed a significant female advantage across
North America, Europe, and Asia. These did not reach homogeneity, most likely due to
different Stroop versions used. Alternatively it may be due to the different languages
employed. An additional analysis based on language is not ideal because most studies came
from the US, Netherlands, or India. An analysis based on ethnicity would be ideal, but this
was rarely reported.
Finally, the negative priming analysis showed a female advantage, though this was not
significant as it was based on only three studies.
STUDY 2: NEGATIVE PRIMING STROOP TASK
Previous research and justification
The negative priming version of the Stroop task was first used by Neill (1977). In this version
of the task the colour to-be-named on one trial is identical to the ignored colour in the
preceding trial. This creates what Tipper (1985) called the negative priming effect, and this
slows down response speed because the task adds an additional interference process (Neill
& Westberry, 1987).
Because the negative priming trials isolates the inhibition mechanism (Tipper et al., 1989),
comparing performance on CW and negative priming trials should give a more accurate
measure of inhibition. Such a comparison has never been reported elsewhere with gender
differences in mind. If this negative priming interference difference is greater in men than in
women this would support the evolved inhibition hypothesis (Bjorklund & Kipp, 1996), but if
no difference is found it seems more likely that the female advantage is due to superior
verbal abilities in women.
As stated in Study 1, only three studies reported gender data on healthy participants using
the negative priming version (Christiansen & Oades, 2010; Steel et al., 1996; Visser et al.,
1996). Only Steel et al. (1996) tested an adult population, and found a non-significant
female advantage with a moderate effect size (d = 0.51).
In order to increase accuracy the Stroop task will be expanded to include more trials than
are used traditionally. Most Stroop tasks (see Table 1) only have one trial consisting of 20100 items. A new version is proposed, consisting of 900 items spread over 30 trials using
both the colour-word condition and the negative priming condition, allowing the calculation
of a more accurate average performance. Errors will also be recorded to investigate if any
gender difference found is due to making more uncorrected errors.
METHOD
Participants
Participants were 64 adults ranging in age of 18-73. There were 32 males (mean age 33.2)
and 32 females (mean age 30.8). Males and females did not significantly differ in age, t (62)
= 0.645, p =.51 and can be considered homogenous.
Apparatus
The Stroop stimuli were presented using Microsoft PowerPoint. Reaction time was
measured using a stopwatch.
Material
A trial consisted of a PowerPoint slide with 30 colour-words presented, where the word was
printed in a conflicting ink colour (e.g. the word “red” written in blue ink). The colours RED,
BLUE, GREEN, WHITE, and BROWN were used. The words were in the font Times New
Roman, size 46. Each word appeared five times each in every trial, and each ink colour six
times. For every created trial, the order of the words was randomised, as was the ink colour.
However, after this randomisation every trial was manually corrected to ensure that there
were by chance no congruent trials and that no colour-word combinations or ink colours
were being repeated in a row.
A total of 30 trials were created (total of 900 items), which were split into two types: the
colour-word (CW) trials, and the negative priming (NP) trials. These were identical except
with one respect: in the NP trials the colour to-be-named in one trial was identical to the
colour to-be-ignored in the previous trial (excluding the first trial). An example of each trial
is shown in the Appendix. There were 15 CW trials and 15 NP trials. The 15 trials showed
high reliability with both CW,  = .977, and NP,  = .980.
Design
The experiment was a 2 x 2 mixed design with sex as the between-subjects variable and
Stroop type as the within-subjects variable. For the Stroop type there were two conditions:
CW and NP. The dependent variables were the time taken to name all the colours in a trial
(measured in seconds and centiseconds), number of errors made but corrected by the
participant (corrections), and number of errors made that were not corrected (full errors).
There were a total of 30 trials, 15 CW trials and 15 NP trials. The order of the trials was
randomised for every 16th participant.
Procedure
Prior to the experiment participants indicated by self-reported that they had no colour
vision deficits. The experiment was presented on a computer, and the onset of the trials was
controlled by the experimenter.
Participants were instructed to name the ink colour of the printed words as quickly and as
accurately as possible. As practice they were given 10 CW items. If they realised they made
an error they were told to correct themselves. Between every trial the experimenter would
note the completion time before starting the next trial. Between every 10 trials was a short
break, lasting approximately 1 minute.
RESULTS
One male participant was removed from further analysis because he was unable to
distinguishing between green and brown, despite insisting that he could clearly tell the
difference when prompted. For the remaining 63 participants a mean reaction time was
calculated for each participant for the CW and NP trials, and these are summarised in Figure
1. A 2 x 2 mixed ANOVA, with gender as the between-subjects variable and Stroop type as
the within-subjects variable, found a main effect of Stroop type, F (1, 61) = 154.138, p <.001,
p2 = .716, suggesting that participants performed worse in the NP condition. A main effect
of gender was also found, F (1, 61) = 4.162, p < .05, p2 = .064, suggesting an overall female
performance. Follow-up two-tailed independent samples t-tests revealed a significant
female advantage in the CW condition, t (61) = 2.005, p < .05, d = 0.505, and in the NP
condition, t (61) = 1.981, p < .05, d = 0.512. No significant interaction was found between
sex and Stroop condition, F (1, 60) = 0.330, p = .568, p2 = .005.
In terms of errors, the mean (standard deviation) of corrected errors was 11.1 (9.2) for
males and 9.6 (5.7) for females. For full errors males made 3.1 (2.6) errors and females
made 2.9 (2.9).
A 2 x 2 mixed ANOVA, with sex as the between-subjects variable and error type as the
within-subjects variable found a main effect of error type, F (1,61) = 70.749, p < .001, p2 =
.537, showing that participants made more corrections than full errors. No main effect of
sex was found, F (1,61) = 0.605, p = .440, p2 = .10, nor any significant interaction, F (1,61) =
0.540, p = .465, p2 = .009.
Reaction time in seconds
33.00
32.00
31.00
30.00
29.00
28.00
Male
27.00
Female
26.00
25.00
24.00
Colour-Word
Negative Priming
Stroop condition
Figure 1: Mean reaction time to a 30-item trial grouped by Colour-Word (CW) and Negative Priming (NP) trials.
The mean (standard deviation) scores for males was 29.43 (7.52) for CW and 32.45 (8.13) for NP. For females it
was 26.36 (4.16) for CW and 29.23 (4.44) for NP.
Age correlated significantly with performance in both the CW condition, r = .386, p <.002,
and the NP condition, r = .299, p <.02, suggesting that performance decreased (higher RT) as
age increased. Age did not correlate with number of corrections, r = .137, p = .286, or full
errors, r = -.039, p = .759.
DISCUSSION
A significant female advantage was found in both CW and NP conditions, equivalent to a
moderate effect size. The observed NP effect size was almost identical to the effect size
found by Steel et al. (1996), which is the only other study to report negative priming gender
data from an adult sample. That no differences were found in number of errors also
suggests that the overall female advantage was not due to a speed/accuracy trade-off
where women were faster because they either made less corrections or more full errors.
As predicted, the NP condition was harder than the CW condition, with participants being
overall three seconds slower. The absence of a significant interaction suggests that women
and men suffered equally in the NP condition compared to the CW condition. This does not
support evolved inhibition hypothesis (Bjorklund & Kipp, 1996). Most likely men and women
showed equal amount of inhibition, but women still outperformed men due to their
superior verbal abilities compared to men (Lee et al., 2004; Watson & Kimura, 1991).
An interesting observation to note is that adding the negative priming effect size found in
Study 2 to the negative priming meta-analysis in Study 1 will render the meta-analysis
significant, z = 1.966, p = 0.049, and also homogenous, with an effect size of d = 0.409,
signifying a moderate female advantage.
STUDY 3: GO/NO-GO TASK
Previous research and justification
The stop-signal task usually involves simply clicking a button in one trial (go trials) and
withholding the response to another trial (stop trial). Often the task is used as a distracter
task in neurological studies and the stimuli tend to be very simple, such as using circles (Li et
al., 2009) or a letter such an X or O (Rucklidge & Tannock, 2002).
Gender data are rare, but the study by Roberts, Newell, Simoes-Franklin, and Garavan
(2008) is highly relevant to the evolved inhibition hypothesis. They found that women in the
follicular phase showed increased inhibition to pictures of men, but not to women. This
suggests that when women are especially susceptible to pregnancy their inhibition skills
increases, but only to male stimuli. Of interest, however, is whether women show more
inhibition than men on a stop-signal task with basic stimuli such as geometric objects that
are unrelated to reproduction. Only four studies have reported gender data in a stop-signal
task, and all found no difference (Li et al., 2006; 2009; Rucklidge & Tannock, 2002; Thakkar
et al., in press).
A stop-signal task slightly modified from Li et al. (2009) is proposed. Their stimuli consisted
of only a circle (go trial), which sometimes turned into an X (stop trial). In the current
experiment four basic geometric objects will be used (square, circle, triangle, diamond) that
is sometimes accompanied by an X to indicate a stop trial.
Even though the stop-signal task involves active cognitive inhibition, it involves motor
movement through button pushing, and it may arguably be better classified as a cognitivebehavioural or motor-inhibition experiment. According to Bjorklund and Kipp (1996), such
tasks tend to show a greater female advantage compared to tasks without motor movement
(such as Stroop).
Alexander, Packard and Peterson (2002) suggested that women process stimuli in the right
visual field more effectively than men. This would mean that if a female advantage is found,
it could be due to a superior performance by women in STOP trials that has an X to the right.
The stop trials will therefore also be analysed by visual field to assess if a right visual field
advantage in women account for any observed female advantage in the experiment.
METHOD
Participants
There were 66 participants, 33 of which were males and 33 of which were females. The
mean age was 24.2 for males and 25.6 for females, and the sample was homogenous, t (64)
= 1.421, p = .160.
Apparatus
SuperCard 4.5 was used to program the experiment.
Material
A trial in the experiment consisted of one image of either a GO trial or a STOP trial. A GO
trial was a picture of a square, rectangle, circle, or diamond. All of the shapes were in the
colour blue presented in the centre of a white background. An example is illustrated in
Figure 2. Their diameter was between 5 and 7 cm depending on the stimuli. A STOP trial was
identical to a GO trial except that an X was presented next to the shape, either to the right
or the left. There were 80 GO trials and 20 STOP trials.
Design
In the experiment sex was the between-subject variable. The dependent measure was
reaction time for the GO trials, and number of STOP trials without a response for both left
and right visual field. The order of the GO and STOP trials were randomised for each
participant.
Figure 2: Example of a GO trial and STOP trial. In the GO trials participants must click the button as soon as
they see the stimuli appears on the screen, while in the STOP trials participants must withhold their response.
Procedure
Participants received instructions telling them to rest their finger on the “B” button (which is
suitable for right and left handed participants) on the keyboard and press it as fast as
possible when a GO trial appears. Participants were told to do nothing when a STOP trial
appeared. There were 8 practice trials before the experiment started, consisting of six GO
trials and two STOP trials.
In the 80 GO trials the stimuli was presented for 2000 ms. A response slower than 1000 ms
was would display the message “too slow”, and if there was no response the message “You
failed to respond” appeared. If the participant responded within 1000 ms the message “well
done” appeared on the screen. In a STOP trial the message “well done” appeared if no
response was given within 2000 ms, and the message “incorrect” if the participant clicked
the button at any time.
RESULTS
One female participant was discarded because the number of successful STOP trials was
more than four standard deviations below the mean. An average reaction time was
computed for each of the remaining 65 participants. These are summarised for men and
women in Table 9, along with the average number of trials successfully inhibited. In terms of
responses on the GO trials it was relatively rare to respond too slowly (after 1000ms): across
all participants there was a successful response rate of 98.07%.
Table 9: Descriptive statistics for the Stop-Signal task. Reported are means (standard deviations).
Measurement
No. successful GO trials
No. successful STOP trials
No. successful STOP trials - LEFT visual field
No. successful STOP trials - RIGHT visual field
Mean RT on GO trials (milliseconds)
males
79.39 (1.25)
16.85 (3.54)
8.58 (1.62)
8.27 (2.014)
458 (75)
females
77.41 (7.33)
18.53 (1.32)
9.47 (0.92)
9.06 (1.05)
464 (44)
Overall
78.42 (5.27)
17.68 (2.79)
9.02 (1.39)
8.66 (1.73)
461 (61)
Regarding reaction time, men and women did not significantly differ in their response time,
t (63) = -.334, p = .740, d = -0.083. However, in regards to number of STOP trials successfully
inhibited, women performed significantly better than men, t (63) = -2.526, p < .02, d = 0.626.
This suggests that women were able to inhibit their responses more effectively, and this was
not due to a speed/accuracy trade-off because men and women did not differ in reaction
time on the GO trials.
To investigate differences in visual field perception, a 2 x 2 mixed ANOVA was conducted
with sex as the between-subject variable and left/right STOP trial as the within-subject
variable. A significant main effect of visual field was found, F (1, 63) = 4.087, p = .048, p2 =
.061, suggesting that both genders were more accurate on STOP trials in the left visual field
compared to the right. A main effect of sex was also found, F (1, 63) = 6.381, p = .014, p2 =
.014, but no interaction, F (1, 63) = .086, p = .770, p2 = .001. This suggested an overall
female advantage regardless of visual field.
DISCUSSION
Women successfully inhibited their response more often than men. As there was no
difference in reaction time on the GO trials, it suggests that the female advantage is not due
to women being slower and taking longer to react. Furthermore, females outperformed
males in both left and right visual fields, and both genders found left field stop signals
easier. The observed female advantage cannot be explained as a result of increased female
processing in the right visual field (Alexander et al., 2002).
The results suggest that women were able to suppress their motor responses more
effectively than men. This supports the evolved inhibition hypothesis. However, some of the
variance may be accounted for by verbal abilities: The X in the stop trials may have been
processed faster by women because they are known to have better verbal fluency (Weiss,
Ragland, Brensinger, Bilker, Deisenhammer, & Delazer, 2006). An interesting future study
would be to repeat the experiment, but instead of using an X instruct participants to
withhold responses to a specific geometric object. This would likely remove any issue of
verbal fluency.
However, that men and women did not differ in reaction time suggests that verbal abilities
can probably only cannot account for a small amount of variance in the female advantage.
As the stop-signal task involves a finger movement it can arguably be better classified as
behavioural inhibition or motor inhibition, and Bjorklund and Kipp (1996) suggested that the
female inhibition advantage abilities should be greater in such tasks compared to tasks with
more cognitive components such as the Stroop. Indeed this does suggests that an evolved
female inhibition mechanism may exist, but it is weak in cognitive inhibition, moderate in
behavioural inhibition, and strong in social inhibition, exactly as suggested by Bjorklund and
Kipp (1996).
GENERAL DISCUSSION
Explaining the observed female advantage in the Stroop CW task and stop-signal task
The meta-analysis revealed a significant female advantage across all age groups. This does
not support the evolved inhibition hypothesis: if a superior inhibition mechanism has
evolved in women for reproductive purposes then this is unlikely to manifest before girls
reach puberty and are able to get pregnant. Additionally, in the negative priming Stroop task
men and women suffered equally in performance from the negative priming trials. As the
negative priming component isolates inhibition this result highly suggests that the female
advantage observed in the CW task is not due to superior inhibition abilities, but rather
superior verbal abilities in women. That a female advantage exist even in children is most
likely the result of girls developing language and verbal skills faster than boys (Burman et al.,
2008). This hypothesis is also supported by Waber (1976), who found that early maturing
children perform better on a Stroop task than late maturing children. In addition, this
difference was greatest between late maturing boys and early maturing girls. Thus the
female advantage in the Stroop CW condition is most likely due to superior verbal abilities.
The hypothesis proposed by Broverman et al. (1968) is also clearly not supported as a male
advantage as not observed.
Another possible explanation for the female advantage is that females outperform males on
the Stroop task because they perceive colours slightly more accurately (Abramov, Gordon,
Feldman, & Chavarga, 2012b). However, this seems unlikely because the amount of colour
combinations used in the Stroop task appear to have no effect on performance (Golden,
1974a; Logan, Zbrodoff, & Williamson, 1984). Furthermore, it has been suggested that
differences in the Stroop task occur at either the cognitive processing or output level, and
not at the perceptual level (MacLeod, 1991), making this explanation unlikely.
The evidence does not support the hypothesis that a female Stroop advantage is due to an
evolved inhibition mechanism. While there is some evidence that suggests that Stroop
performance is affected by a heritable component, this does not give any real insight into
how performance is affected by heritability does not mean that. Three studies have found
strong correlates between performance on monozygotic (identical) twins reared together
(Friedman et al., 2008; Stins, van Baal, Polderman, Verhulst, & Boomsma, 2004), and reared
apart (Johnson, Bouchard Jr., Segal, Keyes, & Samuels, 2003). However, that males and
females show approximately the same correlation strength in performance only tells us that
a heritable component may be present, though it is not apparent what this component is.
The evolved inhibition hypothesis is partially supported by the female advantage found in
the stop-signal task. As predicted by Bjorklund and Kipp (1996), inhibition tasks with a
behavioural component such as a motor response is likely to create a larger female
advantage. Indeed, a moderate female advantage was found that could not be accounted
for by reaction times or verbal abilities, at least not in full. Most likely the effect size is also
higher due to fewer conflicting mental processes taking place, and it appears plausible to
assume that in fact the task may measure motor inhibition as opposed to cognitive
inhibition. It is unclear why the results from Study 3 differ from previous studies, but it may
be because the current version used more varied stimuli, and was also shorter in length. Li
et al.’s (2006; 2009) experiments lasted 40 minutes, and the task may instead be measuring
alertness rather than inhibition.
Evidence for the evolved inhibition mechanism is therefore weak in cognitive inhibition.
Indeed, as Bjorklund and Kipp (1996) themselves concluded, any such mechanism is likely
not domain-general and therefore appears to be largely absent in cognitive inhibition
experiments. Based on results from Roberts et al. (2008) as well as indirect social evidence
such as women being better at inhibiting sexual arousal (Chivers et al., 2010), the evidence
suggest that any evolved inhibition mechanism in women is only present in contexts related
to sex and reproduction, and to some degree in behavioural contexts such as motor tasks.
Indeed, studies that have looked at Stroop performance during the menstrual phase have
not found any significant difference in performance between women in the follicular phase
and the luteal phase (Ioan, Sandulache, Avramescu, Ilie, Neascu, Zagrean, & Moldovan,
2007; Pehlivanoglu, Bayrak, Gurel, & Balkanci, 2012).
The effect of the Stroop version
The meta-analysis in Study 1 found that female advantage depended on which version of
the Stroop task was utilised. Specifically, the more accurate data the Stroop version
generates, the larger the female advantage becomes. The more traditional Stroop measures
of using 100 items on one card and recording completion time had significant, small effect
sizes around d = 0.23. Single trial versions, which measure milliseconds and are perhaps
even more accurate, showed a moderate female advantage with d = 0.476. By contrast,
both versions that count the number of correctly named colours showed practically no
gender difference at all (Golden, 1978; Trenerry et al., 1989). This finding may have
profound impact on neurological studies where the Stroop task is often used to assess
cognitive functioning under the assumption that men and women do not differ in
performance.
One may argue that having single trials in the CW task creates less interference due to the
absence of other items in the display, but Izawa and Silver (1988) and Golden (1974a) found
that the number items or colours in a trial appear to make no difference on performance.
Thus, the increased effect size is most likely due to more accurate measurements rather
than a reduced amount of conflicting colours. Furthermore, the moderate effect size from
Study 2 is higher than the female advantage observed in other reaction time Stroop versions
reported in Study 1. Most likely this is due to the length of the task, which generates more
reliable and accurate estimate of performance compared to 100 items spread over one trial.
This highly suggests that indeed there is a female advantage in the Stroop CW incongruent
condition, and this gender effect becomes larger with more accurate measurements.
Final remarks
It has been found that there is a female advantage in the Stroop task, which is found in all
ages and cultures. This advantage varies from small to moderate depending on the accuracy
of the measurements involved. The advantage is most likely due to increased verbal abilities
in females rather superior inhibition abilities. This is supported by the finding that children
show a female advantage as well as adults and that the observed female advantage in Study
2 did not increase with negative priming trial.
Evidence for an evolved inhibition mechanism in females is weak in cognitive inhibition
tasks, where most studies have found no difference, and the observed female advantage in
Stroop is due to verbal skills. However, as Bjorklund and Kipp (1996) predicted, inhibition
tasks with motor-behavioural components are more likely to show a female advantage, and
indeed this was found in the stop-signal task. Making a cognitive inhibition task more
related to sex and reproduction would likely show an increased female advantage.
REFERENCE LIST
Abramov, I., Gordon, J., Feldman, O., & Chavarga, A. (2012a). Sex & vision I: spatio-temporal
resolution. Biology of Sex Differences, 3(1), 1-14.
Abramov, I., Gordon, J., Feldman, O., & Chavarga, A. (2012b). Sex & vision II: color
appearance of monochromatic lights. Biology of Sex Differences, 3(1), 21-36.
Afsaneh, Z., Alireza, Z., Mehdi, T., Farzad, A., Reza, Z. M., Mehdi, M., & Mojtaba, K. S. (2012).
Assessment of selective attention with CSCWT (Computerized Stroop Color-Word
Test) among children and adults. US-China Education Review, A 1, 121-127.
Al-Ghatani, A. M., Obonsawin, M. C., & Al-Moutaery, K. R. (2010). The Arabic version of the
Stroop test and its equivalency to the English version. Pan Arab Journal of
Neurosurgery, 14(1), 112-115.
Al-Ghatani, A. M., Obonsawin, M. C., Binshaig, B. A., & Al-Moutaery, K. R. (2011). Saudi
normative data for the Wisconsin Card Sorting test, Stroop test, Test of Non-verbal
Intelligence-3, Picture Completion and Vocabulary (subtest of the Wechsler Adult
Intelligence Scale-Revised). Neurosciences (Riyadh), 16(1), 29-41.
Alansari, B., & Baroun, K. (2004). Gender and cultural performance on the Stroop color and
word test: a comparative study. Social Behavior and Personality, 32(3), 233-244.
Alexander, G. M., Packard, M. G., & Peterson, B. S. (2002). Sex and spatial positioning effects
on object location memory following intentional learning of object identities.
Neuropsychologia, 40, 1516-1522.
Amato, M. P., Portaccio, E., Goretti, B., Zipoli, V., Ricchiuti, L., De Caro, M. D., Patti, F.,
Vecchio, R., Sorbi, S., & Trojano, M. (2006). The Rao's Brief Repeatable Battery and
Stroop test: normative values with age, education and gender corrections in an
Italian population. Multiple Sclerosis, 12, 787-793.
Andrews, K. A. H. (2009). Normative indications for Xhosa-speaking unskilled workers on the
trail making test and the Stroop Test. Rhodes University.
Anstey, K. J., Matters, B., Brown, A. K., & Lord, S. R. (2000). Normative data on
neuropsychological tests for very old adults living in retirement villages and hostels.
The Clinical Neuropsychologist, 14(3), 309-317.
Asha, C. B. (1989). Constricted-flexible cognitive cognitive control as a function of creativity
and intelligence. Indian Journal of Community Guidance Service, 6(2), 49-59.
Asha, C. B. (1991). Effects of rearing on cognitive control. Psychological Studies, 36, 131-136.
Barbarotto, R., Laiacona, M., Frosio, R., Vecchio, M., Farinato, A., & Capitani, E. (1998). A
normative study on visual reaction times and two Stroop colour-word tests. The
Italian Journal of Neurological Sciences, 19(3), 161-170.
Baroun, K., & Alansari, B. (2006). Gender differences in performance on the Stroop test.
Social Behavior and Personality, 34(3), 309-318.
Beckmann, J. (unpublished). Maintaining and ending close relationships.
Ben-David, B. M., & Schneider, B. A. (2009). Sensory origin for Color-Word Stroop effects in
aging: a meta-analysis. Aging, Neuropsychology, and Cognition: A Journal on Normal
and Dysfunctional Development, 16(5), 505-534.
Beratis, I. N., Rabavilas, A., Papadimitriou, G. N., & Papageorgiou, C. (2010). Effect of
handedness on the Stroop Colour Word task. Laterality, 15(6), 597-609.
Berg, E. A. (1948). A simple objective technique for measuring flexibility in thinking. The
Journal of General Psychology, 39(1), 15-22.
Bettner, L. G., Jarvik, L. F., & Blum, J. E. (1971). Stroop color-word test, non-psychotic
organic brain syndrome, and chromosome loss in aged twins. Journal of Gerontology,
26(4), 458-469.
Bjorklund, D. F., & Kipp, K. (1996). Parental investment theory and gender differences in the
evolution of inhibition mechanisms. Psychological Bulletin, 120(2), 163-188.
Bohnen, N., Jolles, J., & Twijnstra, A. (1992). Modification of the Stroop Color Word Test
improves differentiation between patients with mild head injury and matched
controls. The Clinical Neuropsychologist, 6(2), 178-184.
Boone, K. B., Ghaffarian, S., Lesser, I. M., Hill-Gutierrez, E., & Berman, N. G. (1993).
Wisconsin Card Sorting Test performance in healthy, older adults: relationship to
age, sex, eduction, and IQ. Journal of Clinical Psychology, 49(1), 54-60.
Broverman, D. M., Klaiber, E. L., Kobayashi, Y., & Vogel, W. (1968). Roles of activation and
inhibition in sex differences in cognitive abilities. Psychological Review, 75(1), 23-50.
Buck, S. M., Hillman, C. H., & Castelli, D. M. (2008). The relation of aerobic fitness to Stroop
task performance in preadolescent children. Medicine and Science in Sports and
Exercise, 41(1), 166-172.
Burman, D. D., Bitan, T., & Booth, J. R. (2008). Sex differences in neural processing of
language among children. Neuropsychologia, 46(5), 1349-1362.
Chipman, K., & Kimura, D. (1998). An investigation of sex differences on incidental memory
for verbal and pictorial material. Learning and Individual Differences, 10(4), 259-272.
Chivers, M. L., Seto, M. C., Lalumiere, M. L., Laan, E., & Grimbos, T. (2010). Agreement of
self-reported and genital measures of sexual arousal in men and women: a metaanalysis. Archives of Sexual Behaviour, 39(1), 5-56.
Christiansen, H., & Oades, R. D. (2010). Negative priming within a Stroop task in children and
adolescents with Attention-Deficit Hyperactivity Disorder, their Siblings, and
independent controls. Journal of Attention Disorders, 13(5), 497-504.
Clutton-Brock, T. H. (2007). Sexual selection in males and females. Science, 318, 1882-1885.
Cohen, A. S., & Fischer, H. (1980). Sex-specific processing of contradictory information by
elementary school pupils. Zeitschrift für Experimentelle und Angewandte
Psychologie, 27(1), 59-71.
Cohen, J. (1969). Statistical Power Analysis for the Behavioral Science. New York: Academic
Press.
Comalli, P. E., Wapner, S., & Werner, H. (1962). Interference effects of Stroop Color-Word
test in childhood, adulthood, and aging. The Journal of Genetic Psychology, 100(4753).
Connor, A., Franzen, M., & Sharp, M. D. (1988). Effects of practice and differential
instructions on Stroop performance. International Journal of Clinical
Neuropsychology, 10(1), 1-4.
Daniel, D. B., Pelotte, M., & Lewis, J. (2000). Lack of sex differences on the Stroop colorword test across three age group. Perceptual and Motor Skills, 90(483-484).
Daniel, J. (1972). Perceptual conflict as test of mental load and personality. Studia
Psychologia, 3, 237-238.
Davies, P. L., & Rose, J. D. (1999). Assessment of cognitive development in adolescents by
means of neuropsychological tasks. Developmental Psychology, 15(2), 227-248.
Davis, W. P., Jorgenson, C. B., Kritselis, A., & Opella, J. (1981). Hemispheric asymmetry in the
processing of Stroop stimulu: the effect of enhancement of spatial skills.
International Journal of Neuroscience, 15(3), 179-183.
de Grip, A., Bosma, H., Willems, D., & Van Boxtel, M. P. J. (2008). Job-worker mismatch and
cognitive decline. Oxford Economic Papers, 60(2), 237-253.
Delis, D., Kaplan, E., & Kramer, J. (2001). Delis-Kaplan Executive Function System. San
Antonio: The Psychological Corporation.
Dodrill, C. B. (1978). A neuropsychological battery for epilepsy. Epilepsia, 19(6), 611-623.
Ellis, P. D. (2010). The Essential Guide to Effect Sizes. Cambridge: Cambridge University
Press.
Esgalhado, G., & Pereira, H. (2012). Efeito do género e da escolaridade no teste Stroop: da
infância à adultez jovem. International Journal of Developmental and Educational
Psychology, 1(2), 77-85.
Friedman, N. P., Miyake, A., Young, S. E., DeFries, J. C., Corley, R. P., & Hewitt, J. K. (2008).
Individual differences in executive functions are almost entirely genetic in origin.
Journal of Experimental Psychology: General, 137(2), 201-225.
Fuster, J. M. (1984). The prefrontal cortex and temporal integration. In A. Peters & E. G.
Jones (Eds.), Cerebral Cortex: Association and Auditory Cortices (Vol. 4, pp. 151-177).
New York: Plenum Press.
Golden, C. J. (1974a). Effect of differing number of colors on the Stroop Color and Word
Test. Perceptual and Motor Skills, 39, 550.
Golden, C. J. (1974b). Sex differences in performance on the Stroop color and word test.
Perceptual and Motor Skills, 39, 1067-1070.
Golden, C. J. (1978). Stroop Color and Word Test: A Manual for Clinical and Experimental
Uses. Chicago: Stoelting.
Graf, P., Uttl, B., & Tuokko, H. (1995). Color- and picture-word Stroop tests: performance
changes in old age. Journal of Clinical and Experimental Neuropsychology, 17(3), 390415.
Guitton, D., Buchtel, H. A., & Douglas, R. M. (1985). Frontal love lesions in man cause
difficulties in suppressing reflexive glances and in generating goal-directed saccades.
Experimental Brain Research, 58, 455-472.
Gustafson, R., & Källmén, H. (1990). Effects of alcohol on cognitive performance measured
with Stroop's color word test. Perceptual and Motor Skills, 71, 99-105.
Halpern, D. (2000). Sex Differences in Cognitive Abilities. New Jersey: Lawrence Erlbaum.
Harnishfeger, K. K., Pope, R. S., & Kirijan, J. C. (1995). An examination of sex differences in
cognitive inhibition tasks. Paper presented at the Conference for the Behavioral
Sciences.
Howell, D. C. (2013). Statistical Methods for Psychology (8th ed.). Boston: Wadsworth,
Cengage Learning.
Hyde, J. S. (1990). Meta-analysis and the psychology of gender differences. Signs: Journal of
Women in Culture and Society, 16(1), 55-73.
Hyde, J. S., & Linn, M. C. (1988). Gender differences in verbal ability: a meta-analysis.
Psychological Bulletin, 104(1), 53-69.
Insua, M. (2001). Performance on the Stroop Color and Word Test as a function of language
in bilinguals. Carlos Albizu University.
Ioan, S., Sandulache, M., Avramescu, S., Ilie, A., Neascu, A., Zagrean, L., & Moldovan, M.
(2007). Red is a distractor for men in competition. Evolution and Human Behavior,
28, 285-293.
Izawa, C., & Silver, N. C. (1988). Response variations to Stroop Color-Word stimuli. Genetic,
Social, and General Psychology Monographs, 114(2), 211-255.
Janetos, A. C. (1980). Strategies of female mate choice: a theoretical analysis. Behavioral
Ecology and Sociobiology, 7, 107-112.
Jensen, A. R., & Rohwer, W. D. (1966). The Stroop Color-Word test: a review. Acta
Psychologica, 25, 36-93.
Johnson, W., Bouchard Jr., T. J., Segal, N. L., Keyes, M., & Samuels, J. (2003). The Stroop
Color-Word Test: genetic and environmental influences: reading, mental ability and
personality correlates. Journal of Educational Psychology, 95(1), 58-65.
Jorgenson, C. B., Davis, J., Opella, J., & Angerstein, G. (1980). Hemispheric asymmetry in the
processing of Stroop stimuli: an examination of gender, hand-preference, and
language differences. International Journal of Neuroscience, 11(3), 165-169.
Kang, C., Lee, G. J., Yi, D., McPherson, S., Rogers, S., Tingus, K., & Po, H. (2013). Normative
data for healthy older adults and an abbreviated version of the Stroop test. The
Clinical Neuropsychologist, 27(2), 276-289.
Laeng, B., Låg, T., & Brennen, T. (2005). Reduced Stroop interference for opponent colors
may be due to input factors: evidence from individual differences and a neural
network simulation. Journal of Experimental Psychology: Human Perception and
Performance, 31(3), 438-452.
Lee, D. Y., Lee, K. U., Lee, J. H., Kim, K. W., Jhoo, J. H., Kim, J. C., Woo, S. I., Ha, J., & Woo, J. I.
(2004). A normative study of the CERAD neuropsychological assessment battery in
the Korean elderly. Journal of the International Neuropsychological Society, 10, 7281.
Lee, T. M. C., Yuen, K. S. L., & Chan, C. C. H. (2002). Normative data for Neuropsychological
measures of fluency, attention, and memory measures for Hong Kong Chinese.
Journal of Clinical and Experimental Neuropsychology, 24(5), 615-632.
Li, C.-S. R., Huang, C., Constable, R. T., & Sinha, R. (2006). Gender differences in the neural
correlates of response inhibition during a stop signal task. NeuroImage, 32(4), 19181929.
Li, C.-S. R., Zhang, S., Duann, J.-R., Yan, P., Sinha, R., & Mazure, C. M. (2009). Gender
differences in cognitive control: an extended investigation of the stop signal task.
Brain Imaging Behavior, 3(3), 262-276.
Linn, M. C., & Petersen, A. C. (1985). Emergence and characterization of sex differences in
spatial ability: a meta-analysis. Child Development, 56, 1479-1498.
Lipsey, M. W., & Wilson, D. B. (2001). Practical Meta-Analysis. Thousand Oaks: SAGE
Publications, Inc.
Llinàs-Reglà, J., Vilalta-Franch, J., López-Pousa, S., Calvó-Perxas, L., & Garre-Olmo, J. (2013).
Demographically adjusted norms for Catalan older adults on the Stroop Color and
Word Test. Archives of Clinical Neuropsychology, 28, 282-296.
Logan, G. D., & Cowan, W. B. (1984). On the ability to inhibit thought and action: a theory of
an act of control. Psychological Review, 91(3), 295-327.
Logan, G. D., Zbrodoff, N. J., & Williamson, J. (1984). Strategies in the color-word Stroop
task. Bulletin of the Psychonomic Society, 22(2), 135-138.
Lord, T., & Taylor, K. (1991). Monthly fluctuation in task concentration in female college
students. Perceptual and Motor Skills, 72, 435-439.
Lucas, J. A., Ivnik, R. J., Smith, G. E., Ferman, T. J., Willis, F. B., Petersen, R. C., & GraffRadford, N. R. (2005). Mayo's older African Americans normative studies: norms for
Boston Naming Test, Controlled Oral Word Association, Category Fluency, Animal
Naming, Token Test, Wrat-3 Reading, Trail Making Test, Stroop Test, and Judgment
of Line Orientation. The Clinical Neuropsychologist, 19(2), 243-269.
Luna, B., Garver, K. E., Urban, T. A., Lazar, N. A., & Sweeney, J. A. (2004). Maturation of
cognitive processes from Late Childhood to Adulthood. Child Development, 75(5),
1357-1372.
Maccoby, E. E., & Jacklin, C. N. (1974). The psychology of sex differences. Stanford: Stanford
University Press.
MacLeod, C. M. (1991). Half a century of research on the Stroop effect: an integative review.
Psychological Bulletin, 109, 163-203.
MacLeod, C. M. (2007). The concept of inhibition in cognition. In D. S. Gorfein & C. M.
MacLeod (Eds.), Inhibition in Cognition. Washington, D.C.: American Psychological
Association.
Martin, N., & Franzen, M. (1989). The effect of anxiety on neuropsychological function.
International Journal of Clinical Neuropsychology, 11(1), 1-8.
Mekarski, J. E., Cutmore, T. R. H., & Suboski, W. (1996). Gender differences during
processing of the Stroop task. Perceptual and Motor Skills, 83, 563-568.
Milhausen, R., Graham, C., Sanders, S. A., Yarber, W. L., & Maitland, S. D. (2010). Validation
of the Sexual Excitation/Sexual Inhibition Inventory for women and men. Archives of
Sexual Behaviour, 39(5), 1091-1104.
Mitrushina, M., Boone, K. B., Razani, J., & D'elia, L. F. (2005). Handbook of Normative Data
for Neuropsychological Assessment (2nd ed.). New York: Oxford University Press.
Moering, R. G., Schinka, J. A., Mortimer, J. A., & Graves, A. B. (2004). Normative data for
elderly African Americans for the Stroop Color and Word Test. Archives of Clinical
Neuropsychology, 19, 61-71.
Neill, W. T. (1977). Inhibitory and facilitatory processes in selective attention. Journal of
Experimental Psychology: Human Perception and Performance, 3(3), 444-450.
Neill, W. T., & Westberry, R. L. (1987). Selective attention and the suppression of cognitive
noise. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13(2),
327-334.
Oosthuizen, M. D., & Phipps, W. D. (2012). A preliminary standardisation of the Bohnen et
al. version of the Stroop Color-Word Test for Setswana-Speaking University students.
South African Journal of Psychology, 42, 411-422.
Panek, P. E., Rush, M. C., & Slade, L. A. (1984). Locus of the age-Stroop interference
relationship. Journal of Genetic Psychology, 145(2), 209-216.
Paniak, C., Miller, H. B., Murphy, D., & Patterson, L. (1996). Canadian developmental norms
for 9 to 14 year-olds on the Wisconsin Card Sorting Test. Canadian Journal of
Rehabilitation, 9(4), 233-237.
Pati, P., & Dash, A. S. (1990). Effects of grade, sex and achievement levels on intelligence,
incidental memory and Stroop scores. Psychological Studies, 35, 36-40.
Pehlivanoglu, B., Bayrak, S., Gurel, E. I., & Balkanci, Z. D. (2012). Effect of gender and
menstrual cycle on immune system response to acute mental stress: apoptosis as a
mediator. Neuroimmunomodulation, 19, 25-32.
Peña-Casanova, J., Quiñones-Úbeda, S., Gramunt-Fombuena, N., Quintana, M., Aguilar, M.,
Molinuevo, J. L., Serradell, M., Robles, A., Barquero, M. S., Payno, M., Antúnez, C.,
Martínez-Parra, C., Frank-García, A., Fernández, M., Alfonso, V., Sol, J. M., & Blesa, R.
(2009). Spanish Multicenter Normative Studies (NEURONORMA project): norms for
the Stroop color-word interference test and the Tower of London-Drexel. Archives of
Clinical Neuropsychology, 24(4), 413-429.
Peretti, P. O. (1969). Cross-sex and cross-educational level performance in color-word
interference task. Psychonomic Science, 16(6), 321-323.
Peretti, P. O. (1971). Effects of noncompetitive, competitive instructions, and sex on
performance in a color-word interference task. Journal of Psychology, 79, 67-70.
Person, E. S., Terestman, N., Myers, W. A., Goldberg, E. L., & Salvadori, C. (1989). Gender
differences in sexual behaviors and fantasies in a college population. Journal of Sex &
Marital Therapy, 15(3), 187-198.
Rassin, E. (2003). The White Bear Suppression Inventory (WBSI) focuses on failing
suppression attempts. European Journal of Personality, 17, 285-298.
Roberts, G. M. P., Newell, F., Simoes-Franklin, C., & Garavan, H. (2008). Menstrual cycle
phase modulates cognitive control over male but not female stimuli. Brain Research,
1224, 79-87.
Roberts, R. J., Hager, L. D., & Heron, C. (1994). Prefrontal cognitive processes: working
memory and inhibition in the antisaccade task. Journal of Experimental Psychology:
General, 123(4), 374-393.
Rognoni, T., Casals-Coll, M., Sánchez-Benavides, G., Quintana, M., Manero, R. M., Calvo, L.,
Palomo, R., Aranciva, F., Tamayo, F., & Peña-Casanova, J. (2013). Spanish normative
studies in young adults (NEURONORMA young adults project): Norms for Stroop
Color—Word Interference and Tower of London-Drexel University tests. Neurología,
28(2), 73-80.
Rosenthal, R. (1979). The "File Drawer Problem" and tolerance for null results. Psychological
Bulletin, 86(3), 638-641.
Rosselli, M., & Ardila, A. (1993). Developmental norms for the Wisconsin Card Sorting Test in
5-to 12-year-old children. Clinical Neuropsychologist, 7(2), 145-154.
Rovainen, E. (2011). Gender differences in processing speed: A review of recent research.
Learning and Individual Differences, 21, 145-149.
Rucklidge, J. J., & Tannock, R. (2002). Neuropsychological profiles of adolescents with ADHD:
effects of reading difficulties and gender. Journal of Child Psychology and Psychiatry,
43(8), 988-1003.
Sanders, G., Riggs, K. J., Simpson, A., & Davies, A. (unpublished). Women process visual
information faster in near space, but men in far space: Implications for
dorsal and ventral visual systems. 1-13.
Sarmany, I. (1977). Different performance in Stroop's interference test from the aspect of
personality and sex. Studia Psychologia, 19(1), 60-67.
Seo, E. H., Lee, D. Y., Choo, I. H., Kim, S. G., Kim, K. W., Youn, J. C., Jhoo, J. H., & Woo, J. I.
(2008). Normative study of the Stroop Color and Word Test in an educationally
diverse elderly population. International Journal of Geriatric Psychiatry, 23, 10201027.
Singh, S. P. (1991). Sex differences in cognitive functioning. Psycho-Lingua, 21(1), 47-50.
Sladekova, L., & Daniels, J. (1981). Differences in performance in Stroop's test from the
aspect of sex and age. Studia Psychologia, 23(145-149).
Steel, C., Hemsley, D. R., & Jones, S. (1996). 'Cognitive inhibition' and schizotype as
measured by the Oxford-Liverpool Inventory of feelings and experiences. Personality
and Individual Differences, 20(6), 769-773.
Stins, J. F., Polderman, J. C., Boomsma, D. I., & de Geus, E. J. C. (2005). Response
interference and working memory in 12-year-old children. Child Neuropsychology,
11, 191-201.
Stins, J. F., van Baal, G. C. M., Polderman, J. C., Verhulst, F. C., & Boomsma, D. I. (2004).
Heritability of Stroop and flanker performance in 12-year old children. BMC
Neuroscience, 5(1), 49-57.
Strickland, T. L., D'elia, L. F., James, R., & Stein, R. (1997). Stroop color-word performance on
African Americans. The Clinical Neuropsychologist, 11(1), 87-90.
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental
Psychology, 6, 643-662.
Suschinsky, K. D., Lalumiere, M. L., & Chivers, M. L. (2009). Sex differences in patterns of
genital sexual arousal: measurement artifacts or true phenomena? Archives of
Sexual Behaviour, 38(4), 559-573.
Swerdlow, N. R., Filion, D., Geyer, M. A., & Braff, D. L. (1995). "Normal" personality
correlates of sensorimotor, cognitive, and visuospatial gating. Biological Psychiatry,
37(5), 286-299.
Thakkar, K. N., Congdon, E., Poldrack, R. A., Sabb, F. W., London, E. D., Cannon, T. D., &
Bilder, R. M. (in press). Women are more sensitive than men to prior trial events in
the Stop-signal task. British Journal of Psychology, 1-19.
Tipper, S. P. (1985). The negative priming effect: inhibitory priming by ignored objects. The
Quarterly Journal of Experimental Psychology, 57A, 571-590.
Tipper, S. P., Bourque, T. A., Anderson, S. H., & Brehaut, J. (1989). Mechanisms of attentions:
a developmental study. Journal of Experimental Child Psychology, 48, 353-378.
Trenerry, M. B., Crosson, B. J., DeBoe, J., & Leber, W. R. (1989). Stroop Neuropsychological
Screening Test Manual. Odessa: Psychological Assessment Resources.
Trivers, R. L. (1972). Parental investment and sexual selection. In B. Campell (Ed.), Sexual
selection and the descent of man (pp. 136-179). Chicago: Aldine.
Troyer, A. K., Leach, L., & Strauss, E. (2006). Aging and response inhibition: normative data
for the Victoria Stroop Test. Aging, Neuropsychology, and Cognition, 13(1), 20-35.
van Boxtel, M. P. J., ten Tusscher, M. P. M., Metsemakers, J. F. M., Willems, B., & Jolles, J.
(2001). Visual determinants of reduced performance on the Stroop Color-Word Test
in normal aging individuals. Journal of Clinical and Experimental Neuropsychology,
23(5), 620-627.
van der Elst, W., Molenberghs, G., Van Boxtel, M. P. J., & Jolles, J. (in press). Establishing
normative data for repeated cognitive assessment: a comparison of different
statistical methods. Behavior Research Methods, 1-14.
van der Elst, W., van Boxtel, M. P. J., van Breukelen, G. J. P., & Jolles, J. (2006). The Stroop
color-word test: influence of age, sex, and education; and normative data for a large
sample across the adult age range. Assessment, 13(1), 62-79.
van Exel, E., Gussekloo, J., de Craen, A. J. M., der Wiel, A. B., Houx, P. J., Knook, D. L., &
Westerdorp, R. G. J. (2001). Cognitive function in the oldest old: womenperform
better than men. Journal of Neurology, Neurosurgery, & Psychiatry, 71(1), 29-32.
Vanier, M. (unpublished). Test de Stroop. 1-56.
Vaskinn, A., Sundet, K., Simonsen, C., Hellvin, T., Melle, I., & Andreassen, O. A. (2011). Sex
differences in neuropsychological performance and social functioning in
schizophrenia and bipolar disorder. Neuropsychology, 25(4), 499-510.
Visser, M., Das-Smaal, E., & Kwakman, H. (1996). Impulsivity and negative priming: evidence
for diminshed cognitive inhibition in impulsive children. British Journal of Psychology,
87, 131-140.
Vogel, A., Stokholm, J., & Jørgensen, K. (2013). Performances on Symbol Digit Modalities
Test, Color Trails Test, and modified Stroop test in a healthy, elderly Danish sample.
Aging, Neuropsychology, and Cognition, 20(3), 370-382.
von Kluge, S. (1992). Trading accuracy for speed: gender differences on a Stroop task under
mild performance anxiety. Perceptual and Motor Skills, 75, 651-657.
Voyer, D., Postma, A., Brake, B., & Imperato-McGinley, J. (2007). Gender differences in
object location memory: a meta-analysis. Psychonomic Bulletin & Review, 14(1), 2338.
Waber, D. (1976). Sex differences in cognition: a function of maturation rate? Science,
192(4239), 572-574.
Watson, N. V., & Kimura, D. (1991). Nontrivial sex differences in throwing and intercepting:
relation to psychometrically-defined spatial functions. Personality and Individual
Differences, 12(5), 375-385.
Wegner, D. M., Schneider, D. J., Carter III, S. R., & White, T. L. (1987). Paradoxical effects of
thought suppression. Journal of Personality and Social Psychology, 53(1), 5-13.
Wegner, D. M., Shortt, J. W., Blake, A. W., & Page, M. S. (1990). The suppression of exciting
thoughts. Journal of Personality and Social Psychology, 58(3), 409-418.
Wegner, D. M., & Zanakos, S. (1994). Chronic thought suppression. Journal of Personality,
62(4), 615-640.
Weiss, E. M., Ragland, J. D., Brensinger, C. M., Bilker, W. B., Deisenhammer, E. A., & Delazer,
M. (2006). Sex differences in clustering and switching in verbal fluency tasks. Journal
of the International Neuropsychological Society, 12, 502-509.
Wolf, M., & Gow, D. (1986). A longitudinal investigation of gender differences in language
and reading development. First Language, 6, 81-110.
Wolff, P. H., Hurwitz, I., Imamura, S., & Lee, K. W. (1983). Sex differences and ethnic
variations in speed of automatized namings. Neuropsychologia, 21, 283-288.
Zalonis, I., Christidi, F., Bonakis, A., Karaizou, E., Triantafyllou, N. I., Paraskevas, G., Kapaki,
E., & Vasilopoulos, D. (2009). The Stroop effect in Greek healthy population:
normative data for the Stroop Neuropsychological Screening Test. Archives of Clinical
Neuropsychology, 24, 81-88.
Appendix:
Example trial of a Colour-Word (incongruent) trials from Study 2
Example trial of a Negative Priming trial from Study 2