The Effect of Number of Trials and Stimulus Set on the Psychometric

The Effect of Number of Trials and Stimulus Set on the Psychometric Qualities of
the Affective Misattribution Procedure
Yoav Bar-Anan
Ben-Gurion University, in the Negev, Be’er Sheva
Brian A. Nosek
University of Virginia
A follow-up study to
A Comparative Investigation of Seven Indirect Attitude Measures
Yoav Bar-Anan and Brian A. Nosek
To cite this manuscript:
Bar-Anan, Y., & Nosek, B. A. (2013). The Effect of Number of Trials and Stimulus Set on the Psychometric
Qualities of the Affective Misattribution Procedure. Open Science Framework, bHNd2.
http://openscienceframework.org/project/bHNd2/
Five years after the study described in the main manuscript (Bar-Anan & Nosek, 2013),
we conducted a smaller follow-up study to address particular questions from the main study
about the Affective Misattribution Procedure (AMP; Payne et al., 2005). In the main study, the
AMP showed some promising qualities—good internal consistency, fair convergent validity and
the best discriminant validity for single category measurement. However, the AMP often trailed
behind many of the other measures, and was very sensitive to removal of participants with
extreme scores. The follow-up study examined three procedural features that might improve the
AMP’s psychometric qualities. The main study had used the most common practices at the time
of its design. In particular, we used a smaller number of critical trials for the AMP (48 trials)
than for the other indirect measures. In most cases, increasing the number of trials can improve
the reliability and validity of a measure. In the follow-up study, we used 120 critical trials (a
common number of critical trials in the Implicit Association Test—IAT, Greenwald, McGhee, &
Schwartz, 1998) and matched the number of trials across indirect measures used for comparison.
Second, we had selected the stimuli for the main study based on acceptable results with
those stimuli in previous studies (e.g., Bar-Anan et al., 2009; Nosek & Banaji, 2001). However,
we had never tested whether the stimuli are representative examples of their categories, nor did
we try to balance them on any objective criteria (e.g., facial expression). It is possible that
selection of stimuli could be more impactful on measures that are more sensitive to the items
than the stimuli (i.e., the AMP and the Evaluative Priming task—EPT, Fazio, Sanbonmatsu,
Powell, & Kardes, 1986). Therefore, the follow-up study used stimuli selected especially for the
AMP to compare experimentally with the stimuli from the main study. Finally, in the published
literature there are several different duration parameters of stimuli presentations in the AMP. To
generalize our results, the follow-up study used a different variation than the main study.
Method
2,563 participants (1431 women, 756 men, 15 of unknown gender; Mage = 23.3; SDage =
11.9) completed the study online in a similar procedure to the one described in the main article.
Each participant completed, in a random order, two of the four indirect measures and all the
direct measures. All the measures measured race attitudes.
Stimuli. We used two stimuli sets, assigned randomly to each participant (i.e., the stimuli
set was a between-participant factor). One set was the same 12 stimuli used in the main study
(the NBA stimuli). The other set were 24 stimuli of Black and White men used in studies that first
presented the AMP (Payne et al., 2005). According to Payne et al., all these images had neutral
expression and they were matched on attractiveness ratings.
The indirect measures were the IAT, Brief-IAT (BIAT; Sriram & Greenwald, 2009), EPT
and AMP. The task procedures were identical to those of the main study, with the following
modifications.
AMP. We used three critical blocks (after the practice block), each with 40 trials – 20
with Black people primes and 20 with White people primes (there were no neutral primes).
Following a number of published studies (e.g., Bar-Anan & Nosek, 2012; Payne, Burkley, &
Stokes, 2008), the trial sequence included four screens that appeared subsequently: the prime
(100ms), a blank screen (100ms), the target (100ms) and then the mask (appeared until
response).
BIAT. We removed four trials from each of the first two critical blocks, to reduce the
number of critical trials from 128 to 120. Additionally, whereas in the main study, four of the
critical blocks used Good words as the focal attribute category and four used Bad words as the
focal attribute category, all eight critical blocks of the BIAT used Good words as the focal
attribute category. This modification was based on conclusions from a separate investigation
regarding the best practices for the BIAT procedure, based on data from the main study (Nosek,
Bar-Anan, Sriram, & Greenwald, 2013). That investigation confirmed previous findings (Sriram
& Greenwald, 2009) that the BIAT is less reliable and valid when the focal attribute category is
negative than when it is positive.
EPT. We removed 20 trials from each of the three critical blocks, to reduce the number
of critical trials from 180 to 120 to be comparable to the other indirect measures. Target words
were all adjectives.
Direct measures. In addition to the thermometer, the preference question, and the
individual item ratings from the main study, participants also completed a modified version of
the MRS. Based on findings from the main study, we selected the four items that were the most
related to indirect measures and added four similar items that showed the strongest relation to
indirect measures in another study (Motyl, Schmidt, & Nosek, 2013).
Results
Data treatment and score computation were identical to the main study. Table 1
summarizes the main results and Table 2 presents more details. Most of the results for the AMP
score that was computed from the first 48 trials of the AMP sessions that used the NBA stimuli
were a close replication of the results of the main study. The internal consistency was .66 (.69 in
the main study), the average correlation with direct measures was .32 (.31 in the main study), and
the average correlation with indirect measures was .27 (.21 in the main study). These
psychometric indices improved when we computed the score of the full task (120 trials; NBA
stimuli only). This was true mostly for the internal consistency (increased to .83) and the average
correlation with the direct measures (increased to .39), and only slightly so with respect to the
average correlation with the indirect measures (increased to .31).
Table 1
Summary Results
All stimuli sets
NBA stimuli
Payne’s stimuli
All stimuli sets,
without extreme 10%
White-preference (effect-size)
IAT
0.75
BIAT
0.29
EPT
0.11
AMP
-0.14
0.75
0.29
0.02
-0.35 (-0.07)
0.75
0.32
0.21
0.0
Internal consistency
IAT
BIAT
EPT
AMP
.86
.86
.85
.85
.37
.39
.85
.83 (.66)
Mean correlation with direct measures
IAT
.23
.25
BIAT
.32
.31
EPT
.13
.11
AMP
.39
.39 (.30)
Mean correlation with indirect measures
IAT
.29
.31
BIAT
.41
.41
EPT
.23
.23
AMP
.28
.31 (.27)
.86
.84
.33
.86
.78 (13%)
.76 (14%)
-.08 (13%)
.51 (46%)
.23
.36
.13
.39
.18 (2%)
.25 (4%)
.09 (1%)
.22 (10%)
.26
.40
.23
.23
.24 (3%)
.32 (7%)
.11 (4%)
.17 (5%)
Notes. The NBA stimuli are the stimuli from the main study; In parentheses, the performance in that criterion of the
AMP score computed from the first 48 trials of the task with the NBA stimuli set; The White-preference effect size
was Cohen’s d indicating the magnitude of the effect compared to 0 (no preference between Whites and Blacks);
Without extreme 10% = without the 10% most extreme scores (% shared variance lost in parentheses).
The stimuli set influenced the average preference score of the AMP and the EPT (Table
1), but had no significant effect on the psychometric qualities of any of the four measures. The
only noticeable difference was that the AMP had marginally stronger relationship with the IAT
was when using the NBA stimuli, r(168) = .314, than when using Payne’s stimuli, r(171) = .140,
Fisher’s z = 1.68, p = .09. None of the other differences was close to significance. Given the
number of comparisons, the single difference with a relaxed alpha criterion is likely to be due to
chance. Overall, these results suggest that the stimuli set has no impact on the most important
psychometric evaluation criteria for the indirect measures.
Table 2
Detailed results
IAT
BIAT
AMP
EP
Thermometer
Items
Overall
0.30 (0.40)
0.11 (0.38)
-0.03 (0.21) 0.06 (0.53)
0.27 (1.44)
-0.34 (1.13)
Mean (SD)
.86 (1318)
.85 (1261)
.85 (1152)
.37 (1257)
Cronbach’s α
.48 (404)
.23 (339)
.13 (415)
.26 (1260)
.19 (1268)
IAT
.39 (352)
.34 (355)
.34 (1192)
.3 (1200)
BIAT
.21 (349)
.40 (1088)
.49 (1087)
AMP
.11 (1202)
.18 (1205)
EP
.51 (2749)
Thermometer
Items
Thermometer
Items
IAT
BIAT
AMP
EP
NBA Set
Mean (SD)
0.30 (0.40)
0.11 (0.38)
-0.07 (0.2)
0.01 (0.54)
0.17 (1.43)
-0.73 (1.04)
.86 (657)
.85 (670)
.83 (598)
.39 (648)
.79 (1325)
Cronbach’s α
.47 (214)
.31 (168)
.14 (200)
.27 (626)
.2 (631)
IAT
.42 (180)
.34 (191)
.33 (635)
.25 (640)
BIAT
.21 (187)
.4 (558)
.49 (556)
AMP
.09 (624)
.12 (625)
EP
.48 (1422)
Thermometer
Items
.29 (178)
.24 (184)
.32 (549)
.38 (546)
AMP (48 trials) .27 (164)
Thermometer
Items
IAT
BIAT
AMP
EP
Payne’s Set
Mean (SD)
0.30 (0.40)
0.12 (0.37)
0.0 (0.22)
0.11 (0.53)
0.38 (1.44)
0.08 (1.08)
.86
(661)
.84
(591)
.86
(554)
.33
(609)
.73 (1372)
Cronbach’s α
.50 (190)
.14 (171)
.13 (215)
.24 (634)
.24 (637)
IAT
.35 (172)
.36 (164)
.35 (557)
.39 (560)
BIAT
.19 (162)
.39 (530)
.45 (531)
AMP
.13
(578)
.18 (580)
EP
.56 (1327)
Thermometer
Items
Notes. For correlations and internal consistency: the relevant N is in parentheses.
Questionnaire
2.65 (0.98)
.83 (2718)
.24 (1266)
.33 (1196)
.27 (1089)
.09 (1202)
.31 (2740)
.34 (2753)
Questionnaire
2.64 (0.98)
.83 (1408)
.27 (630)
.34 (636)
.25 (558)
.12 (624)
.32 (1419)
.35 (1425)
.19 (548)
Questionnaire
2.66 (0.98)
.84 (1310)
.21 (636)
.33 (560)
.30 (531)
.06 (578)
.3 (1321)
.38 (1328)
Finally, like in the main study, the AMP suffered the most among the indirect measures
from the removal of the 10% most extreme scores. However, perhaps because of its overall
improvement with more trials, without the 10% more extreme scores, the AMP showed
moderately good psychometric qualities in comparison to the poor psychometric qualities of the
AMP in the main study. In summary, the results of the follow-up study generally replicated the
results of the main study and confirmed its conclusions. As could be expected, the results also
suggest that adding trials to the AMP can improve its psychometric qualities.
References
Bar-Anan, Y., Nosek, B.A., & Vianello, M. (2009). The sorting paired features task: A measure
of association strengths. Experimental Psychology, 56, 329-343.
Bar-Anan, Y., & Nosek, B. A. (2012). Reporting intentional rating of the primes predicts
priming effects in the affective misattribution procedure. Personality and Social
Psychology Bulletin, 38, 1193-1207.
Bar-Anan, Y., & Nosek, B. A. (2013). A comparative investigation of seven indirect attitude
measures.
Greenwald, A. G., McGhee, D. E., & Schwartz, J. K. L. (1998). Measuring individual
differences in implicit cognition: The Implicit Association Test. Journal of Personality and
Social Psychology, 74, 1464-1480.
Motyl, M., Schmidt, K., & Nosek, B. A. (2013). Unpublished data.
Nosek, B.A., & Banaji, M.R. (2001). The Go/No-Go Association Task. Social Cognition, 19,
625-666.
Nosek, B. A., Bar-Anan, Y., Sriram, N., & Greenwald, A. G. (2013). Understanding and using
the brief Implicit Association Test: I. Recommended scoring procedures. Unpublished
manuscript.
Payne, B. K., Burkley, M.A., & Stokes, M.B. (2008). Why do implicit and explicit attitude tests
diverge? The role of structural fit. Journal of Personality and Social Psychology, 94, 1631.
Payne, B. K., Cheng, C.M., Govorun, O., & Stewart, B.D. (2005). An inkblot for attitudes:
Affect misattribution as indirect measurement. Journal of Personality and Social
Psychology, 89, 277-293.
Sriram, N., & Greenwald, A.G. (2009). The brief implicit association test. Experimental
Psychology, 56, 283-294.