Researchers often cringe when faced with dichotomous data

Examining Response Categories: Does Minimizing Response Burden Maintain the Instrument?
Anneliese C. Bolland, John M. Bolland, Sara E. Tomek, and Heather M. Moore
The University of Alabama
Abstract
Researchers often cringe when faced with dichotomous data,
especially measured attitudes. Multiple response categories (e.g.,
Likert-type) are often encouraged because they increase response
variability. Yet, instrument developers are also cautioned to not overburden respondents. Researchers have concluded that two-answer
formats are equally reliable as Likert-type response options, while also reducing speed (Dolnicar & Grun, 2007; Komorita, 1963; Percy,
et.al., 1976).
The current study provides additional support for the use of dichotomous response categories, particularly in surveys that require cognitively efficient methods to measure attitudes and beliefs in hard-toreach populations. Here, two scales (Identity Styles and Ego
Strengths) from three consecutive waves (2005, 2006, 2007) of the
Mobile Youth Survey (a multiple cohort longitudinal study of poverty
and adolescent risk conducted in Mobile, AL between 1998 and
2011) were analyzed. MANOVA was used to examine differences between the 4-point scale (collected in 2005), the artificially dichotomized 4-point scale (collected in 2005) and the true dichotomous
scale (collected in 2006), controlling for age differences. Post-hoc
analyses were conducted to determine how age contributed to differences. Internal reliability for each version (4-point, artificially dichotomized, true dichotomous) of each subscale was compared. No differences were found between the artificially dichotomized Identity Styles
scale and the truly dichotomous scale.
As expected, reliability of the 4-point scale was slightly higher than
the reliability of either the artificially dichotomized and truly dichotomous scales, yet reliability was generally consistent and relatively
high in each of the three Identity Styles subscales (values ranged between .56 and .80). Differences were found between the artificially dichotomized Ego Strengths scale and the truly dichotomous scale.
We also found youngest participants most affected by the format
change. Consistency was higher for the older participants. Reliability
was generally consistent and relatively high in each of the eight subscales making up the Ego Strengths scale, regardless of the response format (values ranged between .44 and .69). Generally, consistency is maintained between the four versus two category scales.
Additionally, younger respondents are more affected by response format in the Ego Strengths scale. This result seems counterintuitive,
but younger participants may have increased difficulty maintaining
consistency in four categories of responses.
Introduction
Binary response options are becoming more favored on attitude surveys (Dolnicar & Grun, 2007). Dolnicar and Grun find two-answer formats equally reliable, while also reducing speed. The number of questions in a survey can then be increased. Our goal was to determine
how response format affects responses. Implicit in this question is an
answer to “Does the response format affect the response.” Considerations in these questions are age, development, the scale, and the subscales. What we really want to know is whether it makes practical and
empirical sense to use a dichotomous response on surveys, rather
than a Likert-type scale, particularly one with less than five response
options. This issue is particularly relevant when working with adolescents or individuals with limited cognitive ability because a Likert-type
scale may create more “noise” or be more confusing to the respondent, rather than detecting degrees of difference or direction for the researcher. When working with some populations, brevity is important,
and less complex response options may be best suited for these populations.
There are two primary camps and ample evidence to support both
positions. The question is whether the additional options add information or merely create data noise. What appears clear among the
mix of research findings is that it all depends on your measures and
possibly on the population. When measuring response styles or behavior, more options may be better; but, if you are measuring attitudes
or attributes, fewer categories may work better particularly if the degree of agreement or disagreement is not a question. Scale reliability
appears to be more often unaffected by the number of response categories (see Dolnicar & Grun, 2007), while validity depends largely on
subjective interpretation (see Rossiter, 2002, 2011). Few studies use
repeated measures to compare response format; most simply attempt
to recode Likert-type scales, which becomes difficult depending on
whether there is a clear midpoint. Lee and colleagues (2002) found
cultural differences in response patterns dependent on the number of
choices, which may also be indicative of issues complex populations
may face when responding to surveys. Further, young people and less
educated people may have more difficulty with more response options
or degrees of intensity (e.g., strongly agree, strongly disagree) (Hartley
& MacLean, 2006). The current study provides additional support for
the use of dichotomous response categories, particularly in surveys
that require cognitively efficient methods to measure attitudes and beliefs in hard-to-reach populations.
Mobile Youth Survey
The MYS is a multiple cohort longitudinal study of poverty and adolescent risk conducted in Mobile, AL between 1998 and 2011. Participants are 10-18 year old adolescents, with over 99% of the sample
Black American or mixed race and a mean household income of
$6,276 (Bolland, 2005). Current study data consist of the 2005, 2006,
and 2007 waves of the MYS.
Participants (N = 12,000+)
Adolescents Aged 9.75 to 19.25 Years
99.5% Non-White, Low-Income (M = $5,000/hh)
High-risk with Low IQ (M = 85.3; KBIT II; n = 463)
Design
Multiple Cohort Longitudinal Design
Cluster and Convenience Sampling
Targeted low-income neighborhoods
Intent is 100% response rate from these residents
Proctor-read with Scantron Response Forms
Data
Wave 8 responses – artificially dichotomized Likert-type scale
Strongly Agree and Agree = Agree
Strongly Disagree and Disagree = Disagree
Wave 9 and Wave 10 responses – truly dichotomous scale
Agree vs. Disagree
Age groups
Youngest (10-12 years of age)
Middle (13-15 years of age)
Oldest (16-18 years of age)
Results
No differences were found between the artificially dichotomized Identity Styles scale and the truly dichotomous scale. As expected, reliability of
the 4-point scale was slightly higher than the reliability of either the artificially dichotomized and truly dichotomous scales, yet the reliability was
generally consistent and relatively high in each of the three Identity Styles subscales (values ranged between .56 and .80).
Differences were found between the artificially dichotomized Ego Strengths scale and the truly dichotomous scale, F(1,1288) = 42.063
(p < .001). We also found youngest participants most affected by the format change. Reliability was generally consistent and relatively high in
each of the eight subscales making up the Ego Strengths scale, regardless of the response format (values ranged between .44 and .69).
Table 1 – MANCOVA, Estimated Marginal Means, Identity Styles
Table 4 – Cronbach’s Alpha Over Time, Identity Style Reliability
Table 2 – MANCOVA, Estimated Marginal Means, Ego Strengths
Table 5 – Cronbach’s Alpha Over Time, Ego Strength Reliability
Table 3 – MANCOVA, Estimated Marginal Means, Ego Strengths: Will
Methodology
 MANOVA was
run with age as a covariate, so we could control for
development.
 If there are no statistically significant differences between scales
(artificially dichotomized and truly dichotomous), there is no need
to examine whether a difference is more pronounced between a
particular age group.
 If there are no statistically significant differences between scales
(artificially dichotomized and truly dichotomous), there is no need
to examine the subscales individually or the items individually to
determine whether the participants seemed to have cognitive difficulty with one of the subscales or one or more of the items.
 Examined internal reliability between the different waves.
Conclusions
Generally, consistency is maintained between the four versus two category scales. Additionally, younger respondents are more affected by response format in the Ego Strengths scale. This result seems counterintuitive, but younger participants may have increased difficulty maintaining
consistency in four categories of responses. More research is needed to determine age effects in attitude scales.