Using Icon array as a Visual Aid for Communicating Validity

USING ICON ARRAY AS A VISUAL AID FOR
COMMUNICATING VALIDITY INFORMATION
Don C. Zhang
A Dissertation
Submitted to the Graduate College of Bowling Green
State University in partial fulfillment of
the requirements for the degree of
DOCTOR OF PHILOSOPHY
May 2016
Committee:
Scott Highhouse, Advisor
Priscilla K. Coleman
Graduate Faculty Representative
Richard Anderson
Margaret Brooks
ii
ABSTRACT
Scott Highhouse, Advisor
To promote better decisions in the workplace, organizational researchers must
communicate the value of their scientific findings. Traditional statistics such as the correlation
coefficient are difficult to interpret. Graphical visual aids, such as Icon arrays, have recently
emerged as effective tools for simplifying probabilistic and statistical information. This
dissertation examined the benefits the Icon array in communicating the validity of structured
interviews. People judged the Icon array as more useful than the Binomial Effect Size Display
(BESD) for communicating validity information. People were more engaged with the interactive
visual aid than its static counterpart, and judged the interactive visual aid more useful. Finally,
people performed better on an objective graph comprehension test when presented with an Icon
array than the bar graph. The benefit of graphical displays (Icon array and bar graph), however,
was moderated by individual differences in graph literacy. Bar graph and the BESD were more
useful for people with high (vs. low) graph literacy. The Icon array was equally useful for
people with high and low graph literacy.
iii
To my parents, for their unwavering support
iv
ACKNOWLEDGMENTS
I would like to thank all the friends I made at BGSU for making this five-year journey
beyond tolerable. I would also like to thank my dissertation committee members: Richard
Anderson, Margaret Brooks, and Priscilla Coleman for their invaluable feedback. Finally, I
would like to thank my adviser and mentor, Scott Highhouse, for his support, patience, and
guidance.
v
TABLE OF CONTENTS
Page
INTRODUCTION .................................................................................................................
1
Criterion Validity ......................................................................................................
3
Predictive Validity of Structured Interviews .............................................................
6
Benefits of Structured Interviews ..................................................................
6
Resistance Against Structured Interviews .....................................................
7
Methods for Communicating Validity .......................................................................
8
Alternative Displays of Validity ....................................................................
9
Graphical Visual Aids ................................................................................................
10
Icon Array ......................................................................................................
10
Interactivity ....................................................................................................
13
Individual Differences in Graph Literacy ..................................................................
16
METHOD ..............................................................................................................................
20
Participants ...............................................................................................................
20
Stimulus Material .......................................................................................................
21
Subjective Graph Literacy .........................................................................................
21
Design and Procedure ................................................................................................
22
RESULTS ............................................................................................................................
24
Preliminary Analysis..................................................................................................
24
Subjective Graph Literacy .............................................................................
25
Hypotheses Testing ....................................................................................................
26
Perceived Visual Aid Usefulness ...................................................................
26
vi
Visual Aid Engagement .................................................................................
28
Objective Comprehension Test ......................................................................
28
DISCUSSION ........................................................................................................................
31
Limitations and Future Direction ...............................................................................
36
Practical Implications.................................................................................................
37
Conclusion ................................................................................................................
37
REFERENCES ......................................................................................................................
39
APPENDIX A. JOB SCREENING ITEMS .........................................................................
76
APPENDIX B. INTERVIEW VIGNETTE ..........................................................................
77
APPENDIX C. OBJECTIVE COMPREHENSION TEST ..................................................
78
APPENDIX D. DEPENDENT VARIABLES ......................................................................
79
APPENDIX E. DEMOGRAPHICS ......................................................................................
80
APPENDIX F. INFORMED CONSENT .............................................................................
83
APPENDIX G. HSRB APPROVAL LETTER ....................................................................
84
vii
LIST OF TABLES
Table
Page
1
Example of Taylor-Russell Table ..............................................................................
52
2
Example of Binomial Effect Size Display .................................................................
53
3
Characteristics of Decision-aids that Enhance Comprehension ................................
54
4
Principle Axis Factoring Results for Subjective Graph Literacy...............................
55
5
Means, Standard Deviations, Reliabilities and Inter-correlations of Variables .........
56
6
Principle Axis Factoring Results for the Dependent Variables .................................
57
7
Principle Axis Factoring Results for Revised Dependent Variables .........................
58
8
Summary of ANOVA Results for Perceived Visual Aid Usefulness ........................
59
9
Mean Perceived Usefulness Across Visual Aids .......................................................
60
10
Summary of ANOVA Results for Engagement .........................................................
61
11
Summary of the Ordered Logistic Regression on Number of Correct Answers .......
62
12
Predicted Probabilities of Number of Correct Responses..........................................
63
13
Logistic Regression Analysis of Objective Comprehension Test Questions.............
64
viii
LIST OF FIGURES
Figure
Page
1
Example of Expectancy Chart ...................................................................................
65
2
Example of Icon Array...............................................................................................
66
3
Screenshot of Visual Aid Instructions .......................................................................
67
4
Screenshot of Interactive BESD ................................................................................
68
5
Screenshot of Interactive Bar Graph ..........................................................................
69
6
Screenshot of Interactive Icon Array .........................................................................
70
7
Screenshot of Static BESD ........................................................................................
71
8
Screenshot of Static Bar Graph ..................................................................................
72
9
Screenshot of Static Icon Array .................................................................................
73
10
Histogram of Subjective Graph Literacy Scale .........................................................
74
11
Plot of Means for Perceived Usefulness of Icon Array and Bar Graph .....................
75
1
INTRODUCTION
To promote better decision-making, it is critical that scientists communicate the
significance of their research to relevant stakeholders. Haensly, Lupkowski, & McNamara,
(1987) noted “… the greatest impact of research stems from clearly communicating research
findings to policy makers and practitioners” (p. 63). Scientific findings, however, are typically
technical and difficult for a lay audience to understand. In social sciences, where considerable
research is quantitative and statistical in nature, it can be difficult to communicate the practical
implications of research findings. Although traditional effect size indices such as the correlation
coefficient are the norm for communicating effect sizes in an academic context, they are not
ideal for informing real-world decisions and predictions.
Employee selection research often uses Pearson’s correlation coefficient to describe the
predictive validity of a selection instrument (e.g., structured interviews). Whereas the correlation
coefficient is the standard for communicating predictive validity in the scientific literature, it is
difficult to comprehend for a general population (Brogden, 1946), often misunderstood (Lawshe
& Bolda, 1958), and not informative for decision-making (Beaton & Barone, 1981; Soyer &
Hogarth, 2012). Kuncel and Rigdon (2012) recommended two alternatives to the correlation
coefficient for improving understanding of research findings. The first recommendation was
using alternative effect size indices. The second recommendation was using graphical visual
aids. Some scholars have explored alternative effect size statistics such as the Common
Language Effect Size (CLES) (Brooks, Dalal, & Nolan 2013; McGraw & Wong, 1992). Other
research has examined tabular and graphical displays such as Binomial Effect Size Displays
(BESD) or Taylor-Russell tables (Murphy & Davidshofer, 1988; Rosenthal, 2005); and
2
Expectancy Charts (Cascio, 1977; Lawshe & Bolda, 1958). More recently, researchers have
begun examining non-traditional graphical visual aids such as Icon arrays (D. Zhang, Y. Zhang,
Highhouse, & Brooks, 2014)
The present dissertation extends the current literature on using visual aids to
communicate predictive validity statistics of selection instruments. First, previous research on
communicating statistical effect size information focused on alternative numerical displays (e.g.,
BESD and CLES). Interpreting numerical information is cognitively demanding, especially for
people with low numeracy. This study examined graphical displays, which is more accessible for
a wide population. Second, research using graphical visual aids has examined the benefits of
different user interactivity on learning and decision-making (Cleveland & McGill 1985; Lowe,
2003; Mayer & Chandler, 2001; Zikmund-Fisher et al., 2014), but the effect of user interactivity
in the context of communicating effect sizes remains unexplored. In this study, I examined user
interaction moderate the usefulness of visual aids for enhancing the reader’s comprehension of
validity information. Finally, previous research shows that individual differences such as graph
literacy are related to graph comprehension (Okan, Garcia-Retamero, Cokely, & Maldonado,
2012; Galesic & Garcia-Retamero, 2011), yet no research to date has examined the role of graph
literacy in the context of using an interactive visual aid to communicate effect size information.
The current study also examined the moderating effect of graph literacy on the benefits of
graphical visual aids.
Kuncel and Rigdon (2012) suggested that future research on communication of Industrial
and Organizational Psychology findings should: (1) Explore tools that effectively communicate
the value of evidence-based organizational interventions, and (2) Understand the role of
3
individual differences in the comprehension of decision aids. The present dissertation examined
these two questions.
Criterion Validity
Criterion validity, or predictive validity, is the cornerstone of personnel selection
research. It represents the degree to which a predictor – often an assessment instrument or
technique – predicts a job-related criterion (e.g., job performance) (Borsboom, Mellenbergh, &
van Heerden, 2004). Schmidt and Hunter (1998) emphasized “From the point of view of
practical value, the most important property of a personnel assessment method is predictive
validity: the ability to predict future job performance, job-related learning, and other criteria” (p.
262). Adopting criterion-valid hiring practices can improve work performance, increase
profitability, and decrease counterproductive work behaviors (Arthur, 1994; Huselid, 1995;
Salgado, 2002; Schmidt & Hunter, 1998; Terpstra & Rozell, 1997).
In the scientific literature, the criterion validity of a predictor is most frequently
expressed as the Pearson product-moment correlation coefficient (r), which represents the linear
relationship between two variables (e.g., intelligence and job performance; Brogden, 1946).
Criterion validity can also be expressed incrementally when multiple predictors of a single
criterion are assessed simultaneously and the researcher intends to isolate the validity of a
predictor that has shared variance with others. For instance, Schmidt and Hunter (1998) metaanalytically examined the incremental validity of 18 predictors after accounting for the variance
explained by cognitive ability. Depending on the statistical assumptions and the intended use of
the validity results, reports also contain alternative, but mathematically similar indices to express
validity, such as the coefficient of determination (i.e., percent of variance explained),
4
standardized regression weights, or the slope of the regression line for the predictor-criterion
relationship.
The standard for measuring predictive validity in scientific writing – correlation
coefficient – is difficult for lay readers to comprehend (Cascio, 1977; Lawshe & Bolda, 1958,
Hoffrage, Lindsey, Hertwig, & Gigerenzer, 2000). Lawshe and Bolda (1958) declared, “To
explain the meaning of r to a non-statistician is next to impossible” (p. 353). Rynes, Colbert, and
Brown (2002) surveyed human resource (HR) professionals and found that a majority of them
(82%) are not aware that intelligence is a better predictor of job performance than
conscientiousness despite difference in the meta-analytic validity coefficients (Schmidt &
Hunter, 1998; validity for intelligence = 0.51 vs. 0.31 for conscientiousness). Many also believe
that integrity tests are not valid predictors of behaviors on the job, even though meta-analyses
show that the validity coefficient for integrity tests for counterproductive workplace behaviors is
0.41 (Ones, Viswesvaran, & Schmidt, 1993). In education research, validity of standardized tests
(e.g., SAT) has also been misinterpreted and its utility marginalized (Mattern, Kobrin, Patterson,
Shaw, & Camara, 2009). Mattern et al. (2009) argued: “No matter how good a job one does to
collect, analyze, and present validity evidence, it may fall on deaf ears if the results are not
effectively communicated” (p. 229).
Paradoxically, practitioners value research and cite scientific findings as one of their
main justifications for HR-related decisions. Ryan and Sackett (1987) surveyed 163 individuals
who conduct individual assessments and found that the most common reason for choosing a test
is based on published data (63%). However, as evidence shows, readers looking at the same
results can come to different or misinformed interpretations. When the validity information for
5
the SAT as a college admissions test was published in 2008, the public formed various opposing
beliefs based on the same published validity data (Mattern et al., 2009). Some researchers have
focused on the end-users’ interpretation of validity, rather than the validity statistic itself.
Maciver, Anderson, Costa, and Evers (2014) argued that criterion-related validity alone lacks
context. They maintain that the concept of validity extends beyond the statistical metric;
interpretation of validity information depends on the context. Proper use of criterion-related
validity is contingent on the relation between the user’s interpretation of the test score and the
criterion outcome. Traditional validity statistics, such as those that focus on a variance-based
interpretation, do not communicate the implication of the statistic in an actual hiring scenario.
For example, 15% variance explained in the criterion does not help the user with inferring actual
hiring outcomes such as how many employees hired are expected to succeed or fail.
Due to complexity and ambiguity of statistical validity interpretation, validity research
has arguably had a limited impact on HR-related decisions. Johns (1993) observed that the
adoption of I/O personnel practices in the workplace is “not influenced by technical merit” (p.
46), and that other institutional factors such as organizational politics and government
regulations are often the driving force for the adoption of personnel innovations at work. Guion
(2011) argued that decisions to use a hiring method often have less to do with the test’s
psychometric properties than with the organizational and political culture. In a study of over 53
organizations in both private and public sectors, Wolf and Jenkins (2006) surveyed managers
who were directly responsible for various recruitment and hiring decisions at their company.
They found almost all private sector companies increased use in testing in the past 5 years. And
out of the twelve public sector and non-for-profit companies, nine of them had increased use of
6
testing while the other maintained the same level. The authors also observed, based on semistructured interviews with the managers, that the primary reasons for adopting standardized tests
for hiring are its cost-effectiveness, organizational culture, and legal requirements; predictive
validity had only a subsidiary impact on the decisions to adopt evidence-based hiring practices.
In summary, predictive validity remains one of the most important criteria in evaluating
hiring outcomes in the scientific literature, and yet, it has had limited influence on organizational
change when compared with other organizational, political, and legal factors (Johns, 1993; Wolf
& Jenkins, 2006). One explanation is that many managers are either not aware of the validity
evidence discovered in research or misinterpreting the outcomes (Rynes et al., 2002). Validity
information, as traditionally communicated in scientific literature is often unhelpful for
informing real world decisions because it is difficult to understand. Improving the accessibility
of validity findings to the managers is particularly important in an area where science has not
informed practice at work: the use of structured interview in employee selection.
Predictive Validity of Structured Interviews
One of the most valid and most neglected selection tools is the structured interview
(Buckley, Norris, & Wiese, 2000; Dipboye, 1997; Highhouse, 2008; van der Zee, Bakker, &
Bakker, 2002). The structured interview is typically characterized by the standardization of the
interview process (Levishina, Hartwell, Morgeson, & Campion, 2014), which include the
generation of a pre-selected question list based on a job analysis and a quantitative and uniform
scoring procedure for all interviewees.
Benefits of Structured Interviews. Research has shown that the structured interview is
more predictive of job relevant outcomes than the traditional (i.e. unstructured interview)
7
(Barrick, Patton, & Haugland, 2000; Huffcutt & Authur, 1994; McDaniel, Whetzler, Schmidt, &
Mauer, 1994). A meta-analysis of 245 validity coefficients shows that the structured interview (p
= 0.44) is significantly better than unstructured interviews (p = 0.33) at predicting future job
performance (McDaniel et al., 1994). In a separate meta-analysis, Huffcutt and Arthur (1994)
found that implementing even a small amount of structure to the interview process can improve
the validity of the interview from 0.20 to 0.35; whereas a fully structured interview had a validity
of 0.57. Unstructured interviews may even hinder the effectiveness of other hiring tools. For
instance, Dana, Dawes, and Peterson (2013) found that, combining judgments made with
unstructured interviews with valid predictors (e.g., Grade Point Average) can actually lead to
worse predictions than when unstructured interviews are not administered.
Resistance Against Structured Interviews. Despite what the evidence suggests, many
managers still favor the traditional interview over the structured interview because of subjective
benefits such as the need for autonomy during the interview process (Dipboye, 1997; Nolan &
Highhouse, 2014), the need to exert influence on the applicant (Pfeffer & Lammerding, 1981),
and better applicant reactions (Latham and Finnegan, 1993; Schuler, 1993). Another reason for
the resistance to using structured interviews is the lack of awareness and understanding of its
increased predictive validity over traditional interviews (Priem & Rosenstein, 2000; Rynes,
2009). The lack of awareness can be attributed to factors such as the lack of formal education,
limited exposure to the research literature, and the poorly disseminated research findings in
periodicals (Rynes, 2009). Even when managers are presented with research evidence, they are
often not equipped to interpret the statistical results, and therefore, perceive the findings as
uninteresting, unimportant, and abstract (Bailey & Eastman, 1996; Campbell, Daft, & Hulin
8
1982). Furthermore, managers ignore the findings because they believe that the research does not
apply to their own unique situations (Highhouse, 2008). The lack of awareness in personnel
selection research, combined with the general distrust and discomfort with statistics (Ayres,
2008), impedes managers from integrating research findings from an area that is as quantitatively
oriented as personnel selection (Rynes, 2009). Awareness and understanding can be improved by
finding more effective ways to communicate the utility of personnel selection research findings
(Kuncel & Rigdon, 2012; Rynes, 2002).
Methods for Communicating Validity
When a researcher wants to describe the validity of a particular predictor, he or she
typically turns to one of the aforementioned traditional effect size statistics; the Pearson’s
correlation coefficient being the most common. To ease the interpretation of effect size statistics
for scientists, Cohen (1988) provided general guidelines of what constitutes a “small”,
“medium”, or “large” effect size in r terms. Traditional effect statistics are useful because they
allow researchers to compare their results on a standardized metric. However, the standardization
of effect statistics limits one’s ability to interpret its meaning across diverse real world contexts,
such as the difference between experimental groups or expected number of successful employee
hires. Rosenthal and Rubin (1982) found that even experienced researchers were surprised at the
fact that a correlation of 0.32 translated into an increase in success rate of intervention from 34%
to 66%. Validity estimates communicated with traditional effect size statistics are often
underestimated by the public. For instance, critics of the SAT as a college admissions test have
stated “the SAT only adds 5.4 percent of variance explained by HSGPA alone” (Kidder &
Rosner, 2002, p.193). However, these misunderstandings can easily be amended by changing the
9
presentation of the same information (Bridgeman, Pollack, & Burton, 2004; Brooks et al., 2014;
Davidshofer & Murphy, 2005; Lawshe & Bolda, 1958; McGraw & Wong, 1992; Taylor &
Russell, 1939). In the following paragraphs, I outline several alternatives to traditional statistics
in communicating effect size and validity information.
Alternative Displays of Validity. Instead of a point estimate of the linear relation
between predictor and criterion, criterion validity can be expressed in terms of probabilities or
likelihoods of success based on a candidate’s score on a predictor (e.g., Intelligence test). TaylorRussell tables, for instance, convert the correlation coefficient to the expected probability of
success given one’s standing on a particular predictor, the selection ratio, and success rate
(Murphy & Davidshofer, 1988; Taylor & Russell, 1939). To read the Taylor-Russell table, one
has to determine the selection ratio (the percentage of people hired from the applicant pool) and
the base-rate of success, which is the percentage of the population that would succeed on the job.
Next, based on the validity of the selection tool, as indicated by the correlation coefficient, one
can derive the proportion of successful hires based on the job candidate’s standing on the
predictor (i.e., selection test). For example, in Table 1, if 50% of the candidates are selected
based on a selection test with a validity of r = 0.25, and the population has a base-rate success of
30%; then about 37% of the chosen employees will succeed. Similarly, an expectancy chart
shows how one’s standing on a predictor relates to one’s standing on the criterion (Schrader,
1965; Lawshe & Bolda, 1958). In the expectancy chart (Figure 1), the reader is provided with the
probability of success for a particular candidate given his score. For instance, if John scored
between 16 and 20 on the selection test, the probability of his future success is 80%.
10
The binomial effect size display (BESD) simplifies the correlation coefficient by
presenting the expected success and failure in a 2x2 matrix. In a BESD, the cells of the matrix
are defined by (0.5+r/2)*100 and (0.5–r/2)*100 where r is the validity of the test or intervention
(Rosenthal & Rubin, 1982). For example, in Table 2, one can evaluate the effectiveness of a
Graduate Record Examination (GRE) training program by seeing the probability of
improvement: 65% of the people who took the training improved their GRE score while only
35% of the people who did not take the training showed improvement. Finally, the Common
Language Effect Size describes the difference between two groups (e.g., control vs. intervention
group) with the probability that a random score from one group will differ from the other. For
example, the effectiveness of the GRE program can be described as “there is a 60% chance that a
score from someone who took the GRE training will be better than someone without training.”
Despite the number of alternative data presentation tools available, there are only a few
empirical examinations on their benefits for comprehension. Brooks et al. (2014) examined
several alternatives to traditional effect size displays (e.g., Pearson’s r). They presented
participants with the Common Language Effect Size and the Binomial Effect Size Display of the
effectiveness of a GRE training program, and found that both alternative effect size displays
were judged to be higher in understandability, usefulness, and effectiveness when compared with
traditional effect sizes such as the Pearson’s r and the coefficient of determination. Bridgeman et
al. (2004) used various expectancy charts to present validity information of SAT scores on
college academic outcomes. The “straightforward approach,” as they called it, showed the
percent of students that fall under different bands of college GPA based on SAT scores.
Although expectancy charts have often been employed to present both theoretical and empirical
11
expectancies of predictor-criterion data (e.g., Cascio, 1976; Tiffin & Vincent, 2006;
Yankelevich, 2007), no research has directly examined its usefulness for communication
compared to alternatives presentation methods.
Graphical Visual Aids
Graphical visual aids have a long history in quantitative and scientific education (see
Shah & Hoeffner, 2002 for review). The Joint Committee of Standards for Graphic Presentation
published a list of guidelines for using graphics in presenting quantitative data in 1915. These
guidelines have led to a stream of research examining how to best use graphical visual aids in a
variety of contexts such as education, decision-making, and communication (Bettman & Zins,
1979; Boucheix & Guignard, 2005; Carter, 1947).
There are several benefits of using graphical visual aids in communicating complex
numerical information. First, graphical visual aids take advantage of people’s automatic visual
perception abilities (Cleveland & McGill, 1985), which improves memorability (Denis, 1984;
Levie & Lentz, 1982) and understandability of quantitative information (MacDonald-Ross, 1977;
Winn, 1987). Second, graphical visual aids convey more information than through quantitative
description (i.e., numbers and statistics) alone (Lewandowsky & Spence, 1989). Visual aids have
been shown to improve understanding and decision-making in education, finance, and medicine
(Ancker et al., 2006; Garcia-Retamero et al., 2012; Shah & Hoeffner, 2002; Volkov & Laing,
2012).
Icon Array. One type of graphical visual aid that has received modest attention in
medical decision-making research is the Icon arrays. Icon arrays are “graphical representations
consisting of a number of stick figures, faces, circles, or other icons symbolizing individuals…”
12
(Galesic et al., 2009, p. 210) Figure 2 shows an example Icon array that communicates the
effectiveness of a medical treatment. As you can see, the array contains 100 icons, each
representing a single person in a sample. The sample can be hypothetic or empirical. The icons
are separated by color. In the example, green icons represented individuals who are cured by the
medication, red icons represent the uncured, while the gray icons represent the untreated. Icon
arrays have been used in communicating the benefits and risks regarding health and medical
treatments (e.g., Fagerlin, Wang, & Ubel 2005; Feldman-Stewart, Kocovsky, McConnell,
Brundage, & Mackillop, 2000; Garcia-Retamero, Galesic, & Gigerenzer, 2010). Research has
shown that Icon arrays can improve understandability of risk information and help people make
more informed decisions (Garcia-Retamero et al., 2012; Galesic & Garcia-Retamero, 2011). For
instance, Garcia-Rematero and Hoffrage (2013) found that Icon arrays improved diagnostic
judgments for both doctor and patients. Furthermore, Garcia-Rematero et al. (2010) showed that
Icon arrays reduced cognitive biases such as base-rate neglect, which lead to more accurate
judgments of risk.
Icon arrays provide several unique advantages over the previously studied displays of
effect size. First, Icon array is the most salient graphical representation of frequency information.
This is because Icon arrays use individual icons to represent each frequency, therefore, bringing
attention to the discrete properties of the data. Research has shown that when it comes to making
probabilistic inferences, frequency representations are generally easier to understand than
probabilities and more useful (Gigerenzer & Hoffrage, 1995; Hoffrage et al., 2000). Although
the BESD and Expectancy Charts are also capable of representing frequency information, they
are usually used in displaying proportions or probabilities when used in communicating effect
13
size information (e.g., Brooks et al., 2014). Second, Icon array is the only visual aid that can use
different shapes to communicate information. Icon array can represent individual candidates with
silhouettes of people. This gives Icon array the advantage of having higher iconicity than the
other graphical representations. Iconicity is the degree to which symbols used in a graph
represents its real life counterpart (Gaissmaier et al., 2012). Icons can also represent categories.
For example, a happy face can represent a successful employee and an unhappy face can
represent an unsuccessful employee. Finally, Icon array can overcome base-rate neglect, which is
a cognitive bias that interferes with interpreting and understanding probabilistic information
(Bar-Hillel, 1980). The Icon array ameliorates base-rate neglect because it emphasizes the
denominator of a faction with the size of the array, and in turn, calls attention to the denominator.
Table 3 summarizes the characteristics of different representations for effect size that enhances
comprehension. For all these reasons, I hypothesize that the Icon array will be perceived as more
useful for communicating the validity of structured interviews and easier to understand than both
the bar graph and the BESD.
Hypothesis 1.1: Managers will rate the Icon array as more useful than the bar
graph and the BESD for communicating the validity of structured interviews in
hiring.
Hypothesis 1.2: Managers will rate the Icon array as easier to comprehend than
the bar graph and the BESD for communicating the validity of structured
interviews in hiring.
Interactivity. With increased access to computers in the current digital age, more and
more visual aids are becoming computer-based, which allows graph makers to incorporate
14
animation and user interactivity (Lowe, 2003). Some scholars have also recommended
implementing interaction and animation in visual aids to communicate probabilistic information
(Spiegelhalter, Pearson, & Short, 2011). Animation and interactivity are interrelated concepts in
visual displays. Visual aids that are interactive are necessarily animated in some respect because
interactions involve the manipulation of the visual elements, and control over change in the
presentation.
Animations and interactivity each have their advantages. Boucheix and Guignard (2005)
argued that animated presentations alone are more interactive than static visual aids. Interactivity
refers to giving user control over the words or pictures (Mayer & Chandler, 2001). Aesthetically,
animated graphs are more interesting and attractive, which could enhance the engagement of the
user (Ancker, Weber, & Kukafka, 2011; Perez & White 1985, Rieber, 1990). Animations are
ideal for displaying complex concepts and for communicating changes or trends over time
(Morrison, Tversky, & Bertrancourt, 2000). When used properly, animations can reduce the
user’s cognitive load, which leads to better learning outcomes (Mayer & Chandler, 2001).
Boucheix and Guignard (2005) found that animation and user interactivity both improved
performance on the comprehension of a technical document. Mayer and Chandler (2001) found
that modest amount of interactivity improved deep learning of scientific concepts. However,
Gonzalez (1996) cautioned the benefits of animation to decision-making are contingent on the
properties of the design such as transition smoothness, realism, and interactivity style. Interactive
and animated graphs may overwhelm the users by imposing excessive information that could
overload one’s cognitive resources (Lowe, 2003; Morrison & Tversky, 2000). In order for
15
animation and interactivity to be useful, they must be theory-based and not detract from what is
important in the graphs (Mayer & Chandler, 2001).
So far, only one study has implemented user-interactivity to Icon array displays (Ancker
et al., 2011). Their study examined user interactivity in Icon arrays for risk judgments. The
authors created a game-like task where the user clicked on masked squares to reveal the color of
the icon. Over time, the user learned the risk proportions, as the squares are unmasked. They
found that the interactive visual aid did not significantly affect one’s perception of risk or
perceived usefulness of the visual aid. They also found that users with low familiarity with
computers were more confused by the interactive visual aids than its static counterparts. One
limitation of this study is the complexity and novelty of the user interaction. Clicking squares in
a game-like manner is a very specific type of interaction that is unique only to a small set of
computer tasks, and is not one that most computer users are accustomed to.
The second limitation of the Ancker et al. (2011) study design is that the implementation
of user interactivity is confounded with the information presentation mode. In the interactive
visual aids, users learned the risk proportions over time – by clicking on individual icons – rather
than at once by looking at a single complete array. Previous research has shown that probability
judgments differ depending on if the underlying distribution is learned over time (decision from
experience) or at once (decision from description) (Hau, Pleskac, Kiefer, & Hertwig, 2008).
Therefore, it is unclear if incorporating a simpler interaction, while maintaining the information
gathering process will improve the comprehension of the interactive visual aid.
The education literature has examined the effectiveness of interactive and animated
visual aids extensively, but decision-making research has only focused on static visual aids (e.g.,
16
Brooks et al., 2014; Garcia-Retamero & Dhami, 2011, Hess, Visschers, & Siegrist, 2011).
Interactive visual aids are just a small part of a much larger body of scholarship: human
computer interaction (HCI) (Preece, Rogers, Sharp, Benyon, Holland, & Carey, 1994). Because
of the broad nature of the term “interaction”, for the purpose of this study, I constrain the
interactive component of a visual aid by defining it as giving the user control over the basic
appearance of graphical elements (e.g., labels) and the delivery of information. Implementation
of interaction will be elaborated in the methods section. I hypothesize that interactive visual aids
will improve user engagement with the decision-aid, and overall comprehension of the data. I
also hypothesize that people will judge the interactive visual as more useful for communicating
the validity information than the static counterparts.
Hypothesis 2.1: Managers will rate interactive visual as more useful than the
static counterparts of the respective graphs for communicating the validity of
structured interviews.
Hypothesis 2.2: Managers will rate interactive visual as more engaging than the
static counterparts of the respective graphs for communicating the validity of
structured interviews.
Hypothesis 2.3: Managers will rate interactive visual as easier to comprehend than
the static counterparts of the respective graphs for communicating the validity of
structured interviews.
Individual Differences in Graph Literacy
More recently, researchers have begun examining individual differences in the ability to
comprehend graphical information. Two main factors influence the comprehension of graphical
17
information. The first is content knowledge and the second is graph literacy (Shah & Hoeffner,
2002). Content knowledge is related to one’s interpretation of graphical data. People are more
likely to infer relations and trends in familiar than unfamiliar contexts. Lord et al. (1979) also
found that when information presented in graphs is inconsistent with one’s prior experience,
viewers are more likely to make systematic errors in judgments. Finally, expertise in the content
area also allows the user to make more meaningful interpretations of the data (Chase & Simon,
1976, Egan & Schwartz, 1992).
The second factor that influences graph comprehension is graph literacy, which is the
ability to comprehend graphically presented information (Galesic & Garcia-Retamero, 2011).
Research has shown that graphs are not equally effective communication tools for everyone.
Expert graph viewers are more capable of extracting abstract information from graphs, and less
likely to neglect important elements of a graph (Shah & Hoeffner, 2002; Shah & Freedman,
2011). Graph comprehension also takes less cognitive effort when the user is familiar with the
content or is proficient at reading graphs (Kosslyn, 1985). Finally, expert graph readers have
better memory for graphical displays because they are able to group graphical elements in
meaningful ways (Egan & Schwartz, 1992). Okan et al. (2012) found that visual aids (e.g., Icon
array) improved risk comprehension more for the individual with high graph literacy than low.
Given the importance of graph literacy in interpreting information presented in graphs, I
hypothesize that the benefits of graphical displays (Bar graph and Icon array) will be more
beneficial for people with high graph literacy than those with low.
Hypothesis 3.1: There will be an interaction between graph literacy and visual aid
type, such that graphical displays (Icon array and bar graph) will have a greater
18
effect on perceived usefulness of the visual aid over a non-graphical display
(BESD) for managers with high than low graph literacy.
Hypothesis 3.2: There will be an interaction between graph literacy and visual aid
type, such that graphical displays (Icon array and bar graph) will have a greater
effect on perceived comprehension of the visual aid over a non-graphical display
(BESD) for managers with high than low graph literacy.
Previous researchers have cautioned the possible disadvantages of animation and
interactivity (Mayer & Chandler 2001; Morrison & Tversky, 2001). Too many simultaneous
graphical elements can overload one’s cognitive capacity and hinder comprehension (Chandler
& Sweller, 1991; Tindall-Ford, Chandler & Sweller, 1997). The potential cognitive overload
caused by extra graphical elements may be less taxing for individuals with high graph literacy
because they are already proficient at processing basic graphical elements, and therefore, have
the extra cognitive capacity to incorporate additional graphical elements. This suggests that the
addition of interactivity may benefit individuals with high graph literacy more because they have
more cognitive resources available to for engaging and processing the interactive component of
the visual aid.
Hypothesis 4.1: There will be an interaction between graph literacy and user
interactivity, such that interactivity will have a greater effect on perceived
usefulness for managers with high graph literacy than with low.
Hypothesis 4.2: There will be an interaction between graph literacy and user
interactivity, such that interactivity will have a greater effect on perceived
comprehension for managers with high graph literacy than with low.
19
Hypothesis 4.3: There will be an interaction between graph literacy and user
interactivity, such that interactivity will have a greater effect on perceived
engagement for managers with high graph literacy than with low.
Whereas the previous study on communicating validity information with Icon arrays
were administered to a lay audience (Zhang et al., 2014), the present study will narrow the target
population to managers with experience in hiring. There are both theoretical and methodological
advantages to surveying people with context experience. First, doing so would improve the
external validity of the study. The purpose of using visual aids to simplify validity information is
to help people make better decisions with regard to choosing interview methods. Therefore,
surveying those who are in the position of making real world hiring decisions would maximize
the external validity of the study. Secondly, content knowledge is related to graph interpretation
(Shah & Hoeffner, 2002). People with low content knowledge will have to expend additional
cognitive resource to process the non-numerical information whereas people with high content
knowledge are already familiar with concepts such as job interviews and job performance.
Content familiarity also affects people’s interpretation of the graphs. People tend to be better at
inferring relations in familiar contexts than unfamiliar ones; and in situations where their prior
knowledge aligns with the information presented in the graph. A lay population are usually not
familiar with job interviews, and therefore, may process the information presented in the graphs
differently than those who has experience.
20
METHOD
Participants
Data were collected on Amazon Mechanical Turk (MTurk). MTurk is a crowdsourcing
service where people participate in online tasks for modest pay. Past research has demonstrated
that the MTurk population generalizes well to a general adult population and that the service is a
valuable platform for conducting workplace-related experimental research (Highhouse & Zhang,
2015, Paolacci, Chandler, & Iperirotis, 2010). Each participant received 75 cents for completing
the survey, which took approximately 10 minutes.
Multiple steps were taken to ensure that the sample included managers with experience in
conducting job interviews. First, participants who were interested in the survey completed short
screening questions that asked them to indicate their current employment status and employment
industry. Participants who were unemployed were excluded from the survey. Next, participants
saw a list of common work tasks across many occupations (e.g., interact with customers, data
analysis, manual labor, etc.) (Appendix A). These tasks were modified based on a sampling of
major job groups and job tasks listed on O*Net (Onetonline.org). Participants were instructed to
indicate up to five tasks that they most frequently engage in at work. I excluded participants who
did not include either “recruiting/interviewing” or “management” as one of their primary tasks.
Because the survey restricts the same participant from retaking the survey, it discourages
participants from faking by taking the survey multiple times until they fulfilled our selection
criteria. Moreover, the large number of possible responses greatly reduces the likelihood of
participants figuring out the exclusion criteria by chance. Finally, at the end of the study,
participants were asked again if they are currently in a managerial position and their previous
21
experience with interviewing job candidates. I excluded participants who did not have
interviewing experience or those who were not in a managerial position. This entire process
yielded 329 completed surveys. Twenty-four participants were removed for missing any one of
the two attention check questions (e.g., “If you are still paying attention, please respond with
strongly agree”). The final sample of the study had 305 employees who were either in
managerial positions or had interviewing experience (52% male, Mean age = 37, SD = 10, 80%
Caucasian). Participants held occupations across a wide range of industries, the most popular
being: retail trade (11%), health care and social assistance (10%), and professional, scientific and
technical services (10%).
Stimulus Material
I created the graphical visual stimuli on Infogr.am (www.infogr.am). Infogr.am is an
online web service for creating custom charts. In the present study, the interactive visual aids are
ones where the user can compare the validities between different hiring methods (random,
traditional interview, and structured interview) by clicking on the radio buttons on the survey.
The visual aid on the screen only displays the validity information for one hiring method at a
time. The user may take as much time as needed to process the information before clicking on
the radio buttons to see a different hiring method. Moreover, the user may also go back and
review charts that he or she has already seen. There is no constraint on the time spent on each
chart or the order in which the charts are displayed. Users were free to engage with the charts in
any order they preferred until they were satisfied with the information. Figure 4 – 9 are
screenshots the static and interactive visual aids.
Subjective Graph Literacy
22
The subjective graph literacy scale was developed for this study. Some of the items are
modified based on the subjective numeracy scale (Fagerlin et al. 2007), which is a self-report
measure for one’s numerical abilities. A sample item is “I am good at creating graphs or charts of
numerical information.” Participants responded to the questionnaire using a 5-point response
scale (1= strongly disagree to 5= strongly agree). The items of the present scale are presented in
Table 4. Because the scale was developed specifically for this study, I conducted a principal axis
factoring with the Oblimin rotation to assess its psychometric properties. A parallel analysis
recommended a single component solution. The single component solution explained 37% of the
total variance with an eigenvalue of 2.93. All measures reached the minimal factor loading
required for retaining the item (Tabachnik & Fidel, 2001). The scale also had acceptable internal
consistency (α= 0.81).
Design and Procedure
This study aimed to improve how statistical validity is communicated to managers. As
such, I examined three different visual aids for communicating validity information: Binomial
effect size display (BESD), bar graph, and Icon array. All three visual aids presented the same
statistical validity information for traditional interviews and structured interviews. I also
examined the benefits of user-interactive visual aids: each visual aid was either static or
interactive. In order to determine the effects of each visual aid and user-interactivity on the
hypothesized outcomes (perceived comprehension, perceived usefulness, engagement), I used a
randomized control trial design where participants were randomly assigned see validity
presented with one of the three visual aids and either a static or interactive version of the visual
aid.
23
The proposed study used a 3 (Visual Aid: BESD vs. Bar graph vs. Icon array) x 2
(Interactivity: Static vs. Interactive) between-subjects design. First, participants read a short
vignette describing the decision scenario where they were asked to assume the role of manager.
As a hypothetical manager, they had to choose between a traditional interview or structured
interview for their company. (Appendix B). Following the vignette, the participants were
randomly assigned to one of six conditions, each associated with a different graphical visual aid.
Participants read a short description of the visual aid, which will remain the same for all six
conditions. Participants had the opportunity to thoroughly examine the graph before proceeding
to the next page (Figure 3). Participants also answered four objective comprehension questions
while reviewing the graphs (Appendix C).
Next, participants continued to the dependent variables page where they responded to
questions regarding their attitudes toward the visual aid. Dependent variables are presented in
Appendix D. Following completing the dependent variables questions, participants completed
the subjective graph literacy scale. Finally, Participants provided basic demographic information
along with their previous experience with making hiring decisions and computer use (Appendix
E). I tested participants’ attentiveness with two attention-check questions (e.g. “Please respond to
this question with ‘strongly disagree’”).
24
RESULTS
Preliminary Analysis
Means, standard deviations, item intercorrelations and standardized Cronbach Alphas of
the study’s variables are presented in Table 5. Given that the measures used were developed
specifically for this study and high correlation between perceived usefulness and perceived
comprehension measures (r = 0.70), I first conducted an exploratory factor analysis on the study
variables to examine its factor structure. I used principle axis factoring with the Oblimin rotation.
Although the measures aimed to assess three constructs: perceived comprehension, perceived
usefulness, and engagement, parallel analysis retained two components with eigenvalues greater
than one. The first component (eigenvalue = 5.51) accounted for 40% of the total variance. The
second component (eigenvalue = 1.26) accounted for 21% of the total variance. Pattern matrix
from the two-factor solution revealed that the four items for perceived usefulness and two items
from perceived comprehension strongly loaded onto the same factor while the three items for
perceived engagement loaded onto a separate factor (Table 6). The reverse-coded item from the
perceived comprehension measure did not reach the threshold for factor loading (0.32)
recommended to retaining the item (Tabachnik & Fidel, 2001).
I conducted a follow-up principle axis factoring analysis with the Oblimin rotation
without the negatively worded item. The parallel analysis still recommended a two-component
solution. The first component (eigenvalue = 5.30) explained 44% of the variance and included
the six items intended to assess both perceived usefulness and perceived comprehension. The
second component (eigenvalue = 1.26) explained 23% of the variance and included the three
items intended to assess engagement. Table 7 shows the standardized factor loadings of the items
25
in the two-factor solution. Given that the items for both perceived usefulness and perceived
comprehension loaded highly with the same factor, the six items were combined to a single
measure of perceived visual aid usefulness. The negatively worded item was removed from the
analysis.
Finally, interview experience was not significantly correlated with any of the DVs,
whereas age was significantly correlated with perceived usefulness (r = 0.17) but not
engagement (r = 0.06). Computer experience was significantly correlated with engagement (r =
0.14) but not perceived usefulness (r = 0.09). There was no sex difference in any of the study’s
variables. Controlling for age or computer experience as covariates did not change the results of
the test of study’s hypotheses. Therefore, they were excluded from the report of the ANOVA for
simplicity.
Subjective Graph Literacy. Subjective graph literacy was measured on an interval
scale. Given that the independent variables in the current study are categorical and contain more
than two nominal groups, the interpretation of the interactive effects between the categorical IV
and a continuous IV is more difficult in a multiple regression analysis. Thus, graph literacy was
dichotomized to improve the interpretability of the results. Given the data are negatively skewed
(Figure 10), the most natural cutoff point in the data is the median. A median split was conducted
to separate the participants’ scores on graph literacy to high and low. Subsequent tests of the
hypothesized effects will be reported as ANOVAs where graph literacy is treated as a
dichotomous factor. There are several shortcomings to artificially dichotomizing a continuous
variable. First, it reduces the variability of the results; second, it reduces the statistical power of
the analysis; third, the cut-off values are often subjective (Irwin & McClelland, 2003). Therefore,
26
to ensure that the dichotomizing of subjective graph literacy measure did not reduce the power to
detect the hypothesized effects, I also tested the study’s hypotheses with multiple regression,
while keeping subjective graph literacy as a continuous variable. The results from the multiple
regression were congruent with those found in the ANOVAs.
Hypotheses Testing
Perceived Visual Aid Usefulness. Summary of the Analysis of Variance is shown in
Table 8. There was a significant main effect of the visual aid type on participants’ perceived
visual aid usefulness, F(2,287) = 4.48, p < 0.05, η2 = 0.02. The Icon array was judged to be the
most useful (M = 4.37, SD = 0.57), followed by the bar graph (M = 4.28, SD = 0.72) and then the
BESD (M = 4.10, SD = 0.75). Tukey’s post-hoc comparison tests revealed a significant
difference between the Icon array and the BESD (Hedge’s g = 0.40, p < 0.01) but not between
the Icon array and the bar graph (Hedge’s g = 0.15, p = 0.62) nor between the bar graph and the
BESD (Hedge’s g = 0.23 p = 0.10). Therefore, Hypothesis 1.1 was partially supported. I found
support for Hypothesis 2.1, as there was a significant main effect of interactivity on perceived
visual aid usefulness, F(1,287) = 8.06, p<0.01, η2 = 0.02. People judged the interactive visual
aids as more useful (M = 4.37, SD = 0.64) than the static visual aids (M = 4.15, SD = 0.71,
Hedge’s g = 0.32). There was also a main effect of subjective graph literacy, F(1,287) = 39.67, p
< 0.01, η2 = 0.14. People who scored above the median judged the visual aid to be more useful
(M = 4.46, SD = 0.65) than people who scored below the median (M = 3.96, SD = 0.73, Hedge’s
g = 0.73). However, there was no significant interaction between user interactivity and subjective
graph literacy on perceived usefulness, F(1,287) = 0.44, p = 0.51. Therefore, Hypotheses 4.1 and
4.2 were not supported.
27
There was a significant interaction between the visual aid condition and subjective graph
literacy, F(2,287) = 5.29, p < 0.01, η2 = 0.04. As recommended by Rosnow and Rosenthal
(1991), I calculated individual cell means to better illustrate the interactive effects between
subject graph literacy and visual aid type (Table 9). Furthermore, pair-wise comparisons were
made for perceived usefulness of each visual aid between people with high and low subjective
graph literacy. To control for inflated family-wise error in the multiple comparisons, the Type-I
error rate was adjusted based on the number of comparisons (3) using the Bonferroni correction
(Dunn, 1961), resulting in an adjusted Type-I error rate of 0.017. There was no significant
difference in the perceived usefulness of the BESD between people with high subjective graph
literacy (M = 4.19, SD = 0.77) and people with low graph literacy (M = 4.01, SD = 0.74,
Hedge’s g = 0.24), t(86) = 1.12, p = 0.27. However, there was a difference in the perceived
usefulness of the bar graph between people with high subjective graph literacy (M = 4.57, SD =
0.48) and people with low (M = 3.81, SD = 0.78, Hedge’s g = 1.23), t(55) = 5.73, p < 0.01. There
was also a difference in the perceived usefulness of the Icon array between people with high
subjective graph literacy (M = 4.52, SD = 0.56) and people with low (M = 4.10, SD = 0.48,
Hedge’s g = 0.78), t(86) = 4.10, p < 0.01. These results suggest that subjective graph literacy
played a role in people’s attitudes toward the two graphical displays (Bar graph vs. Icon array)
but not the table (BESD).
To better understand the role of subjective graph literacy on the perceived usefulness of
graphical displays, I conducted a follow-up ANOVA on the two graphical displays only (Bar
graph vs. Icon array). There was no significant main effect of visual aid type on perceived
usefulness, F(1,202) = 0.99, p = 0.32, η2 = 0.00. There was, however, a main effect of subjective
28
graph literacy, F(1,202) = 57.4, p < 0.01, η2 = 0.28. There was also a significant interaction
between graph type and subjective graph literacy, F(1,202) = 4.44, p < 0.05, η2 = 0.03 (see
Figure 11). People with low graph literacy perceived the Icon array to be more useful than the
bar graph t(63) = 2.17, p < 0.05, Hedge’s g = 0.43, whereas people with high graph literacy did
not perceive the two graphical visual aids to differ in usefulness, t(130) = 0.55, p = 0.58, Hedge’s
g = 0.11. These results suggest that people with high subjective graph literacy perceive both the
bar graph and the Icon array as equally useful, whereas people who with low graph literacy
perceive the Icon array to be more useful than the bar graph.
Visual Aid Engagement. Summary of the Analysis of Variance is shown in Table 10.
There was a main effect of user interactivity on people’s engagement with the visual aid,
F(1,281) = 9.85, p < 0.01, η2 = 0.04. People who saw the interactive visual aid judged the visual
aid as more engaging (M = 4.24, SD = 0.84) than those who saw a static visual aid (M = 3.97, SD
= 0.72 Hedge’s g = 0.36). Therefore, Hypothesis 2.2 was supported. There was also a significant
main effect of subjective graph literacy on visual aid engagement, F(1,281) = 23.00, p < 0.01, η2
= 0.08. People who scored above the median on the subjective graph literacy scale rated the
visual aid as more engaging (M = 4.28, SD = 0.80) than those who scored below the median (M
= 3.85, SD = 0.74, Hedge’s g = 0.55). However, there was no significant effect of visual aid type
on engagement, F(2,281) = 1.37, p = 0.25, η2 = 0.01, nor was there a significant interaction
between user interactivity and subjective graph literacy, F(1,281) = 0.98, p = 0.32. Therefore,
Hypothesis 4.3 was not supported.
Objective Comprehension Test. I computed the total number of correct answers to the
objective comprehension questions for each subject. The score range from zero to four. The four
29
objective comprehension questions varied in difficulty. The proportion of correct responses for
questions 1 through 4 were 97%, 94%, 91%, and 78% respectively, which indicates that the
questions were too easy and there is a severe ceiling effect. Given that the variable is ordinal
data, and the distribution is highly skewed, I used ordered logistic regression to analyze the data.
Ordered logistic regression is a variant of logistic regression, but it allows for more than two
response options (Hardin, James, Hilbe, & Joseph, 2007). It is also more liberal in its statistical
assumptions such as normality of the variables and homogeneity of variances. To conduct an
ordered logistic regression with categorical predictors with more than two levels, I created an
additional k-1 number of dummy coded variables such that one level of the independent variable
is used as a reference point for which the other two levels are compared to. In this case, because
the test on objective comprehension was not planned, I did not have theoretical rationale for the
reference level. Therefore, I used an empirical approach where I first calculated the mean
number of correct answers for the BESD, Bar graph, and Icon array levels. The mean correct
answers for the three levels were 3.62, 3.52, and 3.64 respectively, which indicates that people
who saw the bar graph had the lowest average score and people who saw the BESD and Icon
array both scored higher. Therefore, I used the bar graph condition as a reference point. Table 11
contains the summary of the ordered logistic regression analysis. The model likelihood ratio test
showed that the logistic model was a good fit for the data, χ(4) = 11.24, p < 0.05. Ordered
logistic regressions can be interpreted the same way as a logistic regression. The exponentiated B
represents the odds ratio of the change from one category of the DV to the next for every unit
increase in the predictor. As shown in the table, there is a marginally significant effect of the
Icon array variable on the number of correct responses, Exp(B) = 1.77, Wald’s z = 1.85, p = 0.06.
30
Participants who saw the Icon array were 1.77 times more likely to increase their score than
those who saw the bar graph. There was not a significant effect of BESD. Alternatively, one can
interpret the results by examining the predicted probabilities table. Table 12 shows the predicted
probabilities of a participant’s score based on his standing on each of the three visual aid
conditions. Participants in the Icon array condition were more likely to score a perfect score
(75.3%) than those in the bar graph condition (63.2%).
Separate analyses for each questions showed that the effect was primarily driven by the
difference in performance on question four of the objective comprehension test. Logistic
regression showed that people who saw the Icon array were 3.15 times more likely to answer
question four correctly than people who saw the bar graph, Exp(B) = 3.07, Wald’s z = 3.15, p <
0.01. There was a marginal difference in performance between people who saw the bar graph
and the BESD. People who saw the BESD were 1.77 times more likely to answer question four
correctly than people who saw the bar graph, Exp(B) = 1.71, Wald’s z = 1.61, p = 0.10. Table 13
shows the summary of the logistic regression analysis.
31
DISCUSSION
Effective communication of statistical information is paramount for good managerial
decision-making (Brooks et al., 2014; Kuncel & Rigdon, 2012). The science-practitioner gap in
personnel selection can be partly attributed to the lack of awareness or understanding of the
validity evidence from academic research (Colbert, Rynes, & Brown, 2005; Rynes, Bartunek, &
Daft, 2001). The structured interview, specifically, is underutilized and undervalued by managers
because its benefits are not well communicated or understood by managers (Roulin & Bangerter,
2012; Van der Zee et al., 2002).
A growing body of research has shown that “non-traditional” presentations of numerical
information, such as the Binomial Effect Size Display and Icon array are more effective for
communicating complicated statistical and probabilistic information to non-experts (Ancker et
al., 2010; Brooks et al., 2014; Garcia-Retamero et al., 2013). Given the lack of awareness of the
benefits of evidence-based hiring practices by some of the non-academic population, there is an
emerging need to improve how those benefits are communicated. Furthermore, with the
prevalence of technology in people’s everyday lives, research should examine how digital
platforms (E.g., computers and mobile devices) can be integrated in communicating statistical
information. This dissertation accomplishes these goals by examining the benefits of userinteractive graphical displays on the comprehension of statistical validity information.
As hypothesized, managers judged the Icon array as more useful than the BESD for
communicating the benefits of structured interviews. However, there was no significant
difference in perceived usefulness between the Icon array and the bar graph. These results
suggest that, although all three visual aids presented the same type of information (comparing
32
expected outcomes of various hiring aids), the physical presentation of that information does
influence people’s perception of the visual aid’s usefulness; there seems to be an advantage of
using graphical displays over numerical only displays. This is consistent with research on how
graphs are used to communicate difficult quantitative and probabilistic information in risk
communication and quantitative education (Gaissmaier et al., 2012; Garcia-Retamero & Cokely,
2013).
The perceived usefulness of the visual aids was moderated by individual differences in
subjective graph literacy. As expected, subjective graph literacy influenced people’s perceived
usefulness of both graphical displays (Bar graph and Icon array) but not the tabular display
(BESD). These results are consistent with the theoretical rationale behind the graph literacy
construct and other empirical findings. Since subjective graph literacy measures the individual
differences in the ability to extract and interpret information from a graphical display, it is
expected to have an effect on the perceived usefulness of the graphs but not tabular displays.
Previous research has also found that graphical visual aids are more useful for people with higher
graph literacy (Okan et al., 2011). These results also suggest that for non-expert graph readers,
graphical displays are as useful as BESDs.
More importantly, there was a difference in the role of graph literacy for the two
graphical displays (bar graph vs. Icon array). People with low graph literacy judged the Icon
array to be more useful than the bar graph while people with high graph literacy judged both
graphical displays as equally useful. These results suggest that the Icon array might not require
the same graph processing skills as a bar graph, which makes Icon arrays useful even for people
with low graph literacy. The benefits of the Icon array may be attributed to its design differences
33
from the bar graph. The design differences are central to the three components of graph
comprehension (Cleveland, 1993; Pinker, 1990; Shah, Mayer, & Hegarty, 1999). First, viewers
must identify key visual features (e.g., bars or icons), next viewers must relate those visual
features to the conceptual relationships depicted in the graphs (expected outcomes in hiring
across interview methods), and finally the viewer must identify the concepts being quantified in
the graph (e.g., applicants). In an Icon array, the key features of the graph (human silhouettes)
are also representations of the concepts being represented in the graph (applicants). This feature
removes the need for graph viewers to identify the important abstract visual features and
associate them to a real world concept, which is the central component graph comprehension.
Individual human-shaped icons also automatically evoke the viewers’ association to people,
whereas in a bar graph, the association is not as direct; the viewer has to refer to additional labels
on the axis to infer the meaning of the bars. In other words, the design of Icon arrays removes
some of the barriers to the graph comprehension process, making it easier for even novice graph
readers to understand its meaning.
Visual aid type also affected people’s scores on the objective graph comprehension test.
People who saw the Icon array and the BESD were more likely to make the correct objective
inferences of the data than people who saw the bar graph. In other words, even though both
graphical displays (Icon array and bar graph) were perceived as equally useful and easy to
understand, they differ when the reader have to make objective inferences of the graphical
information. Moreover, even though the BESD was perceived as less useful than the bar graph,
people actually performed marginally better on the objective comprehension test when they saw
34
the BESD than the bar graph. These results suggest that the benefits of the different displays may
be contingent on the nature of the criteria and problem-solving task.
The differential effects of visual aid type on the dependent variables can be attributed to
the proximity compatibility principle (Carswell, 1992; Carswell & Wickens, 1987), which states
that the degree to which a graphical display is useful depends on the nature of the task it intends
to solve. In graphical representation of numerical information, there is a trade-off between the
ability to accurately perceive precise numerical values and the ability to infer the gist information
presented in the data (Shah & Hoeffner, 2002). Tabular displays of numbers, such as the BESD,
are best at presenting single point estimates but does not provide integrative gist information of
the pattern in the data (Guthrie, Weber, & Kimmerly, 1993). On the other hand, bar graphs
emphasize comparisons of numerical values across categories, as highlighted by the height of the
bars, which is more suited to convey gist information with regard to comparison of quantity
(Shah et al., 1999). The objective comprehension question, which asks people: “what percentage
of applicants are expected to succeed when using a traditional interview”, requires people to
extract exact numerical data from the visual aid, which is best displayed by the BESD. Whereas
subjective questions that focus on the overall perceived utility of the display focuses on the gist
information presented in the data, which is easier to extract in a bar graph. The difference in task
type between the subjective and objective dependent variables may explain why the BESD,
while perceived to be less useful, actually resulted in slightly better performance in the objective
comprehension test than the bar graph.
The Icon array, on the other hand, has design features that both highlights gist
information and precise numerical estimates. Graphically, the Icon array shares similarities with
35
the bar graph in that it conveys gist information with the overall size of the arrays. Similar to the
bar graph, the numerical magnitude is represented with the height of the array, therefore making
process of extracting the gist information similar to that in a bar graph. Moreover, because the
Icon array uses individual icons to represent frequencies, it also brings attention to the precise
value of each arrays, making it easy for the viewer to extract precise numerical information from
the array.
Results also showed that the objective measure of comprehension was uncorrelated with
the subjective measure of comprehension and usefulness. Objective comprehension assessed the
viewer’s ability to extract precise numerical information while the subjective measure of
comprehension assessed the viewer’s overall understanding of the gist information presented in
the graph. The two different types of comprehension may be differentially beneficial for different
organizational decisions and goals. If the goal is to communicate and compare the effectiveness
of assessment methods or interventions and to improve people’s intention to adopt these
methods, then it is more beneficial to use graphical visual aids that improve people’s
understanding of the overall pattern in the data. On the other hand, tabular displays have their
use. Precise quantitative information can be useful when making specific forecasts. For instance,
if a manager has to make precise performance projections based on a particular assessment
method, a tabular display with numerical information might be more useful. Nevertheless, as
shown in this study, the Icon array excels at both presenting gist information and precise
numerical information, which makes them the best of both worlds.
The benefits of user-interactive graphical displays are more controversial (Ancker et al.,
2011; Mayer & Chandler, 2001; Zikmund-Fisher et al., 2011). Many researchers cautioned that
36
user-interactivity can backfire if the implementation is too complex, distracting, or does not aid
information processing (Gonzalez, 1996; Mayer, 2000; Morrison & Tversky, 2000). The userinteractive component of the visual aid examined in this study is simple and intuitive.
Participants did not need additional instructions to understand how to interact with the graphs.
Moreover, the interactive component in this study prompts the users to compare the efficacy of
the different hiring practices, which is relevant to the context of the information being
communicated. Thus, the way user-interactivity in this study satisfies the basic requirements of
an effective implementation. As expected, people were more engaged with the visual aids and
found the information easier to understand. There was no interaction between user-interactivity
and subjective graph literacy, however. This finding suggests that the benefits of userinteractivity, as implemented in this study, does not require high graph comprehension skills.
Limitations and Future Direction
There are several limitations to this study. First, the study assumes a selection ratio of
0.50. In other words, it assumes that in the population of job applicants, 50% will be successful
at the particular job examined in the study. In the real world, the selection ratio varies
significantly across jobs. For more technically demand jobs or executive level positions, the
selection ratio could be much lower. Therefore, the expected outcomes of different hiring
methods will also change as a result. Future research should try to address this issue presenting
the validity of different interviewing method across different types of occupations, and varying
the statistical parameters such as selection ratio or effect size. The second limitation is the
psychometric properties of the objective comprehension questions. The questions used in this
study were fairly easy for the sample: more than 90% of participants were able to answer the first
37
three questions correctly. The limited variability in the criterion may have suppressed the true
statistical effect of the different visual aids. Future research should examine other more difficult
objective comprehension questions.
Practical Implications
This study has several practical implications. While most academic journals have
guidelines and recommendations for reporting statistical information to enhance
understandability and maintain statistical rigor, such guidelines do not exist when
communicating validity information in non-academic formats. The methods and principles of
statistical communication in the present study can serve as possible guidelines. The graphical
displays used in the study can also be used to communicate the value of various psychological
services. For example, consulting firms and test development companies can use these displays
to communicate the value of their services and products. Finally, simplifying scientific evidence
can also reduce the science-practice gap across many academic disciplines; especially those
where evidence-based practices are not always being adopted.
Conclusion
Industrial and organizational psychologists have a long struggled with persuading
organizations to adopt evidence-based hiring practices (Highhouse, 2008; Lawshe & Bolda,
1958; Rynes, 2009; Rynes, Colbert, & Brown, 2002). One major cause of the struggle is the lack
of clear and understandable means to communicate complex statistical evidence. This
dissertation improved on how statistical evidence is communicated by using Icon arrays.
Compared to non-graphical tables and bar graphs, Icon arrays were perceived to be easier to
understand and enhance numerical interpretation of statistical information. Principles of effective
38
statistical communication, as demonstrated in this study, have the potential to inform how
scientists present their research to policy makers, managers, and practitioners across many
scientific disciplines.
39
REFERENCES
Ancker, J. S., Senathirajah, Y., Kukafka, R., & Starren, J. B. (2006). Design features of graphs in
health risk communication: a systematic review. Journal of the American Medical Informatics
Association, 13(6), 608–618.
Ancker, J. S., Weber, E. U., & Kukafka, R. (2011). Effects of game-like interactive graphics on risk
perceptions and decisions. Medical Decision Making, 31(1), 130–142.
Arthur, J. B. (1994). Effects of human resource systems on manufacturing performance and turnover.
Academy of Management Journal, 37(3), 670–687.
Ayres, I. (2008). Super Crunchers: how anything can be predicted. Hachette UK.
Bailey, J. R., & Eastman, W. N. (1996). Tensions between science and service in organizational
scholarship. The Journal of Applied Behavioral Science, 32(4), 350.
Barrick, M. R., Patton, G. K., & Haugland, S. N. (2000). Accuracy of interviewer judgments of job
applicant personality traits. Personnel Psychology, 53(4), 925–951.
Beaton, A. E., & Barone, J. L. (1981). The usefulness of selection tests in college admissions. ETS
Research Report Series, 1981(1), 1–17.
Bettman, J. R., & Zins, M. A. (1979). Information format and choice task effects in decision making.
Journal of Consumer Research, 141–153.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological
Review, 111(4), 1061.
Boucheix, J.-M., & Guignard, H. (2005). What animated illustrations conditions can improve
technical document comprehension in young students? Format, signaling and control of the
presentation. European Journal of Psychology of Education, 20(4), 369–388.
40
Bridgeman, B., Pollack, J., & Burton, N. (2004). Understanding what SAT Reasoning TestTM scores
add to high school grades: A straightforward approach. ETS Research Report Series, 2004(2), 1–
20.
Brogden, H. E. (1946). On the interpretation of the correlation coefficient as a measure of predictive
efficiency. Journal of Educational Psychology, 37(2), 65.
Brooks, M. E., Dalal, D. K., & Nolan, K. P. (2014). Are common language effect sizes easier to
understand than traditional effect sizes? The Journal of Applied Psychology, 99(2), 332–40.
Buckley, M. R., Christine Norris, A., & Wiese, D. S. (2000). A brief history of the selection
interview: May the next 100 years be more fruitful. Journal of Management History, 6(3), 113–
126.
Campbell, J. P., Daft, R. L., Hulin, C. L., Association, A. P., & others. (1982). What to study:
Generating and developing research questions (Vol. 32). Sage Beverly Hills, CA.
Carswell, C. M. (1992). Choosing specifiers: An evaluation of the basic tasks model of graphical
perception. Human Factors: The Journal of the Human Factors and Ergonomics Society, 34(5),
535–554.
Carter, L. F. (1947). An experiment on the design of tables and graphs used for presenting numerical
data. Journal of Applied Psychology, 31(6), 640.
Cascio, W. F. (1976). Turnover, biographical data, and fair employment practice. Journal of Applied
Psychology, 61(5), 576.
Cascio, W. F. (1977). Formal education and police officer performance. Journal of Police Science &
Administration, 5(1), 89–96.
41
Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of instruction. Cognition and
Instruction, 8(4), 293–332.
Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4(1), 55–81.
Cleveland, W. S., & McGill, R. (1985). Graphical perception and graphical methods for analyzing
scientific data. Science, 229(4716), 828–833.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd edition). Hillsdale, N.J:
Routledge.
Colbert, A. E., Rynes, S. L., & Brown, K. G. (2005). Who believes us? Understanding managers’
agreement with human resource research findings. The Journal of Applied Behavioral Science,
41(3), 304–325.
Dana, J., Dawes, R., & Peterson, N. (2013). Belief in the unstructured interview: The persistence of
an illusion. Judgment and Decision Making, 8(5), 512–520.
Davidshofer, K. R., & Murphy, C. O. (2005). Psychological testing: principles and applications.
Upper Saddle River, NJ: Pearson/Prentice Hall.
Denis, M. (1984). Imagery and prose: A critical review of research on adults and children. TextInterdisciplinary Journal for the Study of Discourse, 4(4), 381–402.
Dipboye, Robert L. (1997). Structured selection interviews: Why do they work? Why are they
underutilized? In International handbook of selection and assessment (pp. 455–474). London: J
Wiley.
Egan, D. E., & Schwartz, B. J. (1979). Chunking in recall of symbolic drawings. Memory &
Cognition, 7(2), 149–158.
42
Fagerlin, A., Wang, C., & Ubel, P. A. (2005). Reducing the influence of anecdotal reasoning on
people’s health care decisions: is a picture worth a thousand statistics? Medical Decision
Making, 25(4), 398–405.
Fagerlin, A., Zikmund-Fisher, B. J., Ubel, P. A., Jankovic, A., Derry, H. A., & Smith, D. M. (2007).
Measuring Numeracy without a Math Test: Development of the Subjective Numeracy Scale.
Medical Decision Making, 27(5), 672–680. http://doi.org/10.1177/0272989X07304449
Feldman-Stewart, D., Kocovski, N., McConnell, B. A., Brundage, M. D., & Mackillop, W. J. (2000).
Perception of quantitative information for treatment decisions. Medical Decision Making, 20(2),
228–238.
Gaissmaier, W., Wegwarth, O., Skopec, D., Müller, A.-S., Broschinski, S., & Politi, M. C. (2012).
Numbers can be worth a thousand pictures: Individual differences in understanding graphical and
numerical representations of health-related information. Health Psychology, 31, 286–296.
Galesic, M., & Garcia-Retamero, R. (2011). Graph literacy: a cross-cultural comparison. Medical
Decision Making : An International Journal of the Society for Medical Decision Making, 31(3),
444–57. http://doi.org/10.1177/0272989X10373805
Galesic, M., Garcia-Retamero, R., & Gigerenzer, G. (2009). Using icon arrays to communicate
medical risks: overcoming low numeracy. Health Psychology, 28(2), 210.
Garcia-Retamero, R., & Cokely, E. T. (2013). Communicating Health Risks With Visual Aids.
Current Directions in Psychological Science, 22(5), 392–399.
http://doi.org/10.1177/0963721413491570
43
Garcia-Retamero, R., & Dhami, M. K. (2011). Pictures speak louder than numbers: on
communicating medical risks to immigrants with limited non-native language proficiency.
Health Expectations, 14, 46–57. http://doi.org/10.1111/j.1369-7625.2011.00670.x
Garcia-Retamero, R., Galesic, M., & Gigerenzer, G. (2010). Do icon arrays help reduce denominator
neglect? Medical Decision Making, 30(6), 672–684.
Garcia-Retamero, R., Okan, Y., & Cokely, E. T. (2012). Using visual aids to improve communication
of risks about health: a review. TheScientificWorldJournal, 2012, 562637.
http://doi.org/10.1100/2012/562637
Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction:
Frequency formats. Psychological Review, 102(4), 684.
Gonzalez, C. (1996). Does animation in user interfaces improve decision making? In Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems (pp. 27–34). ACM.
Guion, R. M. (2011). Assessment, measurement, and prediction for personnel decisions. Taylor &
Francis.
Guthrie, J. T., Weber, S., & Kimmerly, N. (1993). Searching documents: Cognitive processes and
deficits in understanding graphs, tables, and illustrations. Contemporary Educational
Psychology, 18(2), 186–221.
Haensly, P. A., Lupkowski, A. E., & McNamara, J. F. (1987). The chart essay: A strategy for
communicating research findings to policymakers and practitioners. Educational Evaluation and
Policy Analysis, 9(1), 63–75.
Hardin, J. W., Hilbe, J. M., & Hilbe, J. (2007). Generalized linear models and extensions. Stata Press.
44
Hau, R., Pleskac, T. J., Kiefer, J., & Hertwig, R. (2008). The description–experience gap in risky
choice: the role of sample size and experienced probabilities. Journal of Behavioral Decision
Making, 21(5), 493–518. http://doi.org/10.1002/bdm.598
Hess, R., Visschers, V. H., & Siegrist, M. (2011). Risk communication with pictographs: the role of
numeracy and graph processing. Judgment and Decision Making, 6(3), 263–274.
Highhouse, S. (2008). Stubborn reliance on intuition and subjectivity in employee selection.
Industrial and Organizational Psychology, 1(3), 333–342.
Highhouse, S., & Zhang, D. (2015). The New Fruit Fly for Applied Psychological Research.
Industrial and Organizational Psychology, 8(02), 179–183. http://doi.org/10.1017/iop.2015.22
Hoffrage, U., Lindsey, S., Hertwig, R., & Gigerenzer, G. (2000). Communicating statistical
information. Science, 290(5500), 2261–2262.
Huffcutt, A. I., & Arthur, W. (1994). Hunter and Hunter (1984) revisited: Interview validity for entrylevel jobs. Journal of Applied Psychology, 79(2).
Huselid, M. A. (1995). The impact of human resource management practices on turnover,
productivity, and corporate financial performance. Academy of Management Journal, 38(3),
635–672.
Irwin, J. R., & McClelland, G. H. (2003). Negative Consequences of Dichotomizing Continuous
Predictor Variables. Journal of Marketing Research, 40(3), 366–371.
Johns, G. (1993). Constraints on the adoption of psychology-based personnel practices: lessons from
organizational innovation. Personnel Psychology, 46(3), 569–592.
Kidder, W. C., & Rosner, J. (2002). How the SAT Creates Built-in-Headwinds: An Educational and
Legal Analysis of Disparate Impact. Santa Clara L. Rev., 43, 131.
45
Kosslyn, S. M. (1985). Graphics and human information processing: a review of five books. Journal
of the American Statistical Association, 80(391), 499–512.
Kuncel, N., & Rigdon, J. (2012). Communicating Research Findings. In Handbook of Psychology,
Industrial and Organizational Psychology. John Wiley & Sons.
Latham, G. P., & Finnegan, B. J. (1993). Perceived practicality of unstructured, patterned, and
situational interviews. Personnel Selection and Assessment: Individual and Organizational
Perspectives, 41–55.
Lawshe, C. H., Bolda, R. A., Brune, R. L., & Auclair, G. (1958). Expectancy charts II. Their
theoretical development. Personnel Psychology, 11(4), 545–559.
Levashina, J., Hartwell, C. J., Morgeson, F. P., & Campion, M. A. (2014). The structured employment
interview: Narrative and quantitative review of the research literature. Personnel Psychology,
67(1), 241–293.
Levie, W. H., & Lentz, R. (1982). Effects of text illustrations: A review of research. ECTJ, 30(4),
195–232.
Lewandowsky, S., & Spence, I. (1989). The perception of statistical graphs. Sociological Methods &
Research, 18(2-3), 200–242.
Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The
effects of prior theories on subsequently considered evidence. Journal of Personality and Social
Psychology, 37(11), 2098.
Lowe, R. K. (2003). Animation and learning: selective processing of information in dynamic
graphics. Learning and Instruction, 13(2), 157–176. http://doi.org/10.1016/S09594752(02)00018-X
46
Macdonald-Ross, M. (1977). How numbers are shown. AV Communication Review, 25(4), 359–409.
MacIver, R., Anderson, N., Costa, A.-C., & Evers, A. (2014). Validity of Interpretation: A user
validity perspective beyond the test score. International Journal of Selection and Assessment,
22(2), 149–164.
Mattern, K. D., Shaw, E. J., & Kobrin, J. L. (2011). An Alternative Presentation of Incremental
Validity Discrepant SAT and HSGPA Performance. Educational and Psychological
Measurement, 71(4), 638–662. http://doi.org/10.1177/0013164410383563
Mayer, R. E., & Chandler, P. (2001). When learning is just a click away: Does simple user interaction
foster deeper understanding of multimedia messages? Journal of Educational Psychology, 93(2),
390–397. http://doi.org/10.1037/0022-0663.93.2.390
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The validity of
employment interviews: A comprehensive review and meta-analysis. Journal of Applied
Psychology, 79(4), 599.
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological
Bulletin, 111(2), 361.
Melody Carswell, C., & Wickens, C. D. (1987). Information integration and the object display an
interaction of task demands and display superiority. Ergonomics, 30(3), 511–527.
Morrison, J. B., & Tversky, B. (2001). The (in) effectiveness of animation in instruction. In CHI’01
extended abstracts on Human factors in computing systems (pp. 377–378). ACM.
Morrison, J. B., Tversky, B., & Betrancourt, M. (2000). Animation: Does it facilitate learning. In
AAAI spring symposium on smart graphics (pp. 53–59).
47
Murphy, K. R., & Davidshofer, C. O. (1988). Psychological testing. Principles, and Applications,
Englewood Cliffs.
Nolan, K. P., & Highhouse, S. (2014). Need for autonomy and resistance to standardized employee
selection practices. Human Performance, 27(4), 328–346.
Okan, Y., Garcia-Retamero, R., Cokely, E. T., & Maldonado, A. (2012). Individual differences in
graph literacy: Overcoming denominator neglect in risk comprehension. Journal of Behavioral
Decision Making, 25(4), 390–401.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test
validities: Findings and implications for personnel selection and theories of job performance.
Journal of Applied Psychology, 78(4), 679–703. http://doi.org/10.1037/0021-9010.78.4.679
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on amazon mechanical
turk. Judgment and Decision Making, 5(5), 411–419.
Perez, E. C., & White, M. A. (1985). Student evaluation of motivational and learning attributes of
microcomputer software. Journal of Computer-Based Instruction. Retrieved from
http://psycnet.apa.org/psycinfo/1986-10370-001
Pfeffer, J., & Lammerding, C. (1981). Power in organizations (Vol. 33). Pitman Marshfield, MA.
Pinker, S. (1990). A theory of graph comprehension. Artificial Intelligence and the Future of Testing,
73–126.
Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S., & Carey, T. (1994). Human-computer
interaction. Addison-Wesley Longman Ltd.
Priem, R. L., & Rosenstein, J. (2000). Is organization theory obvious to practitioners? A test of one
established theory. Organization Science, 11(5), 509–524.
48
Rieber, L. P. (1990). Using computer animated graphics in science instruction with children. Journal
of Educational Psychology, 82(1), 135.
Rosenthal, R. (2005). Binomial Effect Size Display. In Encyclopedia of Statistics in Behavioral
Science. John Wiley & Sons, Ltd.
Rosenthal, R., & Rubin, D. B. (1982). A simple, general purpose display of magnitude of
experimental effect. Journal of Educational Psychology, 74(2), 166.
Rosnow, R. L., & Rosenthal, R. (1991). If you’re looking at the cell means, you’re not looking at only
the interaction (unless all main effects are zero).
Roulin, N., & Bangerter, A. (2012). Understanding the Academic–Practitioner Gap for Structured
Interviews:“Behavioral”interviews diffuse,“structured”interviews do not. International Journal
of Selection and Assessment, 20(2), 149–158.
Ryan, A. M., & Sackett, P. R. (1987). A survey of individual assessment practices by I/O
psychologists. Personnel Psychology, 40(3), 455–488.
Rynes, S. (2009). The research-practice gap in industrial-organizational psychology and related fields:
Challenges and potential solutions.
Rynes, S. L., Bartunek, J. M., & Daft, R. L. (2001). Across the Great Divide: Knowledge Creation
and Transfer between Practitioners and Academics. The Academy of Management Journal,
44(2), 340–355. http://doi.org/10.2307/3069460
Rynes, S. L., Colbert, A. E., & Brown, K. G. (2002). HR professionals’ beliefs about effective human
resource practices: Correspondence between research and practice. Human Resource
Management, 41(2), 149–174.
49
Salgado, J. F. (2002). The Big Five personality dimensions and counterproductive behaviors.
International Journal of Selection and Assessment, 10, 117–125.
Schmidt, F., & Hunter, J. (1998). The validity and utility of selection methods in personnel
psychology: Practical and theoretical implications of 85 years of research findings.
Psychological Bulletin, 124(2), 262–274.
Schrader, W. B. (1965). A taxonomy of expactancy tables. Journal of Educational Measurement,
2(1), 29–35.
Schuler, H. (1993). Social validity of selection situations: A concept and some empirical results. In H.
Schuler, J. L. Farr, & M. Smith (Eds.), Personnel selection and assessment: Individual and
organizational perspectives (pp. 11–26). Hillsdale, NJ, England: Lawrence Erlbaum Associates,
Inc.
Shah, P., & Hoeffner, J. (2002). Review of graph comprehension research: Implications for
instruction. Educational Psychology Review, 14(1), 47–69.
Shah, P., Mayer, R. E., & Hegarty, M. (1999). Graphs as aids to knowledge construction: Signaling
techniques for guiding the process of graph comprehension. Journal of Educational Psychology,
91(4), 690.
Soyer, E., & Hogarth, R. M. (2012). The illusion of predictability: How regression statistics mislead
experts. International Journal of Forecasting, 28(3), 695–711.
Spiegelhalter, D., Pearson, M., & Short, I. (2011). Visualizing Uncertainty About the Future. Science,
333(6048), 1393–1400. http://doi.org/10.1126/science.1191181
Tabachnick, B. G., Fidell, L. S., & Osterlind, S. J. (2001). Using multivariate statistics.
50
Taylor, H. C., & Russell, J. T. (1939). The relationship of validity coefficients to the practical
effectiveness of tests in selection: discussion and tables. Journal of Applied Psychology, 23(5),
565.
Terpstra, D. E., & Rozell, E. J. (1997). Why some potentially effective staffing practices are seldom
used. Public Personnel Management, 26(4), 483–495.
Tiffin, J., & Vincent, N. L. (1960). Comparison of empirical and theoretical expectancies. Personnel
Psychology, 13(1), 59–64.
Tindall-Ford, S., Chandler, P., & Sweller, J. (1997). When two sensory modes are better than one.
Journal of Experimental Psychology: Applied, 3(4), 257.
Van der Zee, K. I., Bakker, A. B., & Bakker, P. (2002). Why are structured interviews so rarely used
in personnel selection? Journal of Applied Psychology, 87(1), 176.
Volkov, A., & Laing, G. (2012). Assessing the Value of Graphical Presentations in Financial Reports.
Australasian Accounting, Business and Finance Journal, 6(3), 85–108.
Winn, W. (1989). The role of graphics in training documents: Toward an explanatory theory of how
they communicate. Professional Communication, IEEE Transactions on, 32(4), 300–309.
Wolf, A., & Jenkins, A. (2006). Explaining greater test use for selection: The role of HR professionals
in a world of expanding regulation. Human Resource Management Journal, 16(2), 193–213.
Yankelevich, M. (2007). EXPECTANCY CHART INTERPRETATION AND USE: EFFECTS OF
PRESENTATION. Bowling Green State University.
Zhang, D. C., Zhang, Y., Highhouse, S., & Brooks, M. E. (2014, November). Using Iconarrays to
Communicate Effect Size Information. Presented at the Annual Meeting for the Society for
Judgment and Decision Making, Long Beach, CA.
51
Zikmund-Fisher, B. J., Witteman, H. O., Fuhrel-Forbis, A., Exe, N. L., Kahn, V. C., & Dickson, M.
(2012). Animated Graphics for Comparing Two Risks: A Cautionary Tale. Journal of Medical
Internet Research, 14(4). http://doi.org/10.2196/jmir.2030
52
Table 1.
Example of Taylor-Russell Table
Base-rate of
Level of
Success
Validity
30%
Selection Ratio (SR)
Pearson's r
0.05 0.10
0.20
0.30
0.40
0.50
0.00
0.30 0.30
0.30
0.30
0.30
0.30
0.25
0.50 0.47
0.43
0.41
0.39
0.37
0.50
0.72 0.65
0.58
0.52
0.48
0.44
0.75
0.93 0.86
0.76
0.67
0.59
0.52
53
Table 2.
Example of Binomial Effect Size Display
Improvement
No Improvement
GRE Training
65%
35%
No Training
35%
65%
54
Table 3.
Characteristics of Decision-aids that Enhance Comprehension
Characteristic
CLES
BESD
Expectancy Charts
Icon array
Frequency Representation
No
Sometimes
Sometimes
Yes
Graph Format
No
No
Yes
Yes
Table Format
No
Yes
No
No
High Iconicity
No
No
No
Yes
Overcomes base-rate neglect
No
No
No
Yes
55
Table 4.
Principle Axis Factoring Results for Subjective Graph Literacy Scale
Item
Factor Loading
It is easy for me to understand information presented in a graph or a chart (E.g.
pie chart, bar graph)
0.72
I find that complex information is easier to understand when it is supported by
graphs or charts
0.64
I find it easier to communicate information to others using a graph or a chart
0.70
I am good at creating graphs or charts of numerical information
0.62
When reading the newspaper or magazine, I find the graphs and charts very
helpful
0.71
I have a hard time making sense of data presented in a graph or chart
0.38
I am better than most people at visualizing information
0.49
In my opinion, a picture is worth a thousand words
0.47
56
Table 5.
Means, Standard Deviations, Reliabilities and Inter-correlations of Variables
Variable
M
SD
1
2
3
1
Hiring Experience
2.80
1.10
(0.86)
2
Graph Literacy
4.10
0.52
-0.02
(0.81)
3
Computer Experience
4.50
0.46
0.03
0.26**
(0.65)
4
Perceived Usefulness
4.20
0.75
-0.02
0.34**
0.06
(0.92)
5
Visual-aid Engagement
4.10
0.79
0.00
0.36**
0.14*
0.55**
(0.85)
6
Perceived Comprehension
4.30
0.68
-0.03
0.35**
0.14*
0.70**
0.49**
(0.78)
7
Objective Comprehension
3.59
0.75
0.05
0.22**
0.14*
0.03
0.05
0.14*
8
Age
37.35
10.69
0.18**
-0.02
-0.05
0.15**
0.06
0.16**
Notes. *. P<0.05, **. P<0.01, Diagonals contain standardized Cronbach's Alpha
4
5
6
7
0.08
57
Table 6.
Principle Axis Factoring Results for the Dependent Variables
Factor
Item
1
2
0.82
0.00
0.93
-0.03
0.93
-0.06
interviews.
0.80
0.06
The graphical visual aid was interesting
0.10
0.77
I was engaged in the graphical visual aid
-0.05
0.95
I was bored from looking at the visual aid
0.03
0.65
The information presented in the visual aid was confusing
0.31
0.17
0.63
0.09
0.70
0.08
I would use this visual aid to communicate the advantages of structured
interviews.
I would recommend this visual aid to be used in presenting the advantages
of a structured interview.
I would like to have this visual aid to accompany information about the
advantages of a structured interview
This visual aid clearly demonstrates the advantages of structured
It was easy to understand the information about the different interview
methods
The visual aid made the advantage of structured interview easy to
understand
58
Table 7.
Principle Axis Factoring Results for Revised Dependent Variables
Factor
Visual aid
Item
usefulness
Engagement
0.81
0.00
0.91
-0.03
0.92
-0.06
interviews.
0.79
0.06
The graphical visual aid was interesting
0.11
0.77
I was engaged in the graphical visual aid
-0.05
0.98
I was bored from looking at the visual aid
0.05
0.68
0.62
0.09
0.69
0.08
I would use this visual aid to communicate the advantages of
structured interviews.
I would recommend this visual aid to be used in presenting the
advantages of a structured interview.
I would like to have this visual aid to accompany information about the
advantages of a structured interview
This visual aid clearly demonstrates the advantages of structured
It was easy to understand the information about the different interview
methods
The visual aid made the advantage of structured interview easy to
understand
59
Table 8.
Summary of ANOVA Results for Perceived Visual Aid Usefulness
Source
DF
MS
F
Interactivity
1
3.17
8.06**
Visual Aid
2
1.76
4.48*
Subjective Graph Literacy
1
15.62
39.67**
Visual Aid x Interactivity
2
0.67
1.70
Subjective Graph Literacy x Interactivity
1
0.17
0.44
Visual Aid x Subjective Graph Literacy
2
2.08
5.29**
Three-way Interaction
2
0.06
0.16
Error
286
0.48
Notes. *. p<0.05, **. p<0.01
60
Table 9.
Mean Perceived Usefulness Across Visual Aids
Graph Literacy
Low
High
BESD
4.01(0.74)
4.19(0.77)
Bar graph
3.81(0.78)
4.57(0.48)
Icon array
4.10(0.48)
4.52(0.56)
Notes. Parentheses contain cell standard deviations
61
Table 10.
Summary of ANOVA Results for Engagement
Source
DF
MS
F
Interactivity
1
5.59
9.85*
Visual Aid
2
0.77
1.36
Subjective Graph Literacy
1
12.93
22.79**
Visual Aid x Interactivity
2
1.10
1.93
Subjective Graph Literacy x Interactivity
1
0.66
1.16
Visual Aid x Subjective Graph Literacy
2
0.36
0.63
Three-way Interaction
2
0.85
1.50
Error
281
0.57
Notes. *. p<0.01, **. p<0.001
62
Table 11.
Summary of the Ordered Logistic Regression on Number of Correct Answers
B
Exp(B)
Wald's z
Interactivity
0.07
1.07
0.29
Icon array
0.57
1.77
1.85+
BESD
0.36
1.43
1.18
Subjective Graph Literacy
0.60
1.82
2.81*
Notes. +. p<0.10, *. p<0.05
63
Table 12.
Predicted Probabilities of Number of Correct Responses
Total number of correct
Visual aid
1
2
3
4
BESD
1.9%
4.5%
21.4%
71.2%
Icon array
1.6%
3.7%
18.7%
75.3%
Bar graph
2.7%
6.2%
26.5%
63.2%
64
Table 13.
Logistic Regression Analysis of Objective Comprehension Test Questions
Question Number
Question 1
Question 2
Question 3
Question 4
Subjective Graph Literacy
2.00
3.45
2.26
1.79*
User Interactivity
0.54
0.53
1.05
1.03
Icon array
1.81
0.62
0.69
3.07**
BESD
0.92
0.91
1.02
1.71+
Notes. Values indicate Exponentiated (B); +. p<0.10, *. p<0.05, **. p<0.01
65
Figure 1.
Example of Expectancy Chart
Probability of Success
Score Range
21-25
16-20
11-15
6-10
0-5
0%
20%
40%
60%
80%
100%
66
Figure 2.
Example of Icon array
67
Figure 3.
Screenshot of Visual Aid Instructions
68
Figure 4.
Screenshot of Interactive BESD
69
Figure 5.
Screenshot of Interactive Bar Graph
70
Figure 6.
Screenshot of Interactive Icon array
71
Figure 7.
Screenshot of Static BESD
72
Figure 8.
Screenshot of Static Bar Graph
73
Figure 9.
Screenshot of Static Icon array
74
Figure 10.
Histogram of Subjective Graph Literacy Scale
75
Figure 11.
Plot of Means for Perceived Usefulness of Icon array and Bar Graph
5
Perceived Usefulness
4
3
2
1
Bar graph
Low Graph Literacy
Icon array
High Graph Literacy
76
APPENDIX A: JOB SCREENING ITEMS
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Physical and manual labor
Interacting with computers
Teaching and instructing
Entertainment/Performance
Recruiting and Hiring
Interact with customers
Research and development
Training and mentoring
Management
Care for patients
Public speaking
Planning and organizing
Operate machinery
Traveling
Student
Administration
Lobbying
Fund raising
Writing
Sales
Transportation
Analyst
Food operation
77
APPENDIX B: INTERVIEW VIGNETTE
Imagine that you are the manager of a medium sized software company in charge of hiring. You
are tasked with choosing between a traditional interview to using a structured interview when
assessing job candidates.
The structured interviews require that the same questions are asked of each applicant. Thus,
structured interviews restrict the freedom of the interviewer. But the advantage of the structured
interview is that it results in more accurate predictions and more successful hires than a
traditional interview.
Next, you will be presented a visual aid that presents information between the different interview
methods. We would like your reaction to the visual aid.
78
APPENDIX C: OBJECTIVE COMPREHENSION TEST
1. Which hiring method will lead to the highest number of expected successful hires?
2. If 50 applicants were hired with the structured interview, how many of those hires are
expected to succeed on the job?
3. If switched from a traditional interview to a structured interview, how many more
successful hires will be expected?
4. What percentage of applicants are expected to succeed when using a traditional
interview?
79
APPENDIX D: DEPENDENT VARIABLES
Visual Aid Usefulness
1. I would use this visual aid to communicate the advantages of structured interviews.
2. I would recommend this visual aid to be used in presenting the advantages of a structured
interview.
3. I would like to have this visual aid to accompany information about the advantages of a
structured interview
4. This visual aid clearly demonstrates the advantages of structured interviews.
Visual Aid Engagement
1. The graphical visual aid was interesting
2. I was engaged in the graphical visual aid
3. I was bored from looking at the visual aid
Visual Aid Comprehension
1. The information presented in the visual aid was confusing
2. It was easy to understand the information about the different interview methods
3. The visual aid made the advantage of structured interview easy to understand
80
APPENDIX E: DEMOGRAPHICS
Hiring Experience
•
How many times have you made a hiring-related decision about a candidate?
1:
2:
3:
4:
5:
0 times
1-5 times
5-10 times
10-20 times
20 or more
times
•
How many times have you made a hiring-related recommendation about a candidate?
1:
2:
3:
4:
5:
0 times
1-5 times
5-10 times
10-20 times
20 or more
times
•
How many times have you interviewed a job candidate with a traditional interview?
1:
2:
3:
4:
5:
0 times
1-5 times
5-10 times
10-20 times
20 or more
times
81
•
How many times have you interviewed a job candidate with a structured interview?
1:
2:
3:
4:
5:
0 times
1-5 times
5-10 times
10-20 times
20 or more
times
•
How often do you read academic journal articles on the topic of human resources and/or
management?
1:
2:
3:
4:
5:
Never
A few times a
Every month
Every week
Almost every
year
•
What kind of formal training have you had in hiring-related practices? Check all that
apply.
o Professional workshop
o Online training
o Written manual
o Informal instructions
o Classroom instructions
o Observation/shadowing
Computer Experience
•
day
How often do you use a computer?
82
•
•
1:
2:
3:
4:
5:
Less than 1
Less than 1
1-2 hours
3-6 hours
6 or more hours
hour per week
hour per day
per day
per day
per day
How would you rate your knowledge about computer use?
1:
2:
3:
4:
5:
Much less
Less than
About average
More than
Much more than
than average
average
average
average
How comfortable are you with using a computer for everyday tasks?
1:
2:
Not at all
comfortable
3:
4:
5:
Somewhat
Somewhat
Very
uncomfortable
Comfortable
comfortable
83
APPENDIX F: INFORMED CONSENT
84
APPENDIX G: HSRB APPROVAL LETTER