indicated the presence automation bias, a term

204
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996
Automation Bias, Accountability, and Verification Behaviors
Kathleen L. Mosier
San Jose State University
Foundation
at NASA Ames Research Ctr.
Linda J. Skitka
Mark D. Burdick
University of Illinois at
Chicago
Susan T. Heers
Western Aerospace Labs
at NASA Ames
Automated procedural and decision aids may in some cases have the paradoxical effect of increasing errors
rather than eliminating them. Results of recent research investigating the use of automated systems have indicated
the presence automation bias, a term describing errors made when decision makers rely on automated cues as a
heuristic replacement for vigilant information seeking and processing (Mosier & Skitka, in press). Automation
commission errors, i.e., errors made when decision makers take inappropriate action because they over-attend to
automated information or directives, and automation omission errors, i.e., errors made when decision makers do
not take appropriate action because they are not informed of an imminent problem or situation by automated aids,
can result from this tendency.
A wide body of social psychological research has found that many cognitive biases and resultant errors can be
ameliorated by imposing pre-decisional accountability, which sensitizes decision makers to the need to construct
compelling justifications for their choices and how they make them. To what extent these effects generalize to
performance situations has yet to be empirically established. The two studies presented represent concurrent
efforts, with student and “glass cockpit” pilot samples, to determine the effects of accountability pressures on
automation bias and on verification of the accurate functioning of automated aids. Students (Experiment 1) and
commercial pilots (Experiment 2) performed simulated flight tasks using automated aids. In both studies,
participants who perceived themselves “accountable” for their strategies of interaction with the automation were
significantly more likely to verify its correct functioning, and committed significantly fewer automation-related
errors than those who did not report this perception.
makers take inappropriate action because they overattend to automated information or directives and do
Automated decision aids have been introduced
not attend to other environmental cues, and
into many work environments with the explicit goal
automation omission errors, i.e., errors made when
of reducing human error. Flight management
decision
makers do not take appropriate action
systems, for example, are taking increasing control
because
they
are not informed of an imminent
of flight operations, such as calculating fuel efficient
problem
or
situation
by automated aids. Evidence of
paths, navigation, detecting system malfunctions and
the tendency to make both omission and commission
abnormalities, in addition to flying the plane.
errors has been found in commercial air-crew selfNuclear power plants and even medical diagnostics
reports (Mosier, Skitka, & Korte, 1994), as well as
are similarly becoming more and more automated.
in non-flight decision making contexts with novices
One of the purportedly beneficial facets of
(Skitka & Mosier, 1994; Mosier, Skitka, & Heers,
automation and automated decision aids is that they
1995).
replace or supersede traditional displays of
A wide body of social psychological research
information with new, typically more salient cues. A
has found that many cognitive biases and resultant
potential danger in this is that the information search
errors can be ameliorated by imposing pre-decisional
process of human operators may get short-circuited,
accountability, which sensitizes decision makers to
i.e., operators may stop short at the automated
the need to construct compelling justifications for
display, and not double-check the operation of the
their choices and how thev make them.
automated system. Because of this, automated
Accountability demands cause decision makers to
procedural and decision aids may in some cases have
employ more multidimensional, self-critical, and
the paradoxical effect of increasing errors rather than
vigilant information seeking, and more complex data
eliminating them. Results of recent research
processing, and have been shown to reduce cognitive
investigating the use of automated systems have
“freezing” or premature closure on judgmental
indicated the presence automation bias, a term
problems (Kruglanski & Freund, 1983), and to lead
describing errors made when decision makers rely on
decision makers to employ more consistent patterns
automated cues as a heuristic replacement for vigilant
of cue utilization (Hagafors & Brehmer, 1983). To
information seeking and processing (Mosier &
what extent these effects generalize to performance
Skitka, in press). Potential negative effects of
situations has yet to be empirically established.
automation bias can be broken down into automation
commission errors, i.e., errors made when decision
Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996
The two studies described below represent
concurrent efforts, with student and with “glass
cockpit” pilot samples, to determine the effects of
accountability pressures on automation bias and on
verification of the accurate functioning of automated
aids.
EXPERIMENT 1: STUDENT STUDY
METHOD
Participants were 121 university students. The
primary experimental task was presented on a 13”
color display monitor with four quadrants (see
Figure l), using the WindowlPANES: Workload1
PelformANcE Simulation (NASA-Ames Research
Center, 1989). The software presents a screen with
four quadrants. Participants were required to track a
target circle, presented in the upper right quadrant (all
participants were right-handed), that moved as a
function of the disturbance imposed by a sum of
sines algorithm using a two-axis joystick.
I PRESS TEMP
EPR
EGT
Figure 1. WindowPANES display.
Fluctuating bar graphs representing four gauges
were presented in the bottom left quadrant. Subjects
were trained to push a
button on the control
board when three of the four gauges went out of
range. On the bottom right quadrant, subjects
monitored their progress on a map, and were to press
a button once they passed “waypoints” on the
display. The upper left quadrant contained the
“Automated Monitoring Aid” (AMA), which
presented directives prompting gauge resets and
waypoint button presses. The AMA also informed
operators of impending gauge failures, and prompted
a choice button response. Subjects were told,
however, that the AMA was not 100% reliable, and
that they could verify the accuracy of its choice
205
button response directives on another screen. The
verification display, which used the MouseLab
decision laboratory software (Johnson, Payne,
Schkade, & Bettman, 1991) was run on a second
microcomputer and displayed on a second monitor.
Participants were instructed to open as many cells as
required by the directive (1-3) in order from left to
right to completely verify the directive’s accuracy
(i.e., whether or not the gauges were really failing
and whether the response button indicated by the
AMA was correct).
Design and Procedures
Participants were randomly assigned to
Accountability conditions. Four groups of
Accountable participants were instructed that their
task was to maximize (1) their Overall Performance,
or (2) Accuracy of responses, or (3) Quick Response
to events, or (4)Tracking. All four of these groups
were told that they were being constantly monitored
by the experimenter, and that they would be asked to
explain and justify their performance to the
experimenters in an interview following the task.
Participants in the Non-Accountable group were told
that their performance could not be monitored by the
experimenter, that their data were not being saved
past the end of each trial, and no mention of an
interview was made.
The number of boxes that had to be opened to
completely verify an automated directive was the
verification level for that event. The choice-response
gauge task provided a way to examine commission
errors (i.e., pressing an incorrect button) while
assessing verification behaviors. Opportunities for
omission errors occurred when subjects were not
prompted to
when necessary, or to indicate
when they passed a waypoint. All participants were
informed that the verification display data were 100%
accurate.
Participants completed a total of three training
and five experimental trials of ten minutes each.
Across experimental trials were 50 total “events,”
i.e., gauge events and waypoint checks. Omission
error opportunities occurred when three gauges went
into the red zone with no automated prompt to reset,
or when subjects passed waypoints and received no
prompt to report in. Each type occurred twice. Six
commission error opportunities were presented--all
of which consisted of automated directives for choice
response tasks that prompted the wrong button press
(subjects who verified would see the correct button
press indicated on the MouseLab display). At the
end of each trial, participants were given visual
feedback on their mean response time, percentage
correct, and root mean squared tracking error.
Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016
206
PROCEEDINGS of the HUMAN FACTORS A N D ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996
After the experimental trials, participants were
given a questionnaire that included manipulation
checks, and tapped perceptions of their experience
during the experimental task and their attitudes
toward automation. No ')justification" interviews
were actually conducted.
events, and also more likely to verify (especially at
higher levels) than other groups, regardless of level.
Omission Errors. Did accountability similarly
affect subjects'tendency to make omission errors? A
one-way analysis of variance investigating the effect
of accountability condition on the number omission
errors revealed that participants in the Overall
Performance and Accuracy conditions made
significantly fewer errors [F(4,116)=2.49, pc.05;
see Table 11 than those in other groups.
RESULTS
Commission Errors and VerijicationBehavior. A
mixed design analysis of variance (ANOVA)
revealed main effects for Accountability condition on
the tendency to make commission errors, [F(4,
111)=3.26, pc.051. Subjects in the Overall
Performance and Accuracy conditions made
significantly fewer commission errors than subjects
in the remaining three conditions, [F(1, 111)=12.
18, p<.OOl.].
A similar panern emerged with respect to which
subjects were most likely to completely verify
automated directives. Accountability had a
significant effect on number of complete
verifications, [F(4, 11 1)=4.53, p<.Ol]. Tukey
tests indicated that subjects in the Overall
Performance and Accuracy conditions were
. significantly more likely to completely verify than
subjects in the Quick Response group. A significant
Verification Level x Accountability interaction [F(2,
222)=2.87, p< .01] was also revealed. As shown in
Figure 2, subjects in the Overall Performance
accountability condition were dramatically more
likely to completely verify in the three level
verification events than in the two or one level
Table 1.
Number of Omission Errors as a Function of
Accountabilitv Condition.
Accountable for:
Overall Performance
1.18
Accuracy
1.08
Quick Response
1.45
Tracking
2.12
Not Accountable
1.78
In sum, imposing accountability for Overall
Performance or for Accuracy had the effect of
making participants more vigilant and more likely to
verify the accuracy of automated information, and
resulted in fewer errors. Correlational analyses of
post-experiment questionnaires indicated that highly
confident and comfortable subjects were less likely to
make commission errors [r (111) = -.20, p<.05].
Significant correlations were found between
o n h i o n errors and items relating to accountability,
such as comfort in justifying strategies, and
perceptions that performance was being monitored,
[r (1 11) = -.15, pc.051.
~~
?
9
p
Y-
-
.,*.
7-
_cF_
KO1
6-
........0 ......"
AIX tcir
~ -o-. .-
Acct for Quick Resp
----&----
Acct for Accuracy
- - -a-- -
Accr f u r TrL
5-
~
Acct
Prrt'
J
f
G
E
U
J-
cr
P
I
'
I
I
I
One
Two
Three
1
Level of Verification
Figure 2. Number of complete verifications by accountability condition and verification level.
Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996
EXPERIMENT 2: COMMERCIAL PILOTS
METHOD
Participants in this study were 21 commercial
glass cockpit pilots (i.e., pilots of automated
aircraft). The part-task flight simulation facility used
in this experiment is modeled after the Advanced
Concepts Flight Simulator at Ames Research Center,
and employs two Silicon Graphics color monitors to
present glass displays of primary and secondary
flight displays, navigation and communication
information, and electronic checklists, as well as
Engine Indicating and Crew Alerting System
(EICAS) and Fight Management System (FMS)
instmments. Subjects interacted with the controls
and displays of the aircraft through a touchscreen
overlaying the instruments needed to accomplish the
flight.
In addition to the aircraft displays, a secondary
tracking task was presented on one of the monitors to
provide a means of increasing the workload of the
pilots, and was incorporated into the automation
events described below. This task involved using a
joystick to keep displayed cross-hairs inside the
boundaries of a blue circle. When the cross-hairs
crossed the boundaries of the circle, the circle turned
red. Feedback on how much time the subject was
able to stay within the target circle was accumulated
and displayed to the subject.
Design and Procedures
Pilots were assigned to either one Accountable
condition or to a Non-Accountable group.
Accountable subjects were told that their performance
would be monitored, and that they would be
interviewed and asked to explain and justify their
performance in the use af automated
Strategies
systems at the end of the experiment. Additionally, a
video camera was placed in a prominent position in
the booth with Accountable participants. Pilots in the
Non-Accountable group were told that their
performance data could not be collected (due to a
computer malfunction) or linked to them personally
in any way, and no mention of an interview was
made. Pilots were trained individually on how each
of the components of the experimental task
functioned and were given time to practice.
Following training, subjects flew two legs (order
was counterbalanced): Los Angeles (LAX) to San
Francisco (SFO), and SFO to Sacramento (SMF).
The flight route was pre-loaded into the FMS prior to
beginning the trial. Subjects were instructed to
communicate with Air Traffic Control (ATC) through
textual datalink messages sent and received on the
CDU screen of their FMS. Clearances from ATC
(e.g., a change in altitude, speed, heading, or
a
207
frequency) could be auto-loaded into the appropriate
flight system, and correct loading could be verified
by checking the the Mode Control Panel (MCP) or
navigation display. Pilots manually performed the
secondary tracking task from SFO-SMF whenever
they were above 5,000'. The secondary task was
automated on the LAX-SFO leg.
Four automation failures during these legs
offered the possibility for pilots to make omission
errors if they did not verify proper automation
functioning: 1) an altitude clearance misloaded into
the flight control systems, and was reflected by
incorrect numbers on the MCP 2) the flight system
incorrectly executed a commanded heading change,
and the improper execution was reflected on the
navigational display; 3) a frequency change
misloaded into the flight control systems, reflected
by incorrect numbers on the MCP and 4) the
tracking task automation failed at 7,000' during the
LAX-SFO flight, which was signalled by the
boundary circle turning red. In all cases, verification
information was available on the appropriate display,
as it would be in the aircraft. One opportunity for a
commission error, a false "Engine Fire" message,
yielded no variance in responses (every pilot
committed the error), and is discussed in detail
elsewhere (Mosier & Skitka, 1996). Debriefing
forms included questions on flight experience, as
well as questions probing perceptions of
accountability and attitudes toward automation. No
"justification" interviews were conducted.
RESULTS
Numbers of errors did not vary significantly as a
function of manipulated accountability. They were,
however correlated with total fight hours
[r(20)=.49, p<.O5] and with years of flight
experience [r(20)=.46, pc.051, indicating that
increased experience decreased the likelihood of
catching the automation failures. Descriptive
analyses of the entire sample revealed that the altitude
load failure and the heading capture failure, the two
events most critical to aircraft operation safety,
remained undetected by 44% and 48% of the
participants respectively. The frequency misload
was undetected by 7 1% of pilot participants. The
tracking task automation failure, completely
irrelevant to flight functioning, was detected by all of
the participants.
In order to ascertain the underlying factors that
discriminate participants who were more likely to
verify automated tasks (and thus catch errors) from
those less likely to do so, pilots were classified
according to the number of omission errors they
committed. Those who missed two or three out of
three flight-related events were categorized as "high-
Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016
208
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996
bias” participants, and those who missed none or
only one automation failure were placed into the “low
automation-bias’’group. ANOVAs were conducted
using bias group as the independent variable and
responses on the debriefing questionnaire as
dependent variables.
Bias groups were statistically equivalent on
items such as comfort with the experiment,
confidence in their strategies, and confidence in
computers. However, low-bias participants reported
more nervousness [F(1,19)=7.08; p<.O15], a higher
sense of being evaluated on their performance
[F(1,19)=2.21; p<.OO] and stratecies in use of the
automation [F( 1,19)=9.63; pc.006], and a stronger
need to justify their interaction with the automation
[F(1,19)=6.24; pc.021. In other words, pilots that
reported a higher internalized sense of accountability
for their interactions with automation verified correct
automation functioning more often and committed
fewer errors.
DISCUSSION
Results of these studies demonstrate that the
perception that one is “accountable” for particular
aspects of performance affects one’s strategies in
performing the task. In the student study,
experimentally manipulated accountability decreased
the tendency to make errors under specific
conditions, i.e., when subjects were accountable for
their Overall Performance or Accuracy. Analysis of
the verification data indicates that accountability
effects occur largely due to the expected increase in
cognitive vigilance--subjects in these conditions were
more likely to completely verify than NonAccountable subjects or subjects accountable for a
Quick Response or for Tracking. This effect was
especially pronounced as verification required
opening more boxes, making it more difficult or
costly to accomplish. Conversely, imposing
accountability for Tracking led to an increase of
automation-related errors, possibly because it
encouraged participants to concentrate attention on
the tracking task rather than on checking for proper
automation functioning.
Although experimentally-manipulated
accountability did not impact the automation bias of
the pilot subjects, there is evidence that internalized
accountability led to an increase in verification of
automated functioning and fewer resultant errors.
Apparently, the sense that one is accountable for
one’s interaction with automation does encourage
vigilant, proactive strategies and stimulates
verification of the functioning of automated systems.
The fact that, for the pilot sample, this perception
was not correspondent with our external
accountability manipulation indicates the need to
establish whether accountability is a variable that can
be significantly influenced in professional decision
makers (i.e., pilots, who are already at a high level
of personal responsibility for their conduct), or if it is
part of some innate personality construct. It is clear,
however, that the perception of accountability for
one’s interaction with automation does encourage
vigilant, proactive strategies and stimulates
verification of the functioning of automated systems.
ACKNOWLEDGEMENTS
This research was supported by NASA grants NCC2-798,
NCC2-837, and NAS2-832. Special thanks to reviewers Mary
Connors, Ute Fischer. and Irene Laudeman. Susan Heers is
currently affiliated with Monterey Technologies.
REFERENCES
Hagafors. R., & Brehmer, B. (1983). Does having to
justify one’s decisions change the nature of the decision
process? OrganizationalBehavior and Human Performance,
31, 223-232.
Johnson, E. J.. Payne, J. W., Schkade, D. A.. &
Bettman, J. R. (1991). Monitoring information processing
and decisions: The MouseLab system. Philadelphia:
University of Pennsylvania, The Wharton School.
Kruglanski, A. W., & Freund, T. (1983). The freezing
and unfreezing of lay inferences: Effects on impressional
primacy, ethnic stereotyping. and numerical anchoring.
Journal of Esperimental Social Psychology, 14,448-468.
Mosier, K. L., & Skitka. L. J. (in press). Human
Decision Makers and Automated Decision Ads: Made for
Each Other? In R. Parasuraman & M. Mouloua (Eds.),
Automation and Human Performance: Theory and
Applications. NJ: Lawrence Erlbaum Associates, lnc.
Mosier. K. L., Skitka. L. J., & Heers. S. T. (1995).
Automation and accountability for performance. In R. S.
Jensen & L. A. Rakovan (Ed.), Proceedings of the Eighth
International Symposium on Aviation Psychology @p. 22 1226). Columbus. Ohio.
Mosier, K. L.. Skitka. L. J., & Korte. K. J. (1994).
Cognitive and social psychological issues in flight
crew/automation interaction. Proceedings of the Automalion
Technology and Human Performance Conference, Sage.
NASA Ames Research Center (1989). WindowPANES:
Workload PerformANcE Simulation. Moffett Field, CA:
NASA Ames Research Center, Rotorcraft Human Factors
Research Branch.
Skitka, L. J.. & Mosier, K. L. (1994). Automation bias:
When, where, why? Presented at the Annual Conference of
the Society for Judgment and Decision Making.
Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016