204 PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996 Automation Bias, Accountability, and Verification Behaviors Kathleen L. Mosier San Jose State University Foundation at NASA Ames Research Ctr. Linda J. Skitka Mark D. Burdick University of Illinois at Chicago Susan T. Heers Western Aerospace Labs at NASA Ames Automated procedural and decision aids may in some cases have the paradoxical effect of increasing errors rather than eliminating them. Results of recent research investigating the use of automated systems have indicated the presence automation bias, a term describing errors made when decision makers rely on automated cues as a heuristic replacement for vigilant information seeking and processing (Mosier & Skitka, in press). Automation commission errors, i.e., errors made when decision makers take inappropriate action because they over-attend to automated information or directives, and automation omission errors, i.e., errors made when decision makers do not take appropriate action because they are not informed of an imminent problem or situation by automated aids, can result from this tendency. A wide body of social psychological research has found that many cognitive biases and resultant errors can be ameliorated by imposing pre-decisional accountability, which sensitizes decision makers to the need to construct compelling justifications for their choices and how they make them. To what extent these effects generalize to performance situations has yet to be empirically established. The two studies presented represent concurrent efforts, with student and “glass cockpit” pilot samples, to determine the effects of accountability pressures on automation bias and on verification of the accurate functioning of automated aids. Students (Experiment 1) and commercial pilots (Experiment 2) performed simulated flight tasks using automated aids. In both studies, participants who perceived themselves “accountable” for their strategies of interaction with the automation were significantly more likely to verify its correct functioning, and committed significantly fewer automation-related errors than those who did not report this perception. makers take inappropriate action because they overattend to automated information or directives and do Automated decision aids have been introduced not attend to other environmental cues, and into many work environments with the explicit goal automation omission errors, i.e., errors made when of reducing human error. Flight management decision makers do not take appropriate action systems, for example, are taking increasing control because they are not informed of an imminent of flight operations, such as calculating fuel efficient problem or situation by automated aids. Evidence of paths, navigation, detecting system malfunctions and the tendency to make both omission and commission abnormalities, in addition to flying the plane. errors has been found in commercial air-crew selfNuclear power plants and even medical diagnostics reports (Mosier, Skitka, & Korte, 1994), as well as are similarly becoming more and more automated. in non-flight decision making contexts with novices One of the purportedly beneficial facets of (Skitka & Mosier, 1994; Mosier, Skitka, & Heers, automation and automated decision aids is that they 1995). replace or supersede traditional displays of A wide body of social psychological research information with new, typically more salient cues. A has found that many cognitive biases and resultant potential danger in this is that the information search errors can be ameliorated by imposing pre-decisional process of human operators may get short-circuited, accountability, which sensitizes decision makers to i.e., operators may stop short at the automated the need to construct compelling justifications for display, and not double-check the operation of the their choices and how thev make them. automated system. Because of this, automated Accountability demands cause decision makers to procedural and decision aids may in some cases have employ more multidimensional, self-critical, and the paradoxical effect of increasing errors rather than vigilant information seeking, and more complex data eliminating them. Results of recent research processing, and have been shown to reduce cognitive investigating the use of automated systems have “freezing” or premature closure on judgmental indicated the presence automation bias, a term problems (Kruglanski & Freund, 1983), and to lead describing errors made when decision makers rely on decision makers to employ more consistent patterns automated cues as a heuristic replacement for vigilant of cue utilization (Hagafors & Brehmer, 1983). To information seeking and processing (Mosier & what extent these effects generalize to performance Skitka, in press). Potential negative effects of situations has yet to be empirically established. automation bias can be broken down into automation commission errors, i.e., errors made when decision Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016 PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996 The two studies described below represent concurrent efforts, with student and with “glass cockpit” pilot samples, to determine the effects of accountability pressures on automation bias and on verification of the accurate functioning of automated aids. EXPERIMENT 1: STUDENT STUDY METHOD Participants were 121 university students. The primary experimental task was presented on a 13” color display monitor with four quadrants (see Figure l), using the WindowlPANES: Workload1 PelformANcE Simulation (NASA-Ames Research Center, 1989). The software presents a screen with four quadrants. Participants were required to track a target circle, presented in the upper right quadrant (all participants were right-handed), that moved as a function of the disturbance imposed by a sum of sines algorithm using a two-axis joystick. I PRESS TEMP EPR EGT Figure 1. WindowPANES display. Fluctuating bar graphs representing four gauges were presented in the bottom left quadrant. Subjects were trained to push a button on the control board when three of the four gauges went out of range. On the bottom right quadrant, subjects monitored their progress on a map, and were to press a button once they passed “waypoints” on the display. The upper left quadrant contained the “Automated Monitoring Aid” (AMA), which presented directives prompting gauge resets and waypoint button presses. The AMA also informed operators of impending gauge failures, and prompted a choice button response. Subjects were told, however, that the AMA was not 100% reliable, and that they could verify the accuracy of its choice 205 button response directives on another screen. The verification display, which used the MouseLab decision laboratory software (Johnson, Payne, Schkade, & Bettman, 1991) was run on a second microcomputer and displayed on a second monitor. Participants were instructed to open as many cells as required by the directive (1-3) in order from left to right to completely verify the directive’s accuracy (i.e., whether or not the gauges were really failing and whether the response button indicated by the AMA was correct). Design and Procedures Participants were randomly assigned to Accountability conditions. Four groups of Accountable participants were instructed that their task was to maximize (1) their Overall Performance, or (2) Accuracy of responses, or (3) Quick Response to events, or (4)Tracking. All four of these groups were told that they were being constantly monitored by the experimenter, and that they would be asked to explain and justify their performance to the experimenters in an interview following the task. Participants in the Non-Accountable group were told that their performance could not be monitored by the experimenter, that their data were not being saved past the end of each trial, and no mention of an interview was made. The number of boxes that had to be opened to completely verify an automated directive was the verification level for that event. The choice-response gauge task provided a way to examine commission errors (i.e., pressing an incorrect button) while assessing verification behaviors. Opportunities for omission errors occurred when subjects were not prompted to when necessary, or to indicate when they passed a waypoint. All participants were informed that the verification display data were 100% accurate. Participants completed a total of three training and five experimental trials of ten minutes each. Across experimental trials were 50 total “events,” i.e., gauge events and waypoint checks. Omission error opportunities occurred when three gauges went into the red zone with no automated prompt to reset, or when subjects passed waypoints and received no prompt to report in. Each type occurred twice. Six commission error opportunities were presented--all of which consisted of automated directives for choice response tasks that prompted the wrong button press (subjects who verified would see the correct button press indicated on the MouseLab display). At the end of each trial, participants were given visual feedback on their mean response time, percentage correct, and root mean squared tracking error. Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016 206 PROCEEDINGS of the HUMAN FACTORS A N D ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996 After the experimental trials, participants were given a questionnaire that included manipulation checks, and tapped perceptions of their experience during the experimental task and their attitudes toward automation. No ')justification" interviews were actually conducted. events, and also more likely to verify (especially at higher levels) than other groups, regardless of level. Omission Errors. Did accountability similarly affect subjects'tendency to make omission errors? A one-way analysis of variance investigating the effect of accountability condition on the number omission errors revealed that participants in the Overall Performance and Accuracy conditions made significantly fewer errors [F(4,116)=2.49, pc.05; see Table 11 than those in other groups. RESULTS Commission Errors and VerijicationBehavior. A mixed design analysis of variance (ANOVA) revealed main effects for Accountability condition on the tendency to make commission errors, [F(4, 111)=3.26, pc.051. Subjects in the Overall Performance and Accuracy conditions made significantly fewer commission errors than subjects in the remaining three conditions, [F(1, 111)=12. 18, p<.OOl.]. A similar panern emerged with respect to which subjects were most likely to completely verify automated directives. Accountability had a significant effect on number of complete verifications, [F(4, 11 1)=4.53, p<.Ol]. Tukey tests indicated that subjects in the Overall Performance and Accuracy conditions were . significantly more likely to completely verify than subjects in the Quick Response group. A significant Verification Level x Accountability interaction [F(2, 222)=2.87, p< .01] was also revealed. As shown in Figure 2, subjects in the Overall Performance accountability condition were dramatically more likely to completely verify in the three level verification events than in the two or one level Table 1. Number of Omission Errors as a Function of Accountabilitv Condition. Accountable for: Overall Performance 1.18 Accuracy 1.08 Quick Response 1.45 Tracking 2.12 Not Accountable 1.78 In sum, imposing accountability for Overall Performance or for Accuracy had the effect of making participants more vigilant and more likely to verify the accuracy of automated information, and resulted in fewer errors. Correlational analyses of post-experiment questionnaires indicated that highly confident and comfortable subjects were less likely to make commission errors [r (111) = -.20, p<.05]. Significant correlations were found between o n h i o n errors and items relating to accountability, such as comfort in justifying strategies, and perceptions that performance was being monitored, [r (1 11) = -.15, pc.051. ~~ ? 9 p Y- - .,*. 7- _cF_ KO1 6- ........0 ......" AIX tcir ~ -o-. .- Acct for Quick Resp ----&---- Acct for Accuracy - - -a-- - Accr f u r TrL 5- ~ Acct Prrt' J f G E U J- cr P I ' I I I One Two Three 1 Level of Verification Figure 2. Number of complete verifications by accountability condition and verification level. Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016 PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996 EXPERIMENT 2: COMMERCIAL PILOTS METHOD Participants in this study were 21 commercial glass cockpit pilots (i.e., pilots of automated aircraft). The part-task flight simulation facility used in this experiment is modeled after the Advanced Concepts Flight Simulator at Ames Research Center, and employs two Silicon Graphics color monitors to present glass displays of primary and secondary flight displays, navigation and communication information, and electronic checklists, as well as Engine Indicating and Crew Alerting System (EICAS) and Fight Management System (FMS) instmments. Subjects interacted with the controls and displays of the aircraft through a touchscreen overlaying the instruments needed to accomplish the flight. In addition to the aircraft displays, a secondary tracking task was presented on one of the monitors to provide a means of increasing the workload of the pilots, and was incorporated into the automation events described below. This task involved using a joystick to keep displayed cross-hairs inside the boundaries of a blue circle. When the cross-hairs crossed the boundaries of the circle, the circle turned red. Feedback on how much time the subject was able to stay within the target circle was accumulated and displayed to the subject. Design and Procedures Pilots were assigned to either one Accountable condition or to a Non-Accountable group. Accountable subjects were told that their performance would be monitored, and that they would be interviewed and asked to explain and justify their performance in the use af automated Strategies systems at the end of the experiment. Additionally, a video camera was placed in a prominent position in the booth with Accountable participants. Pilots in the Non-Accountable group were told that their performance data could not be collected (due to a computer malfunction) or linked to them personally in any way, and no mention of an interview was made. Pilots were trained individually on how each of the components of the experimental task functioned and were given time to practice. Following training, subjects flew two legs (order was counterbalanced): Los Angeles (LAX) to San Francisco (SFO), and SFO to Sacramento (SMF). The flight route was pre-loaded into the FMS prior to beginning the trial. Subjects were instructed to communicate with Air Traffic Control (ATC) through textual datalink messages sent and received on the CDU screen of their FMS. Clearances from ATC (e.g., a change in altitude, speed, heading, or a 207 frequency) could be auto-loaded into the appropriate flight system, and correct loading could be verified by checking the the Mode Control Panel (MCP) or navigation display. Pilots manually performed the secondary tracking task from SFO-SMF whenever they were above 5,000'. The secondary task was automated on the LAX-SFO leg. Four automation failures during these legs offered the possibility for pilots to make omission errors if they did not verify proper automation functioning: 1) an altitude clearance misloaded into the flight control systems, and was reflected by incorrect numbers on the MCP 2) the flight system incorrectly executed a commanded heading change, and the improper execution was reflected on the navigational display; 3) a frequency change misloaded into the flight control systems, reflected by incorrect numbers on the MCP and 4) the tracking task automation failed at 7,000' during the LAX-SFO flight, which was signalled by the boundary circle turning red. In all cases, verification information was available on the appropriate display, as it would be in the aircraft. One opportunity for a commission error, a false "Engine Fire" message, yielded no variance in responses (every pilot committed the error), and is discussed in detail elsewhere (Mosier & Skitka, 1996). Debriefing forms included questions on flight experience, as well as questions probing perceptions of accountability and attitudes toward automation. No "justification" interviews were conducted. RESULTS Numbers of errors did not vary significantly as a function of manipulated accountability. They were, however correlated with total fight hours [r(20)=.49, p<.O5] and with years of flight experience [r(20)=.46, pc.051, indicating that increased experience decreased the likelihood of catching the automation failures. Descriptive analyses of the entire sample revealed that the altitude load failure and the heading capture failure, the two events most critical to aircraft operation safety, remained undetected by 44% and 48% of the participants respectively. The frequency misload was undetected by 7 1% of pilot participants. The tracking task automation failure, completely irrelevant to flight functioning, was detected by all of the participants. In order to ascertain the underlying factors that discriminate participants who were more likely to verify automated tasks (and thus catch errors) from those less likely to do so, pilots were classified according to the number of omission errors they committed. Those who missed two or three out of three flight-related events were categorized as "high- Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016 208 PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 40th ANNUAL MEETING-1996 bias” participants, and those who missed none or only one automation failure were placed into the “low automation-bias’’group. ANOVAs were conducted using bias group as the independent variable and responses on the debriefing questionnaire as dependent variables. Bias groups were statistically equivalent on items such as comfort with the experiment, confidence in their strategies, and confidence in computers. However, low-bias participants reported more nervousness [F(1,19)=7.08; p<.O15], a higher sense of being evaluated on their performance [F(1,19)=2.21; p<.OO] and stratecies in use of the automation [F( 1,19)=9.63; pc.006], and a stronger need to justify their interaction with the automation [F(1,19)=6.24; pc.021. In other words, pilots that reported a higher internalized sense of accountability for their interactions with automation verified correct automation functioning more often and committed fewer errors. DISCUSSION Results of these studies demonstrate that the perception that one is “accountable” for particular aspects of performance affects one’s strategies in performing the task. In the student study, experimentally manipulated accountability decreased the tendency to make errors under specific conditions, i.e., when subjects were accountable for their Overall Performance or Accuracy. Analysis of the verification data indicates that accountability effects occur largely due to the expected increase in cognitive vigilance--subjects in these conditions were more likely to completely verify than NonAccountable subjects or subjects accountable for a Quick Response or for Tracking. This effect was especially pronounced as verification required opening more boxes, making it more difficult or costly to accomplish. Conversely, imposing accountability for Tracking led to an increase of automation-related errors, possibly because it encouraged participants to concentrate attention on the tracking task rather than on checking for proper automation functioning. Although experimentally-manipulated accountability did not impact the automation bias of the pilot subjects, there is evidence that internalized accountability led to an increase in verification of automated functioning and fewer resultant errors. Apparently, the sense that one is accountable for one’s interaction with automation does encourage vigilant, proactive strategies and stimulates verification of the functioning of automated systems. The fact that, for the pilot sample, this perception was not correspondent with our external accountability manipulation indicates the need to establish whether accountability is a variable that can be significantly influenced in professional decision makers (i.e., pilots, who are already at a high level of personal responsibility for their conduct), or if it is part of some innate personality construct. It is clear, however, that the perception of accountability for one’s interaction with automation does encourage vigilant, proactive strategies and stimulates verification of the functioning of automated systems. ACKNOWLEDGEMENTS This research was supported by NASA grants NCC2-798, NCC2-837, and NAS2-832. Special thanks to reviewers Mary Connors, Ute Fischer. and Irene Laudeman. Susan Heers is currently affiliated with Monterey Technologies. REFERENCES Hagafors. R., & Brehmer, B. (1983). Does having to justify one’s decisions change the nature of the decision process? OrganizationalBehavior and Human Performance, 31, 223-232. Johnson, E. J.. Payne, J. W., Schkade, D. A.. & Bettman, J. R. (1991). Monitoring information processing and decisions: The MouseLab system. Philadelphia: University of Pennsylvania, The Wharton School. Kruglanski, A. W., & Freund, T. (1983). The freezing and unfreezing of lay inferences: Effects on impressional primacy, ethnic stereotyping. and numerical anchoring. Journal of Esperimental Social Psychology, 14,448-468. Mosier, K. L., & Skitka. L. J. (in press). Human Decision Makers and Automated Decision Ads: Made for Each Other? In R. Parasuraman & M. Mouloua (Eds.), Automation and Human Performance: Theory and Applications. NJ: Lawrence Erlbaum Associates, lnc. Mosier. K. L., Skitka. L. J., & Heers. S. T. (1995). Automation and accountability for performance. In R. S. Jensen & L. A. Rakovan (Ed.), Proceedings of the Eighth International Symposium on Aviation Psychology @p. 22 1226). Columbus. Ohio. Mosier, K. L.. Skitka. L. J., & Korte. K. J. (1994). Cognitive and social psychological issues in flight crew/automation interaction. Proceedings of the Automalion Technology and Human Performance Conference, Sage. NASA Ames Research Center (1989). WindowPANES: Workload PerformANcE Simulation. Moffett Field, CA: NASA Ames Research Center, Rotorcraft Human Factors Research Branch. Skitka, L. J.. & Mosier, K. L. (1994). Automation bias: When, where, why? Presented at the Annual Conference of the Society for Judgment and Decision Making. Downloaded from pro.sagepub.com at PENNSYLVANIA STATE UNIV on May 11, 2016
© Copyright 2026 Paperzz