Eighth USA/Europe Air Traffic Management Research and Development Seminar (ATM2009) Carbon Copy The Benefits of Autonomous Cognitive Models of Air Traffic Controllers in Large-Scale Simulations Steven Estes, Craig A. Bonaceto, Kevin Long, Dr. Scott H. Mills, & Dr. Frank Sogandares The MITRE Corporation McLean, VA [email protected] Abstract—NextGen proposes a suite of new technologies and procedures to be employed by the en route air traffic controller. Typically, the benefits of NextGen concepts are evaluated using fast-time simulation systems. However, these systems often ignore the human component of the National Airspace System (NAS) or represent it with little fidelity. Providing better estimates of NextGen productivity and efficiency benefits requires, in part, better models of human performance. In this paper we describe the construction and use of a cognitive model, Edgar, designed to address this problem. Keywords-cognitive modeling; human performance; controller agents; controller worklaod I. INTRODUCTION Every year at research institutions across the country hundreds, if not thousands, of controllers and pilots participate in human-in-the-loop evaluations. Each controller or pilot typically spends several hours of their free time doing what they do during their work time; flying airplanes or keeping airplanes separated. By participating in these evaluations they help us understand and quantify how, for example, a new Cockpit Display of Traffic Information (CDTI) will affect a pilot’s workload or how advanced automation will change the way a controller does his or her job. While these evaluations tell us a lot about the effect of new technologies and procedures on the individual, it is often difficult to extrapolate the inverse effect; how the individual, equipped with these new technologies, will impact the National Airspace System (NAS). And, where fast-time simulation and the like are employed to do so, the human component of the system is represented with little or no fidelity. Ideally, the NAS-wide benefit of any new technology would be tested using huge, human-in-loop evaluations where the human, technology, and environment are all represented with a high level of fidelity. Of course, even with as many controllers and pilots as are participating in evaluations today, this huge, high fidelity evaluation would require many orders of magnitude more. During these evaluations each controller or pilot might spend a week or more controlling traffic and flying planes in concert with hundreds of other controllers. These evaluations would cover many control centers’ worth of sectors and would run, literally, 24 hours a day, 7 days a week. This is, of course, a completely infeasible scenario. If large-scale simulations for MITRE SPONSORED RESEARCH the purpose of getting a glimpse of the system-wide picture are to be accomplished, bringing in hundreds of controllers for a single evaluation is not the answer. What is needed is a reasonable controller facsimile; a carbon copy of sorts. In this paper we will discuss how human behavior can be replicated using cognitive modeling, the application of those models to the aviation domain, and describe our cognitive model, Edgar, and how it can be used in large-scale evaluations. II. AUTONOMOUS COGNITIVE MODELING The phrase “autonomous cognitive models” may or may not have any meaning to you, the reader. Those who are unfamiliar with the term might, reasonably, infer something like A.I. (artificial intelligence). And, as a touchstone, this is a fine place to begin thinking about autonomous cognitive models. In the sense that cognitive models are software based agents that act autonomously, there is a definite commonality. There is, however, also a critical distinction between the two. While some A.I. agents may be designed to seem human, there is no constraint that they must go through a “human-like” decision making process. A cognitive agent, however, attempts to emulate not only observable behavior, but also the process by which the human arrives at that behavior. Further, as will be discussed later in this paper, the cognitive agent has a set of resources that are designed to be human-like. That is, the cognitive agent has human-like resources like memory and vision, all of which are limited in the same way the corresponding human system is limited. Before delving much deeper into what cognitive modeling is and how we have applied it to the air traffic control domain, it will be helpful to provide some context. A. A Cognitive Science Primer Prior to the 1950s, cognitive science was a non-entity. This is not to say that studies of human cognition had not occurred until that time; they certainly had. Theory foundational to cognitive science had been generated as far back as Aristotle and Plato, but there had not been a system of study to which that theory had been incorporated and applied. But in the late fifties, the domain of cognitive science began to crystallize, and disparate theories of cognition came together [18]. Critical to the discussion at hand, this coalescing of ideas lead to theories of the mind and accompanying models of attention, memory, language, learning, and the like. B. The Model Human Processor In 1983, Card, Moran, and Newell [6] introduced the Model Human Processor (MHP). The MHP was interesting because it not only provided a box-and-arrow model of human cognitive processes, it also acted as an engineering tool, enabling the user to make quantitative predictions about human performance. As shown in figure 1, Card, Moran, and Newell gathered much of the quantitative research over the 30 years prior and applied it across the various constructs of cognitive psychology, describing the capacity, durability, and speed of various components of the cognitive system. This provides a description of the method a person might use to save a file. However, cognitive models go one step further, executing that method in accordance with a set of constraints like those shown in Figure 1. Those constraints will act on the motor, perceptual, and cognitive resources of an architecture in a variety of ways. We broadly define these limits as being either temporal or resource constraints. A temporal constraint ensures that each step is carried out according to an empirically derived human execution time, despite the fact that a computer could clearly execute any one of the steps in the above example far faster than any human could. For instance, as it takes, on average, a human 1100 ms to point and click on an object on a display, the cognitive agent should likewise also take roughly 1100 ms [6]. Depending on the fidelity of the architecture, this constraint may act at the task or symbolic level and may be bounded or enhanced by rules like Fitt’s law [10], which gives a more precise execution time estimate based on the distance to and size of the display target. Resource constraints are slightly different, in that they determine the availability and/or the capacity of cognitive, motor, and perceptual resources. In its simplest form, this means that a properly constrained cognitive model cannot, for instance, “see” two things at once or “say” two things at once. Depending on the architecture’s fidelity, it may also be that working memory resources have capacity and durability limitations in accordance with human cognitive limits so that a model’s working memory is not functionally limitless. Figure 1. Card, Moran, & Newell's Model Human Processor (MHP) Cognitive models (which may also sometimes include Human Performance modeling) then provide physical, workable representations of the way a human “thinks” and interacts with his environment. Cognitive models may be instantiated as paper and pencil or software models, the primary focus here is on software based models. As such, we can also say that cognitive models are, for the purposes of our research, autonomous agents. Though many cognitive architectures, where an architecture is formal representation of both behavior and processes, are capable of learning [2, 21], the models employed for this project represent only expert behavior and never learn in the course of execution. These models [14] are built by reconstructing the method used to accomplish a given task through controller interviews and literature review. As an example, if the goal is to save an electronic document, the method to accomplish that goal might be annotated as follows: Method_for_Goal: Save_Text File Step 1. Look_at Menu_Item whose Type is "File" Step 1. Point & Click on the “File” menu Step 2. Point & Click “Save As” Step 3. Determine File Name Step 4. Hands to Keyboard Step 5. Type File Name Step 6. Hands to Mouse MITRE SPONSORED RESEARCH So, it is through this representation of human goals and methods, executed according to a set of human-like constraints, that a cognitive model is arrived at that might reasonably replicate human behavior. III. COGNITIVE MODELING IN THE AVIATION DOMAIN Cognitive modeling is a tool and as with any tool, it is most effective when applied according to its expressed purpose. The types of behavior to which this style of cognitive modeling are well suited are characterized, broadly speaking, by experts engaged in routine tasks within a relatively structured (i.e., highly rule based) environment. Typically the task to be completed would involve a human computer interface. By this definition, the task of air traffic control is quite amenable to cognitive modeling. Of course the tool’s appropriateness does not automatically assure utility. Whether or not cognitive modeling may be usefully applied to problems of the aviation domain, the more basic question is, “When does cognitive modeling provide a unique and/or better solution than other analysis tools?” The answer is found in the concept of the Cognitive Model in the Loop Evaluation (CITL). A. The CITL versus the HITL As touched on in the introduction, an ideal HITL evaluation is one that might be called a 24 x 7 x 2 x 12 HITL. This is an evaluation that runs 24 hours a day, seven days a week over two weeks and involves twelve or more controllers, each working different sectors. The impracticality of such an evaluation is obvious, as should be its benefits. With this number of controllers available over such a long period of time, evaluations could include much larger geographic areas, and, as an added benefit, would not require the type of contrived, heavily scripted scenarios that are often employed to ensure specific events occur during the evaluation. The CITL concept thus allows on-demand access to dozens of surrogate controllers that can be used in exactly this type of idealized evaluation. CITLs may also be used in a hybrid HITL in which real controllers control traffic in the center sectors, but bordering sectors are controlled by cognitive models. This would allow air traffic to be delivered in a more realistic manner to the controller and, with the addition of speech recognition and text to speech capabilities, models could conceivably communicate with the live center controllers. They can also be used as pilots - eliminating the need for pseudo-pilots to take part in evaluations, as is typically done. All of this means far less manpower is necessary in enacting a HITL of this type. B. Benefits of the CITL Cognitive models, embedded within a simulation environment, allow for macro level system results that are based on, among other things, micro level cognitive data of a type typically only gained in human-in-the-loop evaluations. This then allows us to evaluate the impact of a new technology on something much closer to a system-wide scale. Perhaps just as importantly, these types of evaluations can happen early in the design process (even before working prototypes of the concept to be tested are developed). Because the models identify potential problems with a design, this also means that fairly early in concept development it would be possible to either make changes for a more effective result or determine that the concept is not one that is worth pursuing. The result is fewer design iterations and the ability to test with live controllers and pilots nearer to a concept’s final form. The CITL is, of course, not without drawbacks. Model Human Processor (MHP) styled cognitive models assume expert, error free behavior and will often represent the visual system in a very rudimentary way. They are also very rigid ways of representing a decision making process that may actually be quite flexible. IV. • Providing a basic set of cognitive resources including a visual system, working memory, long term memory, and procedural task execution • Ability to multitask • Temporal and mental workload estimates A. Fidelity Most of these requirements are answers to the implicit question of fidelity. That is, for our purposes, at what level of detail is it necessary to replicate human behavior? Is it required that the model specify down to the level of individual neurons, or is it sufficient to work at the unit-task level? Consider the image shown in Figure 2. The deer depicted in the image is clearly being reflected in the stream. Even without the context of the rest of the image, the reflection, drawn with much less detail, is still clearly recognizable. If the goal is only to communicate the image of a deer, either rendering would do. And, if the goal is to do so efficiently, the reflection is actually preferable. This is a reasonable, if imprecise, analogy for the level of detail at which we found it necessary to model en route controller behavior. Our goal is not to determine how individual neurons perform, but rather to quantify how a controller performs. So, if the emergent behavior of the controller can be adequately represented with less detail and effort than is, for example, required for neurological models, it is preferable in an applied setting to use a lower fidelity architecture which will presumably be easier to learn, build, modify, and implement. Figure 2. Fidelity [5] EDGAR In the fall of 2006 MITRE began developing its own en route controller model, Edgar, with the ultimate goal of creating an agent that could a) replicate the most common en route controller behaviors and b) be embedded as an autonomous agent within our real time, en route simulation environment. To that end, we began to look for a cognitive architecture - in this case a software package - that would meet our requirements which included: • Relatively easy to learn and use • Provisions to link the architecture to an outside simulation environment In comparing the fidelity required to the fidelity provided by a large variety of cognitive architectures we found a subset of architectures that addressed our needs including CORE [22], APEX [11], ACT-R [2], and GLEAN [15]. Ultimately, we chose GLEAN. Among the major determinants were the language GLEAN is built in (it needed to be amendable to our lab infrastructure), the ease of modeling, and the fact that GLEAN is based on MHP-style cognitive modeling theory with which we were already experienced. B. GLEAN Cognitive Architecture GLEAN formalizes a cognitive modeling technique known as GOMS (Goals Operators Methods & Selection Rules), a derivative of the Model Human Processor. “Paper and pencil” versions of GOMS have been used very successfully at MITRE for a wide variety of projects including analyses of tower, en route, and TRACON controllers, analysis of pilots using advanced delegated separation tools, and analyses of Traffic Managers using Next Generation Air Transportation System (NextGen) style automation [8, 9]. GLEAN provided an avenue to continue this same type of analysis, but on a much larger scale. Kieras [15] describes GOMSL/GLEAN in the following way: "GOMSL runs in a simple cognitive architecture, implemented in the GLEAN simulation environment, that represents the assumed mechanisms and capabilities of the human information processing system. The cognitive architecture consists of a set of human-like processors and output capabilities, each constrained according to the limitations of the human cognitive system. " According to Kieras, processors and output devices, include: "Auditory processor accepts either speech or sound inputs and makes them available to the cognitive processor, where they are tested or waited for by auditory operators. The visual processor accepts visual events that take the form of an object appearing with specified properties, or a change in a visual property of an existing visual object. Visual input is either searched or waited for with visual operators. Vocal output is produced when a vocal operator is executed, specifying an utterance to be produced, which is sent to the device or another simulated human auditory processor. Manual output takes many different forms depending on the type of movement specified by a manual motor operator. The architecture contains certain memory structures. Each perceptual processor has its own working memory. The cognitive processor's working memory has two partitions: an object store and a tag store. The tag store represents the conventional human working memory, and contains tags (labels) each with an associated Figure 3. value. " It is also worth noting that GLEAN is multi-threaded, which allows for simulated human multitasking, increasing the accuracy of the task time prediction. C. Model Building A representative controller model executes a set of controller tasks. The tasks included in that set determine, in part, how thoroughly the whole of controller behavior is modeled. The tasks selected for Edgar currently include: accepting handoffs, offering handoffs, maintaining spacing on a flow, identifying and resolving pairwise conflicts, and implementing Miles-in-Trail (MIT) restrictions. Each of these tasks was formalized using a form of Cognitive Task Analysis (CTA) known as Hierarchical Task Analysis (HTA). HTA both defines the knowledge required for completing a task and presents those tasks in the order in which they must be executed [16]. As an example, the HTA for the offer handoff task (also known as initiate handoff or exit) is shown in Figure 3. As shown in the figure, the controller begins by monitoring the Display System Replacement (DSR) for aircraft in position to be handed off. The task proceeds through the controller completing the necessary data entry and verbalizing the handoff to the pilot. It includes activities that controllers often complete to facilitate remembering like adjusting the length of the leader line as a reminder of which aircraft have checked in on the frequency. HTA, informed by literature review and controller interviews, was conducted for each of the previously described tasks. Once the HTA has been completed for a given task, it can be translated into GLEAN syntax (GOMSL). This process involves converting each cell of the above HTA into a goal or step in a method. The GOMSL syntax for the Initiate Handoff HTA is shown in Figure 4. Notice that some cells of the HTA HTA for Handoff Task Method_for_goal: Accept_Handoff Loop Step. Delete <current_ac>; Delete <next_ac>. Step Begin. Look_for_object_whose Handoff_Status is "Eligible_Incoming" and_store_under <current_ac>. Step log. Log <current_ac>. Step. Decide: If <current_ac> is Absent Then Goto Fine. Step. Accomplish_goal: Get Accept_Handoff_Data. Step. Accomplish_goal: Accept_Handoff Data_Entry. Step. Accomplish_goal: Accept_Handoff Data_Block_Cleanup. Step. Accomplish_goal: Say Hello. Step. Delete <current_cid>; Delete <current_acid>; Delete <carrier>. Step. Delete <flight_id>; Delete <ho_status>; Delete <next_ac>. Step. Goto Begin. Step Fine. Return_with_goal_accomplished. Method_for_goal: Get Accept_Handoff_Data Step 1. Store CID of <current_ac> under <current_cid>; Log <current_cid>. Step 2. Store ACID of <current_ac> under <current_acid>; Log <current_acid>. Step 3. Store Airline of <current_ac> under <carrier>; Log <carrier>. Step 4. Store Flight_ID of <current_ac> under <flight_id>; Log <flight_id>. Step 5. Store Handoff_Status of <current_ac> under <ho_status>; Log <ho_status>. Step 6. Return_with_goal_accomplished. Method_for_goal: Accept_Handoff Data_Entry Step 1. Type_in <current_acid>. Step 2. Verify Correct. Step 3. Keystroke Enter. Step 4. Return_with_goal_accomplished. Method_for_goal: Accept_Handoff Data_Block_Cleanup Step Begin. Type_in "6/2 ". Step. Type_in <current_acid>. Step. Verify Correct. Step. Keystroke Enter. Step. Log Leader_Length of <current_acid>; Log Leader_Direction of <current_acid>. Step. Return_with_goal_accomplished. Method_for_goal: Say Hello Step 1. Store Airline of <current_ac> under <carrier>; Store Flight_ID of <current_ac> under <flight_id>. Step 2. Recall_LTM_item_whose Abbrev is <carrier> and_store_under <airline_name>. Step 3. Keystroke Push_to_Talk. Step 4. Speak Call_Sign of <airline_name>. Step 5. Speak <flight_id>. Step 6. Speak "Jacksonville Center, good day". Step 7. Keystroke Release_Push_to_Talk. Step 8. Return_with_goal_accomplished. Figure 4. GOMSL Syntax for Accept Handoff Task are steps of a larger goal (e.g., monitoring the DSR) while others are the basis of the goal itself (e.g., Accept_Handoff Data_Entry). It is sometimes the case that the HTA will not provide all details necessary to complete the GLEAN model. For example, the HTA calls out the “Initiate DSR Handoff” task but does not specify the keystrokes necessary execute that task. In the model, those steps must be enumerated, and this will often require further consultation with experts. The complete method for initiate handoff data entry is shown under “Initiate Handoff Data Entry.” D. Model Strategy for Conflict Resolution Two of the more interesting tasks to model were spacing on a flow (maintaining the prescribed horizontal separation minima for aircraft following the same path) and pairwise conflict resolution (maintaining the prescribed horizontal and vertical separation minima for aircraft on different paths). Because these tasks are two of the more difficult controller tasks and the resulting task models provided insight into controller cognition, they merit further discussion. There were two important principals, both culled from the literature and applied to the modeling effort, for replicating the process by which controller’s resolve these flow and pairwise conflicts: Long Term Working Memory (LTWM) and satificing using hierarchical decision making. retrieval structures. In short, the controller’s knowledge of the domain is stored in such a way that it is easily retrievable when tied to cues in working memory. In practice, this effect has been seen in chess, medical diagnosis, short order cooking, and flight planning. Under this theory, controllers become highly efficient processors of information. There is also a corollary capability that a controller must develop: a controller must also be an expert "forgetter". Even with the expanded capacities that Long Term Working Memory provides, there are still capacity limits, and new information will eventually overwrite or interfere with old information. Further, because air traffic control is a highly dynamic domain, there are distinct disadvantages to holding information in memory for long periods of time. For example, the current speed of the aircraft may be no indication of the future speed of the aircraft. The DSR also obviates the need to store information in memory for very long. It is clearly advantageous to the controller to rely on perfect knowledge in the world (the DSR) rather than imperfect knowledge in the head [19]. Other researchers have speculated that this type of forgetting exits in air traffic control [13] and theories like functional decay [1] have given algorithmic descriptions of accelerated forgetting among experts. Figure 5. Edgar's Control Loop Controllers have the ability to store an exceptional amount of information in short term, or working, memory. While the average person can store about seven plus or minus two chunks of information in working memory [17], the expert controller has been found to store up to thirty pieces of information about a set of aircraft [4]. This capacity, which allows the controller to process large amounts of information, is extraordinarily useful as, beyond processing large amounts of “on-screen” information, the controller must analyze the traffic display according to knowledge stored in memory. Learned knowledge, or knowledge stored in Long Term Memory, may include information about aircraft, effects of altitude of performance, Standard Operating Procedures and Letters of Agreement for the sector, a large number of heuristics for determining separation, and a repository of knowledge based on prior experience with any number of different conflict geometries. The controller’s ability to retain and quickly recall this information from memory may be accounted for by the theory of Long Term Working Memory [7]. The theory of Long Term Working Memory posits that experts in a given domain are able to enhance working memory through the use of enhanced Functional Decay and Long Term Working memory have had important impacts on our controller agent. First, Edgar “forgets” as soon as the information is no longer needed. This typically means information will be held in working memory for, at most, 10 seconds. Second, rather than approaching conflict problems mathematically, our model relies on many large sets of decision rules, stored in Long Term Memory and accessed via cues in working memory, just as a human controller would. Further, Edgar employs these decision rules under a strategy of satisficing (as does the controller), which entails finding a suitable solution to a problem that be suboptimal. This strategy allows a controller to quickly determine whether a potential conflict will in fact result in a loss of separation. This, it would seem, is a reasonable tradeoff as a) the goal is to find a solution that will maintain separation and b) trading speed for a less than optimal solution accomplishes that goal safely. E. Overview of Model Execution Edgar executes all the modeled tasks according to a control loop, much as a human controller would. In the control loop, Edgar begins by looking for aircraft in position to be accepted into the sector. Edgar then looks for aircraft for which a previously issued vector or altitude needs to be resolved. Edgar subsequently looks for and addresses any new conflicts (including pairwise and spacing on a flow). Edgar ends the loop by looking for aircraft in position to be handed off to the next sector after which the loop starts again. A flow chart representing both the control loop and some aspects of Edgar's decision making process can be found in Figure 5. F. Model Outputs When running a CITL evaluation, Edgar is instantiated in the sector or sectors of interest in our en route simulation environment. As Edgar runs, it generates a log file or trace of the controller model's activities. The model trace contains information about statement execution time, visual system use, and storage to and retrievals from working memory among other times. An annotated example of one line of output from a model is shown in Figure 6. When a trace is generated, it is fed through an analysis tool called CADRE (Cognitive Agent Data Reduction Effort). CADRE identifies all the methods executed, the execution time for each method, and the working memory load for each method. Based on the model output, statistics are generated to measure the controller's workload. Workload is characterized by the mental difficulty and time available to do the work. Time available, or temporal workload, is a relatively straightforward measure. Using the statement time we can observe how long it took Edgar to accomplish a set of tasks. Where appropriate, that time can be compared to the amount of time available to accomplish those tasks. However, as CADRE is most often used to compare efficiency of the controller with and without some new tool (for example, the amount of time to complete a handoff with data link versus over the voice channel), it is typically more interesting to compare total task time across tasks than to the total time available for completion of tasks. Mental workload is measured in a different manner. CADRE assesses the mental difficulty of the work based on what may loosely be described as the amount of information the controller is dealing with during any given 50 millisecond cycle of the model's execution. More formally, it is the total number of chunks stored in the model's working memory resource. Each time a piece of information is stored or deleted from the working memory resource, CADRE notes this change and updates the count of the total number of items stored in working memory. This count can then be compared against known working memory thresholds (as previously discussed, the average person can hold about seven chunks of information in working memory). This limited capacity means that working memory is often a bottleneck in problem solving. It is also a good indicator of the cognitive difficulty of the work. Interestingly, such quantitative and objective measures of working memory load and can only be measured with a model (a human cannot tell you how much information is stored in working memory). As such, it illustrates another way in which models are powerful - they open (slightly) the black-box process of cognition so that it may be observed. So, not only do we know the magnitude of the mental workload, but, in cases where the Figure 6. work is found to be particularly difficult, the exact cause of that difficulty. V. VALIDATION EXERCISE When cognitive models are validated, it is almost always done by having a model and human complete the same set of tasks and subsequently examining how well the model performance conforms to human performance. A similar method is employed here, however, rather than running human participants for the express purpose of testing the model, we gathered data from the literature. Human performance is a broad term. In the course of evaluating a model, any number of aspects of performance may be examined. Among those are: visual scan, decision accuracy, learning rates, and perceptual motor skill. In the coming years we will likely examine Edgar's ability to replicate human performance in most of these areas, but currently we are focused on temporal workload and task times. One of key constraints on increasing NAS capacity is the controller. The controller simply has a finite amount of time to complete tasks. Increases in demand will ultimately lead to a situation in which the amount of time available is exceeded by the total time necessary to complete the tasks at hand. As the time available to complete those tasks cannot be changed, an important component of increasing NAS capacity will be decreasing task time. As part of NextGen, automation will be provided to the controller that enables task time reduction. In evaluating and designing NextGen automation tools, one of the more important goals is quantifying exactly by how much task times are reduced and, as a result, how much capacity is gained in the NAS. So, in as much as Edgar can provide immediate benefit in evaluating questions of controller efficiency and productivity, verifying that Edgar is able to replicate controller task times is a high priority for this research. To do so, CADRE was used to analyze task times generated by Edgar for a set of tasks for which we also had human data. In some cases, the way in which Edgar completed a task differed from the way the task was executed in the literature. Where those differences existed, Edgar was, for the purposes of the validation exercise, modified to execute the task as defined in the reference material. Although Edgar's task time may vary slightly run-to-run depending on things like the length of the clearances being issued, for this analysis we ran Edgar only once for each task. As such, Edgar produces a single value for comparison to the observed data culled from the literature. Annotated GLEAN Output A. Validation Results In the first test, we compare the model and human time required to complete the transfer of communication and initial contact tasks using data from a data communications study conducted by the Federal Aviation Administration (FAA) Free Flight office [3]. It should be noted that because the time to complete the flight plan review component of the initial contact task was not included in the human times for initial contact, we removed it from the model time. Three comparisons were made. Two of these tasks required the controller to use the voice channel to issue the relevant clearance and in the third the controller used data communications rather than the voice channel to issue the relevant clearance. The results are shown in Figure 7. Figure 7. Time in Seconds to Complete Specified Task The greatest discrepancy between the model prediction and the observed average was found for the Transfer of Communication Data Comm task at 400 ms. This result is both within the standard deviation (depicted by the error bars) and, from a standpoint of practical application, well within the range of acceptable results. The model prediction for Initial Contact Voice performed slightly better, within 200 ms of the observed data and, again, well within the standard deviation. The average task time for Transfer of Communication Voice was identical to the model prediction. It is very important that Edgar provide accurate temporal predictions for the Transfer of Communication and Initial Contact tasks because they both occur frequently (once for each aircraft in the sector). These task times are driven by data entry and vocal utterances, both of which are known to be much easier to provide accurate temporal predictions for than are other aspects of human performance. Providing accurate task times for more nuanced decision making tasks and decision making tasks which demand considerably more of the cognitive system is a more difficult challenge. As a result, we chose one component of a task which fits this description for another validation exercise. The conflict resolution task includes, at a high level, conflict identification, generation of conflict resolution, and issuance of that resolution. While issuance of the resolution falls into the bin of tasks we would expect to provide quite accurate task times for, both the conflict identification and resolution are expected to be more difficult to accurately predict temporally. Further, conflict identification is implemented in Edgar in such a way that task time varies based on the conflict geometry. In a paper on the conflict identification process, Rantanen and Nunes [20] provide a series of conflict identification times which, as with Edgar, vary based on the geometry of the conflict. According to Rantanen and Nunes, the length of time required to identify the conflict will depend on the differential speed, heading, and altitude of the aircraft involved in the conflict. The shortest conflict identification times should be for aircraft that have altitudes that will exceed separation minima at the conflict intersection, as altitude will be evaluated first by the controller. If the delta in altitude will be within the separation minima, the controller must further evaluate the conflict according to heading. If the aircraft paths are deemed to intersect, the controller must finally evaluate the conflict according to speed. Clearly, the earlier this decision hierarchy is exited, the quicker the conflict is identified. Edgar, informed by our task analysis and controller interviews, operates in much the same way. However, because heading is identified as an initial pattern recognition in the conflict identification process, our hierarchy begins with heading and is subsequently followed by altitude and speed rather than beginning with altitude. Likewise, although the magnitude of the conflict angle would had an impact on the time to identify the conflict in Rantanen and Nunes, it has no effect in Edgar. As such, the results shown in Figure 8 compare only the results for two aircraft that have a heading difference of zero degrees (that is, two aircraft on the same flight path). Figure 8. Time in Seconds to Identify Conflict When the model performance is compared to that of the performance of participants in Rantanen and Nunes, we again find all results within the standard deviation for human data. However, it is worth noting that where subjects in the Rantanen and Nunes experiment showed higher identification times when a conflict actually existed, Edgar does not replicate the result. This is because the point at which Edgar exits the conflict identification loop determines whether or not the conflict exists, whereas a human may further perform a mental validation of the result. Thus far for the tasks analyzed, Edgar has performed quite well. Further research needs to be done on task time for the conflict resolution task as well as for additional tasks to be added to Edgar's repertoire in the future like the ability to handle point-outs and execute merging. It will also be necessary to evaluate the impact of multitasking, which Edgar can do, on total performance time. Most evaluations of controller task time consider the task in isolation. However, controllers often accomplish several tasks at once. For example, while issuing a voice clearance to an aircraft, the controller can scan for conflicts or aircraft in position for handoff. So if an observer were to add up the task times for all of the tasks accomplished and consider that total execution time, it would often exceed the actual total execution time of the controller, who is able to accomplish some tasks (or portions of some tasks) simultaneously. VI. SUMMARY As Edgar becomes more capable, we will continue to test its predictive power. This will include not only its ability to estimate task times, but also mental workload and emergent behavior. In the mean-time, we have a good initial en route controller model for use in CITLs which gives us the capability to evaluate NextGen concepts on a much larger scale then would be possible with live controllers and pilots. Cognitive Models are a long way from being able to replicate human behavior so accurately that real controllers and pilots are no longer needed in human-in-the-loop evaluations. And there is no intention of using Edgar to stop the steady stream of pilot and controllers participating in evaluations at MITRE. As long as there are controllers and pilots in the system, we will always consider their input on the impacts of new procedures and technologies above that of a cognitive model. After all, there really is no replacement for the real thing. But, if we can use models early in the design process and in large, CITL evaluations, they will be able to supplement controller feedback with insights into how a new technology will affect not just a sector, but the entire NAS. In the long term, our hope is that Edgar, a rough carbon copy of human behavior, can help to get tools into the hands of pilots and controllers that will help them improve safety and efficiency in the NAS. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] ACKNOWLEDGMENTS The authors would like to thank the various managers, computer scientists, and air traffic controllers who aided in the development of Edgar including: Charlie Bailey, Bill Bateman, Bob Bearer, Ed Brestle, Gary Bogle, Chris DeSenti, Alfreda Gipson, Urmila Hiremath, Dr. David Kieras, Keith Knudsen, Chris Magrin, Scott Mayer, Dien Nguyen, Elliott Simons, Mike Tran, Zach Eyler-Walker, Scott Wood, and Soar Tech. [17] [18] [19] [20] [21] Altmann, E. M., & Gray, W. D. (2002). Forgetting to remember: The functional relationship of decay and interference. Psychological Science, 13(1), 27-33. Anderson, J. R., & Lebiere, C. (1998). The Atomic Components of Thought. Mahwah, NJ: Erlbaum. Bennett, Michael (2003). CPDLC Benefits Story. FAA Free Flight Office, Unpublished. Bisseret, A Mémoire opérationelle et structure du travail Bulletin de Psychologie 1970 XXIV 280-294 English summary in Ergonomics, 1971, 14, 565-570. Calder, Alexander (1967), Fables of Aesop, According to Sir Roger L'Estrange New York: Dover Publications. Card, Stuart; Thomas P. Moran and Allen Newell (1983). The Psychology of Human Computer Interaction, Lawrence Erlbaum Associates. ISBN 0-89859-859-1. Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102(2), 211-245. Estes, S.L. (2005). Arriving from Delphi at O’Hare: Predictive Cognitive Engineering in the O’Hare Modernization Project and Beyond. In Proceedings of the 49th Annual Meeting of Human Factors and Ergonomics Society, Orlando, FL Estes, S.L., Masalonis, A.J. (2003). I See What Your Thinking: Using Cognitive Models to Refine Traffic Flow Management Decision Support Prototypes. In Proceedings of the 47th Annual Meeting of Human Factors and Ergonomics Society, Denver, CO. Fitts, Paul (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, volume 47, number 6, June 1954, pp. 381-391. Freed, M., Matessa, M., Remington, R., and Vera, A. (2003). How Apex Automates CPM-GOMS Proceedings of the 5th International Conference on Cognitive Modeling. Bamberg, Germany. Gronlund, S. D., Dougherty, M. R. P., Ohrt, D. D., Thompson, G. L. & Bleckley, K. M. (1997). The Role of Memory in Air Traffic Control. (Report No. DOT/FAA/AM-97/22) Washington, DC: Federal Aviation Administration. Kieras, David (1988). "Towards a practical GOMS model methodology for user interface design". in Martin Helander. Handbook of HumanComputer Interaction. Amsterdam, The Netherlands: Elsevier Science Publishers. pp. p135-157. ISBN 0-444-88673-7. Kieras, D., (1996) A Guide to GOMS Model Usability Evaluation using GOMSL and GLEAN4, University of Michigan. Kirwan, B. & Ainsworth, L. K. (Eds.) (1992). A guide to task analysis. London, UK: Taylor & Francis. Miller, G. A. (1956). The Magical Number Seven Plus or Minus Two: Some limits on our capacity for processing information. Psychological Re view, 63, 81-97 Newell, A. (1973). "You can’t play 20 questions with nature and win: Projective comments on the papers of this symposium". In W. G. Chase (ed.), Visual Information Processing. New York: Academic Press. Norman, Don (2002), The Design of Everyday Things. New York: Basic Books. Rantanen, E. M., & Nunes, A. (2005). Hierarchical conflict detection in air traffic control. International Journal of Aviation Psychology, 15(4), 339-362. Rosenbloom, P. S., Laird, J. E., & Newell, A. (1993) The Soar papers: Research on Integrated Intelligence. MIT Press, Cambridge, MA. Tollinger, Irene, Richard L. Lewis , Michael McCurdy , Preston Tollinger, Alonso Vera, Andrew Howes, Laura Pelton, Supporting efficient development of cognitive models at multiple skill levels: exploring recent advances in constraint-based modeling, Proceedings of the SIGCHI conference on Human factors in computing systems, April 02-07, 2005, Portland, Oregon, USA AUTHOR BIOGRAPHY Steven L Estes lives in Savannah, Georgia. He holds a BA in History (1996) from the University of Georgia in Athens, Georgia and an MA in Human Factors and Applied Cognition (2002) from George Mason University in Fairfax, Virginia. He is currently a Lead Human Factors Engineer at the MITRE Corporation’s Center for Advanced Aviation System Design in McLean, Virginia. Prior to working for MITRE, he was employed as a human factors engineer at Gulfstream Aerospace. Publications include the book chapter the book chapter “Macrocognition in systems engineering: supporting changes in the air traffic control tower”, published in the book Naturalistic Decision Making and Macrocognition (Burlington, VT: Ashgate Publishing Company, 2008). Research interests include: cognitive engineering, human computer interface design, human decision making, and human factors in the aviation domain. Mr. Estes is a member of the Cognitive Science Society and the Human Factors and Ergonomics Society. Craig A. Bonaceto lives in Chelmsford, Massachusetts. He holds a dual BS degree in computer science and psychology (2001) from Rensselaer Polytechnic Institute in Troy, New York. He is currently a Senior Information System Engineer at the MITRE Corporation’s Command and Control Center in Bedford, Massachusetts. Prior publications include the book chapter “A survey of the methods and uses of cognitive engineering”, published in Expertise Out of Context (New York, New York: Lawrence Erlbaum Associates, 2007), and the book chapter “Macrocognition in systems engineering: supporting changes in the air traffic control tower”, published in the book Naturalistic Decision Making and Macrocognition (Burlington, VT: Ashgate Publishing Company, 2008). He applies Cognitive Engineering methods to improve the design of Enterprise Systems in air traffic control and military command and control. He has also performed research on human “mental models” to improve Human-System Integration. Mr. Bonaceto is a member of the Cognitive Science Society and the International Council on Systems Engineering (INCOSE). Kevin Long currently lives in Great Falls, Virginia. He holds a BA in Sociology (2006) from The Catholic University of America in Washington D.C. and is currently pursuing a MA in Human Factors Applied Cognition from George Mason University in Fairfax, Virginia. He is currently employed as a Human Factors Engineer at the MITRE Corporation’s Center for Advanced Aviation Systems Development in McLean, Virginia. Research interests include human factors in the aviation domain, specifically situation awareness and workload, and human computer interface design. Mr. Long is a member of the Human Factors and Ergonomics Society (HFES). Scott H. Mills lives in Arlington, Virginia. He holds a BA degree in psychology (1987), an MS in experimental psychology (1991), and a PhD in experimental psychology (1995) from the University of Oklahoma in Norman, Oklahoma. He is currently a Lead Multi-Discipline Systems Engineer at the MITRE Corporation’s Center for Advanced Aviation System Design in McLean, Virginia. Prior to working for MITRE, he was a Senior Human Factors Engineer at SBC Technology Resources in Austin, Texas and an Engineering Psychologist at the FAA Civil Aerospace Medical Institute in Oklahoma City, Oklahoma. Research interests include human factors in aviation and human computer interface design. Dr. Mills is a member of the Human Factors and Ergonomics Society. Frank M. Sogandares lives in the Northern Virginia suburbs of Washington, DC. He holds a BA in physics (1983) from Hendrix College in Conway, Arkansas and a PhD in physics (1991) from Texas A&M University in College Station, Texas. He is currently a Principal Engineer with the MITRE Corporation’s Center for Advanced Aviation System Design in McLean, Virginia. His interests are distributed computation, languages, and the application of these to real-world problems.
© Copyright 2025 Paperzz