Example – ACMREQ Processing For Internal GILA Kick-Off Meeting May 30-31, 2006 Martin Hofmann, Ph.D. Manager, AI Programs Advanced Technology Laboratories Starting from the Main Goal – Ace the Evaluation: From the PIP The metrics that will be used for evaluation are • Coverage percentile – how much was it able to correctly learn? For instance, if it is shown 40 steps in a plan, was it able to learn 39 correct steps? Sample instances of the coverage metric include (others might include testing for iteration, conditionals, etc.): (1) step coverage percentile: SCP = # steps learned correctly / # steps shown; (2) choice coverage percentile: CCP = # choices learned correctly / # choices shown. • Error rate – how many errors occurred and what was learned? Here we tabulate errors. Sample instances include (1) step error percentile: SEP = # steps learned incorrectly / # steps shown; (2) choice error percentile: CEP = # choices learned incorrectly / # choices shown. • Goal achievement – can the learner achieve the desired goal? If so, how well? Goal achievement cannot be directly determined by the above two metrics. A learner might correctly learn 40 out of 40 steps but then follow those 40 steps with 12 errors in which the goal is undone. There are also degrees of goal achievement, e.g., in a physical domain one can easily imagine putting together an object and having a few parts left over or having the object be functional but wobbly. The same ideas apply in computational domains, e.g., data analysis can be less complete than desired. To measure how well a goal is achieved, we will categorize goal achievement levels and score 0-10. GILA needs to learn correct steps in a process Evaluation Plans and Metrics Program of System Evaluation and Research Experimentation System Evaluation: Measure system against human performance • Evaluations by Lockheed Martin, supported by research partners • Specific ATO domain problems • Results used as: — Go/no-go criteria — Feedback to support definition of research goals — Support for transition planning and implementation Go/No-Go testing is done in this military-relevant domain Research Experimentation: Conducted using Wargus strategy application to provide a platform for controlled, intensive experimentation. • Experiments performed by individual researchers and the integrators, with the results used to: — Support ongoing research — Serve as regression testing — Validate individual technology components Domain easily shared with other IL teams for cross-project comparisons Lockheed Martin Proprietary Information Metric Type Coverage Percentile Error rate Specific Metrics Specific Evaluation in ACO Problem Domain Step coverage percentile Steps in ACO development process correctly learned vs. those identified by an expert. Choice coverage percentile Choices in ACO development process correctly learned vs. those identified by an expert. Step error rate Steps in ACO development process incorrectly learned vs. those identified by an expert. in ACOTom development process incorrectly That’s what Choices we told Choice error rate learned vs. those identified by an expert. Goal Achievement Generalization Quality of Solution Quality of ACO produced, measured using both quantitative metrics (degree of deconfliction) and subjective metrics (human expert assessment). Time to Generate Solution Time to generate ACO given a set of inputs. Model Generality How much difference between the learned model and a new input case can the model tolerate? Metric Type Example Metrics Model Robustness •Degree to which actual task success measure declines given N errors in input data •Increase in task completion time given N errors in input data •Degree to which actual task success measure declines when N bits of data missing •Increase in task completion time given N bits of data missing Task Success (using learned models) •Perceived Success (rating by independent observer) •Actual Success (objective measure calculated according to TBD formula) Efficiency Measures •Time to generate a model •Time to solve problem based upon guidance from model •Time and number of user interventions •Amount of ambiguous/incorrect guidance •Number/complexity of interactions with user needed to generate a model •Number/complexity of interactions with user needed to succeed on a task ILR Contribution •Percent degradation in task performance when component disabled Qualitative Measures •User satisfaction ratings •Ratings of model completeness, accuracy, level of abstraction System Evaluation Metric Type Coverage Percentile Error rate Goal Achievement Generalization Specific Metrics Specific Evaluation in ACO Problem Domain Step coverage percentile Steps in ACO development process correctly learned vs. those identified by an expert. Choice coverage percentile Choices in ACO development process correctly learned vs. those identified by an expert. Step error rate Steps in ACO development process incorrectly learned vs. those identified by an expert. Choice error rate Choices in ACO development process incorrectly learned vs. those identified by an expert. Quality of Solution Quality of ACO produced, measured using both quantitative metrics (degree of deconfliction) and subjective metrics (human expert assessment). Time to Generate Solution Time to generate ACO given a set of inputs. Model Generality How much difference between the learned model and a new input case can the model tolerate? Readable from the back… Lockheed Martin Proprietary Information Initial Process Description • • • • • In WEBAD tool, check if new ACMREQ airspace appears in the “Imported” group (displayed as a list) of airspaces In WEBAD Tool, run conflict checking function (via button push) Iterate over the listed conflicts with existing airspaces (airspace control measures (ACM)) Select conflicting ACM to review its detailed description If the new requested ACM is an exclusion ACM, compare its priority (derive from usage) to the conflicting ACM(s) – • If the conflicting ACM is only a procedural boundary, then – • If the requested ACM has higher priority, request (via email, chat, or phone call – the POC is listed in the ACMREQ) to replan the lower priority ACM and any sorties already planned in the ATO that use this ACM In WEBAD tool, approve the ACMREQ Check the time and space separation of the ACMs and any planned missions – Decide if the overlap can be removed, simply by shrinking an ACM. • • In TBMCS, retrieve elements of the ATO that fall into the 4D volume of the conflicting ACM Decide if there is – – • – If so, decide which ACM to alter, and specifically, which part to reduce, then notify the requester POC. If the airspace is only “on standby” during the conflict space-time: • On the GCCS Common Operational Picture (COP), list any aircraft currently occupying the conflicting ACM space • If the “Standby airspace” (e.g. a missile engagement zone) becomes active (this is driven by enemy actions, thus not predicatble), then change any mission that conflicts (airspaces and missions) – • • Time/space in the ACM that is not used by any mission Sufficient separation within the missions in each ACM If none, approve the ACMREQ Otherwise, contact the ACMREQ POC to revise the request to eliminate the conflict, using chat, email, or phone. Note: May do this in multiple levels of detail: – start with “super-box”, if this is ok, don’t worry about finer detail airspace deconfliction. Could you process an ACMREQ correctly based on this alone? For example 1 • If the conflicting ACM is only a procedural boundary, then – In WEBAD tool, approve the ACMREQ Example 1 - Options 1. We do not specify this condition in the basic process 1. GILA observes the expert – the expert approves ACMREQ without further checking 2. GILA figures out the condition – one of the airspaces is a procedural boundary 2. We specify this condition 1. GILA observes the expert approving the ACMREQ 2. GILA learns the kinds of airspace types that are procedural boundaries Using myself as an example novice: I have learned this already I could not tell you this right now – I’d have to learn this from the expert! For example • Check the time and space separation of the ACMs and any planned missions 2 How does GILA know this? – Decide if the overlap can be removed, simply by shrinking an ACM. • In TBMCS, retrieve elements of the ATO that fall into the 4D volume of the conflicting ACM • Decide if there is – Time/space in the ACM that is not used by any mission – Sufficient separation within the missions in each ACM • If so, decide which ACM to alter, and specifically, which part to reduce, then notify the requester POC. Maybe learn what is sufficient separation for various missions? I later learned that the ATO is not available until late afternoon – can’t do this step until it is published! Instead – might ask the requester
© Copyright 2026 Paperzz