ACO Example

Example – ACMREQ
Processing
For Internal GILA Kick-Off Meeting
May 30-31, 2006
Martin Hofmann, Ph.D.
Manager, AI Programs
Advanced Technology Laboratories
Starting from the Main Goal –
Ace the Evaluation: From the PIP
The metrics that will be used for evaluation are
• Coverage percentile – how much was it able to correctly learn? For
instance, if it is shown 40 steps in a plan, was it able to learn 39 correct
steps? Sample instances of the coverage metric include (others might
include testing for iteration, conditionals, etc.): (1) step coverage percentile:
SCP = # steps learned correctly / # steps shown; (2) choice coverage
percentile: CCP = # choices learned correctly / # choices shown.
• Error rate – how many errors occurred and what was learned? Here we
tabulate errors. Sample instances include (1) step error percentile: SEP = #
steps learned incorrectly / # steps shown; (2) choice error percentile: CEP
= # choices learned incorrectly / # choices shown.
• Goal achievement – can the learner achieve the desired goal? If so, how
well? Goal achievement cannot be directly determined by the above two
metrics. A learner might correctly learn 40 out of 40 steps but then follow
those 40 steps with 12 errors in which the goal is undone. There are also
degrees of goal achievement, e.g., in a physical domain one can easily
imagine putting together an object and having a few parts left over or having
the object be functional but wobbly. The same ideas apply in computational
domains, e.g., data analysis can be less complete than desired. To measure
how well a goal is achieved, we will categorize goal achievement levels and
score 0-10.
GILA needs to learn correct steps in a process
Evaluation Plans and Metrics
Program of System Evaluation
and Research Experimentation
System Evaluation: Measure
system against human performance
• Evaluations by Lockheed Martin,
supported by research partners
• Specific ATO domain problems
• Results used as:
— Go/no-go criteria
— Feedback to support definition of
research goals
— Support for transition planning and
implementation
Go/No-Go testing is done in this
military-relevant domain
Research Experimentation:
Conducted using Wargus strategy
application to provide a platform for
controlled, intensive experimentation.
• Experiments performed by individual
researchers and the integrators, with
the results used to:
— Support ongoing research
— Serve as regression testing
— Validate individual technology
components
Domain easily shared with other IL
teams for cross-project comparisons
Lockheed Martin Proprietary Information
Metric Type
Coverage
Percentile
Error rate
Specific Metrics
Specific Evaluation in ACO Problem Domain
Step coverage percentile
Steps in ACO development process correctly
learned vs. those identified by an expert.
Choice coverage percentile
Choices in ACO development process correctly
learned vs. those identified by an expert.
Step error rate
Steps in ACO development process incorrectly
learned vs. those identified by an expert.
in ACOTom
development process incorrectly
That’s what Choices
we told
Choice error rate
learned vs. those identified by an expert.
Goal
Achievement
Generalization
Quality of Solution
Quality of ACO produced, measured using both
quantitative metrics (degree of deconfliction) and
subjective metrics (human expert assessment).
Time to Generate Solution
Time to generate ACO given a set of inputs.
Model Generality
How much difference between the learned model and a new
input case can the model tolerate?
Metric Type
Example Metrics
Model Robustness
•Degree to which actual task success measure declines given N errors in input data
•Increase in task completion time given N errors in input data
•Degree to which actual task success measure declines when N bits of data missing
•Increase in task completion time given N bits of data missing
Task Success (using
learned models)
•Perceived Success (rating by independent observer)
•Actual Success (objective measure calculated according to TBD formula)
Efficiency
Measures
•Time to generate a model
•Time to solve problem based upon guidance from model
•Time and number of user interventions
•Amount of ambiguous/incorrect guidance
•Number/complexity of interactions with user needed to generate a model
•Number/complexity of interactions with user needed to succeed on a task
ILR Contribution
•Percent degradation in task performance when component disabled
Qualitative
Measures
•User satisfaction ratings
•Ratings of model completeness, accuracy, level of abstraction
System Evaluation
Metric Type
Coverage
Percentile
Error rate
Goal
Achievement
Generalization
Specific Metrics
Specific Evaluation in ACO Problem Domain
Step coverage
percentile
Steps in ACO development process correctly
learned vs. those identified by an expert.
Choice coverage
percentile
Choices in ACO development process correctly
learned vs. those identified by an expert.
Step error rate
Steps in ACO development process incorrectly
learned vs. those identified by an expert.
Choice error rate
Choices in ACO development process incorrectly
learned vs. those identified by an expert.
Quality of Solution
Quality of ACO produced, measured using both
quantitative metrics (degree of deconfliction) and
subjective metrics (human expert assessment).
Time to Generate
Solution
Time to generate ACO given a set of inputs.
Model Generality
How much difference between the learned model and a
new input case can the model tolerate?
Readable from the back…
Lockheed Martin Proprietary Information
Initial Process Description
•
•
•
•
•
In WEBAD tool, check if new ACMREQ airspace appears in the “Imported” group (displayed as a
list) of airspaces
In WEBAD Tool, run conflict checking function (via button push)
Iterate over the listed conflicts with existing airspaces (airspace control measures (ACM))
Select conflicting ACM to review its detailed description
If the new requested ACM is an exclusion ACM, compare its priority (derive from usage) to the
conflicting ACM(s)
–
•
If the conflicting ACM is only a procedural boundary, then
–
•
If the requested ACM has higher priority, request (via email, chat, or phone call – the POC is listed in the
ACMREQ) to replan the lower priority ACM and any sorties already planned in the ATO that use this ACM
In WEBAD tool, approve the ACMREQ
Check the time and space separation of the ACMs and any planned missions
–
Decide if the overlap can be removed, simply by shrinking an ACM.
•
•
In TBMCS, retrieve elements of the ATO that fall into the 4D volume of the conflicting ACM
Decide if there is
–
–
•
–
If so, decide which ACM to alter, and specifically, which part to reduce, then notify the requester POC.
If the airspace is only “on standby” during the conflict space-time:
•
On the GCCS Common Operational Picture (COP), list any aircraft currently occupying the conflicting ACM space
•
If the “Standby airspace” (e.g. a missile engagement zone) becomes active (this is driven by enemy actions, thus not
predicatble), then change any mission that conflicts (airspaces and missions)
–
•
•
Time/space in the ACM that is not used by any mission
Sufficient separation within the missions in each ACM
If none, approve the ACMREQ
Otherwise, contact the ACMREQ POC to revise the request to eliminate the conflict, using chat,
email, or phone.
Note: May do this in multiple levels of detail: – start with “super-box”, if this is ok, don’t worry about
finer detail airspace deconfliction.
Could you process an ACMREQ correctly based on this alone?
For example
1
• If the conflicting ACM is only a procedural
boundary, then
– In WEBAD tool, approve the ACMREQ
Example
1
- Options
1. We do not specify this condition in
the basic process
1. GILA observes the expert – the
expert approves ACMREQ without
further checking
2. GILA figures out the condition – one
of the airspaces is a procedural
boundary
2. We specify this condition
1. GILA observes the expert approving
the ACMREQ
2. GILA learns the kinds of airspace
types that are procedural boundaries
Using myself as an
example novice:
I have learned this
already
I could not tell you
this right now – I’d
have to learn this
from the expert!
For example
• Check the time and space separation of
the ACMs and any planned missions
2
How does GILA know
this?
– Decide if the overlap can be removed, simply
by shrinking an ACM.
• In TBMCS, retrieve elements of the ATO that
fall into the 4D volume of the conflicting ACM
• Decide if there is
– Time/space in the ACM that is not used by any
mission
– Sufficient separation within the missions in each
ACM
• If so, decide which ACM to alter, and
specifically, which part to reduce, then notify the
requester POC.
Maybe learn what is
sufficient separation
for various missions?
I later learned that
the ATO is not
available until late
afternoon – can’t do
this step until it is
published! Instead –
might ask the
requester