Evolutionary Testing of Autonomous Software Agents

Evolutionary Testing of Autonomous
Software Agents
Cu D. Nguyen, Anna Perini, and Paolo Tonella
Simon Miles, Mark Harman, and Michael Luck
Fondazione Bruno Kessler
Via Sommarive, 18 38050 Trento, Italy
Department of Computer Science
Kingʼs College London
Strand, London WC2R 2LS, UK
1
AAMAS, Budapest 2009
Outline
• Introduction
• Approaches
• Experiment and result discussion
• Conclusion
2
Introduction
Autonomous software agents are increasingly used
Testing to build confidence in their operations is
crucial !
3
Introduction
Agent autonomy makes testing harder
• Agents make decisions for themselves based on their goals,
intentions, and beliefs
• Can behave differently in response to the same input
4
Introduction
Agent autonomy makes testing harder
• Agents make decisions for themselves based on their goals,
intentions, and beliefs
• Can behave differently in response to the same input
Autonomous agents operate in an open environment
with high variety of situations4
Introduction
Agent autonomy makes testing harder
• Agents make decisions for themselves based on their goals,
intentions, and beliefs
• Can behave differently in response to the same input
Testing requires:
• adequate output evaluations
• techniques that produce wide range of contexts & can
search for the most demanding test cases
Autonomous agents operate in an open environment
with high variety of situations4
Background
• Testing is to find faults
• We focus on agent level
• We evaluate the exhibited performance of
autonomous agents, not the underlying
autonomy mechanism
5
Our approach (1)
Use stakeholder’ requirements related to quality (e.g.
efficiency) to judge autonomous agents.
Stakeholder
Efficiency
Keep the airport
clean
Robustness
Cleaner
Agent
Legends
softgoal
Goal
6
Actor
dependence
Our approach (1)
Use stakeholder’ requirements related to quality (e.g.
efficiency) to judge autonomous agents.
Represent these requirements as quality functions
Assess the agents
under test
Drive the evolutionary
generation
Stakeholder
Efficiency
ε
Keep the airport
clean
d>ε
Robustness
time
time
Cleaner
Agent
Legends
softgoal
Goal
6
Actor
dependence
Our approach (2)
Use quality functions in fitness measures to drive
the evolutionary generation
•
•
Fitness of a test case tells how good the test case
is
Evolutionary testing searches for test cases
having the best fitness values.
Fitness example: distance to be crashed
7
Our approach (2)
Use quality functions in fitness measures to drive
the evolutionary generation
•
•
Fitness of a test case tells how good the test case
is
Evolutionary testing searches for test cases
having the best fitness values.
Fitness example: distance to be crashed
d>ε
ε
time
7
Our approach (3)
Use statistical methods to measure test case
fitness
•
•
Test outputs of a test case can be different
•
Statistical output data are used to calculate the
fitness
A test case execution is repeated a number of
times (or in parallel)
8
Evolutionary procedure
Evaluation
final results
outputs
Test execution
& Monitoring
Agent
inputs
Generation &
Evolution
initial test cases
(random, or existing)
9
Experiments
Autonomous cleaning agent
‣ explore locations of important
‣
‣
‣
‣
‣
objects
look for waste and bring them to
the closest bin
maintain battery charge
avoid obstacles by changing course
when necessary
find the shortest path to reach a
specific location
stop when no movement is
possible or running out of battery
10
Experiments
Autonomous cleaning agent
‣ explore locations of important
‣
‣
‣
‣
‣
objects
look for waste and bring them to
the closest bin
maintain battery charge
avoid obstacles by changing course
when necessary
find the shortest path to reach a
specific location
stop when no movement is
possible or running out of battery
10
Fitness measurement
Same input environment, different outputs
11
Fitness measurement
Same input environment, different outputs
Cumulative box-plots of the distance of executions converge
Convergence of cumulative boxplot
max
"#(&%
"#(%
quartile 3 (75%)
min
"#'%
cumulative
quartile 1 (25%)
"#'&%
"#$&%
"#$%
"#"&%
"%
!"#"&%
!"#$%
$%
'%
(%
)%
11
&%
*%
+%
,%
-%
$"%
$$%
$'%
$(%
Number of executions
$)%
$&%
$*%
$+%
$,%
$-%
'"%
'$%
''%
'(%
"#"&%
Fitness measurement
"%
!"#"&%
$%
'%
(%
)%
&%
*%
+%
,%
-%
$"%
$$%
$'%
$(%
$)%
$&%
$*%
$+%
$,%
$-%
!"#$%
Fitness
(b) Cumulative box-plots for TC2
Figure 6.9: Cumulative box-plots for two test cases
Then, the fitness function is defined as follows:



min(D) + w1 ∗ quartile1(D) + w3 ∗ quartile3(D)





if min(D) > ε,
f=


min(D) − ε
if min(D) ≤ ε,




+∞ if the agent cannot move and suspend safely.
quartile 3 (75%)
quartile 1 (25%)
ε
min
124
Search objective: bringing the box down to the threshold ε
,i.e. leading the agent to hit obstacles
12
'"%
Result & discussion
evolutionary testing
0.11
0.1
0.09
0.08
Safety Fitness
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
10
20
30
40
50
60
70
80
90
100
110
Generation
13
Result & discussion
evolutionary testing
0.11
0.1
0.09
0.08
Safety Fitness
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
10
20
30
40
50
60
70
80
90
100
110
Generation
13
Result & discussion
random testing
0.11
0.11
0.1
0.1
0.09
0.09
0.08
0.08
0.07
0.07
average of fitness
Safety Fitness
evolutionary testing
0.06
0.05
0.06
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
0
10
20
30
40
50
60
70
80
90
100
110
10
20
30
40
50
Generation
60
70
80
90
Time
• evolutionary testing found better test cases than
random testing
• and is more effective in detecting faults
13
100
110
Conclusion
• Autonomous agent testing is hard
•
•
Non-deterministic outputs
•
•
Use quality requirements as evaluation criteria
•
•
Is more effective compared to random testing
Variability of the world setting
• Evolutionary testing
Use them to guide the evolutionary generation of
test input
Is cost-effective, requires almost no additional cost
14