Automatic Identification of Human Strategies by Cognitive Agents

Automatic Identification of Human Strategies by
Cognitive Agents
Felix Steffenhagen, Rebecca Albrecht, and Marco Ragni
Center of Cognitive Science
University of Freiburg, Germany
{steffenhagen, ragni}@cognition.uni-freiburg.de
[email protected]
Abstract. So far most cognitive modeling approaches concentrated on
modeling and predicting actions of an “average user” – a user profile
that in reality often not exists. User performance highly depends on
psychological factors different for every user like working memory, planning depth, search strategy etc. Therefore, we propose a combination of
several AI methods method to automatically identify user profiles. The
proposed method assigns to each user a set of cognitive agents which are
controlled by several psychological factors. Finally, this method is evaluated in a case study on preliminary user data on the PSPACE-complete
planning problem Rush-Hour.
Keywords: Identification of Heuristics; Cognitive Modeling; Spatial Planning; Data Analysis in Planning Domains; Strategy Analysis
1
Introduction
Identifying common move and strategy patterns of agents in planning problems
can be difficult, especially if the state space is large. If also human agents are
considered, these patterns (or planning profiles) also depend on psychological
factors that are known to restrict the performance of human agents, e.g., working
memory capacity [5]. In this work, we combine different methods to identify such
cognitive planning patterns and evaluate this method on preliminary results from
a case study.
The proposed method consists of three steps. Firstly, for a given planning
domain, so-called strategy graphs for single problem instances are introduced.
A strategy graph represents all strategies used by a group of agents. We define a strategy of an agent in a problem instance as a path from a given start
state to one or many given goal states. Secondly, a group of artificial agents
programmed to comply to restricting psychological factors (controlled by parameters) is introduced. These factors include, for example, working memory
capacity and planning depth. Thirdly, to each human agent a set of best replicating artificial agents is assigned. We define the notion best replicating agent
based on the maximal path similarity for a human agent and a set of artificial
agents. The values of parameters which are used to control the best replicating
agents’ planning behavior is identified as the planning profile of the assigned
human agent.
The presented method is evaluated in a case study in the Rush Hour planning
domain. We introduce a group of human and a group of artificial agents which
both solved selected Rush Hour instances. In order to identify best replicating
artificial agents for each human agent we use the Smith-Waterman algorithm
[6] as a similarity measure. The quality of the presented approach is evaluated
based on the mean similarities of human agents with best replicating artificial
agents.
2
Methodology
Strategy Graphs. In this paragraph we describe the data structure used to represent all strategies used by a group of agents A, in a problem instance p of a
planning domain D. We identify a strategy s for a problem instance p as a path
from a start state to a goal state in the problem space induced by the planning
domain D.
A strategy graph with respect to a problem instance p is a directed, labelled
multigraph Gp = (Vp , Ep , Sp , Gp ). The set of vertices Vp represents all states
traversed by any agent in A in the solution of problem p. The set of edges
Ep ⊂ Vp × Vp × N × N represents the application of legal actions in the planning
domain D including information about an agents’ id ∈ N and a step number in
the solution process t ∈ N. Additionally, a strategy graph includes a set of initial
states Sp ⊂ Vp and a set of goal states Gp ⊂ Vp . Note that the strategy graph
may include multiple edges between two states, each labelled with an agent id
and a step number.
As we evaluate human agents’ strategies, we have to account for several factors which do not contribute to identifying human planning profiles. Therefore,
we propose several graph reduction mechanisms. Firstly, we automatically identify and remove cycles of a certain lengths. This is important to exclude moves
which are immediately retracted. We define a cycle in the strategy graph as a
path s → s0 → . . . → s00 → s, where for edges (s, s0 , id, t) and (s00 , s, id, t0 ) it
holds that t < t0 .
Secondly, we remove outlier strategies for each problem instance p with respect to threshold values τp . An outlier is a strategy which is only used by a
very low number of agents. In order to detect these outlier strategies we use a
significance measure Sig p : Vp × Vp → N, which for two given states returns the
number of edges between these states. With respect to a threshold τp , we remove
all edges (s, s0 , id, t) where Sig p ((s, s0 )) < τ .
Thirdly, we merge partial strategies which we assume to be equivalent. The
basic idea is to exclude strategies which correspond to subsequent moves where
it does not matter in which sequence they are played (cp. move transpositions
in chess). The term strategy equivalence is defined with respect to a parameter
(also called epsilon equivalence). We call two strategies s10 → s11 → . . . → s1n
and s20 → s21 → . . . → s2m n − equivalent iff n ≤ m, s10 = s20 , s1n = s2m ,
s1n , s2m 6∈ {s11 , . . . , s1n−1 , s21 , s2m−1 }, and m − n ≤ . For every detected n − equivalence we exclude all involved edges and include a new edge given by the
start and the goal state, e.g. (s10 , s2m , id, t).
Agent Similarity. With this method we will be able to identify a best replicating artificial agent for one human agent. As a first step, we will introduce this
notion generally for two arbitrary sets of agents A1 and A2 . In the following
we assume that A1 and A2 are pairwise disjoint sets. We denote the similarity
between two agents a1 and a2 for one problem instance p as Sim p (a1 , a2 ) ∈ R,
and the similarity for two agents
P a1 and a2 over all instance in the problem
p∈D Simp (a1 ,a2 )
. Furthermore, we denote the
domain D as Sim D (a1 , a2 ) =
|D|
set of similarity values for one agent a1 ∈ A1 and a group of agents A2 as
Sim(a1 , A2 ) = {Sim(a1 , a2 ) | a2 ∈ A2 }, where Sim is either Sim p or Sim D .
– For each agent a1 ∈ A1 we identify the best replicating agents for agent a1
in one problem instance p ∈ D as
bra p (a1 ) = {a2 ∈ A2 | max (Sim p (a1 , A2 ))}
.
– For each agent a1 ∈ A1 we identify the best replicating agents for agent a1
over all problem instances p ∈ D as
bra D (a1 ) = {a2 ∈ A2 | max (Sim D (a1 , A2 ))}
We denote the set of best replicating agents from group
A2 for all agents in
S
group A1 for one problem instance p as bra
(A
)
=
bra p (a1 ), and over
p
1
a
∈A
1
1
S
all problem instance in D as bra D (A1 ) = a1 ∈A1 bra D (a1 ), respectively.
3
Case Study
Planning Domain. As planning domain D we consider the planning problem
Rush Hour developed by Nob Yoshigahara1 .Rush Hour is a two-dimensional
puzzle game where a specified object (exit car) has to be moved out of the
grid. The (generalized) computational complexity of this game is known to be
PSPACE-complete [2]. Other characteristics are that it is well-defined, solvable,
decomposable, not dynamic and has only one goal (to free the red car) to be
reached [3].
Agent Groups. In the following, we consider two groups of agents, one group
only consisting of human agents AH and one consisting of artificial agents AA .
Both groups were tested on the same problem set selected from the ”Junior
Edition” problem set of the Rush Hour game2 . The problem selection was based
1
2
A description of RushHour can be found at http://www.thinkfun.com/instructions
https://portal.uni-freiburg.de/cognition/alte-seite/research/projects/cspace/rushhour
on different problem attributes: (1) existing classification of the tasks (beginner,
intermediate, and advanced), (2) the optimal solution length, (3) number of
moves of the exit car. This includes 22 problem instances.
The human group AH consisting of 20 participants (or agents) was tested in
a psychological experiment. The experiment was conducted using a computerbased version of Rush Hour3 for recording selected actions and response times
during the solution process. The problems were presented in a randomized order.
Human agents had 3 minutes to solve each trial.
The group of artificial agents AA is programmed to use Means-End-Analysis
[4] particularly tailored to the Rush Hour planning domain. We identified seven
parameters to control the agent’s local planning behavior based on psychological
factors to identify planning profiles of human agents. Parameter values characterize artificial agents the planning profiles. Most importantly, these parameters
include the move distance for game objects, the goal stack capacity (corresponding to human working memory capacity), and several parameters controlling the
greedy selection of sub goals.
Rush Hour Strategy Graphs. In order to construct Rush Hour strategy graphs,
we use the graph reduction mechanisms described in Section 2. Cycle detection
is restricted to cycles of length two. For human agents, this corresponds to moves
which are immediately retracted. For outlier detection parameters τp were determined based on statistical evaluation of strategy frequencies for each problem
instance p. We remove equivalent strategies with respect to the introduced n − measure with = 2. The choice of these parameters is highly specific to Rush
Hour and the used problem instance in particular. These decisions were made
based on a detailed analysis of the planning domain and problem instances.
Similarity Measure. In Section 2 we defined the notion of best replicating agents
based on similarity measures for two agents a1 , a2 in one problem instance p, denoted by Simp (a1 , a2 ) ∈ R. In this paragraph, we will define this value. For measuring the similarity between the two agent groups we use the Smith-Waterman
Algorithm (SW) algorithm for local sequence alignment used in bioinformatics
[6]. In this approach we compare sequences of states to find the optimal local
alignment of the two sequences, i.e. the longest state sequence occurring in both
strategies. The SW algorithm computes a scoring matrix H based on weights
for sequence matches (wm ), insertions (wi ) and deletions (wd ). In this approach,
we use the weights wm = 1, wi = −1, and wd = −1, as we do not have insertions and deletions. The maximum value of H is the similarity score of the local
alignment with the highest similarity. With respect to the chosen weights, this
score reflects the length of the longest local alignment. We define the similarity
Sim(a1 ,a2 )
.
of two agents a1 and a2 for problem instance p as Sim p (a1 , a2 ) = max(|a
1 |,|a2 |)
Results. We evaluated the average similarity for each of the 22 tested problem
instances for both, the best replicating artificial agents for human agents over
3
By courtesy of the Dept. of Theoretical Psychology, University of Heidelberg.
Mean Rel. Similarity
all problem instance (braD (AH )) and the best replicating artificial agents with
respect to single problem instances (brap (AH )). The best replicating agents in
braD (AH ) are assigned based on the maximum mean similarity of human agents
and artificial agents over all 22 tasks. Therefore, a best replicating agents in
this group correspond to planning profiles for a human agent which are constant
over all problem instances. The mean similarity for all best replicating agents in
braD (AH ) and over all problem instances is 44%.
The best replicating agents in brap (AH ) are assigned based on the maximum
mean similarity of human agents and artificial agents for each task separately.
Therefore, the best replicating agents in this group correspond to different planning profiles for a human agent in every problem instance. The mean similarity
for all best replicating agents in brap (AH ) and over all problem instance is 76%.
Figure 1 shows the average similarities for each problem instance for artificial
agents in brap (AH ).
0.8
0.6
0.4
0.2
1
2
3
4
5
6
7
9
11 13 15 16 18 19 22 23 24 26 27 29 33 35
Task
Fig. 1. Mean similarities of all human agents AH and best replicating artificial agents
in brap (AH ) for each of the 22 tested Rush Hour problem instances. The numbering
are the original problem numbers in the Junior Edition set.
4
Discussion
In this work, we present a method to automatically identify psychological planning profiles for human agents. A planning profile is given by a set of parameters
used to control the planning behavior of artificial agents. A planning profile for
one human agent is identified as the values of parameters of artificial agents
which best replicated human planning strategies.
We report preliminary results on a case study based on this method in the
Rush Hour planning domain. The results show that best replicating agents which
are assigned to human agents constantly over all problem instances can be identified as planning profiles only for half of the human agents. Best replicating
agents assigned to human agents for each problem instance separately can be
identified as planning profiles for three quarter of the agents. These results indicate, that human planning profiles are not only different for each users but also
different for each user in each task.
Another possible explanation for the results, especially for planning constantly assigned planning profiles, is the conservative similarity measure used.
This measure only considers the longest sequence alignment of states in two different strategies. However, if two strategies only deviate in one state, the shorter
sequence is not considered.
Furthermore, it is possible that the parameters used to control local planning of artificial agents do not ideally capture psychological factors necessary
to identify planning profiles of human agents over a set of problem instances.
The parameters used to control planning in artificial agents do only consider
planning aspects of the domain. Other factors, like for example the impact of
visually observable task characteristics [1], are not considered.
Further extension of the presented methods include, for example, the automatic identification of branching points, i.e. states where agents choose different successor states, to further classify different strategies. This can be used to
identify and analyze preferred user strategies. Another possible extension is the
introduction of a measure to describe deviations from optimal strategies as a
measure of success.
To conclude, we believe that the presented preliminary methods are a first
step for automatically analyzing large and heterogeneous data sets generated by
human planners. Especially, the strategy graphs are useful for an abstraction of
large data sets. In order to analyze this method further, other similarity measures, for example based on all local sequence alignments or based on the number
of deviating states, and additional psychological factors should be considered.
References
1. Rebecca Albrecht and Marco Ragni. An ACT-R based analysis of the Tower of
London Task. In Accepted for Spatial Cognition Conference 2014, 2014.
2. Gary William Flake and Eric B. Baum. Rush hour is PSPACE-complete, or ”why
you should generously tip parking lot attendants”. Theoretical Computer Science,
270:895–911, 2002.
3. Malte Helmert. Understanding Planning Tasks: Domain Complexity and Heuristic
Decomposition, volume 4929 of Lecture Notes in Computer Science. Springer, Berlin
Heidelberg, 2008.
4. Allen Newell and Herbert Alexander Simon. Computer simulation of human thinking. Rand Corporation, 1961.
5. Adrian M Owen, John J Downes, Barbara J Sahakian, Charles E Polkey, and
Trevor W Robbins. Planning and spatial working memory following frontal lobe
lesions in man. Neuropsychologia, 28(10):1021–1034, 1990.
6. Temple F. Smith and Michael S. Waterman. Identification of common molecular
subsequences. Journal of Molecular Biology, 147:195–197, 1981.