Intent Inference and Action Prediction using a
Computational Cognitive Model
Ji Hua Brian Ang, Loo Nin Teow, Gee Wah Ng
Cognition and Fusion Laboratory, DSO National Laboratories
20 Science Park Drive, Singapore 118230
{ajihua, tloonin, ngeewah}@dso.org.sg
Abstract - This paper proposes a Computational
Cognitive Model (CCM) inspired by the biological mirror
neurons and the theory of mind reading for high level
information fusion, in particular, intent inference and
action prediction. Existing computational prediction
models in the literature would usually build the other
person’s mental model from scratch; however, this would
not work if there is no knowledge about the other person
initially. Instead, this paper uses one’s own mental model
at the start for prediction of the other person and does a
perspective changing (or mirroring), i.e., putting oneself
in the other person’s shoes, to infer the other’s thoughts
and actions. A model analysis using simulated data is
carried out and results show that by using mirroring
principles, convergence to the other person’s mental
model is faster than using baseline method (prediction
with an equal probability distribution), and it is also able
to give higher prediction accuracy. In addition, feedback
and updating mechanisms in the proposed model help to
ensure convergence towards the other person’s mental
model.
Keywords: high level information fusion, inference,
decision support, mirror neurons, mind reading, Bayesian
networks.
1
Introduction
Neurons in the rostral part of the inferior premotor cortex
region of the macaque monkeys’ brain are found to
discharge when the monkeys do a particular action as well
as when they observe others doing a similar action. These
neurons are termed Mirror Neurons (MN) since they exhibit
mirroring properties [4], and these neurons are proposed to
have action understanding and recognition capabilities and
play a role in intent inference mechanism [12]. The
equivalent of these mirror neurons were also found to exist
in humans through neurophysiological and brain imaging
experiments [12]. In experiments by Iacoboni et al. [6],
measured activities of the human inferior frontal cortex
were significantly different when observing different
scenarios given different context with the same actions,
proposing that these neurons play a role in intention-coding
in humans.
From a psychological perspective, the ability of humans
to read the mind of others, as proposed by the simulation
theory, is achieved when one adopts others’ perspectives or
when one does a matching of others’ mental states. In other
words, one would put oneself in the other person’s shoes
and try to imagine how the other person thinks and feels.
Mental state processes may include desires, preferences,
goals, beliefs, etc. This requires a significant amount of
effort to mimic others’ mental states and being able to
predict well requires the pretended mental states to be
sufficiently similar to the genuine mental states [5]. Many
researchers have attributed the psychological theory of mind
reading ability to the matching systems of the biological
mirror neurons. The mirroring properties of the mirror
neurons could be part of, or a precursor to general mind
reading capabilities [3][5].
Inspired by the mirror neurons and the simulation theory
of mind reading, this paper studies how the working
principles of these theories and findings can be applied to
create a computational framework for intent inference and
action prediction. The proposed framework in this paper is
different from the existing work in literature as current
prediction models in literature do not use own model as the
basis and do not use mirroring principles for prediction (i.e.,
they do not give prediction by changing their perspective to
the other person). The motivation for this work is based on
if there is no information about the other person that you
want to predict, the best practice would be to use our own
model, which is what humans would do. In addition,
feedback and updating mechanisms are included the
proposed framework to ensure improved prediction
accuracy.
2 Existing Computational Models of Mirror
Neurons and Simulation Theory
Most existing computational models that draw parallel
with the mirror neurons are largely for imitation tasks in
robotics [11]. The general framework of these models is to
feed a demonstrator’s current state into an observer and
based on the observer’s module, the next likely state is
generated. The difference between the generated next likely
state and the actual demonstrator’s state is used as feedback
for correction. Ito and Tani [7] modelled the mirror neurons
using recurrent neural networks with parametric bias.
During the learning phase, the predicted demonstrator’s
states for imitation are compared with the actual states and
the error is used for updating of the network weights and the
1338
Figure 1: Computational Cognitive Model framework
parametric bias values. During the interaction phase, the
visually perceived demonstrator’s states are fed into the
network. The network then predicts the next value and the
errors produced are used for updating of the parametric bias
values while keeping the weights fixed.
From the psychological view point of mind reading
simulation theory, Berlin et al. [1] presented an integrated
architecture wherein the robot’s cognitive functionality is
able to understand the environment from the teacher’s
perspective, with emphasis on the concept of constrained
attention, and the robot focuses on the subset of problem
space that is important to the teacher. Buchsbaum et al. [2]
proposed that instead of having another set of mechanisms
to understand what the other agent is thinking, one can use
his own cognitive mechanism to predict the mental state of
the other agent and infer the rationale behind. Agents are
able to use its own motor and action representation to
identify the goals and motivations behind the behaviour of
other agents. Using the concept of action identification, and
movement-action tuple correspondence, when an agent sees
another agent performing a particular movement, it relates
the movement to the most probable movement-action tuple
and evaluates the subset of the trigger contexts.
A commonly faced problem arises when the same
observed movement can lead to different actions and goals.
Kilner et al. [8] proposed modelling the MNs based on
predictive coding from a Bayesian inference perspective
which allows different action probabilities.
This paper aims to provide alternatives and new insights
for computationally modelling a mirror neuron system, and
more importantly to be used for intent inference and action
prediction tasks rather than imitation. A computational
framework that is generic enough for both cooperative and
competitive situations is proposed. A cooperative situation
is when prediction of team mate actions is useful for better
teaming and rapport, and a competitive situation is when
prediction of the enemy actions is useful to pre-empt his
actions. The framework uses Bayesian Networks (BN) as
the inference mechanism; and added in capabilities that
provide a way to correct the BN via conditional probability
updating, allowing improved inference accuracy and
adaptability. This correction mechanism is analogous to
adaptation and learning in humans. This intent inference and
action prediction framework would be useful as a decision
support tools to complement and assist military decisionmaking process.
3 Proposed Computational Cognitive Model
for Intent Inference and Action Prediction
A framework for the proposed computational cognitive
model is shown in Figure 1. The design of this framework
aims to show how inference of other’s intents and actions
can be achieved by applying simulation theory concept of
perspective changing and how own decision making
mechanism can be used to mirror other’s intents and actions.
For clarity, the other person to be inferred is consistently
considered as the enemy throughout the paper. However, the
enemy can be equivalently replaced by an ally for
cooperative cases.
The framework contains a Self and an Enemy. Self refers
to the entity making the inference. Within the Self, there
exists an Own model and a mirrored model of the Enemy.
The Own model represents the mechanism for actual Self
decision making and behaviour selection, and may include
beliefs, desires, preferences, etc. The mirrored model is a
simulated model used to represent the Enemy within the
Self, i.e., for inference of the Enemy’s actions. Enemy’s
model is the model used by the Enemy for his decision
making.
The models receive perception inputs from the
environment. Before the inputs are used for inference in the
mirrored model, reference changing or perspective changing
is required. This maps the Enemy’s states as own and vice
versa, i.e., putting oneself in the other person’s shoes.
Initially, the mirrored model of the Enemy is mapped
from the Own model (assuming no prior knowledge or little
information about the Other). Over time, as actions of the
Enemy are viewed, updates would be made to the mirrored
model, and hence the mirrored model should converge
towards the Enemy’s model given enough updates.
1339
The framework is based on the assumption that both the
Self and the Enemy are rational individuals and thus there
should be some consistency in the actions. In addition, this
computational model does not take into account deceptive
methods employed by the enemy and multiple level
predictions. These two issues are non-trivial and require
substantial research which is too large to be covered in this
paper.
4 Implementation of the Framework
As noted in [9], Bayesian Networks (BN) are widely used
as a knowledge-based approach to solve intent inference
problems. BN for intent inference is able to provide a formal
probabilistic semantics capable of capturing knowledge
structures of humans. This facilitates the encoding and
interpretation of knowledge in terms of a probabilistic
distribution and allows for inference and optimal decision
making. Thus, the main inference process for the CCM
framework uses BN implementation.
In the implementation, the system would repeatedly read
in data from the output of an intended application. The
current intended application is for Computer Generated
Forces (CGF), e.g., a first-person shooter game. This
incoming data from the system provides information about
the current contextual factors and all the available states of
the Self and the Enemy. A duplicate of the Self model is
done to create the mirrored model. The Self model and
mirrored model implementation uses the Bayesian networks.
The data would then go through reference changing (an
example is given in Table 1 and greater details in Section
4.2), a process where the Enemy’s states are mirrored as
Self states, i.e., analogous to putting oneself in the other’s
shoes. The mirrored data would then be fed as evidences
into the mirrored model BN for instantiation (Figure 2),
with the posterior probabilities providing inference on the
Enemy’s intents and actions. When the actual actions are
viewed, correction of the mirrored model by updating the
Conditional Probability Table (CPT) of the BN would be
carried out. The updated CPT serves as a revised
knowledgebase of the Enemy in Self.
4.1 Conditional Probability Updating
After the BN input nodes have been instantiated, the state
with the highest probability in the Intent and Actions CPT is
taken to be the prediction of the CCM. When the actual
action is executed, data through the CGF system
representing the real actions of the Enemy would be used to
update the mirrored model of the Enemy (even when the
prediction made is correct, as a larger sample size allows
better convergence towards the Enemy’s model). The
equation used to update the CPT is:
PjN =
N • Pj ( N −1) + X
N +1
where PjN is the updated probability for the jth state, N is the
number of times the event happens, X=1 if j is the state that
the actual action happens and X=0 if it is not. When N is 1,
Pj0 is the initial probability as given by the Self CPT in BN.
4.2
Studied Scenario
The simulated scenario studied is the inference of
enemy’s intents and actions in a CGF environment. The
simulated CGF scenario concerns an agent (Self) meeting
the Enemy in a built-up area, and inferring what the
Enemy’s intents and actions are. The Self agent perceives
information such as weapon types, adversary distance,
contextual factors, etc. Using these information, the Self
would want to predict the Enemy’s intents and actions.
Figure 3 shows the BN used for the example. The input
nodes are the Self and the Enemy’s current states, the
contextual information and held knowledge. Table 1 shows
an example of the simulated inputs, the states when
mirroring is used for instantiation. Mirrored contextual
information gives the relationship of the Enemy with respect
to the environment. Some information or states of the
Enemy derived from the environment could be independent
of reference changing, e.g., the absolute distance of the
Enemy with respect to Self is consistent even from the
perspective of Self.
Figure 2: The prediction mechanism
Figure 3. Example Bayesian networks used
1340
Table 1: Switching of the features values for mirroring
Without Mirroring
5
5.2 Evaluation Metrics
With Mirroring
Features
Self
States
Enemy
States
Self
States
Enemy
States
Weapon
MGun
Pistol
Pistol
MGun
Health level
75
40
40
75
Armor level
66
20
20
66
Distance to exit
170.45
63.08
63.08
170.45
Distance to pillar
151.89
41.91
41.91
151.89
Distance to enemy
113.94
113.94
113.94
113.94
The evaluation metrics used are:
A. Sum of Squared Error (SSE)
(SSE) is used to measure the differences in the BN CPTs.
Indication of low error between the probabilities of the
models would translate to better prediction.
SSE =
r
c
i
j
∑∑ (d
ij
− yij ) 2
where d is the desired probability and y is the probability as
given by the prediction model, r and c are the number of
rows and columns, respectively, of the CPT.
Simulation Setup
5.1 Simulation Objectives
B. Prediction Accuracy
Prediction accuracy measures the percentage of encounters
between the Self and Enemy that is correctly classified.
n
Accuracy = • 100%
N
where n is the number of encounters correctly predicted and
N is the total number of encounters.
The simulations are aimed at investigating:
A. Importance of Mirroring Principles
The Computational Cognitive Model inspired by mirror
neurons and mind reading principles, i.e., Mirror model with
Updating (MU), is compared with an Equal
probabilities/naïve distribution model with Updating (EU).
The EU has equal probabilities for all states in the CPT of
the BN and reflects a situation when no information about
the Enemy is available and no mirroring is used for
inference.
6
Results and Discussion
B. Importance of Updating the Mirror Model
The Mirror model with Updating (MU) is compared with
the Mirror model without Updating (MwoU), to show the
effect of updating the CPT for convergence towards the
Enemy’s model.
6.1 Sum of Squared Error
C. Performance of the Mirror Model in the Short, Mid
and Long Term
Simulations of the MU are studied under different numbers
of encounters (the number of times the Self meets and
makes inference on the Enemy).
D. Mirror Model Performance under various Degrees of
Similarities between Self and the Enemy’s Models (via
variation in BN CPT)
The Self CPT is created by random modifications to the
Enemy’s CPT. A different percentage of differences is used
each time and would give a quantitative gauge of how
different the Self CPT is as compared to the Enemy CPT.
For the first investigation, we study the case where the
Self model is exactly the same as the Enemy model. Figure
4 shows the results when the Self CPT is the same as the
Enemy’s CPT. Updating causes SSE to increase during the
initial stage because any updating of the already accurate
model would initially create a bias towards the observed
states. The increase in SSE saturates at a certain level and
starts to decline thereafter.
The EU graph shows SSE starting from a high value and
decreases during the initial encounters. This is due to
updating of the model. From the figure, the mirror models
(MU and MwoU) start with much lower error. After the
1000 encounters mark (Figure 4b), the decrease in error
saturates and it meets the MU SSE curve.
70
MU
MwoU
EU
60
50
SSE
40
E. Adaptability of the Mirror Model to Changes in
Enemy’s Model (via variation in Enemy’s Model
Bayesian networks CPT)
Changes to the Enemy’s BN model are made with different
percentages and at different numbers of encounter intervals.
For all the different simulation parameters and settings,
10 runs are carried out.
1341
30
20
10
0
0
100
200
300
Encounters
(a) short term
400
500
600
(also reflected by the slightly higher SSE of MU compared
to MwoU in Section 6.1). At this stage, EU is only able to
obtain around 40% accuracy. During mid-stage, the MU
maintains its high prediction accuracy, while the EU
prediction improves which is due to the updating.
When the Self CPT is different from the Enemy’s CPT
(by 20% difference), prediction accuracy drops as compared
to the previous case, however, they are still much higher
than EU. At mid-stage, MU results shows improvement but
MwoU accuracy remains stagnant due to no updating. With
continued updating. the results for MU improves and is able
to reach the level obtained when Self CPT is the same as
Enemy’s CPT. At the end of the 4000 instances, the
standard deviation for models with updating is much smaller
than at the initial stage.
The results show that the prediction of Enemy’s intents
and actions using the Cognitive Computational Model is
important when information about the Enemy is not
available, especially during the initial stage. Updating of the
probabilities when events are perceived has the effect of
improving the prediction with the exception case when there
is no difference between the Self CPT and the Enemy’s
CPT. Updating is essential to keep the standard deviation of
prediction low so as to give a more robust and consistent
prediction.
70
MU
MwoU
EU
60
50
SSE
40
30
20
10
0
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Encounters
(b) long term
Figure 4. No difference between Self and Enemy’s CPTs.
70
MU
MwoU
EU
60
50
SSE
40
30
20
10
0
0
100
200
300
400
500
600
Encounters
(a) short term
Table 2: Prediction Accuracy (Intents and Actions)
70
MU
MwoU
EU
SSE
Encounters
50
CPT %
difference
40
0
83.80 ± 4.42
84.92 ± 2.09
85.52 ± 0.54
20
76.70 ± 4.35
81.72 ± 2.01
85.61 ± 0.64
0
85.80 ± 2.97
87.24 ± 1.20
86.45 ± 0.47
20
75.80 ± 5.27
77.66 ± 2.41
77.26 ± 1.97
-
40.40 ± 6.36
68.54 ± 1.75
83.27 ± 0.62
60
MU
30
20
MwoU
10
0
0
500
1000
1500
2000
2500
3000
3500
4000
EU
4500
100
500
4000
Encounters
(b) long term
Figure 5: 20% difference between Self and Enemy’s CPTs.
In Figure 5, when the Self CPT and the Enemy’s CPT
differ by 20%, both MU and MwoU graphs start from a
lower error rate than the EU. At the initial stage, MU tends
towards Enemy’s CPT faster as compared to the EU.
Looking at the predictions for intents and actions, at about
300 encounters (Figure 5a), the EU SSE graph crosses the
MwoU graph and it converges with the MU graph at around
1000 instances.
Thus, from the observations, it can be seen that by using
mirror principles, convergence towards Enemy’s CPT is
faster than relying on naïve guesses. Updating of CPT is
important to decrease errors between the models.
6.2 Prediction Accuracy
Table 2 shows that the prediction accuracy is high using
MU when Self CPT is the same as Enemy’s CPT (CPT %
difference is 0). At 100 instances, the accuracy for MU and
MwoU is comparable, with MwoU being slightly better
6.3 Changes in Enemy
An Enemy might learn new strategies, change in
mentality, etc., and thus the Enemy’s intents and actions are
subject to changes. To simulate this, the CPT of the Enemy
is randomly changed by a percentage (Pc) during the
encounters. Figure 6 presents the SSE on the simulated
scenarios.
From Figure 6a, when there is a change in the Enemy, it
causes the SSE to rise; however, the updating process
manages to correct the error quickly. As the number of
encounters increases, the error converges. When the Enemy
makes changes at a larger interval (Figure 6b), there is also
an increase in error. It then drops thereafter when updating
is done. Hence updating is important for adapting to the
Enemy’s model. Figure 6c shows an interesting finding
where the SSE of the MwoU saturates towards the end of
the encounters and there are instances that when the Enemy
changes, SSE decreases. This could be due to the Enemy’s
1342
model changing towards the direction of the mirrored
model. In addition, as the size of the CPT is fixed, there is a
maximum error that can be attained. Thus, when the error
between the CPTs is near the maximum error, it starts to
fluctuate around the level. The graphs (Figure 6c and 6d)
with the Enemy changing at 30% show a larger increase in
the SSE. This is also followed by a sharper decrease in the
error compared to the Enemy changing at a smaller rate.
From Table 3, with higher percentage of change and rate
of change, there is a significant decrease in the accuracy for
all the models. When the Enemy does not change
drastically, the models are still able to attain reasonable
accuracy rate at around 77%. The MU consistently achieves
better results than MwoU. Larger difference for the results
is observed when there is a higher rate of change. Therefore,
updating is important when the Enemy is subject to changes.
The standard deviations for all results are small, showing
consistency of the models.
and actions is crucial for giving high prediction accuracy,
especially in the early phase when no information of the
other is known. The proposed method is able to converge to
the Enemy’s model via model updating. When the Enemy’s
model is subjected to changes, an updating process is
critical in lowering the error rate.
Current work in progress, not presented in this paper, has
incorporated the CCM into a Cognitive Architecture (CA)
[10], in which it resides as part of the executive function
module. Applications under research include using the CA
with CCM in Unreal Tournament (a first-person shooter
game) and Map Aware Non-uniform Automata (MANA),
both for enemy’s intents and actions inference, at the
tactical and strategic level, respectively.
References
[1] Berlin, M., Gray, J., Thomas A. L., & Breazeal, C.
(2006). Perspective taking: An organizing principle for
learning in human-robot interaction. Proceedings of the
Twenty First National Conference on Artificial Intelligence
(AAAI), pp. 1444-1450.
70
70
MU
MwoU
EU
60
MU
MwoU
EU
60
50
40
SSE
SSE
50
40
30
[2] Buchsbaum, D., Blumberg, B., Breazeal, C., & Meltzoff,
A. N. (2005). A simulation-theory inspired social learning
system for interactive Characters. Proceedings of the IEEE
International Workshop on Robots and Human Interactive
Communication, pp. 85-90.
30
20
20
10
10
0
500
1000
1500
2000
2500
3000
3500
4000
0
4500
0
500
1000
1500
Encounters
(a) 10% change at 200 interval
3000
3500
4000
4500
(b)10% change at 1000 interval
60
MU
MwoU
EU
60
MU
MwoU
EU
50
[3] Gallese, V. (2007). Embodied simulation: from mirror
neuron systems to interpersonal relations. Proceedings of
Novartis Foundation Symposium, 278, pp. 3-19.
50
40
40
SSE
SSE
2500
70
70
30
30
20
20
10
10
0
2000
Encounters
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0
0
500
Encounters
1000
1500
2000
2500
3000
3500
4000
[4] Gallese, V., Fadiga, V., Fogassi, L., & Rizzolatti, G.
(1996). Action recognition in the premotor cortex. Brain,
119, pp. 593-609.
4500
Encounters
(c) 30% change at 200 interval
(d) 30% change at 1000 interval
Figure 6: 20% difference between Self and Enemy’s CPT.
Table 3: Prediction Accuracy with Changes in the Other
Change %
MU
MwoU
EU
7
[6] Iacoboni, M., Szakacs, I. M., Gallese, V., Buccino, G.,
Mazziotta, J. C., & Rizzolatti, G. (2005). Grasping the
intentions of others with one’s own mirror neuron system.
PLOS Biology, vol. 3, issue 3, pp. 529-535.
Encounters Interval (Change %)
200
1000
10
52.84 ± 1.12
77.13 ± 0.77
30
39.04 ± 0.94
63.49 ± 1.36
10
44.98 ± 1.45
70.19 ± 2.07
30
32.68 ±1.23
56.78 ± 2.03
10
51.94 ± 1.86
74.50 ± 1.52
30
37.84 ± 0.53
61.89 ±1.99
[5] Gallese, V., & Goldman, A. (1998) Mirror neurons and
the simulation theory of mind-reading. Trends in Cognitive
Sciences, vol. 2, no. 12, pp. 493-501.
[7] Ito, M., & Tani, J. (2004). On-line imitative interaction
with a humanoid robot using a mirror neuron model.
Proceedings of the IEEE International Conference on
Robotics and Automation, LA, pp. 1071-1076.
[8] Kilner, J. M., Friston, K. J., & Frith, C. D. (2007). The
mirror-neuron
system:
a
Bayesian
Perspective.
NeuroReport, vol. 18 no. 6, pp. 619 – 623.
Conclusions
A Computational Cognitive Model (CCM) inspired by the
biological mirror neurons and the simulation theory of mind
reading has been proposed and implemented using Bayesian
networks. Using mirroring principles to infer other’s intents
1343
[9] Ng, G. W., Ng, K. H., Tan, K. H., & Goh, C. H. K.
(2006) The ultimate challenge of commander’s decision
aids: The cognitive based dynamic reasoning machine.
Proceedings of the Twenty Fifth Army Science Conference,
Orlando, Florida.
[10] Ng, G.W., Tan, Y.S., Teow, L.N., Ng, K.H., Tan, K.H.,
& Chan, R.Z. (2010) A Cognitive Architecture for
Knowledge Exploitation. Proceedings of the Third
Conference on Artificial General Intelligence, pp. 103-8.
[11] Oztop, E., Kawato, M., & Micheal, A. (2006) Mirror
neurons and imitation: A computationally guided review.
Neural Networks, vol. 19, issue 3, pp. 254-271.
[12] Rizzolatti, G., & Graighero, L. (2004) The mirrorneuron system. Annual Review of Neuroscience, pp. 169192.
1344
© Copyright 2026 Paperzz