INTEGRATED HUMAN DECISION BEHAVIOR MODELING UNDER

INTEGRATED HUMAN DECISION BEHAVIOR MODELING UNDER
AN EXTENDED BELIEF-DESIRE-INTENTION FRAMEWORK
by
Seung Ho Lee
A Dissertation Submitted to the Faculty of the
DEPARTMENT OF SYSTEMS AND INDUSTRIAL ENGINEERING
In Partial Fulfillment of the Requirements
For the Degree of
DOCTOR OF PHILOSOPHY
In the Graduate College
THE UNIVERSITY OF ARIZONA
2009
2
THE UNIVERSITY OF ARIZONA
GRADUATE COLLEGE
As members of the Dissertation Committee, we certify that we have read the dissertation
prepared by Seung Ho Lee
entitled Integrated Human Decision Behavior Modeling under An Extended BeliefDesire-Intention Framework
and recommend that it be accepted as fulfilling the dissertation requirement for the
Degree of Doctor of Philosophy
_______________________________________________________________________
Date: June 12th 2009
Young-Jun Son
_______________________________________________________________________
Date: June 12th 2009
Terry A. Bahill
_______________________________________________________________________
Date June 12th 2009
Ferenc Szidarovszky
_______________________________________________________________________
Date: June 12th 2009
Daniel Zeng
Final approval and acceptance of this dissertation is contingent upon the candidate’s
submission of the final copies of the dissertation to the Graduate College.
I hereby certify that I have read this dissertation prepared under my direction and
recommend that it be accepted as fulfilling the dissertation requirement.
________________________________________________ Date: June 12th 2009
Dissertation Director: Young-Jun Son
3
STATEMENT BY AUTHOR
This dissertation has been submitted in partial fulfillment of requirements for an
advanced degree at The University of Arizona and is deposited in the University Library
to be made available to borrowers under rules of the Library.
Brief quotations from this dissertation are allowable without special permission, provided
that accurate acknowledgment of source is made. Requests for permission for extended
quotation from or reproduction of this manuscript in whole or in part may be granted by
the head of the major department or the Dean of the Graduate College when in his or her
judgment the proposed use of the material is in the interests of scholarship. In all other
instances, however, permission must be obtained from the author.
SIGNED: Seung Ho Lee
4
ACKNOWLEDGEMENTS
I would like to express my gratitude to all my family, especially my lovely
daughter Jiann who enlightens me on the joy of parenthood. She is the most precious
present that I have ever got. I also extend my thanks to my wife Soohyun who supported
and trusted me during this long period of study.
I express my sincere thanks to those who made my foray into the world of
graduate studies possible. I am grateful to Drs. Young-Jun Son, Terry A. Bahill, Ferenc
Szidarovszky and Daniel Zeng for serving on the committee. I would like to especially
thank my great advisor, Dr. Young Jun Son, for his guidance, advice and encouragement
during this research. The knowledge and the passion he has provided me with, goes
beyond what I can get from any other sources.
I extend my thanks to all former and current members of the CIM lab - Xiaobing
Zhao, Karthik Vasudevan, Nurcin Celik, and Esfand Mazhari for their untiring
enthusiasm and patient help in the course of all my research works.
I am glad to have
had them as my colleagues.
Finally, I am forever indebted to my parents, Jongpil Lee and Changboon Choi.
They bore me, raised me, taught me, supported me and loved me. I know I can never
reciprocate their love in full. But, I hope I have done them proud. To them I dedicate
this dissertation.
5
DEDICATION
“Lord Jesus Christ, take all my freedom, my memory, my understanding, and my will. All
that I have and cherish you have given me. I surrender it all to be guided by your will.
Your grace and your love and wealth are enough for me. Give me these, Lord Jesus, and
I ask for nothing more. Amen.”
6
TABLE OF CONTENTS
LIST OF ILLUSTRATIONS ......................................................................................................10
LIST OF TABLES ..................................................................................................................14
ABSTRACT ..........................................................................................................................16
CHAPTER 1
INTRODUCTION .............................................................................................18
1.1 Problem Statement and Objectives ......................................................................20
1.2 Background and Motivation ................................................................................21
1.2.1 Background on Agent-based Modeling .....................................................21
1.2.2 Background on Machine Learning.............................................................22
1.2.3 Motivation ..................................................................................................24
1.3 Justification of Selected Methods and Techniques ............................................24
1.4 Organization of the Remainder of the Dissertation .............................................29
CHAPTER 2
LITERATURE REVIEW AND BACKGROUND ....................................................31
2.1 Intelligent Agent ...................................................................................................31
2.1.1 Human Decision Making Models ..............................................................32
2.1.2 Machine Learning ......................................................................................37
2.1.3 Agent-based Simulation .............................................................................39
2.1.3.1 Benefits of Agent-based Simulation ..................................................41
2.1.3.2 Drawbacks of Agent-based Simulation .............................................42
2.2 Emergency Evacuation ........................................................................................43
CHAPTER 3
SCENARIO, PROPOSED ARCHITECTURE AND METHODOLOGY .......................46
3.1 Overview of Emergency Evacuation Scenario .....................................................46
3.2 Extended BDI Framework ...................................................................................47
CHAPTER 4
PROPOSED TECHNIQUES FOR SUBMODULES IN EXTENDED BDI FRAMEWORK51
4.1 Overview of Simulation Development Workflow ................................................51
4.2 Extended Decision Field Theory .........................................................................52
4.2.1 Decision Field Theory................................................................................52
4.2.2 Bayesian Belief Network-based Decision Field Theory Extension ...........55
7
TABLE OF CONTENTS – Continued
4.2.2.1 DFT Extension for Dynamic Changes of Evaluation on Options .....55
4.2.2.2 DFT Extension for Dynamic Changes of Attention Weights ............57
4.2.2.3 Bayesian Belief Network-based Extension .......................................57
4.2.3 Significance of the Proposed Extensions ...................................................60
4.2.3.1 Effect of Change in Value Matrix (M(t)) ..........................................61
4.2.3.2 Effect of Change in Weight Vector (W(t)) .......................................64
4.2.3.3 Combined Effect of Changes in M(t) and W(t) ................................65
4.2.4 Four Theorems Regarding Expected Preference Values ...........................66
4.2.5 Validation via Human-in-the-loop Experiment .........................................75
4.2.5.1 Human-in-the-loop Experiment Details .............................................76
4.2.5.2 Experimental Results and Analyses ...................................................78
4.2.5.3 Validation and Comparison of EDFT with DFT and Human
Decisions ...............................................................................................80
4.3 Bayesian Belief Network (BBN) .........................................................................87
4.4 Probabilistic Depth First Search (PDFS) for Real-time Planner .........................90
4.5 Confidence Index.................................................................................................92
CHAPTER 5
DETAILED REAL-TIME DECISION PLANNING ALGORITHM.............................94
5.1 Algorithm in Pseudo Code....................................................................................94
5.2 Multi-Horizon Planning Algorithm for Commuter Agent ..................................95
5.3 Single-Horizon Planning Algorithm for Novice Agent.....................................101
5.4 Meta-Model of Choice Probability for Commuter Agent .................................101
CHAPTER 6
PROPOSED DYNAMIC LEARNING ALGORITHM ............................................104
6.1 Overview of Learning Algorithm .......................................................................104
6.2 Taxonomy of Learning Algorithms and Frameworks ........................................105
6.3 Learning in the Context of BDI Framework.......................................................108
6.4 Proposed Hybrid Learning Model ......................................................................109
6.4.1 Bayesian Belief Network for Belief Module ...........................................109
8
TABLE OF CONTENTS – Continued
6.4.2 Reinforcement Learning (Q-Learning) for Emotional Module ...............110
6.4.3 Proposed BBN-RL Hybrid Learning Model ............................................111
6.4.4 Illustration of the Proposed Q-Learning for Effect of CI ........................113
6.5 Experiments under Emergency Evacuation Scenario .........................................116
6.5.1 Simulation Model of Emergency Evacuation ..........................................116
6.5.2 Experimental Results ...............................................................................118
CHAPTER 7
HUMAN-IN-THE-LOOP EXPERIMENT AND VALIDATION ...............................144
7.1 Simulation Model Development .........................................................................144
7.2 Human Experiments in Virtual Reality Environment .......................................148
7.2.1 VR Model Development ..........................................................................149
7.2.2 Human-in-the-loop Experiment and Validation ......................................152
7.3 Emergency Evacuation Simulation Results.......................................................165
CHAPTER 8
DISTRIBUTED COMPUTING TECHNIQUES FOR LARGE SCALE SIMULATION ..169
8.1 Distributed Simulation Infrastructure .................................................................169
8.2 Web Services Technology for Distributed Simulations ....................................170
CHAPTER 9
EXTENSION OF PROPOSED APPROACHES TO OTHER APPLICATIONS ............175
9.1 Background .........................................................................................................176
9.2 Community-based Software Development Process: Case of Kuali ..................178
9.2.1 Kuali Foundation and its Organization Structure ....................................178
9.2.2 Enhancement Request Process .................................................................180
9.3 Integrated Simulation Framework involving Multi-Paradigm Simulations ......181
9.3.1 Evaluation Aid for Development Manager ..............................................183
9.3.2 Functional Council (FC) Decision Simulator ..........................................185
9.3.3 Scheduling Aid for Functional Council ...................................................188
9.3.4 Simulation of Entire Enhancement Request Process ...............................196
9.4 Implementation and Experimental Results ........................................................201
9
TABLE OF CONTENTS – Continued
9.4.1 Experimental Results involving Evaluation Aid for Development
Manager .......................................................................................................201
9.4.2 Experimental Results involving FC Decision Simulator .........................203
9.4.3 Experimental Results involving Scheduling Aid for Functional
Council ........................................................................................................206
9.4.4 Experimental Results involving Simulation of Entire Enhancement
Process .........................................................................................................209
CHAPTER 10
SUMMARY AND CONCLUSIONS..................................................................212
10.1 Summary of the Research Work .........................................................................212
10.1.1 Contributions in DFT ...............................................................................213
10.1.2 Contributions in Human Decisions in Organizational Social Network ...214
10.1.3 Contributions in BBN-RL Hybrid Learning Model ................................214
10.1.4 Contributions in Distributed Simulation Infrastructure ...........................215
10.2 Firsts in the Research ........................................................................................216
10.3 Future Directions of Research ...........................................................................216
APPENDICES ......................................................................................................................218
A. CAVE 3D MODEL DEVELOPMENT .................................................................218
B. MATLAB CODE FOR DFT SIMULATION........................................................230
C. HUMAN SUBJECTS PROTECTION PROGRAM APPROVAL LETTER ....................234
REFERENCES .....................................................................................................................235
10
LIST OF ILLUSTRATIONS
FIGURE 2.1: Notions of realism including i) strong realism, ii) realism, and iii) weak
realism (Fasli, 2003) ..............................................................................................33
FIGURE 2.2: The CLARION architecture (Sun, 2007) ....................................................36
FIGURE 2.3: SCREAM system architecture (Prendinger and Ishizuka, 2002) ................36
FIGURE 2.4: Intersections of the research areas of social science, agent-based computing,
and computer simulation (Davidsson, 2002) .........................................................41
FIGURE 3.1: Washington, D.C. Mall area considered in the scenario .............................46
FIGURE 3.2: Components of the extended BDI framework .............................................50
FIGURE 4.1 Sequence diagram of components (corresponding techniques) of the
proposed human behavior model ...........................................................................52
FIGURE 4.2: BBN-based EDFT for dynamically changing environment ........................58
FIGURE 4.3: Bayesian belief network for stock investment ............................................60
FIGURE 4.4: A graphical depiction of two options in the stock market example ............61
FIGURE 4.5: Comparison of choice probabilities between static M(t) and dynamic M(t)64
FIGURE 4.6: Comparison of choice probabilities between static W(t) and dynamic W(t)65
FIGURE 4.7: Comparison of choice probabilities between static M(t) and W(t) and
dynamic M(t) and W(t) ...........................................................................................66
FIGURE 4.8: Steady choice probability and time steps to the convergence of the expected
preference values (simulation results) ...................................................................71
FIGURE 4.9: Steady choice probability and time steps to the convergence of the expected
preference values at dynamically-changing environment (simulation results)......75
FIGURE 4.10: Screen capture of the virtual stock trading software used in the experiment
................................................................................................................................78
FIGURE 4.11: Relationship between ‘Return’ weight increment and ‘Index increment’ 79
FIGURE 4.12: Bayesian belief network constructed from the stock trading experiment .81
FIGURE 4.13: Stabilization of choice probability over n time steps (1st decision in EDFT)
................................................................................................................................84
11
LIST OF ILLUSTRATIONS - Continued
FIGURE 4.14: Probability of predicting the correct option for DFT and EDFT in the
considered 10,000 replications of simulations .......................................................87
FIGURE 4.15: BBN used for the perceptual processor of BDI agent under emergency
evacuation scenario ................................................................................................88
FIGURE 5.1: Pseudo code of the proposed planning algorithm .......................................95
FIGURE 5.2: Satellite image of an evacuation area and its graphical representation .......96
FIGURE 5.3: Illustration of planning algorithm..............................................................100
FIGURE 6.1: Q-Learning algorithm pseudo code (training/learning phase) ..................114
FIGURE 6.2: Q-Learning algorithm pseudo code (operation phase) ..............................115
FIGURE 6.3: Emergency evacuation simulation in AnyLogic .......................................118
FIGURE 6.4: Normalized action selection probability distributions (Q matrix) under
different states using α = 0.5, αt = 0.7, and γ = 0.5 ..............................................121
FIGURE 6.5: Evolution of CI using softmax selection policy for different γ for each α 125
FIGURE 6.6: Evolution of CI using greedy selection policy for different γ for each α ..127
FIGURE 6.7: Evolution of CI using greedy selection policy for different α for each γ ..130
FIGURE 6.8: Evolution of CI using softmax selection policy for different γ for each αt133
FIGURE 6.9: Evolution of CI using softmax selection policy for different αt for each γ
..............................................................................................................................136
FIGURE 6.10: Evolution of CI using greedy selection policy for different γ foreach αt 139
FIGURE 6.11: Evolution of CI using greedy selection policy for different αt for each γ142
FIGURE 6.12: Evolution of CI without applying Q learning for each α.........................143
FIGURE 7.1: An exemplary rule written in Tcl in Soar ..................................................145
FIGURE 7.2: State charts for representing agent behaviors ............................................147
FIGURE 7.3: Emergency evacuation simulation in AnyLogic interacting with BBN, DFT,
and Soar ...............................................................................................................148
FIGURE 7.4: Human-in-the-loop experiment in the CAVE system ...............................150
FIGURE 7.5: CAVE system having four screens used in the human experiment ..........151
12
LIST OF ILLUSTRATIONS - Continued
FIGURE 7.6: An exemplary VR model developed using Google SketchUp ..................151
FIGURE 7.7: Impact of number of police officer on the average evacuation time and 95%
confidence interval ...............................................................................................166
FIGURE 7.8: Impact of number of leader on the average evacuation time and 95%
confidence interval ...............................................................................................167
FIGURE 7.9: Impact of Q-Learning on the average evacuation time and 95% confidence
interval .................................................................................................................168
FIGURE 8.1: Architecture for distributed simulation .....................................................170
FIGURE 8.2: WSDL snippet for initialise(fedName) ......................................................173
FIGURE 8.3: WSDL snippet for advanceTime(reqFedName , timeVal) ........................173
FIGURE 8.4: WSDL snippet for sendMessage(fedName , msg) .....................................174
FIGURE 8.5: WSDL snippet for getMessage( requestingFedName ) returns message ..174
FIGURE 9.1: Kuali organization chart (Source: http://www.kuali.org/).........................179
FIGURE 9.2: Sequence diagram for the enhancement request process ..........................181
FIGURE 9.3: Sequence diagram of the Evaluation Aid process .....................................184
FIGURE 9.4: BBN inferring DM’s evaluation on the required effort and impact of an
enhancement request ............................................................................................185
FIGURE 9.5: Sequence diagram of the use of FC Decision Simulator ...........................186
FIGURE 9.6: BBN inferring FC’s evaluation of the enhancement request ....................187
FIGURE 9.7: Sequence diagram of the Schedule Aid process ........................................189
FIGURE 9.8: Behavior of f(x) and g(y) in Equations (2) and (3) ....................................193
FIGURE 9.9: Examplary iterations (negotiation process) between FC (real FC in iteration
1 and simulated FC in iterations 2, 3, 4, 5) and simulated PM ...........................196
FIGURE 9.10: Causal Loop diagram for the enhancement request process ..................200
FIGURE 9.11: Stock-Flow diagram for the enhancement request process .....................201
FIGURE 9.12: Simulation results: evaluation of DM on an enhancement request .........203
FIGURE 9.13: Estimated FC’s evaluation of the enhancement request ..........................204
13
LIST OF ILLUSTRATIONS - Continued
FIGURE 9.14: Evolution of preference on acceptance/rejection of an enhancement
request .................................................................................................................206
FIGURE 9.15: Exemplary iterations (negotiation process) involving different behaviors
of FC (real FC in iteration 1 and simulated FC in the other iterations) and PM
(simulated) against the conflicts ..........................................................................209
FIGURE 9.16: System dynamic simulation results for different flow rates ....................211
14
LIST OF TABLES
TABLE 4.1: Hypothetical subjective values of options depending on time......................56
TABLE 4.2: The value matrix M(t) used in the simulation ...............................................62
TABLE 4.3: Conditional probability P(Investment safety|Investment history, Index
increment) ..............................................................................................................82
TABLE 4.4: Choice probabilities of each model in 10 simulated experiments ................85
TABLE 6.1: Taxonomy of learning algorithms, models, and frameworks .....................107
TABLE 6.2: Application of learning algorithms under BDI framework.........................109
TABLE 6.3: Normalized action selection probability distributions (Q matrix) under
different states (1 ~ 4) using α = 0.5, αt = 0.7, and γ = 0.5 (see Section 4.4 for
details of considered states and actions) ..............................................................120
TABLE 7.1: The conditional distribution table for ‘Risk’ node in BBN collected from 6
subjects .................................................................................................................155
TABLE 7.2: The conditional distribution table for ‘Time’ node in BBN collected from 6
subjects .................................................................................................................157
TABLE 7.3: The conditional distribution table for ‘RiskWeight’ node in BBN collected
from 6 subjects .....................................................................................................160
TABLE 7.4: Comparison of decisions made by each subject and EDFT model .............161
TABLE 7.5: Weighted mean value of Risk (Table 7.1) and RiskWeight (Table 7.3) for
each subject ..........................................................................................................163
TABLE 7.6: Comparison of decisions accumulating subjects 1, 3, and 4 (risk averse) and
EDFT model using accumulated BBN ................................................................163
TABLE 7.7: Comparison of decisions accumulating subjects 2, 5, and 6 (risk prone) and
EDFT model using accumulated BBN ................................................................164
TABLE 7.8: Comparison of decisions accumulating all 6 subjects and EDFT model using
accumulated BBN ................................................................................................165
TABLE 9.1: Four different types of simulations considered in this work for the case of
Kuali .....................................................................................................................182
15
LIST OF TABLES - Continued
TABLE 9.2: Impact of a stakeholder’s decision on other stakeholders ..........................197
TABLE 9.3: Evaluation matrix M obtained from BBN in Figure 9.13 ...........................204
16
ABSTRACT
Modeling comprehensive human decision behaviors in a unified and extensible
framework is quite challenging. In this research, an integrated Belief-Desire-Intention
(BDI) modeling framework is proposed to represent the human decision behavior, whose
submodules (Belief, Desire, Decision-Making, and Emotion modules) are based on a
Bayesian belief network (BBN), Decision-Field-Theory (DFT), a probabilistic depth first
search (PDFS) technique, and a BBN-reinforcement (Q-Learning) hybrid learning
algorithm. A key novelty of the proposed model is its ability to represent various human
decision behaviors such as decision-making, decision-planning, and learning in a unified
framework.
To this end, first, we extend DFT (a widely known psychological model for
preference evolution) to cope with dynamic environments. The extended DFT (EDFT)
updates the subjective evaluation for the alternatives and the attention weights on the
attributes via BBN under the dynamic environment.
To illustrate and validate the
proposed EDFT, a human-in-the-loop experiment is conducted for a virtual stock market.
Second, a new approach to represent learning (a dynamic evolution process of underlying
modules) in the human decision behavior is proposed under the context of the BDI
framework.
Our research focuses on how a human adjusts his perception process
(involving BBN) dynamically against his performance (depicted via a confidence index)
in predicting the environment as part of his decision-planning. To this end, Q-learning is
employed and further developed.
17
To mimic realistic human behaviors, attributes of the BDI framework are reverseengineered from human-in-the-loop experiments conducted in the Cave Automatic
Virtual Environment (CAVE). The proposed modeling framework is demonstrated for a
human’s evacuation behaviors in response to a terrorist bomb attack. The constructed
simulation has been used to test the impact of several factors (e.g., demographics, number
of police officers, information sharing via speakers) on evacuation performance (e.g.,
average evacuation time, percentage of casualties).
In addition, the proposed human decision behavior model is extended for
decisions of many stakeholders that form a complex social network in the communitybased development of software systems.
To the best of our knowledge, the proposed human decision behavior modeling
framework is one of the first efforts to represent various human decision behaviors (e.g.,
decision-making, decision-planning, dynamic learning) in a unified BDI framework.
18
CHAPTER 1
INTRODUCTION
Human decision behaviors have been studied by various research communities
such as artificial intelligence, psychology, cognitive science, and decision science (Lee et
al. 2008). As a result of those efforts, several models have been developed to mimic
human decision behaviors. Lee et al. (2008) classified these into three major categories
based upon their theoretical approach: 1) economics-based approach, 2) a psychologybased approach, and 3) a synthetic engineering-based approach. Each approach exhibits
strengths and limitations. First, models employing the economics-based approach have a
concrete foundation, based largely on the assumption that decision makers are rational
(Mosteller and Nogee 1951, Simon 1955, Opaluch and Segerson 1989, Gibson et al.
1997).
However, one limitation is their inability to represent the nature of human
cognition (e.g., stress, fatigue, and memory). To overcome this limitation, models using
a psychology-based approach (second category) have been proposed (Edwards 1954,
Einhorn 1970, Payne 1982, Busemeyer and Diederich 2002).
While these models
explicitly account for human cognition, they generally address human behaviors only
under simplified and controlled laboratory conditions. As people are seldom confined to
the conditions of static laboratory decision problems, those models may not be directly
applicable to human behaviors in a more complex environment (Rothrock and Yin 2008).
Finally, the synthetic engineering-based models, which complement economics- and
psychology-based models, employ a number of engineering methodologies and
19
technologies to help reverse-engineer and represent human behaviors in complex and
realistic environments (Laird et al. 1987, Newell 1990, Rao and Goergeff 1998, Konar
and Chakraborty 2005, Sirbiladze and Gachechiladze 2005, Zhao and Son 2007,
Rothrock and Yin 2008, Lee et al. 2008). The human decision-making models in this
category consist of engineering techniques used to implement submodules. However,
given all of the possible interactions between submodules, the complexity of such
comprehensive models makes them difficult to validate against real human decisions.
More recently, a growing number of interdisciplinary work has been conducted to
complement each of the above-mentioned categories (Shizgal 1997, Sanfey et al. 2006,
Glimcher 2003, Sen et al. 2008, Gao and Lee 2006). In this work, we propose a novel,
comprehensive model of human decision-making behavior, effectively integrating
engineering-, psychology-, and economics-based models.
Another novelty of the
proposed model is its ability to represent both the human decision-making and decisionplanning functions in a unified framework.
In this work, the proposed human decision model is illustrated using scenarios of
emergency evacuation from a terrorist bombing attack in a large city. Effective crowd
management requires accurate prediction of the impact of such incidents on the crowd as
well as on the environment (which will affect the crowd’s behavior). Furthermore, the
human lives at stake require that such predictions be highly accurate. For these purposes,
high-fidelity simulation is an ideal technique, as it enables experiments not feasible
during real incidents. In this work, we construct a model of an individual with unique
characteristics (i.e., situation awareness) based on information extracted from human-in-
20
the-loop experiments. Those characteristics are instantiated as entities with different
attribute values to create a crowd, which will act in accordance with the proposed, highly
detailed human decision model.
1.1 PROBLEM STATEMENT AND OBJECTIVES
The purpose of this dissertation is to build a human behavior model that mimics
various aspects of a human decision behavior (e.g., preference, learning, confidence,
perception, decision-making, and decision-planning), validate it, and demonstrate and
apply it for a real complex scenario. The purpose is divided into the following detailed
objectives. The first objective is to develop a comprehensive and extensible human
behavior architecture that can represent various human behaviors under dynamic
situations.
In this work, an extended Belief-Desire-Intention (BDI) framework is
employed and tailored to provide appropriate architecture to manage general human
behaviors.
Unlike simple human behavior models (e.g., social force model) that
represent behavioral phenomena, BDI can mimic not only the phenomena but also the
decision processes that lead the phenomena. The second objective is to identify and
further develop techniques that can accomplish the functionality of the developed human
behavior architecture.
The extended BDI consists of many independent functional
modules that characterize each aspect of human decision-making process.
In this
dissertation, various engineering, economical, and psychological models and techniques
are identified and further developed for each submodule of the BDI framework. The
third objective is to create and conduct human-in-the-loop experiments in a highly
21
realistic virtual reality environment to collect real human behavior data for model
development and model validation. In this way, the agent will have more quasi-real
human behaviors. The fourth objective is to build a simulation model that can be used to
demonstrate, test, and validate the proposed human decision-making architecture and
techniques in the context of emergency evacuation.
In the emergency evacuation
simulation, each agent interacts with the environment as well as other agents, and its
accumulation results in the emergent behavior of the entire system. The fifth objective is
to extend the proposed flexible architecture and approaches to various applications. In
this work, various applications are considered to illustrate and demonstrate different part
of the proposed framework, such as emergency evacuation (see Section 3.1), communitybased software development process (see Section 9), and virtual stock market (see
Section 4.2.5).
1.2 BACKGROUND AND MOTIVATION
1.2.1 Background on Agent-based Modeling
Intelligent agent-based modeling has been applied in numerous research fields
including social science, cognitive science, economics, ecology, and engineering due to
its ability to model and analyze the complex world where the traditional modeling tools
are no longer as applicable as they once were. Furthermore, by incorporating a cognitive
architecture into intelligent agent, they embody generic descriptions of cognition in
computer algorithms and programs and provide a realistic basis for modeling individual
agents.
This cognitive agent is designed to act like real human so that it can be
22
implemented to simulate the human behavior instead of putting real human in the
situation.
However, until recently studies on the cognitive agent are focusing each
individual aspect of human behavior with the intention to integrate them in the end. This
approach has benefit on the model validation, but the integration of individual models
will bring another issue. Thus in this dissertation, we intend to propose a comprehensive
cognitive agent that mimics a complex human behavior in the everyday dynamic
environment.
1.2.2 Background on Machine Learning
Extensive research has been conducted on applying various machine learning
algorithms and models such as statistics, neural networks, and control theory into
understanding and mimicking human learning.
For example, statisticians have
introduced Bayesian models as a way to understand how human can deal with the
uncertainty. Learning Bayesian belief network (BBN), a widely studied topic in the field
of machine leaning (Jensen 1996), generally implies finding an optimal network structure
(structural learning) as well as prior distributions between the connected variables
(parametric learning). Although many researchers have developed various methods to
construct a BBN model such as Bayesian methods (Buntine 1991, Heckerman et al.
1994), quasi-Bayesian methods (Lam and Bacchus 1993, Suzuki 1993), and nonBayesian methods (Pearl and Verma 1991, Spirtes et al. 1993), a major obstacle for
practical implementation of a BBN is difficulty in constructing an accurate model,
especially when the training data is limited. To tackle this problem, Niculescu et al.
23
(2006) introduced a framework for incorporating general parameter constraints into
estimators for the parameters of a BBN. Similarly, Djan-Sampson and Sahin (2004)
utilized Scatter Search heuristic algorithm in identifying the best structure of a BBN. In
spite of all these efforts, construction of a BBN structure is still considered as a difficult
task compared with other learning techniques. Also, as discussed earlier, there is a gap
between the BBN learning model and actual human learning as most of the existing
models still focus more on finding the best solution (optimal behavior).
As another attempt for developing a human like learning machine, reinforcement
Learning (RL) has been adopted initially in the domain of psychology of animal learning
that concerns learning by trial and error. Later, in 1980s, RL has been implemented in
some of the earliest work in the field of artificial intelligence, where it was used in 1)
cognitive models that simulate human performance during problem solving and/or skill
acquisition (Sun et al. 2001, Sun et al. 2005, Gray et al. 2006, Fu and Anderson 2006)
and 2) the human error-processing system (Holroyd and Coles 2002). Furthermore,
Hoffmann et al. (2008) investigated the human reinforcement learning for movement
skills using behavioral paradigm mimicking a ball-hitting task.
As such, the RL
technique was successfully demonstrated to mimic the human behavior in some simple
problem solving situations especially when the prior knowledge is limited. Also, while
BBN training is an NP-hard problem, training in RL is performed relatively easily based
on the recursive mathematical formula.
However, a major drawback of RL is its
difficulty in being applied to complex problems as the states (which can be exhaustive for
complex problems) and actions need to be clearly defined beforehand. Thus, if the
24
environmental factors (e.g., states and actions) change, they need to be defined
accordingly. In addition, the RL method is more limited to employ prior knowledge than
the BBN method.
1.2.3 Motivation
On February 2nd 2004, in Saudi Arabia, 251 people died and many more were
hurt when people panicked during a crowded religious gathering. Similarly, on March
25th 2000, 13 people died and 44 were injured in Durban, South Africa, following mass
panic when someone released a can of tear-gas in a disco. During one hour and 42
minutes between the first airplane strike on the World Trade Center (WTC) on 11
September 2001 and the second one, more than 2000 people failed to escape, where
roughly 500 occupants are believed to have died immediately upon impact, and more
than 1500 trapped and died in the upper floors in the aftermath. As such, being wellprepared for such emergency situations are critical, which may save a lot of human lives.
The goal of this research is to develop a comprehensive human behavior simulation
model, which will be used to evaluate various evacuation management strategies as well
as training tools. Cognitive agent-based simulation that gives us a quasi-real result of
emergency situation will be the most preferable tool to study the emergency evacuation
plan without putting human in the life threatening situation.
1.3 Justification of Selected Methods and Techniques
•
Why have we chosen a BDI agent model to mimic human decision-making?
25
First, the core concepts of the BDI paradigm, originally based in folk
psychology, allow use of a programming language to describe human reasoning and
actions in everyday life (Norling 2004).
Because of this straightforward
representation, the BDI paradigm can easily map extracted human knowledge into its
framework. This characteristic enables a BDI paradigm-based system to imitate the
human reasoning and decision-making process, and also makes the system easy to be
understood by a real human.
Accordingly, the BDI paradigm has been applied
successfully in many medium-to-large scale software systems including an air-traffic
management system (Kinny et al. 1996). For this reason, we have adopted the BDI
framework as a core modeling and integration tool in our research. On the other hand,
Soar, Act-R, and Belief-Desire-Intention (BDI) are three popular synthetic
engineering-based models from which we could develop a more comprehensive,
modular, and computational human decision model.
Soar and Act-R have their
theoretical bases in the unified theories of cognition (Newell 1990), an effort to
integrate research from various disciplines to describe a single human cognition.
Thus, Soar and Act-R concentrate on the actual mechanisms of the brain during
information processing, including tasks such as reasoning, planning, problem-solving,
and learning.
Consequently, these models become complex and difficult to
understand. Second, the BDI paradigm is a relatively mature framework, and has
been successfully used in a number of medium to large scale software systems. Also,
several packages, such as AgentSpeak, Jack, and Jadex, exist to support the modeling
process. It is noted that in our work we used AnyLogic® (a Java-based general
26
purpose simulator) to implement the proposed BDI framework, which contains
several non-traditional (extended) modules. Finally, the BDI agent can be easily
integrated with other agent-based systems and also can be integrated with any
complex systems.
•
Why have we chosen Soar as opposite to rule-based system?
In Soar (Laird et al. 1987), knowledge is represented in the form of productions
rules due to the representational simplicity and modularity of production, as well as
the existence of efficient production matching algorithms. However, the mechanisms
that Soar uses in conjunction with these rules are quite different from typical rulebased systems (RBS).
The key features of (mechanisms used by) Soar, which
differentiate it from the traditional RBS are discussed below:
a. Parallel, associative memory
In Soar, all knowledge relevant to the current situation is activated, leading to
further elaboration of agent belief, proposals to perform different tasks, taking
action in the world, among others. On the other hand, typical RBS (rule-based
systems) choose one rule among matching rules rather than activating all of them.
To this end, RBS use conflict resolution methods (first applicable, random, most
specific, least recently used, best rule) when deciding between two matching rules.
Soar needs no conflict resolution for rule selection.
knowledge is brought to bear in parallel.
Instead, all relevant
27
b. Preference-based deliberation
Deliberation in Soar is mediated by preferences, which allow agents to bring
selection knowledge to bear for decisions. Using preferences, the agent can
express knowledge about which options it prefers in the current situation.
Through deliberation, Soar selects an operator which indicates a procedure with a
precondition and action. In RBS, individual rules are the operators. In Soar,
precondition and action components of the operator are implemented as separate
rules, which lead to fewer total rules because a single precondition rule can be
matched with any number of action rules. In contrast, the agent in RBS needs
rules for every possible combination of precondition and action, leading to a
potential combinatorial explosion in rules.
c. Automatic subgoaling
Soar recognizes conflict in selection knowledge and automatically creates a
subgoal to resolve it.
This automatic subgoaling gives Soar a meta-level
reasoning capability, the ability to reason about their own reasoning.
d. Decomposition via problem space
Automatic subgoaling enables task decomposition in Soar. At each step in the
decomposition, the agent is able to focus its knowledge on the particular options
at just that level and ignore considerations at other levels.
The process of
decomposition narrows a potentially exponential number of considerations into a
much smaller set of choices. Recognizing knowledge appropriate to the current
28
problem is the essence of the problem space hypothesis. Automatic subgoaling
leads to a hierarchy of distinct states. Thus, agents can use multiple problem
spaces simultaneously, and knowledge can be created that is specific to the unique
states. This makes knowledge search simpler (leading to faster rule matching).
Moreover, the knowledge base is naturally compartmentalized, providing a
scalable infrastructure with which to build very large knowledge bases.
e. Adaptation via generalization of experience
Once an agent comes to a decision that resolves an impasse, it summarizes and
generalizes the reasoning during the impasse.
This process results in new
knowledge that will allow the agent to avoid an impasse when encountering a
similar situation. Because the learning algorithm is an integral part of the overall
system, Soar also provides a structure that addresses when learning occurs (when
impasses are resolved), what is learned (a summarization of impasse processing),
and why learning occurs. The drawback of Soar’s learning mechanism is all
higher-level learning styles must be realized within the constraints imposed by
Soar’s basic learning mechanism. Thus, while Soar provides a lot of constraint
towards integrating multiple learning methods with behavior, realizing any
individual learning style is often more straightforward in a typical RBS.
In summary, although Soar uses rules as its lowest-level representation
language, the processes that Soar uses the rules differ significantly from other
traditional rule-based systems. Soar also adds architectural supports for operators and
29
problems spaces which are higher level, abstract representations not directly
supported in other rule-based systems. In the end, Soar systems scale much better (in
both performance and the manageability of the knowledge base) and require fewer
rules for sufficiently complex applications relative to typical rule-based systems, thus
Soar allows us to represent more detailed knowledge.
1.4 Organization of the Remainder of the Dissertation
The remainder of the dissertation is organized as follows. Chapter 2 provides an
introduction to intelligent agent and summarizes the literature survey of the previous
works in decision-making model, machine learning, agent-based modeling, and
emergency evacuation.
Chapter 3 presents a detailed description of an emergency
evacuation scenario and the proposed extended BDI framework. In Chapter 4, techniques
for each submodule of the proposed BDI framework and the overall interaction among
those techniques (therefore submodules) are described. In Chapter 5, details of the
proposed real-time planning algorithm are discussed. Various learning algorithms are
then identified and categorized for the context of the BDI framework in Chapter 6. Also,
the proposed BBN-RL hybrid learning model is described and examined in Chapter 6. In
Chapter 7, we discuss CAVE-based human-in-the-loop experiments for the emergency
evacuation situation and development of simulation model based on the human behavior
data from the experiment. In Chapter 8, we discuss distributed computing infrastructure
based on web services technology, which is used to integrate various submodules
(implemented in separate software applications) of each agent.
As an exemplary
30
extension, application of the proposed human behavior model into a community-based
software development process is presented in Chapter 9. Chapter 10 includes a summary
of the findings and the conclusions drawn for this research. The directions of future
research are also indicated.
31
CHAPTER 2
LITERATURE REVIEW AND BACKGROUND
In this chapter, an extensive literature review conducted is summarized. First, a
background of an intelligent agent technology is provided, followed by research works in
various, related topics such as human decision-making, machine learning, and agentbased simulation. Second, previous research works on the emergency evacuation are
discussed.
2.1 Intelligent Agent
While there is no universal agreement on a precise definition of the term ‘agent’,
most of the existing definitions tend to agree on many points. Some researchers such as
Bonabeau (2002) defined an agent as any type of an independent component (including
software, model, individuals), whose behavior can range from primitive reactive decision
rules to complex adaptive artificial intelligence (AI) techniques. Thus the component
may execute various behaviors appropriate for the system which it belongs to.
Furthermore, Casti (1997) argued that a component’s behavior must be adaptive in order
for it to be considered an agent by containing both base level rules for behavior as well as
higher level rules to change the base rules. From a computer science perspective, an
agent is insisted by emphasizing its autonomous behavior which makes independent
decisions. For example, Jennings (2000) defined the term “agent” as an encapsulated
computer system that is situated in some environment and that is capable of flexible,
32
autonomous action in that environment in order to meet its design objectives. Based on
this definition, Macal and North (2006) defines and characterizes an agent as following:
•
An agent is identifiable, a discrete individual with a set of characteristics and rules
governing its behaviors and decision-making capability.
•
An agent is situated, living in an environment with which it interacts along with
other agents.
•
An agent may be goal oriented, having goals to achieve (not necessarily
objectives to maximize) with respect to its behaviors.
•
An agent is autonomous and self-directed.
•
An agent is flexible, having the ability to learn and adapt its behaviors based on
experiences.
Samuelson and Macal (2006) pinpointed the fundamental feature of an agent to its
capability to make independent decisions. Based on the various definitions of an agent
mentioned above, an agent can be defined as an independent component which makes
intelligent and adaptive decisions autonomously.
2.1.1 Human Decision Making Models
The Belief-Desire-Intention (BDI) paradigm has been developed by Rao and
Georgeff (1998) based on Bratman’s (1987) argument that intentions play a prominent
role in an agent’s decision-making. The BDI paradigm provides us with the means of
33
describing different types of agents by adopting a set of constraints that describe how the
three attitudes (belief, desire, and intention) are related to each other.
This set of
constraints is called a notion of realism. As shown in Figure 2.1, Rao and Georgeff
(1998) define three types of notion strong realism, realism, and weak realism
characterizing a cautious, an enthusiastic and a balanced agent, respectively.
Figure 2.1: Notions of realism: i) strong realism, ii) realism, and iii) weak realism (Fasli
2003)
Due to this diversity in the type of agents, the BDI paradigm has been successfully used
in various applications. For example, it has been widely applied to medium to large scale
software systems such as an air-traffic management system by Kinny et al. (1996). Fasli
(2003) extended the notions of realism by considering combinations of relations between
the three attitudes and their dynamics. Using this concept, Fasli (2003) mainly focuses
on modeling heterogeneous agents. For example, depending on the relationships between
an agent’s beliefs and intentions, Fasli (2003) distinguishes between two broad categories
of agents – circumspect agent and bold agent. Although Fasli (2003) offers an insight
34
into different relations between the main attitudes as well as a systematic categorization
of agents, it does not offer any criteria for choosing the types of agents and fill the gap
between theory and application.
Later, Zhao and Son (2007) have further developed the original BDI framework
to include detailed, conceptual submodules. The intention module in the traditional
framework is expanded to include 1) deliberator, 2) planner, and 3) decision executor
submodules. Furthermore, a confidence state (or confidence index) is considered in the
model, which affects as well as is affected by three other mental modules. Using this
extended BDI framework, Zhao and Son (2007) aim to develop a human decision
behavior model, which is capable of 1) generating a plan in real-time as opposed to
selecting a plan based on a static algorithm and predefined/static plan templates that have
been generated off-line, 2) supporting both the reactive as well as proactive decisionmaking, 3) maintaining situation awareness in human language like logic to facilitate real
human decision-making (in the case the agent cannot handle the situation), and 4)
changing the commitment strategy adaptive to historical performance. They develop a
human operator model which is responsible for error detection and recovery in a complex
automated shop floor control system by employing LORA (Wooldrige 2000) logic to
represent beliefs, desires, intentions, and plans. However, the actual example used to
illustrate the proposed method did not provide a truly complex environment.
The Connectionist Learning with Adaptive Rule Induction ON-line (CLARION)
is a cognitive architecture that incorporates the distinction between implicit and explicit
processes and focuses on capturing the interaction between these two types of processes
35
(Sun, 2007).
CLARION (see Figure 2.2) consists of several distinct subsystems
including the action centered subsystem (ACS), the non-action centered subsystem
(NACS), the motivational subsystem (MS), and the metacognitive subsystem (MCS).
The ACS controls actions, whether for external physical movements or internal mental
operation.
The NACS maintains implicit or explicit general knowledge.
The MS
provides underlying motivations for perception, action, and cognition in terms of impetus
and feedback. Finally, the MCS monitors, directs, and modifies the operations of the
ACS dynamically, as well as the operations of all the other subsystems.
The SCRipting Emotion-based Agents Minds (SCREAM) (see Figure 2.3) is
architecture for emotion-based agents. It is developed by Prendinger and Ishizuka (2002)
as a plug-in to content and task specific agent systems such as interactive tutoring or
entertainment systems that provide possible verbal utterances for a character.
It is
designed as a scripting tool where content authors state the mental make-up of an agent
by declaring a variety of parameters and behaviors relevant to affective communication
and obtain qualified emotional reactions which are then input to an animation engine
visualizing the agent as 2D animation. The SCREAM may decide the kind of emotional
expression as well as its intensity based on the multitude of parameters that are derived
from the character’s mental state and the peculiarities of the social setting in which the
interaction takes place, and features of the character’s interlocutor.
36
Figure 2.2: The CLARION architecture (Sun 2007)
Figure 2.3: SCREAM system architecture (Prendinger and Ishizuka 2002)
37
In addition to paradigms or frameworks of an agent, some research efforts have
been focused on the perceptions of agent. Herrero and Antonio (2003) introduced a
mathematical model for human-like hearing perception. They decomposed the agent’s
perception into three blocks – Sensitive Perception, Attenuation, and Internal Filtering.
They implemented their hearing perception model into a war scenario simulation and
verified their model. While this kind of perception models can be easily verified and
validated, more research need to be done on the interactions between different kinds of
perceptions and how those perceptions can be utilized in the agent decision-making
process.
Kaminka and Fridman (2007) proposed a model of crowd behavior based on the
Social Comparison Theory (SCT, Festinger 1954) which is a well known social
psychology theory introduced and expanded since 1950’s. In order to implement SCT to
agents, Newell transformed it into a set of axioms: 1) when agents lack objective means
for evaluation, they compare their state features to those of others; 2) agents compare
themselves to those who are more similar to themselves; 3) agents take steps to reduce a
gap between themselves and the objects of comparison. Later, Kaminka and Fridman
(2007) took another step toward the modeling of SCT by transforming it into more
detailed algorithms.
2.1.2 Machine Learning
38
A major goal of research in the field of machine learning is design and
development of algorithms and techniques that allow computers to learn.
Machine
learning has a wide spectrum of applications including natural language processing,
syntactic pattern recognition, search engines, medical diagnosis, bioinformatics and
cheminformatics, detecting credit card fraud, stock market analysis, classifying DNA
sequences, speech and handwriting recognition, object recognition in computer vision,
game playing, and robot locomotion.
Gonzalez et al. (2003) present a learning theory for dynamic decision-making
(DDM) called instance based learning theory (IBLT).
IBLT proposes five learning
mechanisms in the context of a decision-making process: instance-based knowledge,
recognition-based retrieval, adaptive strategies, necessity-making process, and feedback
updates. They implement IBLT’s learning mechanism in an ACT-R cognitive model,
CogIBLT. In this research, they compared the results from real human (16 people) and
those from CogIBLT. A more exhaustive human experiment may be appropriate to
justify and demonstrate the proposed model more strongly.
Later, Gonzalez and Quesada (2003) examine the change in individuals’
recognition ability, as measured by the change in the similarity of decisions they make
when confronted repeatedly with consistent dynamic situations of varying degrees of
similarity. To this end, they test the hypothesis that dynamic decision-making (DDM)
performance is related closely to the ability to recognize similar stimuli through the
human in the loop experiment. In it, they designed to evaluate 1) whether decision
makers in DDM systems reuse past decisions, 2) whether increased similarity between
39
current and past situations leads to performance improvement, 3) whether similarity is a
reliable predictor of future performance, and 4) whether features of the task influence
recognition and fluctuate during task learning. From the human experiment involving 64
students, they could find that human decisions become increasingly similar with task
practice. Also they noticed that the similarity was determined by the interaction of many
task features rather than individual task features. However, their work did not include a
model that incorporates these human decision-making characteristics.
2.1.3 Agent-based Simulation
The agent based model (ABM) is a computational model for simulating the
actions and interactions of autonomous individuals in a network, with a view to assessing
their effects on the system as a whole. An essential idea of agent-based modeling and
simulation (ABMS) is that many phenomena, even very complex ones, can best be
understood as systems of autonomous agents that are relatively simple and follow
relatively simple rules of interaction (Samuelson and Macal 2006).
Repetitive,
competitive interactions between agents are a major feature of agent-based modeling,
which relies on the power of computers to explore dynamics out of the reach of pure
mathematical methods. In traditional discrete event simulation, entities follow sequences
of processes, which are defined from the top down system perspective. On the other
hand, ABM defines the behavior of each entity (bottom-up perspective) so that
simulation result reveals emergent behaviors of a system as aggregated behavior of each
entity. The main roots of the agent base simulation are in modeling human social and
40
organizational behavior and individual decision-making (Bonabeau 2002). With this, it is
required to represent social interaction, collaboration, group behavior, and the emergence
of higher order social structure (Macal and North 2006).
Social computing is a system that supports gathering, representation, processing,
use, and dissemination of information that is distributed across social collectivities such
as teams, communities, organizations, and markets (from Wikipedia). Some examples of
social computing include Web 2.0, Enterprise social software, Electronic negotiation and
markets, and Collaborative filtering. The goal of social computing is to support the
tendency of humans to interact with computers as if they were veritable social actors
(Prendinger and Ishizuka 2002).
Combining agent-based modeling with social computing, agent-based social
simulation (ABSS) that models social phenomena on the basis of autonomous agents has
grown recently (Sun 2007). As shown in Figure 2.4, Davidsson (2002) classifies research
areas in this field into 1) Agent-Based Social Simulation (ABSS), 2) Social Aspects of
Agent Systems (SAAS), 3) Multi Agent Based Simulation (MABS), and 4) Social
Simulation (SocSim) depending on different combinations of agent-based computing,
computer simulation, and social science. Other than ABSS which is defined above,
SAAS consists of social science and agent-based computing and includes the study of
norms, institutions, organizations, co-operation, and competition, among others. The
research in the intersection between computer simulation and agent-based computing is
named MABS and uses agent technology for simulating any phenomena other than social
phenomena on a computer. Finally SocSim is lying on the intersection between social
41
sciences and computer simulation and corresponds to the simulation of social phenomena
on a computer using typically simple models of the simulated social entities such as
cellular automata.
Figure 2.4:: Intersections of the research areas of social science, agent
agent-based
based computing,
and computer simulation (Davidsson 2002)
2.1.3.1 Benefits of Agent--based Simulation
The agent-based
based modeling (ABM) allows us model and analyze the complex
world which traditional modeling tools can no longer support. Advances in the database
technology (allowing a finer level of granularity) and computational power allow us to
compute large-scale
scale micro
micro-simulation
simulation models that would not have been plausible just a
couple of years ago (Macal and North 2006). This feature of ABM contributed to the
field of computer simulation by providing a new paradigm for the simulation of complex
systems with much interaction between the entities of the system (Da
(Davidsson
vidsson 2002). In
micro simulations, the structure is viewed as emergent from the interactions between the
42
individuals, whereas in macro simulations, the set of individuals is viewed as a structure
that can be characterized by a number of variables. Bonabeau (2002) claimed that the
benefits of ABM over other modeling techniques are 1) ABM captures emergent
phenomena; 2) ABM provides a natural description of a system; and 3) ABM is flexible.
ABM is best to be used for the following situations:
•
When the interactions between agents are complex, nonlinear, discontinuous, or
discrete.
•
When space is crucial and agents’ positions are not fixed.
•
When the population is heterogeneous and each individual is different.
•
When the topology of the interactions is heterogeneous and complex.
•
When the agents exhibit complex behavior, including learning and adaptation.
Sun (2007) claimed that agent-based social simulation can overcome the
limitations of traditional (equation-based) approaches, which represent the relationships
between entities by a set of mathematical equations. Furthermore, he incorporated a
cognitive architecture into ABSS (Agent-Based Social Simulation) that embodies generic
descriptions of cognition in computer algorithms and programs and provides a realistic
basis for modeling individual agents.
2.1.3.2 Drawbacks of Agent-based Simulation
43
Samuelson (2005) pointed out that many complex ABMs are dealing with
sufficiently sensitive issues, where validation becomes problematic and its gets worse as
models become more complex. Simulating the behavior of all of the units can be
extremely computation intensive and therefore time consuming (Bonabeau 2002) as well.
Although the computing power is still increasing at an impressive pace, the high
computational requirements of ABM remain a problem when it comes to modeling
extremely large systems. Similarly, Jennings (2000) identifies two major drawbacks
associated with the very essence of the agent-based approach: 1) the patterns and the
outcomes of the interactions are inherently unpredictable and 2) predicting the behavior
of the overall system based on its constituent components is extremely difficult
(sometimes impossible) because of the strong possibility of an emergent behavior.
Another issue of ABM in the social science field is that it most often involves human
agents with potentially irrational behavior, subjective choices, and complex psychology,
all of which are difficult to quantify, calibrate, and sometimes justify (Bonabeau 2002).
2.2 Emergency Evacuation
Bohannon (2005) reviews research efforts on the emergency evacuation
application.
Many scientists try to capture the behavior of crowds using computer
simulations. A diverse effort is under way to refine these models with real-world data.
Learning to predict and control these behaviors may save lives.
Aube and Shield (2004) provide a tool for modeling crowd dynamics factoring in
the presence of “leader” individuals who can influence the behavior of the crowd. The
44
model has been built on the social force model (Helbing and Molnar 1995) to analyze
situations that are most critical when 1) crowd members have little to no knowledge of
the optimal exit strategy to leave their current environment, and 2) they move around in a
stage of disorganized panic. They built a plug-in to an existing rendering and steering
framework called OpenStreet developed by Sony Computer Entertainment America. In
their approach, while physical interactions between agents are based on the social force
model, agents make their decisions using a rule-based system. Thus the model has
potential difficulty to represent a complex situation.
Helbing et al. (2000) build a model of pedestrian behavior to investigate the
mechanisms of (and preconditions for) panic and jamming by uncoordinated motion in
crowds. From the results of simulation, they suggest practical ways to prevent dangerous
crowd pressure and optimal strategies for escape from a smoke-filled room, involving a
mixture of individualistic behavior and collective ‘herding’ instinct. Their simulation
models of pedestrian are based on a generalized force model, which is particularly suited
to describing the fatal build up of pressure observed during panics. However, this model
has a limitation to represent complex human behavior because the behavior in the model
is based on the mathematical formula.
Similarly, Song et al. (2006) and Yuan and Tan (2007) introduced spatial
evacuation simulation model based on the mathematical behavior model. Yuan and Tan
(2007) use a two-dimensional basic cellular automata model considering two factors,
spatial distance and occupant density. Using the concept of social force model, Song et
al. (2006) build multi-grid model in which pedestrian occupies multiple grids. Both
45
research works have the same limitations in handling complex human behaviors as
agent’s behaviors are based on the mathematical formula.
46
CHAPTER 3
SCENARIO, PROPOSED ARCHITECTURE AND METHODOLOGY
3.1 Overview of Emergency Evacuation Scenario
In this chapter, the proposed human decision behavior model (see Section 3.2) is
illustrated in the context of crowd evacuation behaviors in response to a terrorist bomb
attack in Washington, D.C., National Mall area (see Figure 3.1 for map (satellite image)
of the area).
Figure 3.1: Washington, D.C., Mall area considered in the scenario
47
Given the scenario, we have characterized different types of agents (models of humans)
based on 1) the familiarity with the area (which will entail different evacuation planning),
2) risk-taking behavior, 3) the confidence index (affecting the moving speed of an agent
and leader/follower behavior), and 4) the guidance by police. In the considered scenario,
initially people with various goals (e.g., business, shopping, and tourist) are distributed
throughout the area. The scenario begins when an explosion occurs, the police are
informed of it via radio transmission, and the police ask people (agents) around them to
evacuate the area.
Agents’ evacuation behaviors will vary according to their
characteristics. For example, those who are familiar with the area (commuters) invoke
the multihorizon planning algorithm (see Figure 5.1) to develop their evacuation plan.
On the other hand, those who are not familiar with the area (novice agents) move from
intersection to intersection (i.e., invoking single-horizon planning algorithm) and may be
guided by a police officer or a commuter agent to the nearest exit. One example of an
exit point is a Metro station located well beyond the radius of the explosion (see Figure
6.3). The confidence index of an agent also decides the leader or follower behavior (see
Section 4.5). Once agents reach an exit point, they are discarded from our simulation.
3.2 Extended BDI Framework
In this section, BDI is discussed in terms of its submodules, and techniques that
we have employed and further developed in this research for the submodules. While this
section provides an overview of the submodules and techniques, Chapters 4, 5, and 6
discuss them in greatest details. As discussed in Section 2.1.1, BDI is a model of
48
human’s reasoning process, where a person’s mental state is characterized by three major
components: beliefs, desires, and intentions (Rao and Georgeff 1998).
Beliefs are
information that a human possesses about a situation, and beliefs may be incomplete or
incorrect due to the nature of human perception. Desires are the states of affairs that a
human would wish to see manifested. Intentions are desires that a human is committed to
achieve. Zhao and Son (2007) extended the decision-making module (corresponding to
the intention component) of the original BDI model to include three detailed submodules:
1) a deliberator, 2) a real-time planner, and 3) a decision executor in the decision-making
(intention) module (see Figure 3.2). This extension was necessary to accommodate both
the decision-making and decision-planning functions in a unified framework. Later, Lee
et al. (2008) further extended the model, appending an emotional module containing a
confidence index and an instinct index to represent more psychological natures of human.
Lee et al. (2008) employed noble techniques from various disciplines to realize each
component of the extended BDI, such as a Bayesian belief network (BBN) and decision
field theory (DFT). The emotion module affects and is affected by each component
throughout the decision-making process. While Zhao and Son (2007) achievement was
to provide a conceptual extension of the BDI model, this work discusses actual
algorithms and techniques that we have employed and further developed to realize the
submodules for the extended model. While decision-making behavior in the extended
BDI model is an ongoing and iterative process, we can start from the belief module for
the purpose of illustration (see Figure 3.2). The perceptual processor, a subjective
information filter, in the belief module translates information about the environment and
49
the agent (human model) itself into its beliefs. As a result, the agent has only partial and
possibly biased information about the environment and itself. This information therefore
is labeled belief, not knowledge. Then, based on the current beliefs (short-term memory),
the agent updates his instinct index. If the instinct index is below a threshold (normal
mode), the agent evaluates potential states of affairs and finds desirable states (desires)
through the desire generator. The agent selects one desire and generates intentions to
achieve the desire via the deliberator. The agent then generates alternative plans based
on its current beliefs to the direction of achieving his intention. A plan is a sequence of
actions in a plan. Once an optimal or satisfactory plan is identified, decision executor in
the decision-making module executes the series of tasks specified in the plan. On the
other hand, if the instinct index exceeds a threshold (instinct mode), decision executor
executes tasks based on his instincts retrieved from his beliefs (long-term memory)
without involving planning. In this work, a confidence index is an exponential smoothing
function of the deviation between what is predicted about the environment during the
planning stage and the actual environment during the execution stage. If the confidence
index exceeds a threshold (so that the model is operating in the so-called “confident”
mode), then the decision executor executes all the tasks in the plan. Otherwise the model
is operating in the so-called “suspicious” mode, and replanning is performed before
executing each task.
Figure 3.2: Components of the extended BDI framework
50
51
CHAPTER 4
PROPOSED TECHNIQUES FOR SUBMODULES IN EXTENDED BDI
FRAMEWORK
4.1 Overview of Simulation Development Workflow
In this research, we have employed and further developed novel techniques from
various disciplines to realize and implement each component of extended BDI (see
Chapter 3), including the Bayesian belief network (BBN), decision field theory (DFT),
and probabilistic depth first search (PDFS). Each of these techniques is explained in the
following subsections. Figure 4.1 depicts a sequence diagram of the overall decision
planning process, displaying the sequential interactions between components (and
corresponding techniques) of the extended BDI. Whenever an agent needs to make a
decision, it performs planning (single horizon or multi horizon) via PDFS, which in turn
accesses DFT and BBN to obtain preferences and assess the environment, respectively.
Once DFT obtains an assessment of the environment from BBN, it calculates the
preference value of each option, which will be used to calculate the choice probability of
each option. Then PDFS selects an option and makes a plan based on the calculated
choice probability. Since the decision has been made based on the preference value of
each option with the predicted human preference value provided by DFT, which has been
successfully applied to many cognitive tasks (Busemeyer and Diederich 2002), it can
mimic the cognitive nature of human decision behavior. However, for the applications in
52
which a decision depends solely on rational reasoning, DFT can be substituted with other
techniques such as rule-based decision-making.
sd Class Model
Agent
console
PDFS
DFT
BBN
*getPath(Environment info)
loop until finish plan
getPreference(Environment info)
getEnvtEval(Environment info)
:Evalu ation
:Prefe rence
:Pa th
Figure 4.1: Sequence diagram of components (corresponding techniques) of the proposed
human behavior model
4.2 Extended Decision Field Theory
4.2.1 Decision Field Theory
DFT is a human decision-making model based on the principles of psychology
rather than economics (Busemeyer and Diederich 2002). It provides a mathematical
framework leading to understanding the cognitive mechanism of the human deliberation
process in making decision under uncertainty (Busemeyer and Townsend 1993). DFT is
53
distinguished from the previous mathematical approaches in that it is probabilistic and
dynamic (Townsend and Busemeyer 1995). “Dynamic” here denotes that DFT considers
“time” as a factor affecting the decision. It is noted, however, that “dynamic” has two
meanings in our research in this dissertation. In addition to the above meaning, it also
refers to multiple and interdependent decisions that are made in an autonomously
changing environment (Gibson et al. 1997). DFT has been successfully applied across a
broad range of cognitive tasks including sensory detection, perceptual discrimination,
memory recognition, conceptual categorization, and preferential choice (Busemeyer and
Diederich 2002). The original DFT is briefly described in the following section.
DFT describes the dynamic evolution of preferences among options during the
deliberation time using the linear system formulation (see Equation (4.2.1)).
P (t + h) = SP (t ) + CMW (t + h)
(4.2.1)
In Equation (4.2.1), P(t)T=[P1(t),P2(t),P3(t)] represents the preference state where Pi(t)
represents the strength of preference corresponding to option i at time t<TD and TD is the
time that the final decision is made. The preference state is updated at every time step h.
Each element is explained below.
•
The stability matrix S provides the effect of the preference at the previous state (the
memory effect) and the effect of the interactions among the options. In detail, the
diagonal elements of S are the memory for the previous state preferences and offdiagonal elements are the inhibitory interactions among competing options
(exemplary S will be shown in Section 4.2.3). Matrix S is assumed to be symmetric
and the diagonal elements are assumed to have the same value. These assumptions
54
ensure that each option has the same amount of memory and interaction effects.
Furthermore, for the stability of this linear system, the eigenvalues λi of S are
assumed to be less than one in magnitude (|λi| < 1).
•
The value matrix M (m×n vector, where m is the number of options, and n is the
number of attributes) represents the subjective evaluations (perceptions) of a
decision-maker for each option on each attribute. For example, product brochures or
magazines provide consumers with objective facts. Given this objective information,
readers obtain their own subjective evaluations, which constitute the M matrix. If the
evaluation value changes according to the environment, the matrix M is constituted
with multiple states.
•
The weight vector W(t) (n×1 vector, where n is the number of attributes) allocates the
weights of attention corresponding to each column (attribute) of M. In the case that
M is constituted with multiple states, each weight wj(t) corresponds to the joint effect
of the importance of an attribute and the probability of a state.
An important
assumption of DFT is that the weight vector W(t) changes over time according to a
stationary stochastic process. This assumption allows us to derive four important
theories regarding the expected preference values (see Section 4.2.4).
•
The matrix C is the contrast matrix comparing the weighted evaluations of each
option, MW(t). If each option is evaluated independently, then C will be I (identity
matrix). In this case, the preference of each option may increase simultaneously (see
Equation (4.2.1)). Alternately, the elements of the matrix C may be defined as cii=1
and cij=-1/(n-1) for i≠j where n is the number of options. For example, the contrast
55
matrix C for the case with two options is shown below:
 1 −1
C =

 −1 1 
In this case (which is used in this research), preference increase of one option lowers
the preference of alternative options, and the sum of the elements of CMW(t) (m×1
vector, where m is the number of options) is always zero.
As discussed above, the only component of DFT that changes dynamically is the
weight vector W(t) (via random sampling), which we believe is not enough to represent
the preferences in the dynamically changing environment (see Section 4.2.3 for more
details). Therefore, we propose an extension to DFT to represent the preferences in a
dynamically changing environment.
4.2.2 Bayesian Belief Network-based Decision Field Theory Extension
This section discusses two extensions to the original DFT that we propose based
on the assumptions of the human behavior to cope with the dynamically changing
environment. First, we assume that the subjective evaluations about each option (the
value matrix M) may change during the decision deliberation. Second, the stochastic
process of the attention weight may change according to the dynamically changing
environment. Thus, the weight vector W(t) may involve different stochastic processes.
4.1.2.1 DFT Extension for Dynamic Changes of Evaluation on Options
56
In Equation (4.2.1), the matrix M - the hypothetical subjective values of each
option - is assumed to remain same or switch between predefined sets of values with
some probability over the deliberation time. However, the values of matrix M may
change during the decision deliberation more dynamically depending on the environment.
In this case, the value matrix M is also dynamic over time. Thus, we propose the
following extended model.
P (t + h) = SP (t ) + CM (t + h)W (t + h)
(4.2.2)
For the illustration purposes, the following exemplary decisions in stock investment will
be used throughout this chapter. In the exemplary stock investment, it is assumed that we
(decision-makers) consider only two attributes – ‘Investment safety’ and ‘Return’. It is
noted that they are decision-maker’s perceptions, not the objective values. As explained
in Section 4.2.1, a decision maker evaluates the options based on the given information
and composes the value matrix M. Table 4.1 shows the hypothetical subjective values of
each choice of stocks (options) in the stock market example. Since the values of these
attributes are subjective, they may change easily over time depending on the
environmental condition. For instance, Table 4.1 depicts that initial evaluation for the
option A on the ‘Investment safety’ attribute at time t, SA(t), are changed to SA(t + h) at
time t + h.
Table 4.1: Hypothetical subjective values of options depending on time
Options
A
Time t
Investment safety
SA(t)
Return
RA(t)
Time t + h
Investment safety
Return
SA(t + h)
RA(t + h)
57
B
SB(t)
RB(t)
SB(t + h)
RB(t + h)
Thus, the value matrix M has the dynamic representation as shown at Equation (4.2.3)
(see Equation (4.2.1) for comparison) and it changes during the deliberation time.
 S (t ) RA (t ) 
M (t ) =  A

 S B (t ) RB (t ) 
(4.2.3)
4.2.2.2 DFT Extension for Dynamic Changes of Attention Weights
As the environment changes dynamically, the attention weight of a decision
maker may also change. This is reflected to the change of the weight vector W(t) over
time in DFT. Regarding W(t), Roe et al. (2001) assumed that the weights are identically
and independently distributed (iid) over time, where the dynamic change of the
environment is not considered in the weight vector. Similarly, Diederich (1997) used a
Markov process to represent the switch between sub-processes which are individually iid.
In the Markov process, the moment at which the sub-processes switch over can be
considered as the moment when the environment related to the decision-making changes.
However, the changes of the sub-process in the Markov process are based on the
probability, not the current environment.
For this reason, the Markov process is
insufficient to cope with the dynamic environment. Therefore, it is necessary to employ a
richer technique accounting for the dynamic environment.
4.2.2.3 Bayesian Belief Network-based Extension
58
In this research, Bayesian belief network (BBN) is employed to incorporate the
extensions discussed in Sections 4.2.2.1 and 4.2.2.2. In other words, BBN enable EDFT
to model 1) the change of evaluation on the options and 2) the change of human attention
along with the dynamically changing environments (see Figure 4.2). Based on the given
information and the previous history, BBN infers the distribution of value matrix M(t)
and weight vector W(t). More details about BBN are discussed below.
Dynamic
Environment
BBN
Information
Infer W(t+h)
Information
DFT (Preference)
P(t + h) = SP(t ) + C M (t + h) W (t + h)
BBN
Infer M(t+h)
Figure 4.2: BBN-based EDFT for dynamically changing environment
BBN (see Figure 4.3) is a cause and effect network that captures the probabilistic
relationship, as well as historical information. More details on BBN are discussed in
Section 4.3. Figure 4.3 depicts an instance of BBN for the stock market situation. The
directed links represent the cause and effect relationship.
For example, a decision
59
maker’s ‘Investment history’ (the return on investment in the past) affects his/her
attention on ‘Investment safety’ in Figure 4.3. In the real stock market, the attention on
the ‘Investment safety’ attribute of a decision maker may be affected by numerous factors.
In this research, however, we consider a limited set of affecting factors, including the past
investment history (Investment history) and the increment of stock index (Index
increment). It is noted that more factors can be considered in the similar manner. In
Figure 4.3, each node represents the event that can occur. The distribution of node can be
either discrete or continuous. In this research, we used the discrete distributed node for
the sake of implementation. Thus each node has countable states. If there is a link from
node A to B, it is said that B is a child of A and A is a parent of B. In Figure 4.3,
‘Investment history’ and ‘Index increment’ are the parents and ‘Investment safety’ is the
child. In order to build a complete BBN, the probability distribution of each parent and
the conditional probability between the states of each parent and child are necessary.
These prior probabilities in BBN are attained (trained) through the human-in-the-loop
experiment (see Section 4.2.5 for more details) so that the distribution change of the
weight vector (W(t)) and the change of M(t) matrix can mimic the real human’s behavior.
60
Investment
Index
history
increment
Investment
safety
Figure 4.3: Bayesian belief network for stock investment
4.2.3 Significance of the Proposed Extensions
This section demonstrates the significance of the proposed extensions using
conceptual simulation (stock investment) in Matlab®, by comparing results between DFT
and EDFT. In the considered simulation, the time unit is ‘day’ and the time step h is
defined as one day. And, the deliberation time is fixed to 100 days, and the initial
preference P(0) is set to 0 vector. Figure 4.4 shows two options of the stock market
example, where each option is characterized with ‘Investment safety’ and ‘Return’. As
mentioned before, it is noted that ‘Investment safety’ and ‘Return’ are the perception of
each individual, not the true, objective values. In Figure 4.4, X axis represents the
investment safety, and high value of ‘Investment safety’ means small investment risk.
Similarly, Y axis represents the return from the investment. The ranges of both attribute
value are set from 0 to 5.
(Perceived) Return
61
B
A
(Perceived) Investment safety
Figure 4.4: A graphical depiction of two options in the stock market example
In this example, options A and B are assumed not close each other, and therefore they do
not have large interaction between them. Thus, the values of off-diagonal elements in the
S matrix are relatively small. The memory effect is set to decay slowly by setting a high
value to the diagonal elements. Considering these, we can define S as following.
 0.9 −0.01
S =

 −0.01 0.9 
4.2.3.1 Effect of Change in Value Matrix (M(t))
In this section, we investigate the impact of change of M(t) on the preference
states in the DFT. Table 4.2 depicts that the values of M(t+1) change at day 51, and
Equations (4.2.4) and (4.2.5) depict the corresponding M(t+1) and the preference model,
respectively. For the weight vector W(t), we used a simple Bernoulli process. It is
62
assumed that the weight changes in an all-or-none manner from one attribute to another
with some probabilities. The considered probabilities in our simulation are Pr(WInvestment
safety=1)=0.45,
Pr(WInvestment safety=1)=0.43. And, with probability 0.12 none of the two
attributes attains the weight.
Table 4.2: The value matrix M(t+1) used in the simulation
Day 0
Options
Day 51
Investment safety
Return
Investment safety
Return
A
3.5
1.3
3.4
1.3
B
1.3
3.5
1.3
3.5
 3.5

 1.3
M (t + 1) = 
  3.4
  1.3

1.3 
 if t ≤ 50,
3.5 
1.3 
 if 50 ≤ t
3.5 
(4.2.4)
 0.9 −0.01  p 1 (t )   1 −1 3.5 1.3   w1 (t + 1) 

+
 if t ≤ 50,



 p1 (t + 1)   −0.01 0.9   p2 (t )   −1 1  1.3 3.5   w2 (t + 1) 
(4.2.5)
P(t + 1) = 
=
 p2 (t + 1)   0.9 −0.01  p 1 (t )   1 −1 3.4 1.3   w1 (t + 1) 
+
 if 50<t




 −0.01 0.9   p2 (t )   −1 1  1.3 3.5   w2 (t + 1) 
The preference model shown in Equation (4.2.5) has been simulated until 100th
day for 2000 independent replications. At each time t, the option which has a higher
preference is considered to be chosen. Then, we can calculate the choice probability of
each option by counting the frequency of being selected (see Figure 4.5). In Figure 4.5,
the choice probabilities over time are compared between the case with dynamic M(t) (see
63
Equation (4.2.5)) and the static M(t) (M(t) does not change at day 51). In the figure, the
dotted lines, indicated with “+” and “o” symbols, represent the choice probabilities for
options A and B for the static M(t), respectively. Similarly, the solid lines indicated with
“x” and “” symbols represent the choice probabilities for options A and B for the
dynamic M(t), respectively. As shown in Figure 4.5, the outcomes at day 100 (when a
decision time is made) are different between two cases. While the probability for option
A is higher than option B for the static case, the result is opposite for the dynamic case.
Therefore, it is found that even a slight change of value of M (m11 changes 3.5 to 3.4) can
make different results, and the choice probability is very sensitive to the value of matrix
M. This supports the motivation and importance of the proposed research on considering
the dynamic case.
64
Figure 4.5: Comparison of choice probabilities between static M(t) and dynamic M(t)
4.2.3.2 Effect of Change in Weight Vector (W(t))
In the simulation considered in this section, the M(t) matrix is static, but W(t)
probabilities change (see Equation (4.2.6)) during the decision deliberation. Figure 4.6
depicts the effect of the W(t) change on the choice probability. The same notations used
in Figure 4.5 are applicable to Figure 4.6. As shown in Figure 4.6, the effect of the
change of W(t) is even bigger than the effect of the change of M(t) (see Figure 4.5).
65
 0.45 if t ≤ 50
Pr(WInvestment-Safety = 1) = 
0.43 if 50 < t
 0.43 if t ≤ 50
Pr(WReturn = 1) = 
0.45 if 50 < t
(4.2.6)
Figure 4.6: Comparison of choice probabilities between static W(t) and dynamic W(t)
4.2.3.3 Combined Effect of Changes in M(t) and W(t)
In the simulation considered in this section, the M(t) matrix is dynamic (see
Equation (4.2.5)) and W(t) probabilities change (see Equation (4.2.6)) during the decision
deliberation. Figure 4.7 depicts the choice probabilities for the considered simulation,
66
where the combined effect has been boosted up to invert and increase the choice
probability. Again, significance of changes in M(t) and W(t) has been shown in Sections
4.2.3.1, 4.2.3.2, and 4.2.3.3, and these results support the motivation and importance of
the proposed research on considering the dynamic case.
Figure 4.7: Comparison of choice probabilities between static M(t) and W(t) and dynamic
M(t) and W(t)
4.2.4 Four Theorems Regarding Expected Preference Values
67
The expected preference values (over the multiple replications) for each option at
a given time gives us an idea which option will be chosen more frequently before actual
deploying of DFT if the difference between the expected values for each option is large.
In this section, we introduce four important theorems about the expected preference
values of the two options decision-making problem.
These theorems tremendously
enhance the usability of DFT because it provides us with the minimum amount of time
steps needed for the preference values to be stabilized before we actually run the
simulation (evolve the DFT).
Theorem 1. In the two options decision-making problem of the original DFT, the
expected value of preference is E ( P(nh)) =
1
1 − Dn
E (v1 (h))  
1− D
 −1
where D = s11 − s12 and E (v1 (h)) = E ( w1 (h))(m 11 −m 21 ) + E (w2 (h))(m 12 −m 22 ) .
Furthermore E ( P(nh)) =
Proof:
1
1
E (v1 (h))   as n → ∞ .
1− D
 −1
Suppose the valence V (t ) = CMW (t ) .
s
assumption of DFT, we get S =  1
 s2
Then P (t + h) = SP (t ) + V (t + h) .
By
s2 
 1 −1
 and C =  −1 1  in the two options case. In
s1 


most applications, the initial preference state P (0) = 0 (Busemeyer and Diederich 2002).
Thus we get p1 (h) = s1 p1 (0) + s2 p2 (0) + v1 (h) = v1 (h) . So, the expected value of p1 ( h) is
E(p1(h)) = E(v1(h)). From the definition of V(t), E(v1(h)) = E(w1(h)(m11-m21) + w2(h)(m12-
68
m22)) = E(w1(h))(m11-m21) + E(w2(h))(m12-m22). Since the weight vector W(t) changes
over time according to a stationary stochastic process (see Section 4.2.1),
E(v1(h)) = E(v1(ih)) = E(p1(h)) for all i > 1
(4.2.7)
Now E ( p1 (2h)) = E (s1 p1 (h) + s2 p2 (h) + v1 (2h)) = (s1 − s2 ) E ( p1 (h)) + E (v1 (2h)) .
Let D = s1 − s2 then from Equation (4.2.7),
E ( p1 (2h)) = (s1 − s2 + 1) E ( p1 (h)) = ( D + 1) E ( p1 (h)) . Similarly
E ( p1 (3h)) = E ( s1 p1 (2h) + s2 p2 (2h) + v1 (3h)) = ( s1 − s2 ) E ( p1 (2h)) + E (v1 (3h)) =
D E ( p1 (2 h )) + E ( p1 ( h )) = D ( D + 1) E ( p1 ( h )) + E ( p1 ( h )) = ( D 2 + D + 1) E ( p1 ( h )) .
By induction,
E ( p1 ((n + 1)h)) = DE ( p1 (nh)) + E ( p1 (h)) .
(4.2.8)
n −1
Thus E ( p1 ( nh)) = ∑ D i E ( p1 (h)) . Since 0 < D < 1 from the assumption of S,
i =0
E ( p1 (nh)) =
1 − Dn
1
E ( p1 (h)) =
E ( p1 (h)) as n → ∞ .
1− D
1− D
(4.2.9)
From Equation (4.2.7) and p1 (nh) + p2 (nh) = 0 , the expected preference of DFT is
E ( P(nh)) =
1
1 − Dn
E (v1 (h))   . 1− D
 −1
Theorem 1 informs us the following fact. In the two choices problem, the sign of the
value E (v1 (h)) only determines which option has bigger expected preference values. For
example, if E (v1 (h)) > 0 then E ( p1 (t )) > E ( p2 (t )) .
And if E (v1 (h)) < 0 then
69
E ( p1 (t )) < E ( p2 (t )) . However, it does not necessarily mean that the option having a
higher expected preference has a higher choice probability which is calculated by
counting the number of cases involving a higher preference. The option which has a
higher choice probability is determined by the median of the sampled preference. The
option having a higher median will have a higher choice probability. We can also notice
that the expected preference value converges as n increases. This property leads us to the
following Theorem 2.
Theorem 2. The difference between the expected preference value E(P(nh)) and
 log k 
its converging value becomes less than ε > 0 after n time step where n = 
 and
 log D 
k=
1− D
ε.
E ( v1 ( h ))
Proof: Suppose n is large enough so that the difference is less than ε. Then from
Equation
(4.2.9)
we
obtain
1
1 − Dn
E ( p1 (h)) −
E ( p1 (h)) ≤ ε
1− D
1− D
.
Thus
Dn
1− D
1− D
E ( p1 (h)) ≤ ε . Finally we have D n ≤
ε . Let k =
ε . By taking
1− D
E ( p1 (h))
E ( v1 ( h ))
logarithm on both side we get n ≥
log k
.
log D
70
From Theorems 1 and 2, we can see that after some time step n the expected preference
values become steady. This also infers that the choice probability of each option will not
change after some time step n. Thus we can attain a steady preference value after n steps
of evolution. To illustrate the above discussions, we consider the stock example, where
 0.9 −0.01
S =

 −0.01 0.9 
,
 3.5 1.3 
M =

 1.3 3.5 
,
and
Pr(WInvestment-safety=1)=0.53
and
Pr(WReturn=1)=0.47. From Theorem 1, D = 0.9 − ( −0.01) = 0.91 and E(v1(h)) = 0.53(3.51.3) + 0.47(1.3-3.5) = 0.132. Thus, the value of the expected preference of stock A
converges to 1.4667. In Theorem 2, if we set ε = 0.01 , then k = 0.0068 and n = 53. To
validate these results, we executed the Matlab® simulation (see Appendix B) for 10,000
replications, and the results are depicted in Figure 4.8. As shown in Figure 4.8, after time
53 the expected value reaches to ε neighbor of its convergence and the choice
probabilities are stabilized. Therefore, the results from the theorems are valid.
71
Figure 4.8: Steady choice probability and time steps to the convergence of the expected
preference values (simulation results)
Now let us consider EDFT. In EDFT the weight vector W(t) and the value
matrix M(t) changes over time. Let ni be the number of time steps after (i-1)th V(t)
changes and Vi(t) be the ith changed valence. Then the following Theorem 3 can be
derived from Theorem 1.
Theorem 3. Thus the expected preference of EDFT is
72
 
 
n
nj
  q
 1 − D ni 
   q −1  j∑
1− D q
i
=i +1

E  P  ∑ ni h   = ∑ D

 E (v1 (t ))  + 
1− D
   i =1 
 1− D 
  i =1
 
 
q


 1 
q
 E (v1 (t ))   −1 

 

where q is the total number of changes of V(t), D = s1 − s2 , ni is the number of time steps
after (i-1)th changes of V(t), and Vi(t) be the ith valence for i > 1.
  q
  1 
1
q
Furthermore E  P  ∑ ni h   = 
 E (v1 (t ))  −1 as nq → ∞ .
   1− D 
 
  i =1
Proof:
From Theorem 1 we get E ( p1 (n1h)) =
1 − Dn
E (v11 (t )) .
1− D
However since V(t)
n1 −1
changes at n1h, from Equation (4.2.8) we get E ( p1 ((n1 + 1) h)) = D ∑ D i E (v11 (t )) +E (v12 (t ))
i =0
n1 −1
1
i =0
i =0
and E ( p1 ((n1 + 2) h)) = D 2 ∑ D i E (v11 (t )) + ∑ D i E (v12 (t )) .
n1 −1
n2 −1
 1 − D n1
Thus E ( p1 ((n1 + n2 )h)) = D n2 ∑ D i E (v11 (t )) + ∑ Di E (v12 (t )) = D n2 
i=0
i=0
 1− D
 1 − D n2

 1− D

1
 E (v1 (t )) +


2
 E (v1 (t )) . By induction

 ∑ nj
 
n
 1 − D ni 
1− D q
i
j =i +1


E ( p1 (( n1 + n2 + ⋯ + nq ) h)) = ∑ D

 E (v1 (t ))  + 
1− D
i =1 
 1− D 

 
q
q −1
Thus the expected preference of EDFT is

q
 E (v1 (t )) .

73
 q −1  ∑q n j
 
n
 
    j=i+1  1 − D ni 
1− D q
i

E  P  ∑ ni h   = ∑ D

 E (v1 (t ))  + 
1− D
   i =1 
 1− D 
  i =1
 
 
q


 1 
q
 E (v1 (t ))   

  −1

where
D = s1 − s2 and ni is the number of time steps after (i-1)th V(t) changes and Vi(t) be the ith
valence for i > 1. Theorem 3 induces the following Theorem 4.
Theorem 4. If EDFT evolves enough to converge whenever the weight vector W(t)
or the value matrix M(t) changes, then the difference between the expected preference
  q

value E  P  ∑ ni h   of EDFT and its converging value becomes less than ε > 0 after nq

  i =1
time step since the last change of the weight vector W(t) or the value matrix M(t) where
 log k 
1− D
nq = 
ε.
and k =

q
E (v1 (h)) − E (v1q −1 (h))
 log D 
Proof: Suppose n is large enough so that the difference is less than ε. Then from
Theorem 3 we have
 q −1  ∑q n j
 
n
n
1
1− D q
  j=i+1  1 − D i 
q
i

ε≥
E (v1 (h)) − ∑ D

 E (v1 (t ))  + 
1− D
1− D
 1− D 
 i =1 
 

 ∑ nj

 1 − D ni 
D
q
i
j =i+1


E (v1 (h)) − ∑ D

 E (v1 (t ))  .
1− D
1
−
D
i =1 




q
nq
q −1



q
 E (v1 (t ))  =



74
Since we assume that EDFT iterate enough to converge with each given V(t), D ni → 0
n
for i < q. Thus ε ≥
Dq
E (v1q (t )) − E (v1q −1 (t )) .
1− D
n
Finally we have D q ≤
1− D
ε.
E (v (t )) − E (v1q −1 (t ))
q
1
taking logarithm on both side we get nq ≥
Let k =
1− D
ε.
E (v (t )) − E (v1q −1 (t ))
q
1
By
log k
.
log D
Theorems 3 and 4 inform that after some time steps ni the expected preference values are
stabilized also in EDFT. The choice probabilities of each option remain same after some
time steps ni. Here, we consider the stock example used earlier in this section (see Figure
 3.5 1.3 
 3.4 1.3 
to M (t ) = 
4.8), where the value matrix M(t) changes from M (t ) = 


 1.3 3.5 
 1.3 3.5 
and the probabilities of weight vector changes from Pr(WInvestment-safety = 1) = 0.53 and
Pr(WReturn= 1) = 0.47 to Pr(WInvestment-safety= 1) = 0.47 and Pr(WReturn= 1) = 0.53. Then we
can calculate E ( v11 ( h )) = 0.132 and E ( v12 ( h )) = 0.47(3.4-1.3) + 0.53(1.3-3.5) = -0.179. By
Theorem 3 the value of the expected preference of stock A converges to -1.988. If we set
ε = 0.01 in Theorem 4, then k = 0.0029 and n2 = 62. To validate these results, we
executed the Matlab® simulation for 10,000 replications, and the results are depicted in
Figure 4.9. As shown in Figure 4.9, at time 53 the expected preference value (E(P(A))
reaches the ε neighbor of its first convergence (1.4667). Then, it continues to evolve with
the changed M(t) and W(t) for next 62 time steps until time 115, when the expected
75
preference value (E(P(A)) reaches the ε neighbor of its second convergence (-1.988) and
the choice probabilities are stabilized.
Figure 4.9: Steady choice probability and time steps to the convergence of the expected
preference values at dynamically-changing environment (simulation results)
4.2.5 Validation via Human-in-the-loop Experiment
In Section 4.2.2, we assumed that both the value matrix M(t) and attention matrix
W(t) may change during the deliberation time against the dynamic environment. To test
our assumptions, we designed and conducted a human-in-the-loop experiment. More
76
specifically, the goal of the experiment is to find 1) how the evaluations on the each
option (M(t)) are changed, and 2) how the attention weight of the decision maker (W(t)) is
changed.
The experimental results are also used to characterize BBN (conditional
probabilities between the nodes).
4.2.5.1 Human-in-the-loop Experiment Details
Software has been developed to allow a human-in-the-loop experiment involving
a virtual stock market, where a daily stock index and current and historical price of each
stock option are considered. The daily stock index and price are generated randomly
from normal distributions. The stock index and price have been arranged so that they are
positively correlated. This is the same example used in the previous sections. Figure
4.10 is the screen capture of the virtual stock market program used in the experiment.
Only two stock options (A and B) are considered in the experiment. Each experiment is
continued until the subject makes 10 decisions. At each decision in the experiment, the
subject is asked to choose a stock option within the decision deliberation time (10 time
units). In this work, we adopted the fixed stopping time rule which forces the decision
maker to make a decision within the given time. To this end, at the beginning of each
time unit the subject can choose one from following three options: 1) ‘buy stock A’ now,
2) ‘buy stock B’ now or 3) ‘buy later’ (postpone the decision to the next time unit). At
each time unit, the subject is also asked to express his/her perception on 1) the evaluation
of each option on two attributes - ‘Investment safety’ and ‘Return’ (M(t)) and 2) the
weights of attention on each attribute (W(t)). By decreasing the other weight value when
77
one weight value is increased, the summation of attention weights on each attribute is set
to be 1 all the time. The collected data is used to attain the conditional probabilities and
the distributions in BBN (see Section 4.2.2.3). Based on the evaluation and the weight
values received from the subject, the scores of each stock option are calculated using the
multiplication of M(t)⋅W(t) and are shown to the subject. It is believed that showing these
scores helps to draw the subject’s attention into the experiment, and also for them to
choose an option. If the subject chooses alternatives 1) or 2), the experiment continues
with next decision with a different environment (stock market index value, prices of
stocks A and B). If the subject chooses alternative 3), the experiment proceeds to the
next time period and the market environment evolves.
At the end of the given
deliberation time (beginning of the 10th time unit), the subject is enforced to make a
decision. In this case the decision maker has only two alternatives – buy stock A or B
(he/she cannot postpone the decision anymore). When the subject makes a decision (e.g.,
at the 5th time unit), the stock market continues to evolve till the 10th time unit, when the
stock is sold and the monetary margin between the buy price (at the 5th time unit) and the
sell price (at the 10th time unit) of the option is recorded as the reward for the subject.
Each experiment is continued until the subject makes 10 decisions (100th time unit),
during which the rewards are accumulated as ‘Investment history’.
experiment for 10 times.
We repeat this
78
Figure 4.10: Screen capture of the virtual stock trading software used in the experiment
4.2.5.2 Experimental Results and Analyses
From the human-in-the-loop experiment involving one subject with 10
experiments (see Section 4.2.5.1), we observed that the value matrix M(t) changed in the
dynamically changing environment. For example, when the stock price has increased
significantly, the evaluation (perception) of that stock (option) on the ‘Return’ attribute
has increased. In the experiment, the stock price increased by around 17% at the moment
when the evaluation value of ‘Return’ increased. Note that, on the average, the stock
79
price changed within 0.1% of the price throughout the experiment. We also observed that
the attention on the attributes ‘Investment safety’ and ‘Return’ (W(t)) were affected by
the ‘Index increment’ which is the difference between index values of previous and
current stock markets. Figure 4.11 depicts an approximate linear relationship between
the weight increment on ‘Return’ attribute and ‘Index increment’. The correlation value
between them is 0.68. Since the summation of ‘Investment safety’ and ‘Return’ is set to
be 1, there is also approximate negative linear relationship between the weight increment
on ‘Investment safety’ and ‘Index increment’
Figure 4.11: Relationship between ‘Return’ weight increment and ‘Index increment’
80
4.2.5.3 Validation and Comparison of EDFT with DFT and Human Decisions
As discussed in Section 4.2.2.3 and Figure 4.3, the proposed EDFT model uses
BBN to infer the changes of the value matrix M(t) and the weight vector W(t). This
section discusses BBN which has been built from the human-in-the-loop experiment data
(see Section 4.2.5.1). As discussed in Section 4.2.5.2, we observed the relationships
among several factors, including ‘Index increment’, ‘Investment history’, ‘Return’, and
‘Investment Safety’. Based on this analysis, we have constructed a BBN representing the
decision behavior of the subject who participated in the experiment (see Figure 4.12).
For each event (node) of BBN, we divided the range of collected data into 3 discrete
states – high, medium, and low. Then, we calculated the conditional distribution of each
descendant node given the state of the parent nodes. Table 4.3 depicts the conditional
probability between nodes ‘Index increment’, ‘Investment history’ and ‘Investment
safety’.
It is the conditional probability distribution of three discrete states in the
‘Investment safety’ node given the states of the parent nodes (‘Index increment’ and
‘Investment history’).
Other conditional probabilities are calculated in the similar
manner. As mentioned before, the BBN shown in Figure 4.12 was built based on the
experiment in Section 4.2.5.1. Thus, it can be used to infer the value matrix M(t) and the
weight vector W(t) only for the subject who had participated in the experiment. Different
BBNs will need to be built and used to infer M(t) and W(t) for different individuals.
81
Figure 4.12: Bayesian belief network constructed from the stock trading experiment
82
Table 4.3: Conditional probability P(Investment safety|Investment history, Index
increment)
Parent Node(s)
Investment History
Investment Safety
Index Increment
High
High
Medium
Low
High
Medium
Low
0.2
0.4
0.4
0.813
0.187
0.0
Low
1.0
0.0
0.0
High
0.125
0.375
0.5
Medium
0.306
0.625
0.069
Low
0.383
0.51
0.107
High
0.08
0.32
0.6
Medium
0.131
0.481
0.388
Low
0.167
0.533
0.3
Medium
In order to validate the constructed BBN (which is used as part of the EDFT), we
conducted one additional human-in-the-loop experiment involving 10 decisions (100 time
units) after constructing the model. In this experiment, at each time unit the same
environment (stock market index value, prices of stocks A and B) is presented to the
human subject, DFT simulation model, and EDFT simulation model. And, all three of
them are asked to choose a stock option (make a decision) within the decision
deliberation time (10 time units). In the simulations of DFT and EDFT, the initial
83
 0.2 −0.01
preference P(0) is set to 0 and S is set to S = 
 since this S gives the most
 −0.01 0.2 
similar choice with the human subject. Also, based on Theorems 2 and 4 in Section 4.2.4,
we used ε=0.01 to attain the steady choice probability. Simulations of the DFT and
EDFT models were replicated for 10,000 times, based on which the choice probability
has been calculated. It is noted that the time at which the human subject makes a
decision may be different from those of the DFT or EDFT models. For the DFT and
EDFT simulations, the decision-making time has been obtained from Theorems 3 and 4
(in Section 4.2.4) as they provide us with the number of estimated time steps (n) required
to obtain the stabilized choice probabilities. For example, Theorems 3 and 4 suggested 1
and 12 as the required time steps for the DFT and EDFT in the first decision, respectively.
These required time steps are calculated by Theorems 3 and 4 based on the weight vector
W(t) or the value matrix M(t) inferred from the BBN. In the 1st decision of the additional
experiment, the human decision maker changed the weight vector W(t) or the value
matrix M(t) 4 times. Then, based on the same information about the stock index and
prices when the decision maker changes W(t) or M(t), the BBN inferred the weight vector
W(t) or the value matrix M(t). Using these dynamically changed W(t) and M(t), EDFT
evolved until its choice probabilities were stabilized. Figure 4.13 depicts the choice
probability over n time steps for the first decision of the actual EDFT simulation, where
we can observe that the choice probability is actually stabilized at 12 (as suggested by the
Theorem 4).
84
Figure 4.13: Stabilization of choice probability over n time steps (1st decision in EDFT)
Table 4.4 shows performance of DFT and EDFT in terms of the probability of
choosing the same option against the actual choice of the human subject for 10 decisions
considered in the experiment. For example, in decisions 4, 6, 8, 9, and 10, DFT predicts
an opposite option against the option selected by the subject, but EDFT predicts the same
option. In decision 7, however, the EDFT model did not change the matrix M(t) nor
vector W(t) during the deliberation. So, both DFT and EDFT predicted an incorrect
option. The shaded numbers in Table 4.4 denote the case that the model predicts an
opposite (incorrect) option compared with the subject. DFT gives 6 such cases out of 10,
85
whereas EDFT gives only one such case. Therefore, we can see that performance of
EDFT is better than that of DFT. These results demonstrate the effectiveness of the
proposed EDFT under the dynamically-changing environment. Figure 4.14 depicts the
probability of predicting the correct option (same option with the human subject) for DFT
(solid line) and EDFT (dotted line) in the considered 10,000 replications of simulations.
As shown in Figure 4.14, we can see that EDFT gives higher probability to select the
correct option in most of the decisions in the experiment. Also in 9 out of 10 cases, it
gives the probability that is greater than 0.5. Again, the simulation results validated the
accuracy of the proposed EDFT model against the dynamically changing environment.
Table 4.4: Choice probabilities of each model in 10 simulated experiments
DFT
Decision 1
EDFT
DFT
Decision 2
EDFT
DFT
Decision 3
EDFT
DFT
Decision 4
EDFT
Decision 5
DFT
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Choice probability
0.3934
0.6066
0.3724
0.6276
0.4547
0.5453
0.2471
0.7529
0.3995
0.6005
0.4635
0.5365
0.5984
0.4016
0.4386
0.5614
0.6124
0.3876
Actual choice
B
B
B
B
A
86
EDFT
DFT
Decision 6
EDFT
DFT
Decision 7
EDFT
DFT
Decision 8
EDFT
DFT
Decision 9
EDFT
DFT
Decision 10
EDFT
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
Pr(A)
Pr(B)
0.8739
0.1261
0.6534
0.3466
0.208
0.792
0.5344
0.4656
0.5344
0.4656
0.5015
0.4985
0.2629
0.7371
0.4984
0.5016
0.7298
0.2702
0.4991
0.5009
0.7801
0.2199
B
B
B
A
A
87
0.9
Model
DFT
EDFT
0.8
0.7
Probability
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
1
2
3
4
5
6
7
8
9
Decision
Figure 4.14: Probability of predicting the correct option for DFT and EDFT in the
considered 10,000 replications of simulations
4.3 Bayesian Belief Network (BBN)
As mentioned previously, in this research, a BBN is employed to represent the
perceptual processor (see Figure 3.2) of the proposed human behavior model in the
dynamically changing environment. The BBN is a cause and effect, directed acyclic
network, where nodes represent the variables to be considered, and whose arcs directions
encode the conditional dependencies and cause-effect relationship between variables. By
using BBN, we can capture probabilistic relationship as well as historical information
between variables by incorporating prior and conditional probabilities that can be used to
88
infer posterior probabilities using Bayes’ theorem. The major advantage of BBN as a
perceptual processor is its ability and flexibility to handle uncertain and dynamic
environments. In other words, even if not all the information considered by the BBN is
currently available, the BBN still can give a convincing probabilistic answer based on
historical data. Furthermore, the Bayesian framework for probabilistic inference provides
a way to understand the inductive problem solving, moreover perhaps the problem
solving in human mind. For this reason, we have adopted BBN for the perceptual
processor in the Belief module of BDI framework (Lee et al. 2008), and Bayesian models
have become prominent over a broad spectrum of the cognitive science (Griffiths et al.
2008).
Figure 4.15: BBN used for the perceptual processor of BDI agent under emergency
evacuation scenario
89
Figure 4.15 depicts a BBN used to infer the beliefs of an agent in response to the
evacuation scenario (described in Section 3.1). The beliefs that the BBN infers from
environmental information (e.g., smoke, fire, police, crowd, and distance) consist of 1)
the perceived values of attributes (risk and evacuation time) for the option under
consideration (i.e., a path from an intersection) and 2) weights accorded to each attribute
(risk vs. evacuation time). The weights associated with each attribute at time t,
 w (t ) 
W (t ) =  risk 
 wtime (t ) 
can be obtained from ‘RiskWeight’ node of BBN in Figure 4.15 by defining wrisk(t) =
‘RiskWeight’ and wtime(t) = 1- ‘RiskWeight’. Similarly, the perceived value of each
attribute of each available option at time t,
 m1 (t )
 risk

M (t ) =  m 2 ( t )
risk

⋮


m1
(t ) 
time 

m 2 (t ) 
time

⋮


i
can be obtained from ‘Risk’ and ‘EvacuationTime’ nodes of BBN by assigning m ris
=
k (t )
i
‘Risk’ and m tim
= ‘EvacuationTime’ for option i. Thus these inferred W(t) and M(t)
e (t )
will be fed into Equation (4.2.1) to derive choice probability via DFT. Although only
two attributes (risk and time) are considered in this research, it is noted that additional
attributes (e.g., fear of dying, potential for escape, desire to help others) can be
incorporated easily. The inferred belief from BBN is intended to be similar to that of real
human. In this research, this similarity is obtained by constructing BBN based on the
90
data from human-in-the-loop experiments. Details of the experiment are described in
Section 4.2.5.3 (for virtual stock market scenario) and Section 7 (for emergency
evacuation scenario).
4.4 Probabilistic Depth First Search (PDFS) for Real-time Planner
In this research, we have employed the probabilistic depth first search (PDFS)
method (in conjunction with DFT) to implement the real-time planner in the extended
BDI framework (see Figure 3.2). To this end, we have employed Soar software (Laird et
al. 1987) to implement the PDFS. While Soar is also known as a general cognitive
architecture, the aspect of Soar employed in this research is the computer programming
tool, which provides built-in data structures and operators for depth first search. Soar
searches the problem space in a depth-first manner, where a particular branch is selected
based on the choice probability of each branch. In order to evaluate a series of decisions
in a plan, Soar first proposes all available options and then selects one of them as the next
task based on their preference values. Soar has eleven types of preferences including
acceptable, require, prohibit, reject, better, worse, best, worst, unary indifferent, binary
indifferent, and numeric indifferent. Except for the numeric indifferent preference, all of
the preference types are deterministic.
For example, an option with the require
preference must be selected, and an option with the prohibit preference must not be
selected. The preferences better and worse enable the comparison of two options, and the
superior option is selected. When the preference is of the numeric indifferent type, the
numeric values of each option are interpreted as choice probabilities.
91
As discussed in Section 4.2, DFT provides a method of calculating preference
values for each option based on the current environment. Preferences evolve over a
series of time steps as the agent’s attention shifts between attributes. At the end of this
process (known as DFT evolution), the agent selects the option with the highest
preference value. For each decision, we performed 1000 replications of DFT evolution.
We calculated the choice probability for each option by counting the proportion of the
1000 replications in which that option ended up with the highest preference value. For
the binary choice problem, Theorem 1 in Section 4.2.4 proved that there is a finite time t
(a duration of DFT evolution) when the choice probability converges as shown below.
The difference between the expected preference value E(P(nh)) and its converging value
 log k 
in DFT becomes less than ε > 0 after n time step where n = 
 , D = s11 − s12 ,
 log D 
E (v1 (h)) = E ( w1 (h))(m 11 − m 21 ) + E ( w2 (h))(m 12 −m 22 ) ,
and k =
1− D
ε
E (v1 ( h))
.
To obtain the converged choice probability, we must perform DFT evolution for a
sufficient number of time periods. If we fail to evolve the preferences for enough time
periods, the choice probability that we calculate will be inconsistent across DFT
deployments. Once we obtain a converged choice probability for each option, we feed it
into Soar as the value of a numeric indifferent preference. For example, based on the
environmental information (Ismoke, Ifire, Ipolice, Icrowd, and Idistance in our scenario of Section
3.1) and the available options (Oright, Oleft, Oforward, Obackward which denote going right, left,
92
forward, and backward, respectively in our scenario of Section 3.1), DFT evolves the
preference values for each option until they converge in each of the 1,000 replications.
Let us suppose the numbers of occurrence of each option having the highest preference
value are given (pright, pleft, pforward, pbackward) based on the converged preferences from
multiple replications (p =
∑
pi
). And suppose further that we select an option with
i∈Options
the highest preference value as a final decision in each replication.
Then, we can
calculate the choice probability of each option (Pr(Oright) = pright/p, Pr(Oleft) = pleft/p,
Pr(Oforward) = pforward/p, Pr(Obackward) = pbackward/p) and feed them into Soar as the value of
numeric indifferent preference.
Then, Soar selects an option based on the given
preference (choice probability in this case). The above procedure is repeated for the
given length of the planning horizon, which differs for each individual human (agent).
For example, the planning horizon for a novice agent (who has no knowledge about the
area) would be one step (i.e., next path). However, a commuter who is familiar with the
environment may have a multihorizon plan.
In this work, the planning horizon of
commuter agent is set randomly to a value between 2 and 10. More details about the
multihorizon planning algorithm are discussed in Section 5.
4.5 Confidence Index
As mentioned before, the confidence index determines 1) the execution mode
(confident or suspicious mode) in the BDI framework, 2) the type of agent (leader or
follower), 3) the speed of movement, and 4) the length of planning horizon. Equation
(4.5.1) depicts the confidence index proposed in this research.
93
C I t = α ⋅ e -d t + (1 − α ) C I t-1
(4.5.1)
where dt > 0, 0 ≤ α ≤ 1, and 0 ≤ CI0 =β ≤ 1. In Equation (4.5.1), dt denotes the deviation
between what is predicted about the environment during the planning stage and the actual
environment during the execution stage. Thus, the agent updates its confidence index at
each street intersection, where it can compare its prediction with an actual observation.
i
Equation (4.5.2) depicts dt used in our research, where m risk
( t ) is the evaluation of the
i
risk associated with option i, and m tim
is the evaluation of the evacuation time
e (t )
associated with option i,
i
i
i
i
d t = (mrisk
(t ) − mrisk
(t − 1)) + (mtime
(t ) − mtime
(t − 1)) .
(4.5.2)
The parameter α represents the effect of the agent’s previous level of confidence on its
current level of confidence, which varies depending on the individual human. The initial
confidence value (β) is given and depends upon the agent’s type. By definition, the range
of the confidence index is between 0 and 1. In this work, the initial confidence index
assigned to commuter agents and novice agents are unif(0.5, 1) and unif(0.2, 0.7),
respectively.
94
CHAPTER 5
DETAILED REAL-TIME DECISION PLANNING ALGORITHM
This section discusses in greater detail the proposed decision-planning algorithm
(both multihorizon and single-horizon) (see Sections 5.2 and 5.3) for part of the
Decision-Making submodule of the extended BDI framework (see Section 3.2), which is
implemented in Soar.
5.1 Algorithm in Pseudo Code
As mentioned earlier, Soar is a computer programming tool which can be used to
materialize theories and concepts in various fields of cognitive science such as
psychology, linguistics, anthropology, and artificial intelligence. In order to make an
architecture (Soar) produce cognitive behaviors, we need to provide contents such as
knowledge and rules into it. Thus, we have developed our planning algorithms within the
Soar architecture to let Soar devise a plan accordingly.
Figure 5.1 depicts pseudocode for the proposed planning algorithm for the
evacuation application (see Section 3.1), which is implemented in Soar. Using this
algorithm, an agent (human) develops his evacuation plan (route) dynamically (involving
from 1 to 10 planning steps) until he reaches his destination. The first line of Figure 5.1
shows that the application uses the BBN and DFT to obtain preference values for the
paths directly accessible from the current position. Then, the algorithm works differently
depending on the type of agent – novice or commuter. As noted in Section 3.2, the agent
95
revises its plan when its confidence index falls below a threshold value. The following
subsections will discuss both cases in detail.
1: CALL BBN and DFT to get the preferences of PATHs from the current position
2: IF Agent has knowledge of local paths THEN
3:
4:
REPEAT
SELECT a PATH that is directly connected to the current position based on
the probability distributed according to preference
5:
SET the preference value of the selected PATH to ‘worst’
6:
ADD PATH to list of PATHs
7:
IF the selected PATH forms a cycle, THEN DELETE the PATHs in the cycle
8:
UPDATE current position to the destination end of the PATH
9:
GET the preference for all PATHs that are connected to the current position
based on current knowledge
10:
UNTIL Agent reaches the destination or has selected a series of n PATHs
(where n = number of edges)
11: ELSE
12:
SELECT a PATH that is directly connected to the current position based on
the probability distributed according to preference
13:
SET the preference value of the selected PATH to ‘worst’
14: ENDIF
15: RETURN the list of PATH
Figure 5.1: Pseudo code of the proposed planning algorithm
5.2 Multi-Horizon Planning Algorithm for Commuter Agent
As mentioned earlier, commuter agents represent people with enough knowledge
about the area to plan beyond the current decision point (selecting a path from the current
96
intersection). To illustrate the algorithm for various situations, an exemplary evacuation
area (in Washington, D.C.) is used (see Figure 5.2 for satellite image and its
corresponding graph). The graph (G) used here is defined formally as G = (V, E), where
V and E represent a set of nodes and edges pertaining to the graph, respectively. The
graph (G) in Figure 5.2 has nodes V(G) = {a, b, c, d, e, f, g, h, I, j, k, l, m, n, o} and edges
E(G) = {ab, bc, be, de, ef, ej, fc, fg, gh, gk, hl, ij, jk, jm, kl, kn, lo, mn}.
Figure 5.2: Satellite image of an evacuation area and its graphical representation
In this example, it is supposed that an agent in node e is searching for a route (series of
paths, R(G)) to the destination node o. Figure 5.3 depicts a series of selection processes
(of a path), which can be described as following:
1. At node e (see Figure 5.3(a)), the agent evaluates each path (be, de, ef, ej) in terms
97
of smoke, fire, police, crowd, and the updated distance to the destination (distance
to node o from nodes b, d, f, j).
2. Based on his observations, the agent infers evaluation matrix M(t),
be
 m risk
(t )
 de
m (t )
M (t ) =  risk
ef
 m risk
(t )
 ej
 m risk (t )
be
m time
(t ) 

de
m time
(t ) 
ef
m time
(t ) 

ej
m time (t ) 
and weight vector W(t),
 w (t ) 
W (t ) =  risk 
 wtime (t ) 
be
via BBN (see Figure 4.15), where m ris
represents the evaluation of the risk
k (t )
attribute for edge be at time t, and wrisk (t ) is the weight on the risk attribute at
time t.
3. M(t) and W(t) obtained in Step 2 is provided to Equation (4.2.2) (DFT), whose
multiple replications generate the choice probabilities Pr(be), Pr(de), Pr(ef), Pr(ej)
for each path be, de, ef, ej. For each replication, DFT evolution runs for enough
time periods to allow convergence of the choice probability (see Section 4.4).
4. Now, the choice probabilities are fed into Soar, and Soar selects one edge
randomly based on the probabilities (see Figure 5.3(b)). Suppose edge ej is
selected; then, R(G) is updated to {ej}. Then, the same process (see Steps 1, 2, 3,
98
and 4) is used to pick a second path from intersection j. The final step of the
current iteration before starting the second iteration is to set the preference of ej
(i.e., pej) to worst so that path ej (coming back to the intersection e again) will be
elected only if there is no other choice in the second iteration. It is noted that
while the second iteration of the planning algorithm starts from intersection j, the
agent is still located at intersection e.
5. At node j (see Figure 5.3(c)), the agent repeats Step 1 to evaluate each path (ej, ij,
jk, jm). However, this evaluation addresses only the updated distance to the
destination (distance to node o from nodes e, i, k, m) as other environmental
variables at intersection j (smoke, fire, police, and crowd) are not visible from the
current location (intersection e) of the agent. Then, the agent repeats Step 2 to
infer evaluation matrix M(t) and weight vector W(t) via BBN (see Figure 4.15),
where BBN uses the updated distance to the destination and expected values for
smoke, fire, police, and crowd. Then, the agent repeats Step 3 to obtain the
choice probabilities Pr(ej), Pr(ij), Pr(jk), Pr(jm), where the value (worst) of pej (see
Step 4) is not updated. Then, the agent repeats Step 4, selecting edge ij and
updating R(G) = {ej, ij} and pij = worst (see Figure 5.3(d)). Note again that the
agent is still planning the route without actually moving.
6. At node i (see Figure 5.3(e)), the agent repeats Steps 1, 2, 3, 4. However, as
shown in Figure 5.3(e), the only available path is edge ij, whose pij was assigned
99
as worst in Step 5. In this case, although pij = worst (implying very small
probability instead of zero probability), edge ij is selected and R(G) is updated to
{ej, ij, ij} (see Figure 5.3(f)). Then, since edge ij has been taken twice, a cycle
has been formed and the edges in the cycle are deleted from R(G) (= {ej})
according to our planning algorithm. Then, Soar selects an edge from intersection
j again based on the choice probabilities Pr(ej), Pr(ij), Pr(jk), Pr(jm), where both
pej and pij are worst (see Figure 5.3(g)). This way, path ij is hardly selected again.
7. The agent repeats the above process until its plan reaches the destination node o
or the limit of planning horizon (n) is reached (see Figure 5.3(h)).
In this
multihorizon planning process, the limit of the planning horizon is the same as the
number of elements (edges) in R(G). At this point, the multihorizon planning
process is complete, and the agent executes a decision based on the plan (R(G)).
(a) Calculate the choice probability at e
(b) PDFS selects ej
100
(c) Calculate the choice probability at j using only distance (d) PDFS selects ij
(e) ij is the only option with Pij = worst (f) PDFS selects ij and deletes cycle by
removing ij
(g) Calculate the choice probability at j using only distance
(h) Repeat above procedure
101
Figure 5.3: Illustration of planning algorithm
5.3 Single-Horizon Planning Algorithm for Novice Agent
Novice agents (e.g., tourists) are those people who do not have knowledge about
the area. Novice agents cannot develop a plan involving multihorizon because they do
not have any information (e.g., updated distance to the destination) other than what they
see for the adjacent paths. Therefore, their planning horizon is one (see line 10 of Figure
5.1). It is noted that the planning procedure for the novice agent is exactly same as that
of the commuter agent (Steps 1 to 4 in Section 5.2), but with a planning horizon (n) of 1.
5.4 Meta-model of Choice Probability for Commuter Agent
During the multihorizon planning of the commuter agent (see Section 5.2), the
choice probabilities for the paths (even beyond the current intersection) are calculated
repeatedly via BBN and DFT (see line 6 of Figure 5.1), which requires intensive
computational power especially when the simulation involves numerous agents. Thus,
this section discusses an aggregated metamodel that allows us to obtain the choice
probabilities in a significantly shorter time. It is noted that both the original approach
(BBN-DFT method) and the metamodel can be used adaptively according to
computational availability.
For planning from the current intersection to an adjacent one (all considered
environmental variables are available to the agent), both the original approach (BBNDFT method) and the metamodel obtain choice probabilities using BBN and DFT in
102
exactly the same way (see Steps 1, 2, 3, 4 in Section 5.2). However, the original
approach and the metamodel work differently for planning beyond the current
intersection. The original (BBN-DFT) method uses only knowledge of the distance
(distance from the considered node to the destination) to infer M(t) (evacuation time and
risk) via BBN and obtain the choice probabilities. On the other hand, the proposed
metamodel utilizes knowledge about the number of connected paths from an intersection
(which is related to risk) in addition to knowledge of distance (which is related to
evacuation time). For example, considering four nodes b, d, f, j connected to node e in
Figure 5.2, nodes b and f have three paths away from each of them, node d has one path
away from it, and node j has four paths away from it. Here, we consider the number of
connected edges emanating from a node because going to an intersection connected to
more paths may be safer in an emergency evacuation situation. Based on knowledge
about the distance from a connected to the destination and the number of connected paths
emanating from that node, its preference value is calculated using Equation (5.4.1).
(
)
pi = ki + Ι[xcurr ,xdest ] ( xi ) + Ι[ ycurr ,ydest ] ( yi ) ⋅ wdist
(5.4.1)
where pi = preference of path i; ki = number of paths connected from the node connected
to path i; I is the indicator function; (xi, yi) = x, y coordinates of the node connected to
path i; (xcurr, ycurr) = x, y coordinates of the current node; (xdest, ydest) = x, y coordinates of
the destination node; wdist = weight associated with the distance factor. The proposed
metamodel accords a higher preference as the number of connected paths increases (less
risk) or the distance to the destination decreases (shorter evacuation time). The weight
associated with the distance factor (wdist) allows us to adjust the impact of the distance
103
and the number of connected paths. Once the calculated preference values are fed into
Soar, Soar selects one path using the choice probabilities calculated from the preference
values as described below.
To illustrate the proposed metamodel, we consider the same example used in
Section 5.2 (searching for a route from node e to node o). For the decision from the
current intersection (e) to an adjacent intersection, the agent uses the original (BBN-DFT)
method. It is assumed that path ej is selected (R(G) = {ej}). The numbers of connected
paths are ke = 4, ki = 1, kk = 4, and km = 2, respectively, and the coordinates of the nodes
are (xe, ye) = (5, 7), (xi, yi) = (3, 8), (xk, yk) = (3, 4), and (xm, ym) = (1.5, 7). It is assumed
that the current node ((xcurr, ycurr) = (xj, yj)) is located at (3, 7), the destination node ((xdest,
ydest) = (xo, yo)) is located at (0, 0), and wdist is set to 2. Preference values for each node
can be calculated using Equation (5.4.1): 1) pe = 4+3/5⋅2 = 5.2, 2) pi = 1+7/8⋅2 = 2.75, 3)
pk = 4+7/4⋅2 = 6.8, and 4) pm = 2+3/1.5⋅2 = 6. Here, as path ej already has been selected,
pe is set to worst. Next, Soar uses these preference values to calculate the choice
probabilities: 1) pi = 2.75 / (2.75 + 6.8 + 6) = 0.18, 2) pk = 6.8 / (2.75 + 6.8 + 6) = 0.44,
and 3) pk = 6 / (2.75 + 6.8 + 6) = 0.38, and selects a path randomly based on those choice
probabilities. This planning procedure is repeated until it reaches the destination node o
or the planning horizon limit is reached. Comparisons between the original method and
the metamodel (in terms of required computation and quality of generated plans) are left
as a future research task, and the experimental results, which will be discussed in the next
section, in the current work are based on the metamodel.
104
CHAPTER 6
PROPOSED DYNAMIC LEARNING ALGORITHM
6.1 Overview of Learning Algorithm
Learning is one of the most important aspects among intelligent creatures’
behaviors. In the Webster’s dictionary, learning is defined as “knowledge acquired by
systematic study; the modification of behavior through practice, training, or experience”.
In other words, learning is a unique and essential characteristic of a living intellectual
creature adopting itself into the environment. Thus, machine learning, which is defined
broadly as changing its structure, program, or data in a way that its expected future
performance improves (Nilsson 1990), is also a central function in the artificial
intelligence (AI) model which aims to create an intelligent machine.
Traditionally,
machine learning techniques have been used to solve problems that are hard or
impossible to be formally described, where 1) there is no human expert, 2) it is unable to
explain the human expertise, 3) phenomena change rapidly, and 4) it needs to be
customized for each computer user (Dietterich 2003). By attaining a learning ability, a
machine could deal with the above mentioned obstacles via 1) learning the rules from
data, 2) learning by examples, 3) constant modification based on situations, and 4)
learning from personalized data. As such, the main stream of the machine learning
research has focused on developing techniques that intend to yield the best solution under
a given environment (or data). As a result, the characteristics of the widely used learning
algorithms targeted for optimal behaviors have become quite distant from those of the
105
real human, which are not always optimal. To resolve this problem, the goal of this
research is to employ and further develop a mature machine learning model to mimic the
human’s learning process.
In this chapter, we propose an innovative learning model for human against a
dynamically changing complex environment (a terrorist bombing scenario in a public
area is considered in this research), combining a BBN and reinforcement learning (RL)
technique and compensating the deficiency of each method. In the proposed work, our
view of learning is a dynamic evolution process of underlying modules, which constitute
the human decision behavior process when a considered human evolves from a novice to
an expert in a certain aspect. It is believed that the proposed definition embraces the
definition of learning, which was discussed earlier in this research. To this end, the first
objective of the work is to survey and classify the machine learning algorithms based on
various criteria. Then, the second objective is to demonstrate the proposed learning
model (combining BBN and RL) in the context of the extended Belief-Desire-Intention
(BDI) human decision-making framework (see Section 3.2).
6.2 Taxonomy of Learning Algorithms and Frameworks
In this section, we discuss various machine learning algorithms and models
available in the literature and categorize them. Table 6.1 depicts taxonomy of learning
algorithms, models, and frameworks that we have developed based on four criteria
(which are not completely mutually-exclusive) such as 1) type of required training data, 2)
type of core algorithm, 3) characteristics of evolution in learning, and 4) type of acquired
106
knowledge. First, with respect to the “type of required training data”, we have classified
the algorithms and models into three categories: supervised learning, unsupervised
learning, and reinforcement learning. While supervised learning algorithms/models use
pairs of previous input and output data as training data, unsupervised learning
algorithms/models use only input training data (observations). Reinforcement learning
algorithms use pairs of an action and a reward (or punishment), but they differ from
supervised learning in that correct input/output pairs are never present, and differ from
unsupervised learning in that rewards (punishments) are considered for the selected
actions. Second, considering “Type of core algorithm” (the second criteria), we have
classified learning algorithms/models into four categories: statistical model, rule-based
model, exemplar-based model, and neural network. The third criteria “characteristics of
evolution in learning” concerns the aspects of underlying models that change during the
learning process.
Considering this criteria, we have classified the learning
algorithms/models into three categories: structural learning, parametric learning, and both.
For example, structural learning algorithms/models modify the major structure of the
model (knowledge) during the learning process whereas parametric learning
algorithms/models adjust numeric parameters only without changing the major structure
of the model (knowledge). Finally, the “type of acquired knowledge” criteria concerns
the natures (characteristics) of knowledge obtained during the learning process. Based on
the forth criteria, we have classified the learning algorithms/models into two categories:
declarative learning and procedural learning. While knowledge (facts) that one can speak
107
about is acquired in declarative learning, skills and procedures (“how to” knowledge) are
acquired in procedural learning.
Table 6.1: Taxonomy of learning algorithms, models, and frameworks
Criteria
Categories
Algorithms and Frameworks
Bayesian belief network (BBN) (Pearl 1985),
Neural network (Hopfield 1982), Support
Supervised learning
(1)
vector machine (SVM) (Cortes 1995),
Regression, ID3, C4.5, Classification and
Type of
Regression Tree (CART), Chunking (Soar,
required
ACT-R)
training data
Unsupervised learning
Clustering, Self organizing maps (SOM),
(A)
(2)
Reinforcement
learning
(3)
Statistical model
Type of core
(1)
algorithm
Rule-based model (2)
(B)
Exemplar-based model
Adaptive resonance theory (ART)
Temporal Difference (TD) learning, Qlearning, State-Action-Reward-State-Action
(SARSA)
BBN, SVM, Regression, TD learning, Qlearning, SARSA
Chunking
ID3, C4.5, CART, Clustering
(3)
108
Neural network (4)
Neural network, SOM, ART
Structural learning (1)
Clustering, SOM, Chunking
Characteristic
Neural network, SVM, Regression, ID3,
of evolution
Parametric learning
C4.5, CART, ART, TD learning, Q-learning,
in learning
(2)
SARSA
(C)
Both (3)
BBN
BBN, SVM, Regression, TD learning, Q-
Type of
Declarative learning
learning, SARSA, ID3, C4.5, CART,
acquired
(1)
Clustering, Neural network, SOM, ART
knowledge
(D)
Procedural learning (2)
Chunking
6.3 Learning in the Context of BDI Framework
As discussed in Section 6.1, learning in this work is defined as the evolution
process of underlying modules which constitute the human decision behavior process
when the considered human evolves from a novice to an expert in a certain aspect. As
such, discussions of learning depend on a particular human decision model and its
components. In this section, we discuss the modules of the extended BDI framework that
are deemed relevant with human learning (evolution of underlying modules). Table 6.2
depicts the identified modules (and corresponding elements) and the corresponding
learning category (see Table 6.1) for each of them.
For example, we can apply
supervised learning (A-(1)) algorithms to represent the learning in the Belief element of
the belief module. While several modules that are relevant with human learning have
109
been identified in this section, we focus on two of them (the belief module and a
confidence index (CI) in the emotion module) for a more detailed analysis (see Section
6.4).
Table 6.2: Application of learning algorithms under BDI framework
Module
Element
Applicable learning category
Belief module
Beliefs
A-(1), C-(3), D-(1)
Desire module
Desires
A-(2), C-(1), D-(2)
Intention
A-(3), C-(1), D-(2)
Plans
A-(2), C-(1), D-(2)
Confidence Index
A-(1), A-(3), C-(2), D-(1)
Instinct Index
A-(1), A-(3), C-(2), D-(1)
Decision making module
Emotional module
6.4 Proposed Hybrid Learning Model
In this section, we propose a novel hybrid learning algorithm involving a BBN
and a RL for the belief module and a confidence index (CI) in the emotion module of the
extended BDI framework.
6.4.1 Bayesian Belief Network for Belief Module
As mentioned in Section 4.3, we adopted BBN for the perceptual processor in the
belief module because of 1) its ability and flexibility to handle uncertain and dynamic
environments and 2) the way to understand the problem solving in human mind.
110
However, in terms of learning, most of the research works in this area have focused on
finding the best model fitting techniques that can accurately represent the given
input/output data as opposed to mimicking a dynamic learning process of human (goal of
this research). In this research, we propose an approach to update the BBN (human
perceptual processor in this framework) dynamically (see Section 6.4.3).
6.4.2 Reinforcement Learning (Q-Learning) for Emotional Module
The intent of reinforcement learning is to obtain (learn) an action-value function
that gives an expected utility of taking an action in a given state and following a fixed
policy (e.g., taking an action whose expected utility is maximized) thereafter. Q-learning
(Watkins 1989) is one of the most actively investigated reinforcement learning
techniques. A number of proofs are available in the literature to show that Q-learning is
guaranteed to converge with a probability of 1 in the cases where the state space is small
enough so that lookup table representations can be used (Watkins and Dayan 1992).
Furthermore, in the cases with large state spaces where lookup table representations are
infeasible, reinforcement learning methods can be combined with function approximators
(e.g., fuzzy logic) to ensure promising practical performance despite the lack of
theoretical guarantees of convergence to optimal policies.
In the extended BDI framework (see Section 3.2), a confidence index (CI) of the
emotional module affects as well as is affected by all the other modules (e.g., the belief
module, the desire module, and the decision-making module). For example, the higher
the CI, the longer the planning horizon of the human is in his decision-planning process.
111
Also, the better human’s performance in predicting the environment in his decisionplanning process, the higher the CI is. These inter-effects between the CI and other
modules evolve as part of the decision maker’s dynamic learning process.
In this
research, we investigate such a learning process using a Q-Learning technique (see
Equation (6.4.1)), in particular for the effect of the CI on the perceptual processor in the
belief module.
6.4.3 Proposed BBN-RL Hybrid Learning Model
As mentioned in Section 6.4.1, the perceptual processor in the considered BDI
framework is implemented by BBN. As a perceptual processor, BBN takes observed
information as inputs and delivers an inferred perception of a decision maker as outputs.
The inferred outputs are represented as a set of probability distribution functions f(x) for
each of the result factors (child nodes in the BBN). Here, as the CI is believed (see
Section 6.4.2) to affect the perceptual processor (inference in BBN), it must be an input
node in the BBN. However, its true value for human is extremely difficult to observe and
measure, and therefore cannot be included to train (construct) a BBN. Therefore, in the
proposed approach, in addition to the inference of BBN (without considering the CI), the
effect of the CI is considered as an additional step, where the probability distribution for
each of the output factors is modified in a way that the probability that positive
(optimistic) factors will infer higher values (states) is increased. The positive factor is a
relative concept depending on the situation faced by the decision maker. For example,
time can be a positive factor when people want to take a rest whereas it can be a negative
112
factor when people travel to a destination. Thus, positive factors need to be decided
depending on the context. In this work, we propose that the above mentioned effect of
the CI on the probability distribution obtained from the BBN is determined and improved
(resulting in a better decision-making performance) via Q-learning. For example, let us
suppose that a node X is a positive output factor (e.g., safety measure under an
evacuation situation) in the BBN with three discrete states (High, Medium, and Low).
Then, the BBN infers the probability distribution of X (i.e., p(High), p(Medium), p(Low))
based on an observation.
In the proposed model, the CI changes the probability
distribution by subtracting δ from p(Low) and adding it to p(High). The altered amount
(δ) is determined based on the current CI value. In our work, the relationship between δ
and the CI is trained via the Q-learning algorithm. Equation (6.4.1) depicts a general Qlearning algorithm, where Q(st, at) is a discounted reward, R(st, at) is an observed
immediate reward, st and at are state and action at time t, αt (0 ≤ αt <1) is a learning rate,
and γ (0 ≤ γ < 1) is a discount factor.
, 1 , , · , (6.4.1)
In the considered BDI framework, the CI and δ values correspond with the state and
action terms in Equation (6.4.1), respectively. Once the beliefs are updated via the BBN
along with the CI, the CI is updated using the true information that is observed after for a
while. The definition of CI (0 ≤ CI0 ≤ 1) is shown in Equation (6.4.2) (see Section 4.5),
where dt (>0) denotes the deviation between what is predicted about the environment
during the planning stage and the actual observed environment during the execution stage.
In this work, dt is defined as
113
| 1 |
where is the inferred prediction of child node i at time t using BBN.
C I t = α ⋅ e − d t + (1 − α ) C I t − 1
(6.4.2)
In other word, is a numeric interpretation of child node i at time t inferred based
on the environmental information. For example, we can use the expected value method
as following. If the child node i has the inferred distribution of each state as p(High) =
0.3, p(Medium) = 0.4, p(Low) = 0.3, then can be calculated as = 0.3 × 5 +
0.4 × 3 + 0.3 × 1 = 3. In Equation (6.4.2), α (0 ≤ α ≤ 1) adjusts the effect of previous
confidence to the current confidence, which varies depending on an individual. The
initial confidence value (CI0) has to be given, which will be different for individual
characteristics.
The change of the CI indicates how accurate the previous inference was. In other
word, the change of the CI can be an immediate reward in Q-learning that is represented
as R in Equation (6.4.1). In this way, we can find a best (involving the most increasing
CI value) δ value for a given CI value.
6.4.4 Illustration of the Proposed Q-Learning for Effect of CI
In this example, in order to deal with a finite number of states, the continuous
CI value is divided into four discrete intervals: 0 ~ 0.25, 0.25 ~ 0.5, 0.5 ~ 0.75, 0.75 ~ 1.
For example, if the current CI value lies between 0 and 0.25, the current state is 1;
similarly, if the current CI value lies between 0.25 and 0.5, the current state is 2. Here, 9
114
actions are defined, which will alter the probability distribution of positive factors.
Actions I, II, III, and IV subtract 90%, 70%, 50%, and 30% of the inferred probability
(δ) from the highest state and add it to the probability of the lowest state, respectively.
Action V denotes no alteration. Similarly, Actions VI, VII, VIII, and IX subtract30%,
50%, 70%, and 90% of the inferred probability (δ) from the lowest state and add it to the
probability of the highest state, respectively. We set R (immediate reward) as the amount
of the CI value change. In other words, the immediate reward R is defined as R(st (i.e.,
CIt), at (i.e., δt) ) = CIt - CIt-1. Let us suppose that α and CI0 are set to α = 0.5 and CI0 =
0.5, respectively for the CI (see Equation (6.4.2)). Moreover, γ is assumed to be 0.1,
which means we tend to neglect future rewards. Then, we can establish a state/action Q
.
1
matrix as #
4
0
%#
0
… "
' 0
. Figures 6.1 and 6.2 depict the training (learning) phase
( #)
' 0
and the operational phase of the Q-learning algorithm, respectively.
1: Set α, γ, and CI0
2: REPEAT
3:
4:
Select an action (δt) randomly
On next BBN inference with new observation, update CIt+1 and
calculate R(CIt, δt)
and Q(CIt, δt)
Figure 6.1: Q-Learning algorithm pseudo code (training/learning phase)
115
1: Set CI0
2: REPEAT
3:
From current state (CI0), find action that maximize Q value according
to the policy
4:
On next BBN inference with new observation, update CIt+1
Figure 6.2: Q-Learning algorithm pseudo code (operation phase)
Exemplary calculations in learning are described here. We set α = 0.5, αt = 0.7, γ
= 0.1, and CI0 = 0.5. Then, the current state is 2. Suppose that action III is randomly
selected at state 2 with a positive factor’s p(High) = 0.67, p(Medium) = 0.18, and p(Low)
= 0.15 in the BBN. Then, a modified BBN has p1(High) = 0.67 – 0.67 × 0.5 = 0.335 and
p1(Low) = 0.15 + 0.67 × 0.5 = 0.485 in the positive factor’s probability distribution.
Then, using the expected method discussed in Section 6.4.3, m(t) of this factor is m(1) =
0.335 × 5 + 0.18 × 3 + 0.485 × 1 = 2.7. Suppose further that we obtain m(2) = 2.74 from
the next BBN inference. Then d1 = | m(1) – m(2)| = |2.7 – 2.74| = 0.04. Using Equation
(6.4.2), we can calculate CI1 as *
· + ,- 1 * . = 0.5⋅e-0.04 + 0.5⋅0.5 = 0.7.
Thus R(CI1, δ1) = 0.7 – 0.5 = 0.2. Then Q(CI1, δ1) = (1-αt) Q(CI0, δ0) + αt[R(CI1, δ1) +
γ⋅Max(Q(CI2, all actions))] = (1-0.7)⋅0 + 0.7[0.2 + 0.1⋅ Max(Q(0.75, -0.2), Q(0.75, 0),
Q(0.75, +0.2))] = 0.14 + 0.1⋅0 = 0.14.
0 0 0 …
Then, the Q matrix is updated as following: / 0. . .. 0.14
. . .0. We repeat the same
# .
. (
process until the Q matrix is converged. Suppose that we have repeated the above
116
22 9
17
19
training and obtained a converged Q matrix as following: /
9 12
3 11
0.32
normalization, we can obtain a revised Q matrix, /0.24
0.09
0.03
0.2 0.12
0.29 0.14
0.12 0.28
0.11 0.23
8 …
12 . 0. Via
18 .
23 …
…
. 0, which is
.
…
then used for the operation. For example, the first row of the Q matrix defines the
probability distribution of actions in state 1. Thus, if we are in state 1, the probabilities of
selecting actions I, II, and III are 0.32, 0.2, and 0.12, respectively. It is noted that the
summation of the elements in each row of the normalized Q matrix is 1.
6.5 Experiments under Emergency Evacuation Scenario
In this section, the proposed hybrid learning model is illustrated in a simulated
environment for emergency evacuation (bombing attack) scenarios (see Section 3.1). In
particular, evacuation performances (e.g., average evacuation time, percentage of
casualties) between learned agents and novice agents are compared, where learned agents
update their Q matrix under various emergency scenarios. Also, the effect of various
parameters considered in the proposed hybrid model on learning are discussed.
6.5.1 Simulation Model of Emergency Evacuation
In this work, the proposed hybrid learning model is tested and illustrated using
agent-based simulation (see Section 7.1) developed for emergency evacuation scenarios
(see Section 3.1). In particular, the crowd behaviors under a terrorist bomb attack are
simulated in the Washington, D.C. National Mall area. In order to incorporate the Q-
117
Learning algorithm, the model described in Section 7.1 is further developed to have Q
matrix and reward function.
Figure 6.3 depicts a snapshot of our simulation in
AnyLogic®, where bomb explosion is shown in the middle of the map and agents are
evacuating from the area. When agents notice the explosion, they change their behaviors,
starting to move faster and heading to one of the four exits placed in the area. As shown
in Figure 4.15, agents consider several environmental factors, such as fire, smoke, police,
crowd, and distance to exit.
Then, via the BBN, the environmental information is
translated into the agent’s belief about the risk and evacuation time for each of the
alternative paths. This simulation has allowed us to observe agents’ behaviors and
evaluate various evacuation policies beforehand.
118
Figure 6.3: Emergency evacuation simulation in AnyLogic®
6.5.2 Experimental Results
In this section, we discuss various simulation results obtained from different sets
of parameters considered in the proposed hybrid learning model such as α (see Equation
(6.4.2)), αt, and γ (see Equation (6.4.1)). In particular, we compare the results obtained
from the BBN-RL hybrid method with those from the BBN method only. In order to
construct a Q matrix, we first create 500 instances of normal agent and one special type
of agent that updates the Q matrix. Each agent adjusts its CI according to Equation
119
(6.4.2), and only the special type of agent updates the Q matrix using Equation (6.4.1)
and the algorithm in Figure 6.1. The CI values have a high impact on the evacuation
performances (e.g., average evacuation time, the best evacuation path) as an agent’s
speed of movement and his planning horizon are dependent on them. Thus in this
research, we have selected CI as a performance index of an agent, which we intend to
maximize via Q learning (see Section 6.4.4). The agents stay at intermediate destinations
for a random duration (1 ~ 30 seconds) and move to another destination. Thus, the
number of agents on the street is stabilized from 500 (initial) to 200 at simulation time 25.
It is important to stabilize the number of agents on the street because the prior
distribution of the “Crowd” node in BBN (see Figure 4.15) is based on the normal
situation. For this reason, we create an agent with the special type at time 25, when the
number of agent on the street is stabilized. As mentioned in Section 6.5.1, the terrorist
bomb attack occurs also at time 25. Thus, the special type of agent trains the Q matrix
under an emergency situation. To obtain a converged Q matrix, we made the special type
of agent to update the Q matrix under various situations of emergency evacuation during
200 replications of simulation runs. In each replication, we ran the simulation for 150
seconds that is the time when the most of agents evacuated the area. It is noted that the
movement speed of an agent is set higher than the speed of real human to accelerate the
simulation speed. Then, we have normalized the matrix so that values in each low (state)
are summed to 1. Table 6.3 and Figure 6.4 depict normalized action selection probability
distributions (normalized Q matrix) under different states (1 ~ 4) using α = 0.5, αt = 0.7,
and γ = 0.5. In this case (α = 0.5, αt = 0.7, and γ = 0.5), action V (‘Do nothing’ action)
120
has lowest values for all four states, which means action V is expected to return the least
reward (increment of CI). We have repeated this process of Q matrix training with
varying values of parameters α, αt, and γ between 0.3 and 0.9 with an increment of 0.2.
The effects of each parameter are analyzed below by comparing simulation results (i.e.,
CI) involving different Q matrixes.
Table 6.3: Normalized action selection probability distributions (Q matrix) under
different states (1 ~ 4) using α = 0.5, αt = 0.7, and γ = 0.5 (see Section 6.4.4 for details of
considered states and actions)
I
II
III
IV
V
VI
VII
VIII
IX
1
0.16
0.11
0.06
0.04
0.00
0.05
0.09
0.18
0.31
2
0.16
0.13
0.05
0.02
0.01
0.05
0.11
0.14
0.32
3
0.16
0.11
0.06
0.04
0.02
0.05
0.09
0.20
0.27
4
0.17
0.12
0.05
0.04
0.03
0.06
0.11
0.17
0.26
121
Prob.
0.35
State 1
State 2
0.30
State 3
0.25
State 4
0.20
0.15
0.10
0.05
0.00
I
II
III
IV
V
VI
VII
VIII
IX
Action
Figure 6.4: Normalized action selection probability distributions (Q matrix) under
different states using α = 0.5, αt = 0.7, and γ = 0.5
Once we obtained each Q matrix, we ran the simulation model with letting the
agents to use the Q matrix for the BBN adjustment (see Section 6.4.3). Each simulation
with a different Q matrix is replicated 100 times, where the evolution of the CI value is
recorded at every second. To make the starting condition equivalent, the initial CI value
of the agent was set to 0.5. In other words, CI0 is set to 0.5, and CIt is updated according
to Equation (6.4.2) thereafter. In addition to the effects of each parameter, we also
compared the CI results obtained from two different action selection policies: 1) greedy
policy and 2) softmax policy. The greedy policy is a special case of ε-greedy policy,
where ε = 0. Thus, the agent selects the action that has a biggest utility value at each
state. On the other hand, the softmax policy selects an action ai on state sj randomly
122
based on the following probability, where the nominator is the element in the jth row and
ith column of the considered Q matrix:
67 , 7 , ∑9: 7 , Figures 6.6 to 6.13 depict the comparison of the effects on different parameters
and action selection policies. In particular, Figures 6.6, 6.7, and 6.8 depict the evolution
of the CI value over simulation time t for different γ and α values under the softmax and
greedy action selection policies, respectively. In these figures, we can notice that as α
increases the variation of CI also increases. Recall that α is a parameter used in Equation
(6.4.2), C I t = α ⋅ e − d + (1 − α ) C I t −1 where ∑| 1 | .
t
Since α
determines the reflection of prediction performance (dt) into CI, it represents the human
characteristic about how a person is affected by his near term performance. For example,
if one has a small α, he does not care much about his current prediction performance and
his confidence is not affected much by it. The variance of CI is much bigger in the cases
using the softmax action selection policy than those using the greedy selection policy
since the softmax action selection policy itself implies the CI variance by selecting
random actions.
123
(a) α = 0.1
(b) α = 0.3
124
(c) α = 0.5
(d) α = 0.7
125
(e) α = 0.9
Figure 6.5: Evolution of CI using softmax selection policy for different γ for each α
126
(a) α = 0.1
(b) α = 0.3
(c) α = 0.5
127
(d) α = 0.7
(e) α = 0.9
Figure 6.6: Evolution of CI using greedy selection policy for different γ for each α
128
(a) γ = 0.1
(b) γ = 0.3
129
(c) γ = 0.5
(d) γ = 0.7
130
(e) γ = 0.9
Figure 6.7: Evolution of CI using greedy selection policy for different α for each γ
Figures 6.9 and 6.10 depict the evolution of CI values over simulation time t for different
αt and γ values under the softmax action selection policy. Recall αt and γ are parameters
used in Equation (6.4.1), Q(st, at) = (1-αt) Q(st-1, at-1) + αt[R(st, at) + γ⋅Max(Q(st+1, a))],
where αt and γ denote the effect of immediate reward (R(st, at)) and contribution
(importance) of a future reward to the Q matrix, respectively. In the softmax action
selection policy, CI tends to increase as αt increases and γ decreases (see Figures 6.9 and
6.10). In other words, we can obtain the best CI value when we focus more on the instant
reward. This result reveals that it is best to focus more on the immediate reward when
131
the decision is made randomly (softmax action selection policy) under an inexperienced
situation.
(a) αt = 0.1
132
(b) αt = 0.3
(c) αt = 0.5
133
(d) αt = 0.7
(e) αt = 0.9
Figure 6.8: Evolution of CI using softmax selection policy for different γ for each αt
134
(a) γ = 0.1
(b) γ = 0.3
135
(c) γ = 0.5
(d) γ = 0.7
136
(d) γ = 0.9
Figure 6.9: Evolution of CI using softmax selection policy for different αt for each γ
Similarly, Figures 6.11 and 6.12 depict the evolution of CI over simulation time t for
different αt and γ values under the greedy action selection policy. In this case, no
significant trend of CI value was observed. For each αt value, the γ value that produces
the best CI value was different. For example, Figure 6.10 reveals that either γ=0.1 or 0.9
gives the best CI value when αt = 0.1, 0.5, or 0.7. However, when αt = 0.3 or 0.9, γ=0.3
produces the best CI value.
We can conclude that there is no dominating set of
parameters in the greedy selection policy.
137
(a) αt = 0.1
(b) αt = 0.3
138
(c) αt = 0.5
(d) αt = 0.7
139
(e) αt = 0.9
Figure 6.10: Evolution of CI using greedy selection policy for different γ foreach αt
Figure 6.11 plots evolution of CIs for different αt for each γ value and illustrates the
effect of γ value. It was observed that the difference between CIs of each αt value is
minimized when γ = 0.5 or 0.7. However, the CI was observed to be very sensitive to αt
and γ values under the greedy action selection policy.
140
(a) γ = 0.1
(b) γ = 0.3
141
(c) γ = 0.5
(d) γ = 0.7
142
(d) γ = 0.9
Figure 6.11: Evolution of CI using greedy selection policy for different αt for each γ
Figure 6.12 depicts the CI values for different α when we did not adjust the BBN using
the proposed Q-Learning algorithm. As mentioned earlier, α determines the reflection of
a current prediction performance into the CI. Thus, as α decreases the variance of CI
decreases and CI value itself decreases slowly. As shown in Figure 6.12, the CI value
becomes about 2.5 for all the considered α values at time 125, whereas the CI values are
greater than 2.5 in most cases when the Q-Learning algorithm is implemented (see
Figures 6.6 ~ 6.12). It demonstrates that in an inexperienced situation, the proposed
BBN-RL hybrid model outperforms the BBN only model.
143
Figure 6.12: Evolution of CI without applying Q learning for each α
144
CHAPTER 7
HUMAN-IN-THE-LOOP EXPERIMENT AND VALIDATION
This section describes a crowd simulation model that mimics the above-described
emergency evacuation scenario (see Section 3.1), utilizes human-in-the-loop experiments
for human behavioral data collection, and facilitates testing the impact of several factors
(e.g., demographics of agents, number of police officers, information sharing via speakers)
on evacuation performance (e.g., average evacuation time, percentage of casualties). The
crowd simulation model has been developed based on the two-layer modeling principles
proposed by Hamgami and Hirata (2003), 1) modeling the agent and 2) modeling the
environment that agents interact with such as paths and intersections. By employing
these two conceptual layers, we isolated models of the environment and agent, which
facilitated the modeling process. The interaction between the layers is analogous to the
interaction between human and their surroundings in the real world. The agent makes
decisions based on perceptions of the environment and executes decisions to achieve its
intention in the environment.
7.1 Simulation Model Development
The environment (GIS information such as paths and intersections representing
the National Mall in Washington, D.C.) has been implemented in AnyLogic® 6.0 agentbased simulation software. As discussed in Section 3.2, an agent plans and makes
decisions via BBN, DFT, and PDFS techniques. For the implementation purposes, we
145
employed various software packages: Netica for BBN, JAMA (A Java Matrix Package)
for DFT, and Soar for PDFS. The fact that all of these software packages are Java based
or has Java interface has facilitated the integration of them.
Figure 7.1 depicts an
exemplary rule written in Tcl (a scripting language) for Soar.
sp {next-intersection*propose*intersectionanywhere
(state <s> întersection âctive.name
<ain>)
( ^name <in>)
( ^near <ain>)
( ^pref <pf>)
( ^pref <ipf>)
(<s> ^prefPremium.dist <fc> ^max <max>)
-->
(<s> ôperator <o> = (+ <ipf>)+ )
(<o> ^name <in>)
}
Figure 7.1: An exemplary rule written in Tcl in Soar
In our simulation, three types of agents are considered: 1) commuter, 2) novice,
and 3) police agents. Each type of agent behaves differently. Furthermore, commuter
and novice agents can be characterized as leader or follower agents. Some commuter
agents are defined as leaders, who lead the follower agents to the exits. When agents
146
with a low confidence index (follower agents) meet a leader agent, they start to follow
him. Figure 7.2 depicts the behaviors of each agent using state charts. A state chart
diagram is a generic way of representing the behavior of an agent in response to the
external events (e.g., an explosion) or internal events (e.g., achieving an intention). The
intention of an agent is modeled in the state chart diagram through a sequence of
transitions from one state to another. Every commuter and novice agent follows the state
chart on the left, and every police agent follows the state chart on the right. Agents’
individual behaviors are differentiated by parameters such as the confidence index and the
planning horizon and by randomness within the BBN and DFT subroutines. The number
of agents of each type in the simulation can be adjusted. As shown in Figure 7.2, when
the explosion occurs the commuter and novice agent’s state transitions from ‘Normal’ to
‘Abnormal’ and the police agent’s state changes to ‘Evacuate’. When the commuter and
novice agents transition into ‘Abnormal’ state, their underlying state becomes ‘Evacuate’,
‘Wounded’, or ‘Dead’ depending on their distance from the explosion. If a follower
agent who is in the ‘Evacuate’ state meets a leader agent, its state becomes ‘Follow’.
147
Figure 7.2: State charts for representing agent behaviors
Figure 7.3 depicts a snapshot of the AnyLogic® simulation, in which bomb
explosion is shown in the middle of the map and agents are evacuating from the area.
When the simulation begins, it generates the requested number of agents for each type
and places them randomly within the simulating area heading towards their everyday
destinations (in the ‘Normal’ state). 15 seconds after the simulation starts, an explosion
occurs in the middle of the area. Based on distance from the explosion, agents within
‘fatal range’ (as marked with a circle (smaller circle at the center) in Figure 7.3) of the
explosion at that moment become dead. Similarly, the agents within ‘wound range’
become wounded, and agents within ‘notice range’ will notice the explosion and start to
evacuate. Smoke goes up and diffuses (as indicated by a gray circle in Figure 7.3) from
the explosion. In addition to the perception of sound and smoke, an agent can notice the
148
explosion via communication with other agents and police. When two agents reach a
certain proximity to each other, they can communicate and exchange information about
the explosion. When agents notice the explosion, they start to move faster and head
toward one of the four exits placed in the area. The constructed simulation allowed us to
observe agents’ behaviors that mimic human in the given scenario, using which we were
able to evaluate various evacuation policies.
Figure 7.3: Emergency evacuation simulation in AnyLogic® interacting with BBN, DFT,
and Soar
7.2 Human Experiments in Virtual Reality Environment
149
As discussed in Sections 4.2 and 4.3, BBN infers M(t) and W(t), and DFT
calculates preference values of the options based on those matrices of evaluations and
weights. Thus, constructing an accurate BBN for a human is a critical task in the
development of a simulation that accurately mimics human behavior. To this end, we
conducted human experiments to extract their behaviors and translate those behaviors
into a BBN. In this research, we used the Cave Automatic Virtual Environment (CAVE)
to conduct human experiments in a 3D environment.
7.2.1 VR Model Development
"Immersiveness” means that the user’s point of view or some part of the user’s
body is contained within the computer-generated space of the VR environment.
Immersiveness allows us to observe quasi-real human response data in a very practical
way for a potentially life-threatening situation without actually putting humans at risk,
whereas 2D based experiments do not completely immerse the human subjects and would
result in relatively unrepresentative participation from them (Shendarkar et al. 2006).
However, it is noted that other methods allowing subjects to image the scene such as text
description, picture, or movie clips may substitute the CAVE system. The hardware
system used is FakeSpace Inc. CAVE simulator. Figures 7.4 and 7.5 depict a human-inthe-loop experiment in CAVE and the CAVE system, respectively. The 3D model
projected within the CAVE system is developed using Google SketchUp 3D modeling
software. The individual 3D images were collected from Google SketchUp component
library and Google 3D Warehouse. Figure 7.6 depicts snapshot of a virtual cityscape of
150
an intersection developed by Google SketchUp 3D modeling software. The 3D model
from Google SketchUp is in .skp file format, but it needs to be converted into .vi format
that can be projected in CAVE. In order to add dynamism into the static 3D model from
Google SketchUp, we adopted OpenSceneGraph (OSG) C++ API. OSG is a middleware,
which is built on top of low-level APIs (OpenGL) to provide spatial organization
capabilities and other features typically required by high-performance 3D applications.
Using OSG, we could add smoke, fire effect, and moving objects such as crowd, vehicles
(see Appendix A). In addition to OSG, we also adopted DIVERSE C++ API to control
Input/Output devices such as keyboard and wand in CAVE. These devices are used for 1)
receiving the responses from subjects during the human-in-the-loop experiment and 2)
setting and initializing the experiment (see Appendix A).
Figure 7.4: Human-in-the-loop experiment in the CAVE system
151
Figure 7.5: CAVE system having four screens used in the human experiment
Figure 7.6: An exemplary VR model developed using Google SketchUp
152
7.2.2 Human-in-the-loop Experiment and Validation
In the human-in-the-loop experiment, each subject is asked to evaluate the risk
and the evacuation time of 3 available paths (i.e., right, forward, and left) depending on
the various environmental observations (i.e., fire, smoke, police, and crowd). Also they
are asked to select from 3 given paths. As shown in Figure 7.6, the intersection has four
paths that can be chosen by the subject, but we only show three paths in the experiment
since CAVE has only three wall display. In this study, 6 subjects participated in the
experiment voluntarily. For each subject, 3 sets of experiments were conducted, where
each set of experiment involved 18 different intersections (situations).
In each
experiment, the subjects are asked to answer four questions, including 1) on which
attribute between evacuation time and safety, they put more weight under the current
situation, 2) their evaluation of risk on each path (total of three paths), 3) their evaluation
of evacuation time on each path, and 4) the path that they choose to evacuate. Each set of
experiments took about 10 to 15 minutes depending on the subject. The data collected on
the relationship between situation and subject’s evaluation was used to construct a BBN
(see Figure 4.15) in the form of a conditional probability distribution. Tables 7.1, 7.2,
and 7.3 show the conditional distribution tables of each subject for each of three nodes in
the BBN (‘Risk’, ‘Time’, and ‘RiskWeight’), respectively. In other words, we built 6
BBN, each representing the perceptual (evaluation) behavior of each subject. It is noted
that the structure of a BBN is same for all 6 subjects.
153
The EDFT decision-making model is validated by comparing the decisions made
by the EDFT model against actual decisions of the human subjects. As described in
Section 4.2.5, EDFT provides the choice probability for each decision in the given
situation. To this end, we gave EDFT the same situation that was given to the subjects
and calculated the choice probability (see Table 7.4). In Table 7.4, actual decisions of
each subject are shown on the left column while decisions predicted by the EDFT (based
on the BBN of each subject) are shown on the right column. From these comparisons, we
counted the number of experiments, where a same path (decision) has the highest
probability in both columns.
For subject 1 as an example, path 2 has a selection
probability of 0.6 from actual human decisions, and it also has the highest probability 1.0
for the case of simulation as well. In Table 4, we have marked gray for those identical
cases.
Depending on the subject, the number of experiments which did not match
between actual and model decisions varied between 1 to 6 (out of a total of 18
experiments). In other words, the model has predicted the actual decisions correctly
about 67 ~ 95% of time.
In Tables 7.1 and 7.3, we could observe different behaviors on the risk for
different subjects. For example, we can infer from Table 7.3 that subject 1 is risk-averse
as the subject put relatively more weight on the risk under the same circumstance.
Similarly, subject 5 is risk-prone. Based on these observations, we have examined
whether the accuracy of the EDFT model can be increased by categorizing the subjects
based on their risk-taking behaviors. To this end, we measured the weight given to the
risk by each subject using the weighted mean value in Tables 7.1 and 7.3. The weighted
154
sum value in Table 7.1 can be calculated in the following manner. In each row, we
obtain a row summation (total value), where 5, 3, and 1 are multiplied to each of the high,
medium, and low columns, respectively. Then we sum all the row summation for each
subject (subject column). Similarly in Table 7.3, we calculated weighted sum values for
each row. But, in this case, we weighted once more when we summed the rows by
multiplying 7, 5, 3, and 1 to the first, second, third, and fourth row, respectively, as we
can see that the first row situate the subjects to the most risky environment. Table 7.5
shows the weighted sum values calculated from Tables 7.1 and 7.3 for each subject.
Based on the total value in Table 7.5, we separated the subjects into two groups – riskaverse group (subjects 1, 3, and 4) and risk-prone group (subjects 2, 5, and 6). Then, we
built two different EDFT models by constructing a BBN for each group and compared
the actual decisions from the subjects to the decisions from the categorized EDFT models
(see Tables 7.6 and 7.7). In both categories, 4 out of 18 cases simulations generated
different decisions from the actual decisions. This is not a better result compared with
the case in Table 7.18, where the simulation model (based on EDFT model considering
all 6 subjects) generated 3 different decisions out of 18. In order to compare them in
more detailed probability distribution level, we calculated
∑∑( f
i
Actual
ij
− f ijSimul ) 2 where
j
fijActual and f ijSimul are the choice probabilities of path j in experiment i in the actual
responses and simulated responses, respectively. This value captures the difference of
probability distributions between actual responses and simulated responses.
The
calculated values for the risk-averse group (subjects 1, 3, 4), risk-prone group (subjects 2,
155
4, 5), and the entire group (6 subjects) are 3.77, 3.19, and 3.68, respectively. Once again,
we observed no significant difference between the categorized model and the combined
model. Thus, according to the experiments conducted in this research for the considered
setting, it appears that the combined EDFT model of all 6 subjects well represents the
decisions of the subjects, and we have used it in our emergency evacuation simulation.
Smoke
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
No
No
No
No
No
No
Environment
Fire Police
Yes Yes
Yes Yes
Yes Yes
Yes No
Yes No
Yes No
No Yes
No Yes
No Yes
No No
No No
No No
Yes Yes
Yes Yes
Yes Yes
Yes No
Yes No
Yes No
No Yes
No Yes
No Yes
No No
No No
No No
Crowd
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
1
0.88
0.88
1
0.83
0.88
0.4
0.5
0.5
0.75
1
0.17
1
1
0.92
1
0.91
0.9
0.09
0.27
0.1
0.18
0.38
0.1
Subject 1
Medium
0
0.12
0.12
0
0.17
0
0.6
0.5
0.5
0.25
0
0.83
0
0
0.08
0
0.09
0.1
0.91
0.73
0.32
0.73
0.31
0
Low
0
0
0
0
0
0.12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0.58
0.09
0.31
0.9
High
0.67
0.33
0.5
0
0.5
0.5
0
0.17
0
0
0.17
0
0.78
1
0.67
1
0.33
0.67
0
0.11
0
0.11
0
0
Subject 2
Medium
0.33
0.5
0.5
1
0.33
0.5
1
0.83
1
0.83
0.83
0.67
0.22
0
0.33
0
0.5
0.33
0.5
0.33
0.44
0.11
0
0
Low
0
0.17
0
0
0.17
0
0
0
0
0.17
0
0.33
0
0
0
0
0.17
0
0.5
0.56
0.56
0.78
1
1
High
0.83
0.5
0.67
0.5
1
0.67
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
Subject 3
Medium
0.17
0.5
0.33
0.5
0
0.33
1
1
1
1
1
0.83
0
0
0
0
0
0
0
0
0
0
0
0
Table 7.1: The conditional distribution table for ‘Risk’ node in BBN collected from 6 subjects
Low
0
0
0
0
0
0
0
0
0
0
0
0.17
0
0
0
0
0
0
1
1
1
1
1
1
156
Smoke
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
No
No
No
No
No
No
Environment
Fire Police
Yes Yes
Yes Yes
Yes Yes
Yes No
Yes No
Yes No
No Yes
No Yes
No Yes
No No
No No
No No
Yes Yes
Yes Yes
Yes Yes
Yes No
Yes No
Yes No
No Yes
No Yes
No Yes
No No
No No
No No
Crowd
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
Medium
Low
High
0.83
0.67
0.83
0.5
1
0.67
0.5
0.83
0.17
0.17
0.17
0
0.89
0.83
0.67
0.33
1
0.5
0
0
0
0
0
0
Subject 4
Medium
0.17
0.33
0.17
0.5
0
0.33
0.5
0.17
0.83
0.83
0.83
0.67
0.11
0.17
0.33
0.67
0
0.5
0.50
0.67
0.22
0.44
0.22
0
Low
0
0
0
0
0
0
0
0
0
0
0
0.33
0
0
0
0
0
0
0.5
0.33
0.78
0.56
0.78
1
0.67
0.5
0.17
0.67
0.5
0
0.78
0.5
0
1
0.83
0
0.5
0
0
0.67
0.44
0
High
0.67
0.5
0
1
0.17
Subject 5
Medium
0.33
0.5
0.67
0
0.83
1
0.33
0.5
0.5
0.33
0.5
0.67
0.22
0.33
0.67
0
0.17
1
0.33
1
0.11
0.11
0.44
0.17
Low
0
0
0.33
0
0
0
0
0
0.33
0
0
0.33
0
0.17
0.33
0
0
0
0.17
0
0.89
0.22
0.12
0.83
High
0.83
0.67
0
0.83
0.5
0
0
0
0
0
0.33
0.17
0.89
0.83
0.22
0.83
0.67
0.17
0
0.11
0
0.11
0
0
Subject 6
Medium
0.17
0.33
1
0.17
0.5
1
0.83
0.67
0.67
0.67
0.17
0.17
0.11
0
0.56
0.17
0.33
0.66
1
0.33
0.22
0.11
0.22
0.17
Table 7.1: The conditional distribution table for ‘Risk’ node in BBN collected from 6 subjects (Cont’d)
Low
0
0
0
0
0
0
0.17
0.33
0.33
0.33
0.5
0.66
0
0.17
0.22
0
0
0.17
0
0.56
0.78
0.78
0.78
0.83
157
Environment
Police Crowd Distance
Yes
High
Increase
Yes
High
Decrease
Yes
Medium Increase
Yes
Medium Decrease
Yes
Low
Increase
Yes
Low
Decrease
No
High
Increase
No
High
Decrease
No
Medium Increase
No
Medium Decrease
No
Low
Increase
No
Low
Decrease
Short
0
0
0
0.09
0
0.94
0
0
0.78
0
0.2
0
Subject 1
Medium
0.05
0.89
0
0.91
0.88
0.06
0
0.75
0.22
1
0.8
0
Long
0.95
0.11
1
0
0.12
0
1
0.25
0
0
0
1
Short
0
0.13
0
0.07
0
0.61
0.07
0.08
0
0.25
0
0.83
Subject 2
Medium
0.17
0.6
0
0.86
0.25
0.39
0.27
0.59
0.2
0.75
0.67
0.17
Long
0.83
0.27
1
0.07
0.75
0
0.66
0.33
0.8
0
0.33
0
Short
0
0.33
0
0.6
0
0.94
0
0.58
0
0.75
0
0.83
Subject 3
Medium
0.42
0.67
0.25
0.4
0.5
0.06
0.53
0.42
0.47
0.25
0.5
0.17
Table 7.2: The conditional distribution table for ‘Time’ node in BBN collected from 6 subjects
Long
0.58
0
0.75
0
0.5
0
0.47
0
0.53
0
0.5
0
158
Environment
Police Crowd Distance
Yes
High
Increase
Yes
High
Decrease
Yes
Medium Increase
Yes
Medium Decrease
Yes
Low
Increase
Yes
Low
Decrease
No
High
Increase
No
High
Decrease
No
Medium Increase
No
Medium Decrease
No
Low
Increase
No
Low
Decrease
Short
0
0
0
0.4
0.08
1
0
0.25
0
0.5
0.17
0.92
Subject 4
Medium
0.42
0.87
0.08
0.6
0.25
0
0.33
0.75
0.27
0.5
0.5
0.08
Long
0.58
0.13
0.92
0
0.67
0
0.67
0
0.73
0
0.33
0
Short
0
0.07
0
0.07
0.17
0.67
0
0
0.13
0
0.33
0.67
Subject 5
Medium
0.17
0.2
0
0.6
0.66
0.33
0.2
0.5
0.13
0.67
0.67
0.33
Long
0.83
0.73
1
0.33
0.17
0
0.8
0.5
0.74
0.33
0
0
Short
0.08
0
0
0.27
0.08
0.33
0.07
0.08
0.13
0
0.42
0.59
Subject 6
Medium
0.42
0.47
0
0.73
0.67
0.61
0.6
0.42
0.47
0.92
0.58
0.33
Table 7.2: The conditional distribution table for ‘Time’ node in BBN collected from 6 subjects (Cont’d)
Long
0.5
0.53
1
0
0.25
0.06
0.33
0.5
0.4
0.08
0
0.08
159
Environment
Fire
Smoke
Yes
Yes
Yes
No
No
Yes
No
No
Environment
Fire
Smoke
Yes
Yes
Yes
No
No
Yes
No
No
High
0.48
0.83
0.22
0.17
High
0.91
1
0.18
0.22
Subject 1
Medium
0.09
0
0.82
0.11
Subject 4
Medium
0.3
0.17
0.67
0
Low
0.22
0
0.11
0.83
Low
0
0
0
0.67
High
0.04
0
0
0
High
0.33
0.75
0.22
0
Subject 2
Medium
0.48
0.17
0.67
0.17
Subject 5
Medium
0.81
0.92
0.67
0
Low
0.15
0.08
0.33
1
Low
0.19
0.08
0.11
0.83
High
0.56
0.59
0.22
0
High
0.66
1
0
0
Subject 3
Medium
0.3
0
1
0
Subject 6
Medium
0.4
0.33
0.33
0
Low
0.04
0.08
0.45
1
Low
0.04
0
0
1
Table 7.3: The conditional distribution table for ‘RiskWeight’ node in BBN collected from 6 subjects
160
0.6
0
0
0.8
0
0
0.6
0
1
0
0
0
1
1
1
1
0
1
0
0
1
0.8
0.4
0
0.8
0
1
1
1
0
0
0
0
0.67
0
Path2
Path1
0.4
1
0
0.2
0.2
0.6
0.4
0.2
0
0
0
0
0
0
0
0
0.33
0
Path3
0
0
1
0
1
0.06
0
1
0
1
1
0.96
0
0
0
0
1
0
Path1
1
0
0
0.35
0
0
1
0
1
0
0
0
0.73
1
1
1
0
0.74
Path2
0
1
0
0.65
0
0.94
0
0
0
0
0
0.04
0.27
0
0
0
0
0.26
Path3
Subject 1
Actual Decision
Simulation
0
0.67
0
0
0.33
0.67
0
0.33
0.67
0
0
0.33
0.33
0
0.67
0.33
0.67
0
Path1
1
0
0
0.67
0
0
0
0
0
0.67
0
0
0.33
0.33
0.33
0.67
0
0.67
Path2
0
0.33
1
0.33
0.67
0.33
1
0.67
0.33
0.33
1
0.67
0.34
0.67
0
0
0.33
0.33
Path3
0
0.69
0
0
0.3
0
0
0.29
1
0.29
0.24
1
0
0
0.15
0
0.38
0.26
Path1
0.81
0
0
0
0
0
0.12
0
0
0.71
0
0
0.44
1
0.85
1
0
0.74
Path2
0.19
0.31
1
1
0.7
1
0.88
0.71
0
0
0.76
0
0.56
0
0
0
0.62
0
Path3
Subject 2
Actual Decision
Simulation
0
0.67
0
0
0
0
0
0.33
1
0
0.33
0.33
0.33
0
0
0
1
0
Path1
1
0
0
0.67
0
0.67
0.67
0
0
0.67
0
0
0.67
1
1
1
0
1
Path2
0
0.33
1
0.33
1
0.33
0.33
0.67
0
0.33
0.67
0.67
0
0
0
0
0
0
Path3
0
0.65
0
0
0
0.02
0
0
1
0
0.22
1
0
0
0.07
0
0.92
0.02
Path1
1
0
0
0
0
0.02
0.67
0
0
0.43
0
0
1
1
0.93
1
0
0.98
Path2
0
0.35
1
1
1
0.96
0.33
1
0
0.57
0.78
0
0
0
0
0
0.08
0
Path3
Subject 3
Actual Decision
Simulation
Table 7.4: Comparison of decisions made by each subject and EDFT model
161
Path2
0.33
0
0
0.67
0
0
0.33
0
0
0.67
0
0
1
1
1
1
0
1
Path1
0.67
0.67
0
0.33
0
0
0.33
0.67
1
0.33
1
0.33
0
0
0
0
1
0
0
0.33
1
0
1
1
0.34
0.33
0
0
0
0.67
0
0
0
0
0
0
Path3
0
0.49
0
0.72
0.36
0
0
0.41
1
0.11
0.59
1
0
0
0.1
0
0.76
0
Path1
0.99
0
0
0
0
0.59
0.31
0
0
0.47
0.04
0
1
1
0.9
1
0
1
Path2
0.01
0.51
1
0.28
0.64
0.41
0.69
0.59
0
0.42
0.37
0
0
0
0
0
0.24
0
Path3
Subject 4
Actual Decision
Simulation
0
0
0
0
0
0
0.33
0.67
1
0.33
0.33
1
0
0
0
0
1
1
Path1
1
1
0
0
0
0
0
0
0
0
0.33
0
1
1
1
1
0
0
Path2
0
0
1
1
1
1
0.67
0.33
0
0.67
0.34
0
0
0
0
0
0
0
Path3
0
0
0
0
0
0
0.29
0
1
0
0.13
1
0
0
0
0
1
1
Path1
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
Path2
0
0
1
1
1
1
0.71
1
0
1
0.87
0
0
0
0
0
0
0
Path3
Subject 5
Actual Decision
Simulation
0.33
0.67
0.33
0
0.67
1
0.33
0.67
1
0.33
0.33
0.67
0
0
0
0.33
0.67
0.33
Path1
0.67
0.33
0
0.33
0
0
0
0
0
0
0.33
0
1
1
1
0.67
0.33
0.67
Path2
0
0
0.67
0.67
0.33
0
0.67
0.33
0
0.67
0.34
0.33
0
0
0
0
0
0
Path3
0
0.99
0
0
1
1
0
0.5
1
0.05
0
1
0
0
0
0
0
0
Path1
1
0
0
1
0
0
0.45
0
0
0
0
0
1
1
1
1
0
1
Path2
0
0.01
1
0
0
0
0.55
0.5
0
0.95
1
0
0
0
0
0
1
0
Path3
Subject 6
Actual Decision
Simulation
Table 7.4: Comparison of decisions made by each subject and EDFT model (Cont’d)
162
163
Table 7.5: Weighted mean value of Risk (Table 7.1) and RiskWeight (Table 7.3) for each
subject
Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6
Risk (Table 7.1)
99.25
76.22
80.00
84.56
83.67
73.11
RiskWeight (Table 7.3)
70.97
55.74
64.81
59.30
41.61
56.93
Total
170.22
131.96
144.81
143.85
125.28
130.04
Table 7.6: Comparison of decisions accumulating subjects 1, 3, and 4 (risk averse) and
EDFT model using accumulated BBN
Actual Decision
Path1
Path2
Path3
0.18
0.64
0.18
0.36
0
0.64
0.45
0
0.55
0.09
0.73
0.18
0.36
0
0.64
0.18
0.18
0.64
0.09
0.55
0.36
0.64
0
0.36
0.67
0.33
0
0.44
0.44
0.12
0.78
0
0.22
0.56
0
0.44
0.11
0.89
0
0
1
0
0
1
0
0
1
0
0.89
0
0.11
0
1
0
Path1
0
0.72
0
0
0.23
0.17
0
0.81
1
0.12
0.06
1
0
0
0
0
0.62
0.07
Simulation
Path2
0.99
0
0
0.24
0
0
0.78
0
0
0.4
0
0
1
1
1
1
0
0.93
Path3
0.01
0.28
1
0.76
0.77
0.83
0.22
0.19
0
0.48
0.94
0
0
0
0
0
0.38
0
164
Table 7.7: Comparison of decisions accumulating subjects 2, 5, and 6 (risk prone) and
EDFT model using accumulated BBN
Actual Decision
Path1
Path2
Path3
0.11
0.89
0
0.44
0.44
0.12
0.11
0
0.89
0
0.33
0.67
0.33
0
0.67
0.56
0
0.44
0.22
0
0.78
0.56
0
0.44
0.89
0
0.11
0.22
0.22
0.56
0.22
0.22
0.56
0.67
0
0.33
0.11
0.78
0.11
0
0.78
0.22
0.22
0.78
0
0.22
0.78
0
0.78
0.11
0.11
0.44
0.44
0.12
Path1
0
0.26
0
0
0.28
0
0.2
0.29
1
0.13
0.11
0.83
0
0
0
0
0.08
0.64
Simulation
Path2
1
0.31
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0.36
Path3
0
0.43
1
1
0.72
1
0.8
0.71
0
0.87
0.89
0.17
0
0
0
0
0.92
0
165
Table 7.8: Comparison of decisions accumulating all 6 subjects and EDFT model using
accumulated BBN
Actual Decision
Path1
Path2
Path3
0.15
0.75
0.1
0.4
0.2
0.4
0.3
0
0.7
0.05
0.55
0.4
0.35
0
0.65
0.35
0.1
0.55
0.15
0.3
0.55
0.6
0
0.4
0.78
0.17
0.05
0.33
0.33
0.34
0.5
0.11
0.39
0.61
0
0.39
0.11
0.83
0.06
0
0.89
0.11
0.11
0.89
0
0.11
0.89
0
0.83
0.06
0.11
0.22
0.72
0.06
Path1
0
0.75
0
0
0.41
0
0
0.21
1
0.18
0.1
1
0
0
0
0
0.51
0.07
Simulation
Path2
0.97
0
0
0
0
0
0.14
0
0
0
0
0
1
1
1
1
0
0.93
Path3
0.03
0.25
1
1
0.59
1
0.86
0.79
0
0.82
0.9
0
0
0
0
0
0.49
0
7.3 Emergency Evacuation Simulation Results
Using the crowd simulation model that we constructed in AnyLogic®, we
conducted various experiments to test the impacts of several factors (e.g., demographics,
number of police officers, number of leader agent) on evacuation performance (e.g.,
average evacuation time). Figures 7.7 and 7.8 depict impacts of the number of police
officers and the number of leaders, respectively. In Figure 7.7, the number of police
officers increases from 10 to 100 with a step size of 10.
In each simulation, 20
166
replications were conducted with 400 commuter agents and 100 novice agents. The
average evacuation time of novice agents (travelers) was observed to decrease as the
number of police officers increases. Reducing average evacuation time by 1 minute in an
emergency situation can save many lives. The average evacuation time of commuter
agents, however, does not decrease much compared with that of the novice agents. This
is because information from the police officers provides the travelers with new
knowledge that they did not have before, but is used by the commuters to
complement/correct their knowledge/judgment.
Figure 7.7: Impact of number of police officer on the average evacuation time and 95%
confidence interval
Similarly, Figure 7.8 depicts the impact of the number of leaders (increase from
10 to 200 with a step size of 10) on average evacuation time. Each simulation, involving
167
400 commuter agents and 100 novice agents, was replicated 20 times. In order to
eliminate the police effect in this experiment, agents are configured to notice the
explosion right after it occurs and start to evacuate. It results in smaller evacuation time
compared with the results in Figure 7.7. The results in Figure 7.8 reveal that an increase
in the number of leaders from 10 to 200 reduces the agents’ evacuation time an average
of about 30 seconds, which is critical in an emergency situation. In addition, the results
are consistent with our intuition that the commuters have less dependency on the leaders
than the travelers as they usually have higher confidence index than the travelers and
agents start to act as a follower if their confidence index is low. It is noted that the
simulation is flexible, so it can be used to test impacts of other factors (e.g., impact of
information-sharing via speakers or text-messaging) on various other security metrics
(e.g., percentage of casualties).
Figure 7.8: Impact of number of leader on the average evacuation time and 95%
168
confidence interval
Figure 7.9 shows the impact of Q-learning adjustment on the evacuation time of
the agents. In the figure, we compare average evacuation times for the agents with Qlearning adjustment and the agents without it. As shown in the graph, the average
evacuation time of agents with Q-learning adjustment is always shorter than the average
evacuation time of agents without the adjustment regardless the agent’s population.
Figure 7.9: Impact of Q-Learning on the average evacuation time and 95% confidence
interval
169
CHAPTER 8
DISTRIBUTED COMPUTING TECHNIQUES FOR LARGE SCALE SIMULATION
8.1 Distributed Simulation Infrastructure
In this work, web services technology is used for the integration of BDI modules
operating in the distributed computing environment (see Figure 8.1). The components of
the system are: (1) each of the BDI module, (2) corresponding client proxies and (3) the
transaction coordinator. The BDI modules communicate with each other using XMLbased messages via services provided by the transaction coordinator (web service). A
client proxy provides the interface that can be used by a BDI module to communicate
with the transaction coordinator.
The transaction coordinator is responsible for
performing time and data management required for distributed simulation. In this work,
we use web services technology (see Figure 8.1) instead of using the standard HLA/RTI
(High Level Architecture/Run Time Infrastructure, IEEE Standard 1516-2000). While
commercial HLA/RTI software systems are reliable, the proposed transaction coordinator
offers a simplified set of services that makes it much easier to use as compared with the
HLA/RTI. Also the transaction coordinator requires less communication overhead.
170
Figure 8.1: Architecture for distributed simulation
8.2 Web Services Technology for Distributed Simulations
The Transaction Coordinator is the main component of the software infrastructure
(see Figure 8.1). The essential function of Transaction Coordinator is to facilitate time
and data management. The Transaction Coordinator has been implemented using web
services technology (state-of-the-art distributed computing technology) to overcome the
barrier of standard way of communication between heterogeneous distributed systems via
W3C (http://www.w3c.org) standard protocols including XML, WSDL, and SOAP.
Federates use the web services developed in this work (initialize, advanceTime,
cons_advanceTime, sendMessage, getMessage, terminate, and cleanup) in order to
achieve the desired simulation outcome. These services along with their parameters are
described below.
171
•
initialise(fedName)
A federation execution consists of multiple federates communicating with each
other. Any federate that wants to become a part of a federation execution, can do
so by calling this method. Here, fedName is the identifier of the federate joining
the federation execution. The WSDL file for this service is shown in Figure 8.2.
•
advanceTime(reqFedName , timeVal)
Since the Transaction coordinator is the central authority for managing the
synchronization between various federates, if a federate wants to move to a
specific time, it must request the transaction coordinator for the required time
advance. In this method, the Transaction coordinator follows the FDE algorithm
and grants the time advance accordingly. Here reqFedName is the identifier of
the federate (using which it joined the federation execution). The timeVal is the
time to which the requesting federate wants to jump. The WSDL file for this
service is shown in Figure 8.3.
•
cons_advanceTime(reqFedName , timeVal)
This method is used when a federate wants to follow the conservative algorithm
for time advancement. This method also enabled us to compare the readings from
the conservative method against the FDE approach. The WSDL description for
this method is same as that for the advanceTime method.
•
sendMessage(fedName , msg)
This method is used to send a message when a federate wants to send a message
to all federates participating in the federation execution. The contents of the
172
message sent by a federate are not interpreted by the Transaction Coordinator.
Doing so would imply that the transaction coordinator gets limited to a particular
type of simulation. Hence it’s up to the participating federates to interpret a
message and take appropriate action. Here, fedName is the identifier of the
federate sending the message. Msg is the actual message body to be delivered to
other federates. The WSDL file for this service is shown in Figure 8.4.
•
getMessage( requestingFedName ) returns message
Each federate uses this method to retrieve messages queued for it. If there are no
messages in the queue maintained for this federate, the requesting federate is
informed accordingly. If there is more than one message for the requesting
federate, all the messages in the queue are delivered. Here, requestingFedName is
the identifier of the federate interested in receiving the messages queued for it.
The WSDL file for this service is shown in Figure 8.5.
•
terminate(requestingFedName)
When a federate no longer wants to be part of the federation execution, it can
request so by calling this method. All the resources allocated for the requesting
federate are de-allocated.
Here, requestingFedName is the identifier of the
federate that want to withdraw from the federation execution.
•
cleanup()
This method is used to destroy the federation execution and to perform cleanup
activity. This method also sets up and initializes data structures required for the
next execution of Transaction coordinator.
173
Figure 8.2: WSDL snippet for initialise(fedName)
Figure 8.3: WSDL snippet for advanceTime(reqFedName , timeVal)
174
Figure 8.4: WSDL snippet for sendMessage(fedName , msg)
<s:element name="getMessage">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1"
name="requestingFedName" type="s:string" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="getMessageResponse">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1"
name="getMessageResult" type="s:string" />
</s:sequence>
</s:complexType>
</s:element>
Figure 8.5: WSDL snippet for getMessage( requestingFedName ) returns message
175
CHAPTER 9
EXTENSION OF PROPOSED APPROACHES TO OTHER APPLICATIONS
The goal of this chapter is to extend the proposed human decision-behavior
modeling approaches to other applications, in particular community-based software
development process (Lee et al. 2009). A community-based development of software
systems requires the collaboration of many stakeholders that form a complex social
network. In this application, we propose a novel simulation modeling framework, which
will allow the stakeholders to perform what-if analyses before making their decisions in
the community-based software development process. The proposed framework involves
four different modeling paradigms, including 1) Bayesian belief network, 2) BeliefDesire-Intention (BDI) model to mimic a human decision behavior, where submodules
are realized using Bayesian belief network and decision-field theory, 3) game-theory to
mimic interactions of decision-makers with conflicting goals, and 4) system dynamics to
simulate an overall software enhancement request process. The proposed simulation
framework is illustrated with a software enhancement request process in Kuali, which is
an open source project currently under development by a consortium of nine major
universities.
In particular, four different simulation models (targeted for different
stakeholders) are developed based on the proposed modeling framework, where some
data have been collected via surveys from the Kuali participants and historical
enhancement requests, and other data are based on our assumptions.
Each of the
176
constructed simulations is executed with varying parameters to demonstrate its use and
benefit in the considered software enhancement process.
9.1 Background
A community-based development of software systems requires the collaboration
of many stakeholders (e.g., upper level managers, functional team, development team,
and testing team) that form a complex social network. Here, decisions made by a
stakeholder upstream directly affect other stakeholders downstream or vice versa. For
example, when the functional team decides to add a new module into an existing system,
the development team will need to implement it after evaluating its impact on the system
reliability, required efforts, and cost. If the change is deemed inappropriate due to lack of
resources or highly negative impact on reliability of the existing system, the development
team will go back to the functional team for discussions or negotiations. Usually, this
backward interaction is very costly and is caused by ineffective and insufficient
communications among the stakeholders. This communication problem becomes even
more apparent for the case of open source software development as the stakeholders are
geographically dispersed across the globe.
Another common problem in complex
software development is lack of understanding of the impacts of a prior decision by one
stakeholder on the other stakeholders. To overcome these problems, the goal of this
research is to propose a novel integrated simulation modeling framework, which will
allow stakeholders to perform what-if analyses before making their decisions in the
community-based software development process.
By evaluating the impact of a
177
stakeholder’s decision on the other stakeholders downstream before making his/her
decision, the chances of backward interactions can be significantly reduced, therefore
reducing the development cost of an software system.
The proposed framework involves four different modeling paradigms, including 1)
Bayesian belief network, 2) Belief-Desire-Intention (BDI) model to mimic a human
decision behavior (Lee et al. 2008), where submodules are realized using Bayesian belief
network and decision-field theory (Busemeyer and Townsend 1993), 3) game-theory to
mimic interactions of decision-makers with conflicting goals, and 4) system dynamics to
simulate an overall software enhancement request process. Multiple modeling paradigms
are employed in this research as we selected the most appropriate modeling paradigm
depending on the considered problem (e.g., type of data, availability of relevant
knowledge in the literature).
The first three paradigms of the proposed framework
represent cognitive functions of human such as situation assessment, decision-making,
and negotiation, respectively. The last paradigm (system dynamics) mimics an entire
software enhancement request process. To demonstrate the validity and enhance the
credibility of our work, the proposed simulation framework is illustrated with a software
enhancement request process in Kuali, which is an open source project currently under
development by a consortium of nine major universities (http://www.kuali.org).
In
particular, four different simulation models (targeted for different stakeholders and
purposes) are developed based on the proposed modeling framework, including 1)
evaluation aid for Development Manager (see Section 9.3.1), 2) Functional Council
decision simulator (see Section 9.3.2), 3) scheduling aid for Functional Council (see
178
Section 9.3.3), and 4) simulation of entire enhancement process (see Section 9.3.4).
Some of the data used to construct these simulators have been collected via surveys from
the Kuali participants and historical enhancement requests, and other data are based on
our assumptions.
While in this work the proposed framework is illustrated and
demonstrated for the enhancement request process of Kuali, it is believed that it can be
directly applicable to other processes in Kuali as well as other community-based software
development organizations.
9.2 Enterprise Software Development Process: Case of Kuali
9.2.1 Kuali Foundation and its Organization Structure
Kuali Foundation (see Figure 9.1 for the Kuali’s organizational structure) is a
non-profit organization involving nine major universities including Indiana University
and The University of Arizona, and is responsible for developing and maintaining a
collection of open-source software systems that meet the needs of all Carnegie Class
institutions (www.kuali.org).
It began with a financial systems module based on a
conversion of the Financial Information System, an application used by Indiana
University last decade. Since then, Kuali diversified its software to other areas such as
research administration, endowment management, student systems, and Rice (a suite of
integrated products that allows for the rapid development of Kuali and non-Kuali
applications). In this research, we focus on the financial module of Kuali (Kuali’s
Financial System (KFS)).
179
Figure 9.1: Kuali organization chart (Source: http://www.kuali.org/)
KFS is a modular and flexible system that was developed using the community
source development model to provide and maintain a richly featured financial system for
use by its member institutions.
The financial system modules meet Governmental
Accounting Standards Board (GASB) and Financial Accounting Standards Board (FASB)
standards and may be adopted by institutions without any licensing fees
(http://www.kuali.org/resources/kfs10.shtml).
180
9.2.2 Enhancement Request Process
Within KFS, the proposed simulation framework in this research focuses
specifically on its enhancement request process.
While the proposed framework is
illustrated and demonstrated for the enhancement request process in this research, it can
be directly applicable to other processes in the community-based software development
process.
Figure 9.2 depicts a detailed enhancement request process, where partner
institutions request to have changes implemented to their modules. The enhancement
request process starts when a participating institution’s Subject Matter Expert (SME) fills
out required information on the enhancement request form such as 1) the primary
component involved (KFS module in this case), 2) the priority (high, medium, and low),
3) business need, 4) impact if not implemented, and 5) questions to be answered by
Development Manager and submits it to the Development Manager. However, this is
often done with the input of the other SME at other institutions and a Business Analyst of
a Functional Subcommittee (which is typically made up of a Business Analyst, a Lead
Subject Matter Expert, and a group of other advising subject matter experts).
The
Development Manager then fills out the rest of information on the enhancement request
form, such as 1) an estimated effort required, 2) technical impact (required technical
challenge), and 3) system impact (expected functional improvement, impact on other
functional modules in the system), and submits it to the Functional Council (FC), who
then makes the final decision regarding the request. If the FC approves the request, the
FC meets with the Project Manager in order to schedule it, and then, the Project Manager
takes this decision to the Development Manager, who will add the request to their
181
processing queue. However, if the FC rejects it, the SME from the partner institution
may choose to raise it to the board representative. In this case, after reviewing the appeal,
the board may affirm or override the FC’s decision.
sd Sequence
SME & Functional
Subcommittee
Development
Manager
Functional
Council
Project Manager
Board
submitEnhancementRequest()
add estimates and impact()
hand over()
analyze()
opt Accept request
loop Scheduling
[acce pted]
*schedule()
schedule enhancement()
*schedule()
schedule()
create queue()
opt Rej ect request
[reje cted]
rejection()
raise decision to board()
review()
decission()
Figure 9.2: Sequence diagram for the enhancement request process
9.3 Integrated Simulation Framework involving Multi-Paradigm Simulations
182
This section discusses our integrated simulation framework, which will allow the
stakeholders to perform what-if analyses before making their decisions in the software
enhancement request process. As mentioned earlier, the proposed framework involves
four different modeling paradigms, including 1) Bayesian belief network (BBN), 2)
Belief-Desire-Intention (BDI) model to mimic a human decision behavior, 3) gametheory to mimic interactions of decision-makers with conflicting goals, and 4) system
dynamics to simulate an overall software enhancement request process. Table 9.1 depicts
four different types of simulations considered in this study for the enhancement request
process in Kuali in terms of 1) user, 2) role, 3) results, and 4) simulation modeling
methodology. BBN, the first modeling paradigm, mimics perception of stakeholders.
The second paradigm (BDI human decision-making model) mimics perception, decision
deliberation, and decision-making functions of stakeholders. Third, the game theory
allows us to mimic the interactions between two conflicting stakeholders. Finally, the
system dynamics model allows us to model an overall software enhancement request
process. Each of these simulations is discussed in detail in the following subsections.
Table 9.1: Four different types of simulations considered in this work for the case of
Kuali
Name of
Simulation
Evaluation
aid for
DM
User of
Simulatio
n
DM /
SME
Purpose of Simulation
Help DM evaluate the
effort and impact of a
request in a consistent
manner
Result of
Simulation
Estimated effort
and impact of a
request
Simulation
Modeling
Methodology
Bayesian
Belief
Network
183
FC
decision
simulator
SME
Schedulin
g aid for
FC
FC
Simulation
of entire
Board
enhancem
ent process
Help SME evaluate the
chance of acceptance of
his enhancement
request
Help FC devise
enhancement schedule
considering resource
availability
Help board analyze
system level
performance (e.g.,
evaluating alternative
communication
networks, mid-term or
long-term policies)
Acceptance or
rejection of the
request
BDI human
decisionmaking model
Enhancement
schedule
Game theory
Various system
performance in
System
enhancement
dynamics
request (quality of
model
software, period,
efforts, and cost)
9.3.1 Evaluation Aid for Development Manager
When Development Manager (DM) evaluates different enhancement requests (in
terms of their required effort, technical impact, and system impact) submitted from the
SME and Functional Subcommittee (FS), he can be easily inconsistent even for the
similar requests. The purpose of “Evaluation Aid for DM” is to help DM evaluate the
required effort and impacts of the enhancement requests in a consistent manner. Figure
9.3 depicts the sequence diagram of the Evaluation Aid process.
In this work, we
propose to use BBN to represent historical evaluation behaviors of DM. Figure 9.4
depicts a BBN which was constructed for “Evaluation Aid for DM” in this work. In
order to construct a BBN, we first need to identify the causal factors (parent nodes) and
effects (children nodes) involved in the evaluation process. Then, we need to collect data
that can define the relationships (conditional dependencies) between the identified causes
184
and effects. In the area of human perception modeling, interviews and surveys are
frequently used for data collection. Given environmental information such as 1) priority,
2) code lines, 3) similar process, and 4) related module, the BBN infers the DM’s
evaluation on the required effort, technical impact, and system impact of the considered
enhancement request. It is noted that the BBN in Figure 9.4 has been constructed based
on our assumptions, but will be updated when a real data is available in the future. As
depicted in Table 9.1, a similar BBN can be constructed for each of DMs for their
consistent evaluations across different requests.
sd Sequence
SME & Functional
Subcommittee
Development
Manager
Evaluation Aid
(Estimates Effort &
Impact)
submitEnhancementRequest()
run Evaluation Aid()
analyze()
estimated DM Evaluation()
add Effort and Impact()
Figure 9.3: Sequence diagram of the Evaluation Aid process
185
Figure 9.4: BBN inferring DM’s evaluation on the required effort and impact of an
enhancement request
9.3.2 Functional Council (FC) Decision Simulator
As mentioned in Section 9.2, if an enhancement request is rejected by FC, SME
and FS may raise it to the board or modify and re-submit it. These processes, however,
usually cause additional workloads to many stakeholders such as DM, FC, and board.
Thus, we propose to develop “FC Decision Simulator” that will predict a decision of FC
for a given enhancement request. Its purpose is to allow SME and FS to modify their
enhancement request in a way that the chance of its acceptance can be maximized.
Figure 9.5 depicts a sequence diagram of the use of FC Decision Simulator. In this work,
we have employed BDI based human decision-making model (Lee and Son 2008, Lee et
186
al. 2008) to construct “FC Decision Simulator”. Its technical details are discussed in the
following subsections.
sd Sequence
SM E & Functional
Subcom m ittee
Request Acceptance
Sim ulator Console
Evaluation Aid
(Estim ate Effort &
Im pact)
FC Decision
Sim ulator
subm it Request()
request DM Evaluation()
analyze()
estim ated DM Evaluation()
request FC Decision()
analyze()
estim ated FC Decision()
return Decision()
Figure 9.5: Sequence diagram of the use of FC Decision Simulator
BBN is used to represent the perceptual processor of the employed human
behavior model (see Figure 3.2). Figure 9.6 depicts a BBN used to mimic the perceptual
processor of a FC member. Given environmental information such as 1) required effort
estimated by DM, 2) primary component, 3) system impact estimated by DM, 4)
statement, and 5) business needs, the BBN infers perception of the FC member in terms
187
of the effort, impact of acceptance, and impact of rejection of the enhancement request.
It is noted that in Figure 9.6 the nodes ‘Effort_in_Request’, ‘PrimaryComponent’,
‘SystemImpact’, ‘Statement’, ‘BusinessNeeds’ and ‘ImpactOfReject’ have been selected
from the items in the enhancement request form based on the surveys from the FC
members of Kuali (which asked them to select important factors considered when they
accept/reject the requests). And ‘Subjective_Effort’ and ‘Subj_ImpactOfAccept’ have
been constructed based on our assumptions.
Figure 9.6: BBN inferring FC’s evaluation of the enhancement request
Once BBN infer 1) the effort, 2) impact of acceptance, and 3) impact of rejection
of the enhancement request, Decision Field Theory (DFT) predicts the FC’s acceptance /
rejection decision. The value matrix M(t) (m×n matrix, where m is the number of options,
and n is the number of attributes) of DFT represents the subjective evaluations
(perceptions) of a decision-maker for each option on each attribute at time t. For example,
188
given an objective information (e.g., effort estimated by DM, primary component, impact
of acceptance estimated by DM, statement, business needs), evaluators (functional
council members in our case) obtain their own subjective evaluations (e.g., subjective
effort, subjective impact of acceptance, impact of reject) for each option (i.e., accept or
decline of an enhancement request), which constitute the M(t) matrix.
9.3.3 Scheduling Aid for Functional Council
As shown in Figure 9.2, FC and PM collaborate to devise consentaneous
schedules for the accepted enhancement requests, where conflicting positions may arise
due to FC’s urge to maximize the number of accepted enhancements (and enhancement
throughput) and PM’s more conservative approach in order to stay within their resources
and budget.
In other words, FC intends to finish an enhancement request quickly
(minimize completion time of a request) while PM tries to have enough time until the
completion of the request. The existing conflict between FC and PM are considered in
“Scheduling Aid Simulation for FC” and modeled in a game theoretic setting. Figure 9.7
depicts a sequence diagram of the Schedule Aid process, where the simulation provides a
FC member with a 95% confidence interval of the completion dates of a request on the
PM’s schedule. The details of the game theoretic model are explained below.
189
sd Sequence
Development
Manager
Functional
Council
Schedule Aid
(Estimate PM's
schedule)
Project Manager
hand over()
analyze()
opt
[Accept request]
loop simulation
[Satisfied]
run Schedule Aid()
PM schedule()
loop Scheduling
*send schedule()
schedule enhancement()
*Schedule()
Figure 9.7: Sequence diagram of the Schedule Aid process
The proposed game theoretic setting consists of two players: FC and PM. Each
player has his own payoff that depends on the other player’s suggestion in terms of the
duration of implementing the enhancement request. The payoff functions of FC and PM,
which we have developed in our research based on our assumptions about their general
decision behaviors, are shown in Equations (9.3.1) and (9.3.2), respectively. These
190
payoff functions can mimic the decision conflicts between FC and PM, which were
discussed in the previous paragraph.
ϕ 1 = revenue + reputation (gain or cost) − impactCons tant * penaltyCos t
= A + f ( x) −
Bα ( y − lnx)
y
(9.3.1)
ϕ 2 = currentBudget − realizationC ost − dailyFixedCost − impactConstant * penaltyCost
= C − g ( y) − Dy −
Bβ ( x − ln y)
x
(9.3.2)
In equations (9.3.1) and (9.3.2), x and y are the durations of implementing an
enhancement request (in number of days) suggested from FC and PM, respectively, and A,
B, C, and D are revenue, impact constant, current budget, and daily fixed cost,
respectively. Impacts of accepting a request on the existing system is incorporated into
impactConstant (i.e., B). The value of impactConstant indicates how important the
request is and increases as the importance of request increases. Thus, as shown in
equations (9.3.1) and (9.3.2), the payoffs of both FC and PM decrease faster for those
requests with a higher impact. In equations (9.3.1) and (9.3.2), the penalty costs of FC
and PM represent the additional efforts caused by a deviation between their suggested
durations. This deviation is considered as a penalty since it would require extra meetings
and time to resolve the conflicts and delay the implementation of the request. The
penalty constants (α and β) characterize an individual behavior of a player against the
conflict with the other player. As they increase, players take more account the conflicts
into their decisions. Thus by adjusting the penalty constant, we can represent various
191
relationships between the players such as competitive and collaborative relationships. In
Section 9.4.3, we will demonstrate the effect of the penalty constants using an example.
In Equation (9.3.1), revenue (i.e., A) is assumed as zero in this study as Kuali is a nonprofit organization. And, the reputation (gain or cost) of FC is captured in function f (x),
whose overall behaviors for both linear and nonlinear cases are depicted in Figure 9.8(a).
As shown in Figure 9.8(a), if an enhancement request is accomplished within a certain
amount of time, it helps FC build a positive reputation (gain). On the other hand, if the
enhancement request is accomplished after that point, it results in a negative reputation
(cost) for FC.
192
Reputational
Cost/Gain
20
15
10
5
0
0
10
20
30
40
50
60
70
f(x)
-5
-10
-15
-20
Days
Reputational
Cost/Gain
100
75
50
25
0
-8
-25
2
12 22 32 42 52 62 72 82 92 102 112 122 132
-50
-75
-100
Days
(a) Non-linear vs. linear reputation (gain or cost) f(x)
f(x)
193
Realization Cost
1100
1000
900
800
700
600
500
g(y)
400
300
200
100
0
0
10
20
30
40
50
60
Days
(b) RealizationCost g(y)
Figure 9.8: Behavior of f(x) and g(y) in Equations (2) and (3)
194
In Equation (9.3.2), realization cost of an enhancement request is encapsulated in
function g(y), whose overall behaviors for both linear and nonlinear cases are depicted in
Figure 9.8(b). As shown in Figure 9.8(b), if an enhancement request is asked to be
fulfilled within a shorter time period, it may result in overtime works or outsourcing
opportunities, and therefore the corresponding realization cost increases substantially.
On the other hand, as the PM could push the completion date of the job further, its cost
becomes lower. In addition to the variable realization cost of enhancement requests,
there is a fixed cost associated with each request regardless of its time to be completed.
This cost is captured in the term Dy in Equation (9.3.2), where D is the dailyFixedCost.
Given two payoff functions discussed above (see equations (9.3.1) and (9.3.2)), a
multi-iteration dynamic leader-follower game is employed in this work. In the game,
each player tries to find a solution (duration of a request implementation: x for FC and y
for PM) that maximizes his payoff function given the duration suggested by the other
player. Initially, a user of the simulation (a real FC member) determines impactConstant
(i.e., B) of the request, penaltyConstant representing his behavior against the conflict
with PM (i.e., α), and the most recent currentBudget that the PM has reported to FC.
Then the behaviors of a PM against the conflict with FC (involving varying β values) are
simulated in this game theoretic setting. Each iteration (round of play) of the game is
comprised of two stages. The goal of stage 1 is to simulate the most probable response of
the PM given a duration suggested by a FC. To this end, a FC becomes the leader and a
simulated PM becomes the follower, where the game starts with an input (i.e., x)
195
suggested by a FC. Given the initial x value, an optimum y value (maximizing ϕ2 ) is
found. Only at the beginning of the very first iteration, a real FC submits an x value;
otherwise, x value is obtained from stage 2 of the previous iteration. At stage 2, the
follower of the stage 1 (PM) becomes the leader and the simulated FC obtains his
response (i.e., x) based on the PM’s decision (i.e., y) in stage 1. Here, the goal is to find
the most beneficial duration x that a simulated FC should suggest to PM in the next
iteration (round of play) to minimize ϕ1 . At the end of each iteration (two stages),
simulation obtains: 1) how the PM would react to simulated FC’s previous suggestion
and 2) the most beneficial duration x in the next iteration suggested by Equation (9.3.1)
(simulation of FC). The output of the previous iteration becomes the input of very next
iteration. This new iteration starts with the most preferable duration that should be
suggested by FC (x) (which is determined in previous iteration given PM’s reaction to
earlier suggestion of FC). This sequential iterations continue with the sacrifices made by
each party (simulated FC vs. simulated PM) until an equilibrium point is reached. This
equilibrium point of the game can be either an agreement point where x becomes equal to
or close to y or a point where both x and y become stationary. In this study, it is assumed
that an enhancement request is valid for only six months (i.e., it should be completed
within this time period). In addition, both payoff functions are defined as continuous
concave functions whose solution sets for the decision variables x and y are closed and
bounded. Therefore, at each stage there is always an optimal solution according to
Weierstrass theorem (Jones 2000). The simulation of the game runs for 100 replications,
where each replication involves varying β values. As a result, simulation results are
196
provided to the real FC as a 95% confidence interval of the completion dates of a request
on the PM’s schedule.
Figure 9.9 depicts exemplary interactions (negotiations) between a FC (real FC in
iteration 1 and simulated FC in iterations 2, 3, 4, 5) and a simulated PM over multiple
iterations, where the duration of implementation is consented to 22 days.
Days
26
24
22
20
FC's schedule
18
PM's schedule
16
14
12
10
1
2
3
4
5
Iteration
Figure 9.9: Examplary iterations (negotiation process) between FC (real FC in iteration 1
and simulated FC in iterations 2, 3, 4, 5) and simulated PM
9.3.4 Simulation of Entire Enhancement Request Process
While previous three simulators (Evaluation aid for DM, FC decision simulator,
and Scheduling aid for FC) focused on a specific part of the enhancement request process
shown in Figure 9.2, “Simulation of Entire Enhancement Process” concerns its entire
197
process. It will allow the board to perform a system level analysis such as evaluating
alternative communication networks and evaluating mid-term or long-term policies. As
discussed earlier, the entire enhancement request process requires the collaboration of
many stakeholders, whose decisions affect one another. A comprehensive list of possible
impacts of different stakeholders’ decisions is depicted in Table 9.2.
Table 9.2: Impact of a stakeholder’s decision on other stakeholders
Stakeholder
Decision
Affected
Possible Impact
Stakeholders
Member
Raise request
Functional
Conflict between Functional
Institution
to board
Council
Council and Board
Member
May limit functionality of the
Institution
system
Functional
Subcommittee
Reject request
Developers
Development
Manager
Add estimates
(resources)
Council
Council
the Council’s decision
Increased workload
Member
Improved functionality of their
Institution
system
Project
Manager
Reject
wrong estimation
Estimated parameters may impact
(resources)
Functional
create excess overtime based on a
Functional
Developers
Accept
May overload developers and
Affects enhancement scheduling
Member
Prevent improved functionality of
Institution
member’s system
198
Increased scheduled work.
Project Manager
Scheduled
Developers
May imply need of overtime
(resources)
Determines number of resources
needed
Date
Accept
Board
Member
Scheduled date may be crucial for
Institution
the success of the enhancement
Functional
Reduced confidence in decision-
Council
making
Member
Improves functionality of the
Institution
system
Development
(resources)
Reject
May increase workload
Member
May affect relationship with
Institution
member institution
In this work, we propose to employ system dynamics modeling paradigm to
construct “Simulation of Entire Enhancement Process”. Figure 9.10 is a causal loop
diagram that we have developed based on our analysis depicted in Table 9.2 for the
enhancement request process, where each node is a variable and each links denotes the
relationships between the nodes. A solid line represents a positive relationship (e.g., an
increase in node A results in an increase in node B), while a dotted line refers to an
inverse relationship (an increase in A decreases B). As shown in Figure 9.10, the higher
the member institution priority is, the greater the pressure on the FC is (if accepted). The
greater the pressure on the FC, the greater effort they will put to reduce the scheduled
duration for the request. At the same time, increased scheduled duration increases the
total number of works in queue, which in turn affects the scheduled duration itself (the
199
greater the amount of work in queue, the longer the scheduled duration will be for the
next request). The same relationship exists with the developers’ workload: the greater the
workload, the greater the amount of work in queue and vice versa. We can also observe
an inverse relationship between the number of available resources, and initial estimated
effort and developers’ workload; more available resources decrease estimated effort and
workload. In addition, the lower the initial estimate of effort, the greater the chances of
an acceptance for the request by increasing the chance of FC accepting the request, and
more acceptances are expected to increase the functionality of the system. Figure 9.11
depicts a Stock-Flow diagram, which we have developed based on the causal loop
diagram (see Figure 9.10). The rectangular nodes and sandglass shape nodes represent
the stock and flow, respectively. Red (dark) sandglass represent negative flow and blue
(light) sandglass represent positive flow. Thin arrows from stock to flow denote that the
rate of the flow depends on the stock value.
200
Figure 9.10: Causal Loop diagram for the enhancement request process
201
Figure 9.11: Stock-Flow diagram for the enhancement request process
9.4 Implementation and Experimental Results
In this section, we demonstrate the numerical results of four different types of
simulations that have been discussed in Section 9.3.
9.4.1 Experimental Results involving Evaluation Aid for Development Manager
As mentioned in Section 9.3.1, BBN in Figure 9.4 mimics the evaluation of DM
on the requested enhancements. Figure 9.12 depicts the estimated result based on the
information provided by the enhancement request form.
Figure 9.12(a) depicts the
probability distributions of ‘Effort’, ‘TechnicalImpact’ and ‘SystemImpact’ being high,
202
medium, and low when the environmental information is observed (indicated in dark
color) as ‘Priority’ is Medium and ‘SimilarProcess’ is Yes.
For example, the
probabilities of DM’s evaluation on ‘Effort’ being high, medium, and low are 0.41, 0.309,
and 0.281, respectively. Similarly, Figure 9.12(b) depicts the probability distributions of
‘Effort’, ‘TechnicalImpact’ and ‘SystemImpact’ being high, medium, and low when
‘Priority’ is High and ‘SimilarProcess’ is No. When the relationships between each node
of the BBN in Figure 9.12 are collected from the past evaluation data of DM, the
probability distributions of ‘Effort’, ‘TechnicalImpact’ and ‘SystemImpact’ can mimic
the DM’s evaluation on the submitted enhancement request.
It is noted that the
probability distributions of ‘Effort’, ‘TechnicalImpact’ and ‘SystemImpact’ will change
for different environmental information according to the Bayes’ theorem.
(a) Priority = Medium and SimilarProcess = Yes
203
(b) Priority = High and SimilarProcess = No
Figure 9.12: Simulation results: evaluation of DM on an enhancement request
9.4.2 Experimental Results involving FC Decision Simulator
As mentioned in Section 9.3.2, FC Decision Simulator mimics FC’s evaluation
and decision on the submitted request using BBN and DFT, respectively. Figure 9.13
depicts the BBN that mimics the evaluation of FC for a request, assuming that the
submitted enhancement document has medium required effort, core primary component,
high impact of accept, well described statement, and medium level of business needs.
Information for these nodes in Figure 9.13 can be provided by the enhancement request
forms submitted by a member institution and estimates from the DM. Once we infer the
discrete probability distributions on ‘Subjective_Effort’, ‘Subj_ImpactOfAccept’, and
204
‘ImpactOfReject’ nodes in BBN, we generate a random variate between 0 and 6 based on
their underlying distributions, which will be fed to the DFT formula. Table 9.3 depicts
the quantified value of effort and impact for the two options – accept and reject the
 4.4 4.7 
request. Thus, we obtained the evaluation matrix M = 
 in the DFT formula.
 0 5.1 
Figure 9.13: Estimated FC’s evaluation of the enhancement request
Table 9.3: Evaluation matrix M obtained from BBN in Figure 9.13
Effort
Impact
Accept
4.4
4.7
Reject
0
5.1
In this example, we assume that the weight vector follows the Bernoulli distribution
where FC focuses on ‘Effort’ and ‘Impact’ for 34% and 63% of the time, respectively.
205
0.03 is the probability that FC focuses on another factors. Then the complete DFT
formula is shown in Equation (9.4.1),
 paccept (t + h)   0.8 −0.1   paccept (t )   1 −1   4.4 4.7   w1 (t + h) 

=
+
 . (9.4.1)



 preject (t + h)   −0.1 0.8   preject (t )   −1 1   0 5.1   w2 (t + h) 

1
0
0
where PrW (t ) (W (t )) = 0.34 if W (t ) =   ,0.63 if W (t ) =   , 0.03 if W (t ) =   .
 0
1
0

Figure 9.14 depicts evolution of the preference for two options based on Equation (9.4.1).
The solid line and dotted line represents the preference of accepting and rejecting the
request, respectively. As shown in Figure 9.14, the preference of accepting the request at
time 100 (decision-making time) is higher than the preference of rejecting. Thus, FC
accepts the enhancement request in this simulated trial. Due to the stochastic nature of
DFT (see Equation (9.4.1)), we executed it for 2000 times, and the results revealed that
FC would accept the request with the probability of 0.998. To demonstrate the sensitivity
 4.4 0.7 
of our DFT model, we have repeated the same procedure using different M = 

 0 5.1 
from BBN. In this case, the results revealed that FC would accept the request with the
probability of 0.093. Since the impact of acceptance decreased from 4.7 to 0.7 in this
case, the probability of acceptance has decreased.
206
Preference
25
20
15
10
Accept
5
0
-5
Reject
-10
-15
-20
-25
0
10
20
30
40
50
60
70
80
90
100
Deliberation time
Figure 9.14: Evolution of preference on acceptance/rejection of an enhancement request
9.4.3 Experimental Results involving Scheduling Aid for Functional Council
This section demonstrates the proposed game theoretic model (see Section 9.3.3)
used for “Scheduling Aid for Functional Council”. In this example, we assumed each of
the parameters for the payoff functions as revenue (i.e., A) = 0, dailyFixedCost (i.e., D)
=1.212, currentBudget = 100,000, impactConstant (i.e., B) = 2.8 for low, 4.0 for medium
and 4.8 for high, and penalty constant for FC (i.e., α) =50 for low, 70 for medium and 90
for high (see Equations (9.4.4) and (9.4.5)), where the functions f(x) (reputation) and g(y)
(realization cost) are shown in Equations (9.4.2) and (9.4.3).
207
f ( x ) = (100 − 1.515 x )
(9.4.2)
g ( y ) = (100 - 0.606y)
(9.4.3)
ϕ 1 = revenue + reputation alGain / Cost − impactCons tant * penaltyCos t
= (100 − 1.515x) −
Bα ( y − ln x)
y
(9.4.4)
ϕ 2 = currentBud get − realizatio nCost − fixedCost − impactCons tant * penaltyCos t
= 100000 − (100 − 0.606 y) − 1.212 y −
Bβ ( x − ln y)
x
(9.4.5)
As we mentioned in Section 9.3.3, the penalty parameters α and β characterize an
individual behavior of a player against the conflict with the other player. Figure 9.15
depicts a series of iterations (negotiation process) involving different penalty parameters
(therefore, different characteristics of FC and PM).
Figure 9.15(a) shows how the
conflicts are evolved when the penalty values of both parties are low (α is low, β is low
and B is low) (i.e., neither FC nor PM is willing to cooperate). Here, FC starts the game
by suggesting to complete the work within the same day (x = 1). Then, PM offers back
with 143 days (y = 143). At each iteration, FC increases the duration against the PM’s
response. The simulated FC suggests x = 7, then PM responds back with y = 51. In this
particular case, FC and PM cannot reach an agreement as the simulated FC keeps his
offer firm at x = 16, whereas PM maintains his final response at y = 36 after the sixth
iteration. Therefore, we increased α (reflecting a more cooperative FC) and β (reflecting
a more cooperative PM) and simulated the game again, expecting that the gap would
208
become smaller. Figure 9.15(b) shows the result of the game with increased penalty
values (α is medium, β is medium and B is low). Here, FC starts the game again by
suggesting to complete the work within the same day (x = 1). Then, PM offers back with
117 days (y = 117). As both parties are willing to cooperate more (increased penalty
constants), they reach an agreement (x = y= 25) in this case. This agrees with our
expectation that increase in α and β means more cooperative natures of the players
against their conflicts. It is noted that each of Figures 9.15 (a) and (b) depicts a series of
iterations involving the FC (real FC in iteration 1 and simulated FC in the other iterations)
and the simulated PM to demonstrate the impact of α and β within a single replication.
Once the real FC obtains an agreement from the game-theoretic simulation, he can
suggest the agreed schedule learned from the simulation to the real PM. This schedule is
expected to reduce the number of real, costly iterations with the real PM.
Day
160
140
120
100
FC's Schedule
80
PM's Schedule
60
40
20
0
0
1
2
3
4
5
6
7
8
9
10
Iteration
209
(a) α is low, β is low and B is low
Day
140
120
100
80
FC's Schedule
PM's Schedule
60
40
20
0
0
1
2
3
4
5
6
Iteration
(b) α is medium, β is medium and B is low
Figure 9.15: Exemplary iterations (negotiation process) involving different behaviors of
FC (real FC in iteration 1 and simulated FC in the other iterations) and PM (simulated)
against the conflicts
9.4.4 Experimental Results involving Simulation of Entire Enhancement Process
An exemplary system dynamic simulation model (Stock-Flow diagram) has been
developed based on the causal loop diagram shown in Figure 9.10 using AnyLogic® (see
Figure 9.11). Figure 9.16 depicts the simulation results with different flow rates. The
initial stock values are set to 1,000 for all the stocks including AvailableResource,
InitialEstimatedEffort, and DeveloperWorkload. As shown in Figure 9.16(a) (reflecting
the current situation), the values of stock AvailableResource and InitialEstimatedEffort
210
decrease to 529 and 874, and the value of stock DeveloperWorkload increases to 1590 at
the end of simulation run (6 months). It is supposed that the board is interested in
simulating the effect of hiring more developers on entire system performance. To reflect
hiring of more developers in our model, the rate of flow from AvailableResource to
DveloperWorkload is decreased to (initial stock values)/AvailableResource*0.001. Then,
as
shown
in
Figure
9.16(b)
(reflecting
the
future
situation
after
hiring),
AvailableResource and InitialEstimatedEffort are decreased to 799 and 838 while
DveloperWorkload is increased to 1372. From these simulation results, the board can
predict how much workload of developers can be decreased by hiring more developers.
Similarly, the board members can simulate various other scenarios (e.g., evaluating
alternative communication networks, mid-term or long-term policies) in the enhancement
process using the system dynamic model. While the stock and flow values used in our
current simulation are based on our assumption, they will be updated when real data are
available in the future.
211
(a) ScheduledTime to WorkInQueue flow rate = 0.01
(b) ScheduledTime to WorkInQueue flow rate = 0.05
Figure 9.16: System dynamic simulation results for different flow rates
212
CHAPTER 10
CONCLUSIONS AND FUTURE RESEARCH
10.1 Summary of Research Work
In this research, we have proposed promising techniques to realize each
submodule of an extended BDI architecture. The techniques employed in this research
have been selected to represent the characteristics of the corresponding steps of human
decision-planning, decision–making, and dynamic learning processes under emergency
evacuation scenarios.
Successful implementation of these techniques allowed the
extended BDI architecture to be used to mimic human behaviors in the considered,
intricate situations.
Furthermore, the proposed techniques and the extended BDI
framework have been demonstrated using an agent-based simulation technique. The
simulation we have developed allowed us to simulate and observe the crowd behaviors
under evacuation scenarios with various conditions. The proposed simulation has a
potential to allow responsible governmental and law-enforcement agencies to evaluate
different evacuation and damage control policies beforehand, which in turn would help
them to execute the most effective crowd evacuation scheme during an actual emergency
situation. As part of this research, we conducted human-in-the-loop experiments using a
Virtual Reality system to collect data regarding more realistic human behaviors. Through
the proposed hybrid learning process, we have successfully demonstrated how behaviors
of a novice agent become close to those of the commuter agents.
213
In addition to the contribution of this research on the development of an
architecture to mimic human behaviors, this research work has made significant
contributions in several other topics including (1) Extended Decision Field Theory, (2)
Decisions in an organizational social network, (3) BBN-RL Hybrid Leaning model, and
(4) distributed computing infrastructure enabling integration of distributed models. They
are summarized in the following sections.
10.1.1 Contributions in DFT
In this research, we proposed an extension to Decision field theory (DFT) and
presented four corresponding, important theorems. To the best of our knowledge, this is
the first effort to extend DFT to cope with a dynamically changing environment. From
the Matlab® simulations, we demonstrated that the proposed EDFT (extended DFT)
could cope with a dynamically changing environment efficiently. In addition, results
from the human-in-the-loop experiment showed that EDFT can mimic the human
decisions in a dynamically-changing environment. In addition, the theorems presented in
this research increase the usability of DFT. By using these theorems, we can obtain the
expected preference values without actually deploying the DFT formula. Also we can
obtain the desired number of time steps to attain the converged value of the expected
preference or the steady choice probability. Since DFT has been built based on the
psychological principles, DFT and EDFT can be potentially applied to many applications
involving human behaviors.
214
10.1.2 Contributions in Human Decisions in Organizational Social Network
In this research, we have proposed an integrated simulation modeling framework
(based on the proposed human decision behavior model) for decision aids in a
community-based software development process, which involve a Bayesian belief
network, decision field theory, game theory, and system dynamics simulation.
The
proposed simulation framework has been illustrated with a software enhancement request
process in Kuali, which is an open source project by a consortium of nine universities.
To demonstrate the utility of the proposed simulation framework, four simulations have
been developed for different stakeholders in the Kuali organization based on the proposed
framework, where simulation data have been obtained from the survey on the Kuali
members as well as our assumptions.
Experiments were also conducted using the
constructed simulations with varying parameters, which have allowed us to mimic
decision-behaviors of different stakeholders under various conditions.
While the
proposed framework has been demonstrated for the enhancement request process of
Kuali in this work, it is believed that it can be directly applicable to other processes in
Kuali as well as other community-based software development organizations.
10.1.3 Contributions in BBN-RL Hybrid Learning Model
In this research, we have proposed a promising hybrid model integrating a BBN
with RL techniques to enhance the performance of a human decision-making. The
proposed hybrid model has been implemented into the emergency evacuation simulation.
The developed simulation allowed us to simulate and observe the effect of various
215
learning behaviors that are defined by different parameters under various conditions. The
simulation results demonstrated that the proposed BBN-RL model effectively adjusted
itself to an inexperienced situation without any prior knowledge. Inexperienced agents
who infer the environmental situation via only BBN have evolved to experts who adjust
the inferred perception via RL. Moreover, the proposed model represented the learning
on a psychological effect that is the effect of emotion (CI) to the situation perception
(belief) in the human decision process.
Thus, the proposed model demonstrated a
potential to represent the complex human learning process more thoroughly than existing
models. The proposed hybrid learning model is believed to reduce the gap between the
behaviors of an agent and real human significantly.
10.1.4 Contributions in Distributed Simulation Infrastructure
A generic infrastructure using web service technique has been developed under
HLA/RTI standards to integrate and together simulate the distributed BDI modules. The
transaction coordinator developed using web services technology offers a simplified set
of services compared with the HLA/RTI.
Reusable interface modules have been
implemented which can be used to quickly create distributed simulations with multiple
simulation packages and/or multiple software modules. The only requirement for the
modeling packages is that they must be able to embed a web service client module, which
is available in most of the state-of-the-art software applications.
10.2 Firsts in the Research
216
To the best of our knowledge, the followings are achieved first in this research.
•
Proposed a human decision-behavior modeling framework (based on extended BDI),
that allowed us to represent various human decision behaviors such as decisionmaking, decision-planning, and dynamic learning in a unified framework.
•
Extended DFT to cope with a dynamic environment by including BBN to infer the
evaluation matrix and weight vector.
•
Proposed and demonstrated the BBN-RL hybrid learning algorithm complementing
each other.
•
Proposed a real-time planning algorithm that can adjust the number of planning steps
dynamically.
•
Developed a web service infrastructure, which allowed time synchronization and
message exchanges between various simulation software systems under languageindependent and platform-independent distributed computing environment.
•
Developed a realistic CAVE environment immersing the subjects into a quasi-real
emergency evacuation situation (used for human data collection for model
development as well as validation).
10.3 Future Directions of Research
While this dissertation has presented significant initial efforts towards highly
realistic human behavior modeling, there is still a great deal of work to be done.
Extensions are possible in the methodological aspects, technological aspects and the
applications described in this research.
217
In this work, the belief module (perceptual processor) in the extended BDI has
been implemented using a BBN. However, the structure and parameters of the BBN need
to be trained based on the actual human behavior data. To this end, we conducted CAVE
based human-in-the-loop experiments. The drawback of this method is that the BBN
should be re-trained (involving expensive experimentations) whenever the scenarios
change. This challenge creates important future research opportunities such as 1) how to
develop a more robust BBN and 2) investigation on which portion of a BBN can be
reusable under varying, but similar situations.
Extension of the proposed approaches into various applications is already
underway.
The usage of the proposed framework for a community-based software
development process has been being explored using the Kuali organization, where the
decisions from stake holders are critical for the success of the software development.
Collecting more operational and historical data from Kuali to refine and improve the
validity of the proposed simulation models is a next step. Also, design and development
of a workforce assignment method involving the proposed simulation model as well as
considering characteristics (e.g., tightly coupled vs. loosely coupled) of a social network
is of great interest.
218
APPENDIX A
CAVE 3D MODEL DEVELOPMENT
In this appendix, we demonstrate how to develop a CAVE project which will be
helpful to extend the work in this dissertation. Since this work is based on the open
source projects such as OpenSceneGraph and DIVERSE, the detailed descriptions of
each development step are necessary to expedite any extension of the work.
3D Application (Google SketchUp, 3D Studio)
Scene graph middleware (OpenSceneGraph)
Low-level rendering API (OpenGL)
Figure A.1: The 3D application stack: Rather than interface directly with the low-level
rendering API, many 3D applications require additional functionality from a middleware
library, such as OpenSceneGraph.
A.1 Background
A.1.1 OpenSceneGraph (OSG)
The OpenSceneGraph (http://www.openscenegraph.org/projects/osg) is an open
source high performance 3D graphics toolkit, used by application developers in fields
such as visual simulation, games, virtual reality, scientific visualization and modelling.
Written entirely in Standard C++ and OpenGL it runs on all Windows platforms, OSX,
219
GNU/Linux, IRIX, Solaris, HP-Ux, AIX and FreeBSD operating systems.
The
OpenSceneGraph is now well established as the world leading scene graph technology,
used widely in the vis-sim, space, scientific, oil-gas, games and virtual reality industries.
OSG is a middleware, which is built on top of low-level APIs (OpenGL) to
provide spatial organization capabilities and other features typically required by highperformance 3D applications.
OSG is a set of open source libraries that primarily
provide scene management and graphics rendering optimization functionality to
applications. It’s written in portable ANSI C++ and uses the industry standard OpenGL
low-level graphics API. OSG is open source and is available under a modified GNU
Lesser General Public License, or Library GPL (LGPL) software license.
Data file format: The default file format for OSG is .osg, .ive. But OSG support
common 2D, 3D file formats including 2D : (.bmp, .dds, .gif, .jpeg, .pic, .png, .rgb, .tga,
and
.tiff),
3D:
Graphics’],
(.3ds[3D
Studio
.dae[COLLADA],
Max],
.obj[Alias
.shp[ESRI
Wavefront],
Shapefile],
.geo[Carbon
.lwo[NewTek
LightWave], .flt[OpenFlight], .md2[Quake], and .txp[Terrex TerraPage])
A.1.2 OpenGL
OpenGL (Open Graphics Library) is a standard specification defining a crosslanguage cross-platform API for writing applications that produce 2D and 3D computer
graphics. OpenGL was developed by Silicon Graphics Inc. (SGI) in 1992 and is widely
used in CAD, virtual reality, scientific visualization, information visualization, and flight
220
simulation. It is also used in video games, where it competes with Direct3D on Microsoft
Windows platforms.
1.3 Device Independent Virtual Environment - Reconfigurable, Scalable, Extensible
(DIVERSE)
DIVERSE (http://diverse.sourceforge.net/diverse/) is a cross-platform, open
source, API for developing user interface of virtual reality applications that can run on
Linux, IRIX, Windows XP, and Mac OS X. Using DIVERSE the same program can be
run on CAVE™, ImmersaDesk™, Head Mounted Display (HMD), desktop and laptop
systems without modification. DIVERSE provides a common API to virtual environment
oriented hardware such as trackers, wands, joysticks, and motion bases.
A.2 Installation of Packages
The latest version of OSG can be downloaded at http://www.openscenegraph.org
/projects/osg. However, since DIVERSE is developed based on the OSG version 1.2, we
recommend installing OSG version 1.2.
A.2.1 OSG Installation
OSG has windows installation package that can be installed easily. The windows
installation package file name of OSG version 1.2 is “osg1.2_setup_2006-09-13.exe”.
Running this file will install all the necessary files. Note that the installation package
contains only binary files. In order to use debugging, it is required to download the
221
source code and compile it with the debugging option.
For more detail about the
installation, refer the OSG web site (http://www.openscenegraph.org /projects/osg).
A.2.2 DIVERSE Installation
The DIVERSE web site (http://diverse.sourceforge.net/diverse/) has the latest
package. DIVERSE also has windows installation package which contains binary files
only. In order to modify any of the DIVERSE library, it is necessary to deploy the source
code. Since DIVERSE distribute the source code in the platform independent format, we
need to use CMake to build project files. If we built the projects correctly, then we get
131 projects. For more detail about the installation, refer the DIVERSE web site. In
order to compile DIVERSE, two additional dependencies are required – OpenThreads
(http://openthreads.sourceforge.net/) and Producer (http://www.andesengineering.com/
Producer/). If an error "'ERANGE': undeclared identifier" comes from client.cpp during
DIVERSE compile, add “#include <errno.h>” to the client.cpp file.
A.3 Building 3D Model
In this dissertation, we used Google SketchUp 3D modeling freeware software
(http:// sketchup.google.com/). The 3D buildings can be downloaded from Google 3D
Warehouse. Then place each 3D model in proper place within the current project. Now,
most of the buildings in major cities are ready in 3D model. The Google SketchUp
supports only .skp file format, but Google SketchUp Pro supports other formats also and
allows conversions between formats. The manager of AZLIVE facility (Marvin Landis,
222
[email protected], (520) 621-8258) converts the 3D model into .vi file format that
can be projected in CAVE.
A.4 OSG Programming
OSG application is implemented to represent dynamic movement of various 3D
objects such as smoke, fire, and crowd. The moving 3D objects (.ive files) are imported
to the application whereas smoke and fire are represented using osgParticle class library.
In this section, the major programming issues are discussed.
Example A.1: Importing 3D Object and Place in Proper Position
osg::Node* intersection =
osgDB::readNodeFile("BDI/BDI_nohumans_nocars.ive");
osg::PositionAttitudeTransform* Xform = new
osg::PositionAttitudeTransform();
Xform->setPosition( osg::Vec3(0.0f,0.0f,0.0f) );
Xform->addChild(intersection);
root->addChild(Xform);
In the Example A.1, the readNodeFile of osgDB class is used to load 3D model file
(BDI_nohumans_nocars.ive file) from $OSG_FILE_PATH/BDI and the loaded
model is put into Node class type variable which is defined in osg class.
$OSG_FILE_PATH is the environmental variable defined in the operating system. In
order to define the position of imported 3D object PositionAttitudeTransform
class is declared. The coordination in the CAVE is defined using Vec3 class type
223
variable of osg class and assigned to defined PositionAttitudeTransform class
using the setPosition function. Then the 3D object is attached to this class as a child.
Example A.2: Adding Smoke and Fire
osgParticle::SmokeEffect* smoke1 = new osgParticle::SmokeEffect
(osg::Vec3(15.0f,12.0f,0.0f), 8.0f, 8.0f);
smoke1->setWind(osg::Vec3(1.0f,-2.0f,0.0f));
smoke1->setStartTime(0.0f);
smoke1->setEmitterDuration(3600);
root->addChild(smoke1);
osgParticle::FireEffect* fire1 = new
osgParticle::FireEffect(osg::Vec3(15.0f,12.0f,0.0f), 8.0f,
8.0f);
fire1->setWind(osg::Vec3(1.0f,-2.0f,0.0f));
fire1->setStartTime(0.0f);
fire1->setEmitterDuration(3600);
root->addChild(fire1);
As mentioned earlier, smoke and fire are created using SmokeEffect class and
FireEffect class of osgParticle class, respectively. As shown in Example A.2,
each class requires position, intensity, and scale as the parameters. Then we set the
direction of diffusion, start time, and duration of them.
Example A.3: Add Movement to 3D Objects
osg::AnimationPath* animationPath = new osg::AnimationPath;
224
animationPath->insert(starttime,
osg::AnimationPath::ControlPoint(osg::Vec3 (10.5, 14,
0.0)));
animationPath->insert(starttime+3,
osg::AnimationPath::ControlPoint(osg::Vec3(3.5, 13, 0.0)));
animationPath->insert(starttime+3.5,
osg::AnimationPath::ControlPoint(osg::Vec3(3, 14, 0.0),
osg::Quat(osg::inDegrees(-90.0f),
osg::Vec3(0.0,0.0,1.0)) ));
animationPath->insert(starttime+7,
osg::AnimationPath::ControlPoint(osg::Vec3(3, 30, 0.0),
osg::Quat(osg::inDegrees(-90.0f),osg::Vec3(0.0,0.0,1.0))));
animationPath->setLoopMode(osg::AnimationPath::LOOP);
osg::PositionAttitudeTransform* Xform = new
osg::PositionAttitudeTransform();
Xform->setUpdateCallback(new
osg::AnimationPathCallback(animationPath,0.0,1.0));
Xform->addChild(3D_Obj);
root->addChild(Xform);
Example A.3 depicts how to assign a movement to the 3D object (3D_Obj) using
AnimationPath class.
A linear movement is created by defining the start and end
positions with the time that the object is plan to be.
Then the speed is adjusted
accordingly. In the Example A.3, we define 3 linear movements connecting 4 positions
defined using the ControlPoint class. When we add the third position, we defined
rotation in the ControlPoint class parameter using Quat class to rotate the 3D object
to their moving direction.
225
A.5 DIVERSE Programming
In this dissertation, DIVERSE is used for the controlling of Input/Output devices
such as keyboard and wand. These devices are used for 1) receiving the responses from
subject during the human-in-the-loop experiment and 2) setting and initializing the
experiment. As we mentioned in Section A.2.2, the distributed version of DIVERSE
source has 131 projects that can be compiled into dynamic library or execution files. In
addition to them, more projects can be defined by users to add functions as they needed.
Then we can use the dynamic libraries selectively according to the functions required by
specifying them via set command. The more detail about the loading dynamic library is
discussed in Example 3. Since the dynamic libraries have a common structure, knowing
the structure is helpful to understand the mechanism of library. Two important member
functions are postConfig and postFrame. The postConfig member function is
called when the dynamic library is loaded. Thus most of the initializing procedures are
placed in this function. The postFrame member function is called after each update of
environment. Thus any dynamic interfaces during the experiment need to be placed in
this function. In the Examples 1 and 2, we describe the interfaces with wand and
keyboard.
Example A.4: Interface with wand (caveWandInputCIMLAB)
#include <dtk.h>
#include <dtk/dtkDSO_loader.h>
#include <dgl.h>
#define CHANGEHP_BUTTON 3
#define TRIGGER_BUTTON 5
226
class caveWandInputCIMLAB : public dtkAugment
{
public:
caveWandInputCIMLAB (dtkManager *);
int postFrame(void);
int postConfig(void) ;
private:
dtkManager *manager;
//wand variables
dtkInLocator *wand_loc;
dtkInButton *wand_but;
dtkDequeuer *dequeuer;
}
caveWandInputCIMLAB:: caveWandInputCIMLAB (dtkManager *m) :
dtkAugment("caveWandInputCIMLAB ")
{
setDescription("reads the wand and sets dtk shared memory
segments "
"to control the simulator") ;
manager = (dtkManager*) DGL::getApp();
if (manager == NULL || manager->isInvalid()) {
printf("caveWandInputCIMLAB:: caveWandInputCIMLAB: Bad
manager :(\n");
return;
}
dequeuer =
wand_but =
wand_loc =
validate()
NULL;
NULL;
NULL;
;
}
int caveWandInputCIMLAB::postConfig(void)
{
dequeuer = new dtkDequeuer(manager->record());
if(!dequeuer || dequeuer->isInvalid()) return ERROR_;
227
wand_loc = (dtkInLocator *) manager->get("wand",
DTKINLOCATOR_TYPE);
wand_but = (dtkInButton *) manager->get("buttons",
DTKINBUTTON_TYPE);
if (!wand_loc || !wand_but) {
dtkMsg.add(DTKMSG_ERROR, "caveWandInputCIMLAB: couldn't get "
"valid wand or button.\n");
invalidate();
return ERROR_;
}
wand_but->queue();
return CONTINUE;
}
int caveWandInputCIMLAB::postFrame(void)
{
dtkRecord_event *e;
while((e = dequeuer->getNextEvent(wand_but))) {
int whichButton;
int buttonState = wand_but->read(&whichButton, e);
if (whichButton == CHANGEHP_BUTTON) {
if (buttonState) {
printf("CHANGEHP_BUTTON pressed.\n ");
}
}
}
return CONTINUE;
}
Example A.4 is the DIVERSE dynamic library code (caveWandInputCIMLAB)
detecting a button presses on wand. In this example, “CHANGEHP_BUTTON pressed” is
printed whenever one of the buttons (the blue button on wand at U of Arizona CAVE) on
wand is pressed. As shown in the example, wand_but variable which is defined in
228
postConfig is used to detect the button presses. In postFrame, the printf function
is called if the pressed button is CHANGEHP_BUTTON.
Example A.5: Interface with keyboard (caveKeyboardInputCIMLAB)
#include <dtk.h>
#include <dtk/dtkDSO_loader.h>
#include <dgl.h>
class caveKeyboardInput : public dtkAugment
{
public:
caveKeyboardInput(dtkManager *);
int postFrame(void);
int postConfig(void) ;
private:
dtkManager *manager;
}
caveKeyboardInput::caveKeyboardInput(dtkManager *m) :
dtkAugment("caveKeyboardInput")
{
setDescription("reads the keyboard and sets dtk shared memory "
" segments to control the simulator") ;
manager = (dtkManager*) DGL::getApp();
if (manager == NULL || manager->isInvalid()) {
printf("caveKeyboardInput::caveKeyboardInput: Bad
manager :(\n");
return;
}
validate() ;
}
int caveKeyboardInput::postConfig(void)
{
229
return CONTINUE;
}
int caveKeyboardInput::postFrame(void)
{
bool newScene = (DGLKeyboard::getState(KeyChar_s)==
DGLKeyboard::PRESSED) ;
if (newScene) {
printf("s key is pressed.\n");
}
return CONTINUE;
}
Similarly, Example A.5 depicts the way proving key presses of the keyboard. In this
example, we detect ‘s’ key press and print out ‘s key is pressed.’. All this action is
coded in the postFrame function.
Example A.6: Loading the dynamic library using .bat file
set DGL_DSO_FILES=caveWandInputCIMLAB;desktopCaveEmulateGroup
::set DGL_DSO_FILES= caveKeyboardInputCIMLAB
osg_cave.exe
As shown in Example A.6, we load the DIVERSE dynamic library using the set
command. Multiple libraries can be loaded at the same time by separating them with
semicolon. After the libraries are loaded, OSG executive file is called to start CAVE
application. We can make batch file (.bat file) to easily do all processes. When we use
different sets of libraries alternatively, we can comment out the set command by putting
double colons (::) at the first of the line.
230
APPENDIX B
MATLAB CODE FOR DFT SIMULATION
B.1 MATLAB Code for Expected Preference Values
function Exp_Pref(iS, iM, iW, iColor)
nSubj=10000;
nDeliberTime=150;
nMTransTIme=53;
nWTransTIme=53;
nInterval = 5;
S=[0.9 -0.01;-0.01 0.9];
C=[1 -1;-1 1];
M1=[3.5 1.3;1.3 3.5];
M2=[3.4 1.3;1.3 3.5];
wRisk=0.53;
wReturn=0.47;
P=zeros(2, nSubj);
for m= 1:nDeliberTime
for n = 1:nSubj
y=rand(1);
if (m > nWTransTIme) & (iW == 1)
wRisk=0.47;
wReturn=0.53;
end
if y <= wRisk
W=[1;0];
elseif y <= wRisk+wReturn
W=[0;1];
else
W=[0;0];
end
if (m > nMTransTIme) & (iM == 1)
P(:,n)=S*P(:,n) + C*M2*W;
else
P(:,n)=S*P(:,n) + C*M1*W;
end
end
231
if rem(m, nInterval) == 0
Pa=mean(P(1,:));
Pb=mean(P(2,:));
if (iM == 0) & (iW == 0) & (iS == 0)
if m == nInterval
line([0 m], [0 Pa], 'Marker', '+',
'LineStyle', '--', 'Color', iColor);
line([0 m], [0 Pb], 'Marker', 'o',
'LineStyle', '--', 'Color', iColor);
else
line([m-nInterval m], [PaO Pa], 'Marker', '+',
'LineStyle', '--', 'Color', iColor);
line([m-nInterval m], [PbO Pb], 'Marker', 'o',
'LineStyle', '--', 'Color', iColor);
end
else
if m == nInterval
line([0 m], [0 Pa], 'Marker', 'x',
'LineStyle', '-', 'Color', iColor);
else
line([m-nInterval m], [PaO Pa], 'Marker', 'x',
'LineStyle', '-', 'Color', iColor);
end
end
PaO=Pa;
PbO=Pb;
end
end
B.2 MATLAB Code for Choice Probability
function Prob(iS, iM, iW, iColor)
nSubj=2000;
nDeliberTime=200;
nMTransTIme=100;
nWTransTIme=100;
S=[0.9 -0.01;-0.01 0.9];
C=[1 -1;-1 1];
232
M1=[3.5 1.3;1.3 3.5];
M2=[3.4 1.3;1.3 3.5];
wRisk=0.45;
wReturn=0.43;
P=zeros(2, nSubj);
if iS == 1
S=[0.5 -0.01;-0.01 0.5];
end
for m= 1:nDeliberTime
for n = 1:nSubj
y=rand(1);
if (m > nWTransTIme) & (iW == 1)
wRisk=0.43;
wReturn=0.45;
end
if y <= wRisk
W=[1;0];
elseif y <= wRisk+wReturn
W=[0;1];
else
W=[0;0];
end
if (m > nMTransTIme) & (iM == 1)
P(:,n)=S*P(:,n) + C*M2*W;
else
P(:,n)=S*P(:,n) + C*M1*W;
end
end
if rem(m, 10) == 0
Pa=0;
Pb=0;
for n = 1:nSubj
if P(1,n) > P(2,n)
Pa = Pa + 1;
elseif P(1,n) < P(2,n)
Pb = Pb +1;
end
end
Pa = Pa / nSubj;
233
Pb = Pb / nSubj;
if (iM == 0) & (iW == 0) & (iS == 0)
if m == 10
line([0 m], [0.5 Pa], 'Marker', '+',
'LineStyle', '--', 'Color', iColor);
line([0 m], [0.5 Pb], 'Marker', 'o',
'LineStyle', '--', 'Color', iColor);
else
line([m-10 m], [PaO Pa], 'Marker', '+',
'LineStyle', '--', 'Color', iColor);
line([m-10 m], [PbO Pb], 'Marker', 'o',
'LineStyle', '--', 'Color', iColor);
end
else
if m == 10
line([0 m], [0.5 Pa], 'Marker', 'x',
'LineStyle', '-', 'Color', iColor);
line([0 m], [0.5 Pb], 'Marker', 's',
'LineStyle', '-', 'Color', iColor);
else
line([m-10 m], [PaO Pa], 'Marker', 'x',
'LineStyle', '-', 'Color', iColor);
line([m-10 m], [PbO Pb], 'Marker', 's',
'LineStyle', '-', 'Color', iColor);
end
end
PaO=Pa;
PbO=Pb;
end
end
234
APPENDIX C
HUMAN SUBJECTS PROTECTION PROGRAM APPROVAL LETTER
235
REFERENCES
ANDERSON, J.R., BOTHELL, D., BYRNE, M.D., DOUGLASS, S., LEBIERE, C., AND QIN, Y.,
2004, An integrated theory of the mind. Psychological Review, 111(4), 1036-1060.
AUBE, F. and SHIELD, R., 2004, Modeling the Effect of Leadership on Crowd Flow
Dynamics, in: CHOPARD, B. and HOEKSTRA, A.G. (Eds.), Cellular Automata, Springer,
Berlin, Heidelberg, 601-611.
BOHANNON, J., 2005, Directing the Herd: Crowds and The Science of Evacuation.
Science, 310, 219-221.
BONABEAU, E., 2002, Agent-based modeling: Methods and techniques for simulating
human systems. PNAS, 99, 7280-7287.
BRATMAN, M.E., 1987, Intention, Plans and Practical Reason. Harvard University Press,
Cambridge, MA.
BUNTINE, W., 1991, Theory refinement on Bayesian networks. In Proceedings of 7th
Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, 52-60, Morgan
Kaufmann.
BUSEMEYER, J.R. and DIEDERICH, A., 2002, Survey of Decision Field Theory.
Mathematical Social Science, 43, 345-370.
BUSEMEYER, J.R. and TOWNSEND, J.T., 1993, Decision Field Theory: A DynamicCognitive Approach to Decision Making in an Uncertain Environment. Psychological
Review, 100, 432-459.
CASTI, J., 1997, Would-Be Worlds: How Simulation Is Changing the World of Science.
Wiley, New York.
CORTES, C. and VAPNIK, V., 1995, Support-Vector Networks. Machine Learning, 20(3),
279-297.
DAVIDSSON, P., 2002, Agent based Social Simulation: A Computer Science View.
Journal of Artificial Societies and Social Simulation, 5(1).
DIEDERICH, A., 1997, Dynamic Stochastic Models for Decision Making under Time
Constraints. Journal of Mathematical Psychology, 41, 260-274.
DIETTERICH, T.G., 2003, Machine Learning. Nature Encyclopedia of Cognitive Science,
London: Macmillan.
236
DJAN-SAMPSON, P.O. and SAHIN, F., 2004, Structural Learning of Bayesian Networks
from Complete Data using the Scatter Search Documents. IEEE International Conference
on Systems, Man and Cybernetics, 4, 3619-3624.
EARNSHAW R.A., VINCE J.A, and JONES H., 1995, Virtual reality Applications. Academic
Press Ltd., London, UK.
EDWARDS, W., 1954, The theory of decision-making. Psychological Bulletin 51(4), 380417.
EDWARD, W., 1962, Subjective Probabilities Inferred from Decisions. Psychological
Review, 69, 109-135.
EINHORN, H.J., 1970, The Use of Nonlinear, Noncompensatory Models in Decision
Making. Psychological Bulletin, 73, 221-230.
FASLI, M., 2003, Interrelations between the BDI primitives: Towards heterogeneous
agents. Cognitive Systems Research, 4, 1-22.
FU, W. and ANDERSON, J.R., 2006, From Recurrent Choice to Skill Learning: A
Reinforcement-Learning Model. Journal of Experimental Psychology: General, 135 (2),
184 –206.
GAO, J., and LEE, J.D., 2006, Extending the decision field theory to model operator’s
reliance on automation in supervisory control situations. IEEE Transactions on Systems,
Man, and Cybernetics, Part A, 36(5), 943-959.
GIBSON, F.P., FICHMAN, M., and PLAUT, D.C., 1997, Learning in Dynamic Decision
Tasks: Computational Model and Empirical Evidence. Organizational Behavior and
Human Decision Processes, 71, 1-35.
GLIMCHER, P.W., 2003, Decision, Uncertainty, and the Brain, The Science of
Neuroeconomics. MIT Press, Cambridge, MA.
GONZALEZ, C. and QUESADA, J., 2003, Learning in Dynamic Decision Making: The
Recognition Process. Computational & Mathematical Organization Theory, 9, 287-304.
GONZALEZ, C., LERCH, J.F., and LEBIERE, C., 2003, Instance-based Learning in Dynamic
Decision Making. Cognitive Science, 27, 591-631.
GRAY, W.D., SIMS C.R., FU, W., and SCHOELLES, M.J., 2006, The Soft Constraints
Hypothesis: A Rational Analysis Approach to Resource Allocation for Interactive
Behavior. Psychological Review, 113(3), 461–482.
237
GRIFFITHS, T.L., KEMP, C., and TENENBAUM, J.B., 2008, Bayesian models of cognition,
In RON, S. (ed.), Cambridge Handbook of Computational Cognitive Modeling,
Cambridge University Press.
HAMAGAMI T. and HIRATA H., 2003, Method of crowd simulation by using multiagent on
cellular automata. In Proceedings of IEEE/WIC International Conference on Intelligent
Agent Technology (IAT’03), Halifax, Canada, 46-52.
HECKERMAN, D., GEIGER, D., and CHICKERING, D., 1994, Learning Bayesian networks:
The combination of knowledge and statistical data. In Proceedings of 10th Conference on
Uncertainty in Artificial Intelligence, Seattle, WA, 293-301, Morgan Kaufmann.
HELBING, D. and MOLNAR, P., 1995, Social Force Model for Pedestrian Dynamics.
Physical Review E, 51(5), 4282-4286.
HELBING, D., FARKAS, I., and VICSEK, T., 2000, Simulating dynamical features of escape
panic. Nature, 407, 487-490.
HERRERO, P. and ANTONIO, A., 2003, Introducing Human-like Hearing Perception in
Intelligent Virtual Agents, AAMAS.
HOFFMANN, H., THEODOROU, E., and SCHAAL, S., 2008, Optimization Strategies in
Human Reinforcement Learning. In Proceedings of the Annual Symposium Advances in
Computational Motor Control, Washington DC.
HOLROYD, C.B. and COLES, M.G.H., 2002, The neural basis of human error processing:
Reinforcement learning, dopamine, and the error-related negativity. Psychological
Review, 109, 679-709.
HOPFIELD, J.J., 1982, Neural networks and physical systems with emergent collective
computational abilities. Proceedings of the National Academy of Sciences of the USA,
79(8), 2554-2558.
JANIS, I.L., and MANN, L., 1977, Decision Making: A Psychological Analysis of Conflict,
Choice, and Commitment. Free Press, New York, NY.
JENNINGS, N.R., 2000, On agent-based software engineering. Artificial Intelligence,
117(2), 277-296.
JENSEN, F.V., 1996, Introduction to Bayesian Networks. Springer-Verlag New York, Inc.
KAMINKA, G.A. and FRIDMAN, N., 2007, Social Comparison in Crowds: A Short Report.
AAMAS.
KINNY, D., GEORGEFF, M., and RAO, A., 1996, A methodology and modeling technique
for systems of BDI agents. In Proceedings of the 7th European Workshop on Modeling
238
Autonomous Agents in a Multi-Agent World, MAAMAW’96, Eindhoven, The Netherlands,
VAN DER VELDE W. and PERRAM, J.W. Eds. Springer Verlag, 56-71.
KONAR, A., and CHAKRABORTY, U.K., 2005, Reasoning and unsupervised learning in a
fuzzy cognitive map. Information Sciences, 170, 419-441.
LAIRD, J.E., NEWELL, A., and ROSENBLOOM, P.S., 1987, SOAR: An Architecture for
General Intelligence. Artificial Intelligence, 33, 1-64.
LAM, W. and BACCHUS, F., 1993, Using causal information and local measures to learn
Bayesian networks. In Proceedings of 9th Conference on Uncertainty in Artificial
Intelligence, Washington, DC, 243-250, Morgan Kaufmann.
LEE, S., CELIK, N., AND SON, Y., 2009, An Integrated Simulation Modeling Framework
for Decision Aids in Enterprise Software Development Process. International Journal of
Simulation and Process Modeling, 5(1), 62-76.
LEE, S., SON, Y., and JIN, J., 2008, Decision Field Theory Extensions for Behavior
Modeling in Dynamic Environment using Bayesian Belief Network. Information
Sciences, 178(10), 2297-2314.
LEE, S., SON, Y., and JIN, J., 2008, An Integrated Human Decision Making and Planning
Model for Evacuation Scenarios under a BDI Framework. ACM Transactions on
Modeling and Computer Simulation (TOMACS) (submitted).
MACAL, C.M. and NORTH, M.J., 2006, Tutorial on agent-based modeling and simulation
part 2: how to model with agents. Winter Simulation Conference.
MOSTELLER, F., and NOGEE, P., 1951, An Experimental Measurement of Utility. The
Journal of Political Economy, 59, 371-404.
NEWELL, A., 1990, Unified Theories of Cognition. Harvard University Press, Cambridge,
MA.
NEUMANN, J.V. and MORGENSTERN, O., 1944, Theory of Games and Economic Behavior.
Princeton University Press, Princeton, NJ.
NICULESCU, R.S., MITCHELL, T.M., and RAO, R.B., 2006, Bayesian Network Learning
with Parameter Constraints. Journal of Machine Learning Research, 7, 1357-1383.
NILSSON, N.J., 1990, The Mathematical Foundations of Learning Machines. San
Francisco: Morgan Kaufmann.
NORLING, E., 2004, Folk psychology for human modeling: extending the BDI paradigm.
In Proceedings of International Conference on Autonomous Agents and Multi-Agent
System, New York, 202-209.
239
OPALUCH, J.J., and SEGERSON, K., 1989, Rational Roots of Irrational Behavior: New
Theories of Economic Decision-Making. Northeastern Journal of Agricultural and
Resource Economics, 18(2), 81-95.
PAYNE, J.W., 1982, Contingent Decision Behavior. Psychological Bulletin, 92, 382-402.
PEARL, J., 1985, Bayesian Networks: A Model of Self-Activated Memory for Evidential
Reasoning. In Proceedings of the 7th Conference of the Cognitive Science Society,
University of California, Irvine, CA, 329-334.
PEARL, J. and VERMA, T., 1991, A Theory of Inferred Causation. In ALLEN, J., FIKES, R.,
and SANDEWALL, E. (Eds.), Knowledge Representation and Reasoning: Proceedings of
the 2nd International Conference, 441-452, Morgan Kaufmann.
PRENDINGER, H. and ISHIZUKA, M., 2002, Social Computing. Life-like Characters as
Social Actors, Proc. 1st Salzburg Workshop on Paradigms of Cognition.
QUINLAN, J.R., 1993, C4.5: Programs for Machine Learning. Morgan Kaufmann
Publishers.
RAO, A.S., and GEORGEFF, M.P., 1998, Decision procedures for BDI logics. Journal of
logic and computation, 8(3), 293-343.
ROTHROCK, L., and YIN, J., 2008, Integrating Compensatory and Noncompensatory
Decision Making Strategies in Dynamic Task Environments. In Decision Modeling and
Behavior in Uncertain and Complex Environments, KUGLER, T., SMITH, C., CONNOLLY,
T., and SON, Y. (Eds.) Springer, 123-138.
ROE, R., BUSEMEYER, J.R., and TOWNSEND, J.T., 2001, Multialternative Decision Field
Theory: A Dynamic Connectionist Model of Decision Making. Psychological Review,
108, 370-392.
SAMUELSON, D.A., 2005, Agent of Change: How agent-based modeling may transform
social science. OR/MS Today, 32(1).
SAMUELSON, D.A., and MACAL, C.M., 2006, Agent-based simulation comes of age:
software opens up many new areas of application. OR/MS Today, 33(4).
SANFEY, A.G., LOEWENSTEIN, G., MCCLURE, S.M., and COHEN, J.D., 2006,
Neuroeconomics: cross-currents in research on decision-making. TRENDS in Cognitive
Sciences, 10(3), 108-116.
SEN, S., ASKIN, R., BAHILL, T., JIN, J., SMITH, C., SON, Y., and SZIDAROVSZKY, F., 2008,
Predicting and Prescribing Human Decision Making Under Uncertain and Complex
Scenarios. MURI (Award Number: F49620-03-1-0377) 2007 Annual Report.
240
SHENDARKAR, A., VASUDEVAN, K., LEE, S., and SON, Y., 2006, Crowd Simulation for
Emergency Response using BDI Agents Based on Immersive Virtual Reality. Simulation
Modelling Practice and Theory, 16, 1415-1429.
SHIZGAL, P., 1997, Neural basis of utility estimation. Current Opinion in Neurobiology, 7,
198-208.
SIMON, H.A., 1955, A Behavioral Model of Rational Choice. The Quarterly Journal of
Economics, 69, 99-118.
SIRBILADZE, G., and GACHECHILADZE, T., 2005, Restored fuzzy measures in expert
decision-making. Information Sciences, 169, 71-95.
SONG, W., XU, X., WANG, B., and NI, S., 2006, Simulation of evacuation processes using
a multi-grid model for pedestrian dynamics, Physica A, 363, 492-500.
SPIRTES, P., GLYMOUR, C., and SCHEINES, R., 1993, Causation, Prediction, and Search.
Springer-Verlag, New York.
SUN, R., MERRILL, E., and PETERSON, T., 2001, From implicit skills to explicit
knowledge: A bottom-up model of skill learning. Cognitive Science, 25(2), 203-244.
SUN, R., SLUSARZ, P., and TERRY, C., 2005, The interaction of the explicit and the
implicit in skill learning: A dual-process approach, Psychological Review, 112(1), 159192.
SUN, R., 2007, Cognitive Social Simulation Incorporating Cognitive Architectures. IEEE
Intelligent Systems, 22(5), 33-39.
SUZUKI, J., 1993, A construction of Bayesian networks with discrete variables from data.
In Proceedings of 9th Conference on Uncertainty in Artificial Intelligence, Washington,
DC, 266-273, Morgan Kaufmann.
SVENSON, O., 1992, Differentiation and Consolidation Theory of Human Decision
Making: A Frame of Reference for the Study of Pre- and Post- Decision Processes. Acta
Psychologica, 80, 143-168.
TOWNSEND, J.T. and BUSEMEYER, J.R., 1995, Dynamic Representation of DecisionMaking. In: PORT, R.F. and GELDEr, T.V. (Eds.), Mind as Motion, MIT Press, Cambridge,
MA, 101-120.
WATKINS, C.J.C.H., 1989, Learning from delayed rewards. Ph. D. thesis, Cambridge
University.
WATKINS, C.J.C.H. and DAYAN, P., 1992, Q-learning. Machine Learning, 8, 279-292.
241
WOOLDRIDGE, M., 2000, Reasoning about Rational Agents, MIT press, London, England.
YUAN, W. and TAN, K.H., 2007, An evacuation model using cellular automata. Physica A,
384, 549-566.
ZHAO, X., 2006, A penalty Function-based Dynamic Hybrid Shop Floor Control System.
Unpublished doctoral dissertation, University of Arizona, Tucson, AZ.
ZHAO, X., and SON, Y., 2008, BDI-based Human Decision-Making Model in Automated
Manufacturing Systems. International Journal of Modeling and Simulation, 28(3), 347356.

Download Report

INTEGRATED HUMAN DECISION BEHAVIOR MODELING UNDER

Paperzz.com

Your Paperzz