poster - University of Notre Dame

P-SAM:
A Post-Simulation Analysis Module for Agent-Based Models
Title
S. M. Niaz Arifin1, Ryan C. Kennedy1, Kelly E. Lane2, Agustín Fuentes3, Hope Hollocher2, Gregory R. Madey1
Researcher
Presenter
Name
1
Department/ of
Computer
Science and Engineering, 2Department of Biological Sciences, 3Department of Anthropology
Institution
/
Organization
/
Company
Name
University of Notre Dame
Agent-based models (ABMs) can produce large volumes
of textual output, which often contain inherent logical
structures that can be naturally expressed in terms of
abstract mathematical notions such as graphs, relations
etc. It is crucial to effectively analyze this voluminous
textual output, and to produce the desired visualization.
Analysis and visualization also play important roles in
verification and validation (V&V) of ABMs. P-SAM (PostSimulation Analysis Module) is designed to analyze and
visualize the post-simulation output for ABMs, with special
emphasis on biological simulation models.
As a case study, P-SAM is applied to a biological
simulation model named LiNK1. LiNK analyzes the spread
of pathogens among macaque monkeys in the Indonesian
island of Bali. Results indicate the importance of using PSAM to perform V&V of LiNK by allowing internal validity
checking and tracing the model entities.
Analysis and Visualization
Infection Statistics
•  Allows interactive probing: user can interact
•  An infection event occurs when a macaque transmits
the pathogen to another macaque inside a temple,
allowing self-transmission.
•  Shows all initially infected macaques, macaques that did
not infect any other, all macaques that took part in infection
events with details (timestep, infected macaque and the
temple) of each infection event
Birth and Death Statistics
Summary Statistics
•  In a birth event, a mother gives birth to an infant
•  A death event records the time, place and cause of
deaths (aging, dispersal and pathogen)
•  Lists all birth events with the corresponding timesteps,
and all death events with the corresponding timesteps,
causes of deaths and the location temple
•  Input/Output Statistics: original parameters from the LiNK
file (acquired immunity, virulence, infectivity etc.)
•  Line count and Event Statistics: counts for initially
infected macaques, infections, roaming infections, the total
number of infected macaques etc.
•  Temple Statistics: number of unique infections occurring
at each temple
•  Runtime and Memory Statistics: memory and runtime
consumed by P-SAM
Design
LiNK
LiNK Output File
Analysis &
Serialization
Pathogen Transmission Graphs
Writer
Infection Data
(DOT File)
Roaming Infection Data
(DOT File)
Summary Data
(Text File)
Reader
Visualization
Graphical User Interface
•  The core P-SAM architecture consists of two Perl
programs: the writer and the reader
•  The writer analyzes and serializes the simulation output;
it takes the LiNK file as its input, serializes it into three
files: Infection Data, Roaming Infection Data, and
Summary Data; it uses the DOT graph-description
language for hierarchical drawings of directed graphs
•  The reader builds the interactive visualization structures;
it reads in the serialized DOT files, and projects the
information into the GUI
We utilize the following Perl modules2:
Roaming Infection Statistics
•  Allows interactive probing
•  A roaming infection event occurs when a macaque
transmits pathogen to another macaque outside a temple
•  Each event is accompanied by location information:
the latitude, longitude, and the landscape in which the
infection took place, such as City, Forest, Rice Field,
River, Road, and Coast
•  Visually tracks the pathogen transmission record
•  Particularly helpful in V&V
•  Nodes represent macaques, edges represent events
•  Node 27.2969.0: a female macaque with temple 27 as its
natal temple and 2969 as its id
•  Events are listed with the timestep and location where
the infection occurred
•  Macaque 27.2969.0 infected macaque 27.2775.0 at
timestep 1, in temple 27, and so on
•  Autoinfection: macaques 27.2863.0 and 27.2805.1
More information, including some results about LiNK & PSAM can be found in [3].
Future Work
•  Performance: We are working in collaboration with CRC4
to improve P-SAM runtime. Short term plans include
preprocessing of input, code profiling, and code
optimization
•  Generalization: We envision P-SAM to be useful to other
types of biological ABMs that produce large volumes of
textual output, which may include different types of agents
(e.g. humans, mosquitoes, monkeys etc.)
References
1.  Kennedy, R.C., et al., “A GIS Aware Agent-Based Model
of Pathogen Transmission,” International Journal of
Intelligent Control and Systems, 14(1): 51-61, March 2009.
2.  CPAN: http://www.cpan.org
3.  LiNK: http://www.nd.edu/~macaque/
Acknowledgements
4.  Paul Brenner, CRC
!"#$%&'()$"%
!)#*'+"%',%-+"%
!"#!$%&
!'&($$)**&+,)&!"&%-./(/01&2&.3-%4&+,)&567&
5/(8,&
!'&$/)(+)&9/(8,&4(+(&*+/3$+3/)*1&(:4&3*)&+,)&
$'//)*8':4-:9&9/(8,&(%9'/-+,;*&
5/(8,<-=&
!'&>-*3(%-=)&+,)&9/(8,*&
P-SAM: A Post-Simulation Analysis Module
for Agent-Based Models
S. M. Niaz Arifin∗, Ryan C. Kennedy†, and Gregory R. Madey‡
Department of Computer Science and Engineering, University of Notre Dame
Abstract
Agent-based models (ABMs) can produce large volumes of textual output, potentially in the range of hundreds
of gigabytes. In most cases, these output contain inherent logical structures that can be naturally expressed in
terms of abstract mathematical notions such as graphs, relations etc. It is crucial to be able to effectively analyze this
voluminous textual output, and to produce the desired visualization with ease. Appropriate analysis and visualization
also play important roles in verification & validation (V&V) of ABMs.
We have developed a software module, called P-SAM (Post-Simulation Analysis Module), to analyze and visualize
the post-simulation output for ABMs, with special emphasis on biological simulation models. P-SAM differs from
conventional statistical software tools by emphasizing the visualization part that arises from the interaction between
abstract entities present in the textual output, with the goal to automate post-simulation analysis tasks for ABMs.
P-SAM, though still in its current embryonic form, has been designed with the goal to handle large data files
distributed over a high-performance network. To achieve this, it must be suited to take full advantages of the network’s
computing system, data storage system, data repositories, and visualization environments.
As a case study, this poster describes the application of P-SAM to a biological simulation model named ‘LiNK’ that
analyzes the spread of pathogens amongst long-tailed macaque monkeys in the Indonesian island of Bali. Reported
results indicate the importance of using P-SAM to perform V&V of the LiNK model.1
The core P-SAM architecture consists of two programs (written in Perl) called the writer and the reader. The
writer takes the LiNK file as its input and serializes it (after analysis) into separate files, some of which are written in
the DOT format, which allows hierarchical drawings of directed graphs. Once analysis and serialization are complete,
the reader allows visualization by building the Graphical User Interface (GUI). It ‘reads’ in the serialized DOT files,
and projects the information into the GUI.
P-SAM works on relations involving relevant entities (e.g. agents) defined by the user. The visualization process
enables the user to visually analyze Infection Statistics, Roaming Infection Statistics, Birth and Death Statistics,
Pathogen Transmission Graphs, and Summary Statistics.
P-SAM still faces the challenges of efficiently processing gigabytes of data and minimizing the response time for
large data sets run over the campus network. In collaboration with the Center for Research Computing (CRC) at
the University of Notre Dame, we plan the following future improvements:
Preprocessing Decomposing the large LiNK file into smaller files according to similar events, then to sort and
analyze
Profiling Using profilers (Devel::Profile and Devel::NYTProf) to measure the frequency and duration of function
calls and to collect performance data
Code Optimization Applying certain optimization techniques after the hotspots have been identified by profiling
Once the above are implemented within the cyberinfrastructure setting offered by CRC, we expect P-SAM to be
able to perform more complex, multi-run analysis involving large data sets with improved runtime, and to efficiently
handle the storage, management, integration, and visualization of data produced by the LiNK model. All of these,
linked by the CRC network, would create opportunities of scholarly innovation and discoveries by the LiNK users.
This, in turn, would allow the biologists to further validate and refine their model. We also envision P-SAM to be
useful to other types of ABMs.
∗ [email protected][email protected][email protected]
1 See
http://www.nd.edu/~macaque/ for more information
1