Fitting Behaviors to Pedestrian Simulations

Eurographics/ ACM SIGGRAPH Symposium on Computer Animation (2009)
E. Grinspun and J. Hodgins (Editors)
Fitting Behaviors to Pedestrian Simulations
Alon Lerner1 , Eitan Fitusi1 , Yiorgos Chrysanthou2 and Daniel Cohen-Or1
1 School
of Computer Science, Tel Aviv University, Israel
Science Department, University of Cyprus
2 Computer
Abstract
In this paper we present a data-driven approach for fitting behaviors to simulated pedestrian crowds. Our method
annotates agent trajectories, generated by any crowd simulator, with action-tags. The aggregate effect of animating
the agents according to the tagged trajectories enhances the impression that the agents are interacting with one
another and with the environment. In a preprocessing stage, the stimuli which motivated a person to perform
an action, as observed in a crowd video, are encoded into examples. Using the examples, non-linear, action
specific influence functions are encoded into two-dimensional maps which evaluate, for each action, the relative
importance of a stimulus within a configuration. At run time, given an agents stimuli configuration, the importance
of each stimulus is determined and compared to the examples. Thus, the probability of performing each action is
approximated and an action-tag is chosen accordingly. We fit behaviors to pedestrian crowds, thereby enhancing
their natural appearance.
Categories and Subject Descriptors (according to ACM CCS): Computer Graphics [I.3.7]: Three-Dimensional
Graphics and Realism—Animation
1. Introduction
In recent years, the quality of computer generated crowds
has risen to a degree where they are commonly used in virtual environment applications. Hordes of fighting soldiers,
people fleeing a monster or cheering their favorite team can
be found in many films and games. However, pedestrian
crowds are not as common. Although the quality of animation, rendering and simulation has improved tremendously,
the generation of a believable group of people occupying a
city street remains a challenge. Since we are used to seeing
such crowds in our day-to-day lives any peculiar behavior
decreases the realism of the entire crowd.
Usually, crowd simulation techniques focus on generating realistic crowds at the trajectory level. They direct people along believable, collision free paths. However, people
do more than just walk. They talk to one another, they look
around, they scratch their heads or perform various other actions, such as the ones seen in Figure 1. The absence of these
mundane actions diminishes the credibility of the simulated
crowd. However, should the agents perform such actions at
inappropriate times, it may seem odd. In this work we show
how the observed conditions that motivate secondary actions
c The Eurographics Association 2009.
can be learned from captured data and assigned to simulated
agents. We fit behaviors to agents one action at a time and
animate them accordingly. The aggregate effect of the animated actions enhances the impression that the agents are
interacting with one another and with the environment.
Behavior can be defined as a person’s aggregate actions
and reactions to internal and external stimuli. The internal
stimulus emanates, among other things, from ones personality and state of mind, while the external stimuli from the
surrounding people and objects. Cognitive modeling methods can take into account both internal and external stimuli
and have been shown to perform well on individual agents.
However, these methods are quite complex and computationally intensive, therefore not suitable for fitting behaviors
to a crowd.
When looking at a real pedestrian crowd, all that we know
about the people we see is derived from our observations
during the few moments they are in our view. We may guess
their internal stimuli, however, the interactions between the
people, their movements within the environment and consistency of actions are the factors by which we judge them.
Therefore, in principle, a system based only on the observed
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
convincing trajectories without taking actions into account.
Our technique enhances the realism of crowds simulated
by these methods. Since the trajectories are simulated separately and the behaviors are fitted to them, the assigned actions are affected by the trajectories, however they do not
affect them.
Our technique is data-driven. From a video of a crowd,
example configurations of observed stimuli and the actions
motivated by them are defined. These examples form a training set according to which we estimate the likelihood of performing each action given a stimuli configuration. We store
the examples on the edges of a graph, in which nodes represent actions and directed edges transitions between actions.
At run time, the stimuli configuration of a simulated agent
is compared to example configurations from the training set,
thus approximating the probability of performing an action.
The similarity between the configurations is computed and
validated by using stimuli and validity-maps. These maps,
along with the global distribution of examples, capture the
essence of crowd behavior. Agents are individually assigned
action-tags, however the fitted behaviors admit both local
and global characteristics. Therefore, each agent seems to
act naturally, while at the same time, the crowd as a whole
presents consistent pedestrian behaviors.
During a simulation each agent is asked three questions;
When should it perform an action? Which action should it
perform? and how long should the action last? Our method
answers these questions based on the similarity between the
stimuli surrounding the agent and stimuli which motivated
an action in the real world.
We employ our method to annotate the trajectories of
agents from various sources: real-data, an example-based
simulation, a rule-based simulation and a flocking simulation. In all cases the crowds behave in a believable manner,
thereby enhancing their natural appearance.
Figure 1: Assigning behaviors to simulated crowds enhances the impression that the agents are interacting with
each other and the environment.
stimuli of an agent should be able to recreate a seemingly
realistic crowd. Although reactive rule-based systems have
been proposed for adding agent reactions to environmental
stimuli, the interactions within a crowd are so rich and varied that great skill and effort are required to define and tune
a set of rules that will faithfully capture them.
Actions and trajectories have a mutual effect on each
other. Some actions are more likely to occur along certain
trajectories while some trajectories are more likely to be
taken when certain actions are performed. Despite the link
between the two, numerous methods exist for simulating
2. Related Work
Crowd simulation research is an active theme in a number
of fields, such as computer graphics, sociology and robotics.
There are several approaches for simulating crowds. Some
derive ideas from fluid mechanics while others use particles
or social forces. However, the most popular approach is the
rule-based approach.
In recent years, data-driven methods, which are common
in many fields, have been applied to crowd simulation. For
example, Metoyer and Hodgins [MH03] allow the user to
define specific examples of behaviors, while Musse et al.
[MJBJ06] define the navigational model based on observed
trajectories. In recent works, [LCHL07,LCL07], trajectories
learned from videos of crowds are stored along with some
representation of the stimuli that affected them. During a
simulation, agents match their stimuli to the ones stored in
c The Eurographics Association 2009.
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
Figure 2: The trajectories of people in a video are annotated with action-tags (letters represent actions). Based on the annotations, examples of stimuli configurations that motivated an action are defined and used to construct, for each action, stimuli
and validity-maps.
the database, and navigate accordingly. Most of the crowd
simulation literature focuses on the navigational aspect. Our
work focuses on the behavioral aspect and assumes that
some system exists to provide us with agent trajectories.
A number of works look beyond the navigational aspect.
A cognitive decision making mechanism was defined by Terzopoulos et al. [TTG94], where a range of individual traits,
such as hunger and fear, stimulate appropriate behaviors in
simulated fish. Funge et al. [FTT99] simulated agents that
perceive the environment and learn from it which is the most
suitable behavior to choose out of a predefined set. These
methods are rather inefficient and therefore not suited for
crowds. Furthermore, when striving to generate realistic behaviors, they are, like most rule-based systems, complicated
to define.
In some works, such as that of Farenc et al. [FBT99], information stored within the environment triggers agents to
perform various actions. One can regard the triggers as stimuli, however, they are prescribed and associated with specific
objects. In our approach the stimuli are not attached to specific objects. In the film industry crowds are simulated using
proprietary systems, such as Massive SoftwareTM . These are
rule-based systems which also have rules for performing actions that mimic interactions between agents.
Some works do not define rules explicitly. Sung et al.
[SGC04] represent a set of behaviors as a finite state machine
with probabilities associated with the edges. The probabilities are updated in real-time based on user defined behavior
functions. Yu el al. [YT07] define a decision network framework for generating behaviors which can account for uncertainties in behavior, and affect both trajectories and actions.
Some methods are data-driven. Subconscious upper body
actions are modelled by Ashida et al. [ALA∗ 01], based on a
statistical distribution, extracted from a video. This model
is used in conjunction with a representation of an agents
c The Eurographics Association 2009.
internal stimulus. Lee et al. [LCHL07] simulate believable
crowd behaviors by learning navigation and interaction behavior models from a video. At any given moment an agent
either navigates through the environment or interacts with
other agents, however it cannot do both. In our work the
agents can walk and talk at the same time.
Data-driven approaches are used extensively in character
animation, where characters are animated using motion captured data. To facilitate the composition of long animated sequences from a set of short clips, motion graph approaches,
such as [KGP02], are used. A motion graph is a finite state
machine where each node represents a set of animation clips
and edges smooth transitions between them. It is a convenient method for automatically animating characters, and is
often used to animate crowds. The graph representation of
transitions between actions which appears in this paper, is
a means for assigning action-tags regardless of the method
used to animate them.
3. Overview
The focus of this work is a scheme for fitting behaviors to
simulated pedestrian crowds, based on real world examples.
We assume that two separate dedicated systems exist and run
in parallel to ours, one for simulating crowd trajectories and
another for animating them.
Our method runs in two stages. In a preprocessing stage
annotated trajectories, extracted from a video of a real
crowd, are analyzed and examples of observed stimuli configurations, which motivated a person to perform an action,
are defined, Figure 2. At run-time, an agent approximates the
probability of performing different actions and stochastically
selects one. The probability of each action is approximated
according to the similarity between the agent’s stimuli configuration and the stimuli stored in the examples, Figure 3.
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
Figure 3: For each simulated agent a query stimuli configuration is defined. Using the maps, examples and a similarity function,
the probability of performing each action is approximated and an action is chosen accordingly.
Preprocessing: The input for the method is a set of trajectories extracted from a video [LCHL07,LCL07,MJBJ06].
We manually annotate the trajectories with action-tags that
represent the actions that the people performed. Objects of
certain interest are considered as stimuli as well and are annotated with the no-action tag.
4. Preprocessing
The stimuli surrounding a person at a certain time motivates an action to be performed shortly after, or more precisely, a transition between actions to occur. From the annotated trajectories we define examples of such stimuli configurations, see Figure 4.
Based on these examples, stimuli and validity-maps are
constructed for each action separately. Depending on the action, some stimuli might be more important than others. This
importance is captured by the stimuli-maps, which are density based influence functions, Figure 5. The validity-maps
impose constraints over the stimuli required for performing
an action, as observed in the input video. For instance, a
validity-map would assure the presence of a person to the
left of the subject person when performing the talk-left action, as shown in Figure 6.
The aforementioned information is encoded into an
action-graph. In the graph, nodes represent actions and directed edges observed transitions between actions. The examples are stored on the corresponding edges, and the maps
on the appropriate nodes.
Run-time: During a simulation each agent decides which
action it should perform. From its current graph node, neighboring nodes represent potential actions. The validity of an
action is determined by testing the agents stimuli against
the action’s validity-maps. For each valid action, the most
similar examples are collected using the stimuli-maps and
a similarity function. The probability of an action is determined according to the number of examples collected and
their degree of similarity to the agents stimuli. An action is
chosen accordingly and the associated action-tag is assigned,
see Figure 3.
Figure 4: An example representing the stimuli that motivated subject person p to stop performing action ’B’ and
start action ’D’. It consists of the subject person, the surrounding people (ei ) and objects, and their annotated trajectories over the past several frames.
4.1. Examples
An example E represents the configuration of observed stimuli that motivated an action, Ak , to be performed shortly after. Therefore, it accounts for a transition from the current
action to action Ak . An example is defined with reference
to a person, p, denoted as the subject person, in the video
at frame t. The transition is represented by a pair of actions
(A j , Ak ), where A j is the action p performed at frame t and
Ak the action at frame (t + ∆). Note that action Ak can be
no-action or the same action as A j . The example stores the
observed configuration of stimuli surrounding the subject
person, which consists of both internal and external stimuli.
We assume that the internal stimulus can be inferred from
p’s recent annotated trajectory, and consider the people and
objects that fall within a region surrounding p as external
stimuli. For each stimulus, internal or external, the example
stores its annotated trajectory, over the frames [t − δ,t], in
the local coordinate system of the subject person, as seen in
Figure 4.
c The Eurographics Association 2009.
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
Figure 5: Stimuli-maps are density based influence functions surrounding the subject person (red arrow). Areas of high influence are marked in red and of low influence in white.
Examples are generated from every frame along the trajectory of each person that appears in the input video. The
examples are stored such that the subject person’s local coordinate system is aligned with a global coordinate system. In
our experiments the constants ∆ and δ were set to 25 frames
(1 second) and 12 frames respectively.
4.2. Stimuli-Map
During a simulation we evaluate the similarity between a
stimuli configuration of a simulated agent and the configuration stored in an example. In order to do that we need to find
the relative importance of each stimulus in the configuration.
A stimulus’s importance depends not only on the other stimuli in the configuration, but also on the action being evaluated. This importance is captured by the stimuli-maps.
A stimuli-map acts as an influence function for a given
action. In many works, influence is merely distance-based,
where the closer the stimulus is to the subject person the
more influence it has on his actions. Stimuli-maps allow the
influence to be arbitrarily involved, as seen in Figure 5. For
example, talking to someone on your left side should be influenced more by the presence of a person on your left, than
the proximity of the people to your right.
A stimuli-map is a two dimensional regular subdivision
of the region of influence. It is constructed for action Ak according to the examples which motivate it, i.e. examples representing transitions from any action A j to action Ak . From
each one of these examples, the last frame of the stimuli configuration (the configuration at frame t) is overlayed over
the map and each map cell accumulates the stimuli that falls
within it. A Gaussian filter is applied so that each stimulus
contributes not only to its own cell, but also to the surroundings ones.
Given an axis-aligned configuration of stimuli, Q, and a
potential action, Ak , the amount of influence, wi , associated
with an external stimulus, ei , is determined by overlaying
Q on top of Ak ’s stimuli-map. The influence, wi , equals the
value stored in the cell to which the stimulus ei belongs. The
c The Eurographics Association 2009.
Figure 6: Validity-maps represent regions where a stimulus must be present prior to performing the corresponding
action. Crowds of different natures might produce different
regions for the same actions.
internal stimulus, represented by the subject person’s annotated trajectory, is assigned the value w p , which equals the
average influence value of the external stimuli. The influence
values are then normalized such that w p + ∑ wi = 1.
i
4.3. Validity-Map
Some actions require the presence of appropriate stimuli in
order to be performed. The validity-map of action Ak is used
to confirm that the stimuli required to perform the action exists in a given configuration. To keep the method general,
applicable to any type of behavior, we deduce these requirements directly from the examples. If the vast majority of examples leading up to action Ak have at least one stimulus
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
stores the actions’ stimuli and validity-maps. A transition between actions is represented by a directed edge, to which the
examples featuring the transition are assigned, see Figure 7.
During a simulation, each agent traverses the graph from one
node to another, assigning the corresponding action-tags to
its trajectory. The next step in the traversal is chosen according to an approximated probability function over the actions
represented by the neighboring graph nodes.
5. Fitting Behaviors
Figure 7: A sample from the action-graph. A node represents an action and stores the corresponding stimuli and
validity-maps. A directed edge represents a transitions between actions. It stores examples leading from the action of
its source node to the action of its destination node.
within some region, we conclude that the action can be performed only if there exists at least one stimulus within that
region.
As for the stimuli-maps, a validity-map for action Ak is
constructed by overlaying on top of the map, the examples
representing transitions from any action A j to action Ak .
However, in this construction the cells accumulate example
id’s. After the relevant examples have been processed, a circular region is grown from each cell of the map until it encapsulates most of the example id’s. In our experiments, a
95% threshold was used. The regions with the smallest radii
define the required regions and the rest are discarded. An additional constraint is imposed over the regions, which is that
the subject person cannot be included within them. The reason for this is that actions which have such requirements tend
to be directional. Since circular regions are grown, there are
instances where several overlapping circles of similar radii
cover the required region and some of them spill over to the
opposite side of the subject person, even though the required
stimuli is concentrated only on one side of it.
The validity-maps shown in Figure 6 were created from
two different input videos. The first is a video of a dense
crowd at a student fair and the second of a sparse crowd
walking in front a department store. The differences between
the required regions are a direct result of the number of times
each action was performed and the natures of the crowds.
The point-to actions appear only a limited number of times
in the sparse video, as seen in Table 1, which results in a hard
constraint that can, probably, be relaxed by using additional
input data.
4.4. Action-Graph
The action-graph is a probabilistic finite automata that provides a convenient means for fitting action-tags to simulated
agents. In the graph an action is represented by a node which
5.1. Run-Time
At run-time, our system runs in parallel to a crowd simulator,
whose output trajectories are redirected as input for our system. For each simulated agent that requires a new action-tag,
we find its stimuli configuration, Q, and current action, A j ,
in the same manner as for an example in the video. Our goal
is to estimate the likelihood of a real person to perform each
potential action Ak given the configuration Q. In terms of the
action-graph, Ak is a potential action if a directed edge exists between the node of action A j and that of action Ak and
Q passes Ak ’s validity test. To pass the test, Q is checked
against Ak ’s validity-map. If it does not have the necessary
stimuli for performing the action then the action receives a
zero probability. If Q passes the validity test, then the probability depends on the similarity between Q and the example
configurations representing this transition. These examples,
which are stored on the directed edge between A j ’s node and
Ak ’s node, are searched for the ones most similar to Q, and
their similarity values are summed up. The similarity values
for the different actions are normalized and thus the probability of choosing an edge and performing the corresponding
action is approximated.
An agent performs the same action until it traverses to a
different node in the graph. Since the environment is constantly changing there is a need to periodically validate the
assigned actions. An action is allowed to continue as long
as it passes the validation test and an example similar to
the agents’ current stimuli configuration exists on the self
referencing edge of the action’s graph node. The reasoning
behind this is that although the configuration changes over
time, a real person that performed the action under similar
circumstances can still be found in the input video. Note that
there is a natural minimal length for an action, defined by the
shortest animation cycle for it.
5.2. Similarity Function
The similarity function, Sim(Q, E), quantifies the similarity
between two stimuli configurations. A query configuration,
Q, originating from a simulated agent and an example configuration, E, leading to action Ak . Generally, Q will not
match exactly any one of the examples in the training set,
there are always going to be unmatched or displaced stimuli.
Therefore, the similarity between Q and E is a weighted sum
c The Eurographics Association 2009.
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
of the similarities that do exist. Each stimulus qi ∈ Q is assigned a weight, wi , according to the stimuli-map of action
Ak . The stimulus qi is matched to the stimulus e j ∈ E most
similar to it according to the similarity function S(qi , e j ).
Sim(Q, E) = w p S(q p , e p ) +
∑
qi ∈Q
wi max S(qi , e j )
e j ∈E
where q p and e p are the subject people of the configurations, their positions and actions over the previous δ frames
representing the internal stimuli, and w p is the weight assigned to q p as described in section 4.2.
The function S(qi , e j ) computes the similarity between
two stimuli by taking into account the difference in position,
dP(qi , e j ), and action, dB(qi , e j ), along their trajectories.
S(qi , e j ) = ∑ ct St (qi , e j )
to POI’s, otherwise unwanted behaviors, such as people talking to inanimate objects, might occur. To accommodate the
separability between people and POI’s, for each action two
stimuli and validity-maps are defined. One for the people
and another for the POI’s. During the matching process, the
appropriate map is used depending on the type of stimulus.
Note that the actions motivated by a POI are not limited to
looking at it.
6. Results and Analysis
The method was implemented in C# and the results measured on an AMD Athlon 64 X2 5200+ Dual Core Processor
with 2 GB of RAM. The simulated crowds that appear in
the accompanying video were animated automatically using
a motion-graph.
t
St (qi , e j ) = 1 − αdP(qi , e j ) − βdB(qi , e j )
where ct is the relative weight of time t along the trajectory and ∑ ct = 1.
t
α and β are predefined constant weights whose sum
equals 1 and in our experiments were equal to 32 and 13 respectively.
The difference between actions, dB(qi , e j ), is the topological distance in the graph between the action-tags of qi and
e j at time t, divided by the maximal topological distance in
the graph.
The difference between positions, dP(qi , e j ), is computed
using the Euclidian distance, dist(qi , e j ), between the positions of the stimuli at time t. If the distance is over a user defined upper bound, rmax , then the difference is 1. If it is under
the lower bound, rmin , then the difference is 0. Anywhere in
between it equals the squared ratio between the difference of
dist(qi , e j ) and rmin and the difference of rmax and rmin .
dP(qi , e j ) =


 

dist(qi , e j ) < rmin
0
dist(qi ,e j )−rmin
rmax −rmin
1
2
rmin < dist(qi , e j ) < rmax
rmax < dist(qi , e j )
The rmin lower bound is proportional to the distance of qi
to the subject person qs at time t.
5.3. Points of Influence
A person interacts with the people and objects in his surrounding environment. In the input video we mark objects
that attract peoples attention, such as a dummy in a shop
window or a garbage can, as Points of Influence, or POI’s
for short. A POI is an external stimulus whose position is
considered as a trajectory annotated with a no-action tag.
During the matching process people should not be matched
c The Eurographics Association 2009.
6.1. Data Preparation
The results presented here are based only on one of the analyzed videos. The specific video was five minutes long and
captured unaware pedestrians walking in front of a department store. The maximal number of pedestrians per frame
is 18, with an average of 5-6 people per frame. They either walk on their own or in small groups of 2-4 people. We
employed a user friendly interface in order to rapidly annotate their trajectories. Additionally, four dummies in a shop
window, which attracted people’s attention, were marked as
points of influence. For defining the examples, an oval region of influence contained in a rectangle of size 200 × 400
pixels was used. This translated roughly into a 3.5 by 7 meter
region surrounding the subject person. Twelve actions were
used, the eleven presented in Table 1 and an additional action
representing no action. The method can easily accommodate
a wider selection of actions with no significant degradation
in performance. The number of annotated actions in the input video is 253, which results in 49,269 examples.
An action-graph of 12 nodes and 63 edges was constructed. For each node a stimuli-map was created. Validitymaps were created for six of the twelve actions, which were
found to have distinct validity regions, see Figure 6 (bottom).
All maps have a resolution of 200 × 400 pixels. A square
9 × 9 gaussian filter was used in their creation. The memory
required to store the complete data-structure is about 80 MB.
Generally speaking, the number of potential stimuli configurations motivating the performance of an action is immense and therefore the amount of input data used has a
direct influence on the quality of the probability approximation. However, even a short video of a typical crowd
shows enough variety of actions and configurations, so that
the overall behavior seen in the video can be captured and
fitted to a simulated crowd.
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
6.2. Run-time Performance
Here we measure the cost of fitting behaviors to simulated agents. The run-time performance of the method depends mostly on the number of examples tested for each
agent. To improve the performance of the method and reduce
the memory requirements we performed two optimizations.
First, the number of examples was reduced by clustering
similar examples on each edge. For each cluster only a single
representative is kept and is assigned a weight value equal to
the size of the cluster. Clustering was done using the similarity function defined in 5.2. The second optimization involves
the no-action node. Over two thirds of the examples lead to
this node. Even after clustering, the number of remaining
representatives is large. We found that for most query stimuli configurations a fairly similar one exists in the no-action
example set. We make the assumption that no-action can always be performed. Therefore, we create a single cluster for
all the examples leading to the no-action node and give them
a single constant weight.
The main expense of the method lies in the comparison between configurations. The number of comparisons depends on the number of examples and the cost of a single
comparison depends on the density of the stimuli (people
per square meter). To test the scalability of the method we
ran several tests.
We assigned behaviors to the same 3000-frame long (2
minutes) simulation three times. The number of examples
according to which the action-graph was constructed varied between roughly 5000, 25000 and 50000 examples. The
number of clusters created was 1855, 7439 and 13035 respectively. The number of clusters increases in a slower rate
than the number of examples since new examples can be
added to existing clusters. The time required to assign the
behaviors was 35 sec, 65 sec and 78 sec respectively. Obviously, there is an increase, although sub-linear, in the required computational time.
Next, we varied the density of the crowd in a small region.
We constructed a single action-graph and used it to assign
behaviors to four 3000-frame long (2 minutes) simulations.
The simulations varied in the number of people, and had 6,
11, 20 and over 40 simulated people. The required computational times were 18 sec, 35 sec, 78 sec and 226 sec. A
clear dependence between the computational cost and the
density can be seen. However, this dependence has an upper bound as there is a limit to the number of people that fit
in a square meter. In our opinion, the simulation which has
over 40 people in a small region is close to the upper limit
for a pedestrian crowd. If real-time performance is required
for a dense crowd, then an agent’s region of influence can be
shrunk. This will reduce the number of influencing stimuli
and the per example comparison cost.
Figure 8: Different types of simulated crowds. The Rulebased crowd contains only individuals, the flock moves as
a group and the Example-based crowd is a mix of both.
Even though the same set of examples and action-graph were
used, the fitted behaviors match the nature of the simulated
agents.
6.3. Evaluation of Fitted Behaviors
To show the generality of the method we present results for
four different types of crowd trajectories:
•
•
•
•
Real captured data.
A flocking simulation [Rey87], Figure 8.
A rule-based simulation, Figure 8.
An example-based simulation [LCL07], Figure 8.
Two different experiments were ran on real-data. We
checked the frequency of the fitted actions and their average length, see Table 1. The distribution of actions produced
by the action-graph method (Column 3) corresponds quite
closely to the real-data (Column 2). Obviously, our method
is stochastic in nature, and therefore, cannot be expected
to reproduce the exact same frequencies. A random assignment of actions, based on their frequencies in the input data,
yields, as expected, a distribution similar to the input data
(Column 4). However, there are several significant problems
with random selection. First, the assigned actions do not look
natural. People talking to thin air are common in these simulations. One might argue that these behaviors can be easily eliminated using a simple set of rules. Instead of manc The Eurographics Association 2009.
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
Action Type
A - Talk right
B - Talk left
C - Look right
D - Look left
E - Look down
F - Look back
G - Point left
H - Point right
I - Talk on cellular
J - Comb hair
K - Look at watch
Real Data
%
avg len
18.4
57
17.4
61
21.3
67
14.0
54
0.7
38
4.5
50
0.2
40
0.7
38
14.0
212
7.0
78
1.8
43
Action Graph
%
avg len
11.7
65
9.3
64
27.5
58
12.4
71
2.0
32
5.3
49
0.0
21
0.6
18
13.4
182
9.7
73
8.1
44
Random
%
avg len
17.5
18
15.5
19
20.8
47
11.3
17
0.8
20
3.2
19
0.1
12
0.5
13
21.6
35
6.9
27
1.7
16
Random + VM
%
avg len
10.6
19
11.2
21
28.9
45
12.8
19
1.3
12
4.5
16
0.0
40
0.2
18
21.6
32
6.9
16
2.1
16
Table 1: A comparison, against the input data, of the frequency and length of the behaviors selected by three different methods:
our method, random selection and random selection filtered with the validaty-maps.
ually defined rules, we used the validity-maps and filtered
out these actions (Column 5). Although this resolved one
problem, it did not resolve un-natural behaviors caused by
the length of the assigned actions. The average length of
an action produced by the action-graph is similar to the average length in the real-data. The same cannot be said for
both random assignments, where very short actions are frequently found. Again, one might argue that this can be easily
resolved by assigning a fixed or variable length to the chosen
action. However, in our method there is no need to determine
the length or the frequency of the actions in advance. Rather,
they are determined according to the stimuli surrounding the
agent during the simulation and their similarity to stimuli
which motivated the same actions in the real world.
To emphasize the last point, we applied our method using the same set of input examples, onto three very different
situations: a flocking, a rule-based and an example-based
simulation. The flock consists of a large group of agents
walking together. The rule-based simulation mostly generates individual trajectories, while the example-based simulation is a mix of both, as can be seen in the accompanying
video. The experiment showed that indeed our method accounts for the specific circumstances of each character. The
resulting fitted behaviors reflect the different natures of the
simulations. For instance, talking accounts for 52.8% of the
actions fitted to the flock, for 20.4% of the example-based
actions and only 3.5% of the rule-based ones. The average
length of a talking action is also significantly different. For
the flock and example-based simulations the average length
is approximately 64 frames while for the rule-based simulation it is only 31 frames long. Looking in any direction,
is an action that is as likely to be performed when walking
alone, as when one is part of a group. However, talking on
the phone, looking down and looking at the watch are actions that were performed more by individuals in our input
data. These actions account for 23.6% of the total actions in
c The Eurographics Association 2009.
the rule-based simulation, while for only 12.3% and 10.4%
of the example-based and flock actions. A random assignment has a specific frequency of actions assigned to it. A
rule-based assignment of behaviors would have to be modified and tweaked to fit the nature of the crowd. On the other
hand, our method is more general and fits behaviors based on
the surrounding stimuli and their similarity to the examples.
Our system provides the user with control over the frequency of behavior in two ways: (a) Scaling the cluster
weight of the no-action node, which affects the overall frequency of actions. By scaling the weight, either up or down,
the user changes the probability of performing no-action,
therefore, increasing or decreasing the frequency of the other
actions, see Table 2. (b) Scaling the weight of an arbitrary
node in the graph, which affects the frequency of a specific action. These controls can be applied globally to all the
agents or just to specific ones. For example, if some individual is known to frequently speak on the cellular phone,
then we can scale the corresponding weight for this specific
agent. Note that this control does not violate the validity of
the action selection algorithm. By scaling up the weight of
a certain action we reduce the importance of the stimuli. So,
conceivably, an improper behavior might be selected. However, the presence of the validity-map assures that hard constraints are always satisfied.
7. Conclusions
In this paper we presented a data-driven method for fitting
behaviors to a pedestrian crowd thus increasing its realism.
The example-based nature of the approach promotes behaviors that are closer to those observed in a real crowd. The
method runs in real-time for crowds up to several dozen
agents in size. It can be used in real-time applications, such
as games, to enhance the ambient crowds, or in off-line productions to lighten the load of the animator. Additionally, the
output of the fitted behaviors can be redirected and used as
A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations
Action Type
Talk right
Talk left
Look right
Look left
Look down
Look back
Point left
Point right
Talk on cellular
Comb hair
Look at watch
Total
Weight of the no-action node
0
1
10
40
100
60
46
68
38
12
51
60
53
33
9
121 122
83
31
21
69
97
77
43
19
11
16
25
9
7
69
82
59
30
9
0
0
0
0
0
7
8
5
2
3
156
70
51
14
17
80
72
59
26
12
18
52
50
19
15
642 625 530 245 124
Table 2: The user can control the frequency of the actions
by changing the weight of the no-action node. As the weight
decreases, so does the probability of selecting to do no action and thus other actions are chosen more often.
additional input for the crowd simulator, thus allowing it to
generate trajectories that consider the actions that the people
perform.
The system has its limitations. The circles used for defining the validity regions, can cover a larger area than required.
Therefore, in rare cases a person standing on the edge of a
region might cause a less than natural behavior. However,
this and the restriction which is applied during the regions
construction can be alleviated by using a different clustering technique. Another limitation, stemming mainly from
the limited animation clips that we had at our disposal, is
that actions are not directed at specific targets. For example, while talking, there are occasions when the person being talked to leaves and another passer by takes his place.
The action is valid but unrealistic. This could possibly be
solved by having targeted animations.
environment dedicated to the simulation of virtual humans in urban context. Computer Graphics Forum 18, 3 (Sept. 1999), 309–
318.
[FTT99] F UNGE J., T U X., T ERZOPOULOS D.: Cognitive modeling: Knowledge, reasoning and planning for intelligent characters. In Siggraph 1999, Computer Graphics Proceedings (1999),
pp. 29–38.
[KGP02] KOVAR L., G LEICHER M., P IGHIN F.: Motion graphs.
Proceedings of the 29th annual conference on Computer graphics and interactive techniques (2002), 473–482.
[LCHL07] L EE K., C HOI M., H ONG Q., L EE J.: Group behavior from video: a data-driven approach to crowd simulation. Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium
on Computer animation (2007), 109–118.
[LCL07] L ERNER A., C HRYSANTHOU Y., L ISCHINSKI D.:
Crowds by Example. Computer Graphics Forum 26, 3 (2007),
655–664.
[MH03] M ETOYER R. A., H ODGINS J. K.: Reactive pedestrian
path following from examples. In Proceedings of the 16th International Conference on Computer Animation and Social Agents
(2003).
[MJBJ06] M USSE S. R., J UNG C. R., B RAUN A., J UNIOR J. J.:
Simulating the motion of virtual agents based on examples. In
ACM/EG Symposium on Computer Animation, Short Papers (Vienna, Austria, 2006).
[Rey87] R EYNOLDS C. W.: Flocks, herds, and schools: A distributed behavioral model. Computer Graphics 21, 4 (1987), 25–
34.
[SGC04] S UNG M., G LEICHER M., C HENNEY S.: Scalable behaviors for crowd simulation. Comput. Graph. Forum 23, 3
(2004), 519–528.
[TTG94] T ERZOPOULOS D., T U X., G RZESZCZUK R.: Artificial fishes: autonomous locomotion, perception, behavior, and
learning in a simulated physical world. Artif. Life 1, 4 (1994),
327–351.
[YT07] Y U Q., T ERZOPOULOS D.: A decision network framework for the behavioral animation of virtual humans. Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on
Computer animation (2007), 119–128.
Conceptually, an important contribution of this work is the
introduction of the stimuli and validity-maps. They provide
a detailed, non-linear method for assessing the importance
of each feature when evaluating a complex situation. This
information is derived directly from the example data, avoiding the need to define weights for all features/situations. We
believe that by studying stimuli and validity-maps one can
extract information regarding interactions among people in
a crowd that can be used to enhance rule-based systems or
for quantitative behavior analysis of people in a crowd.
References
[ALA∗ 01] A SHIDA K., L EE S., A LLBECK J., S UN H., BADLER
N., M ETAXAS D.: Pedestrians: creating agent behaviors through
statistical analysisof observation data. The Fourteenth Conference on Computer Animation. Proceedings (2001), 84–92.
[FBT99]
FARENC N., B OULIC R., T HALMANN D.: An informed
c The Eurographics Association 2009.