Eurographics/ ACM SIGGRAPH Symposium on Computer Animation (2009) E. Grinspun and J. Hodgins (Editors) Fitting Behaviors to Pedestrian Simulations Alon Lerner1 , Eitan Fitusi1 , Yiorgos Chrysanthou2 and Daniel Cohen-Or1 1 School of Computer Science, Tel Aviv University, Israel Science Department, University of Cyprus 2 Computer Abstract In this paper we present a data-driven approach for fitting behaviors to simulated pedestrian crowds. Our method annotates agent trajectories, generated by any crowd simulator, with action-tags. The aggregate effect of animating the agents according to the tagged trajectories enhances the impression that the agents are interacting with one another and with the environment. In a preprocessing stage, the stimuli which motivated a person to perform an action, as observed in a crowd video, are encoded into examples. Using the examples, non-linear, action specific influence functions are encoded into two-dimensional maps which evaluate, for each action, the relative importance of a stimulus within a configuration. At run time, given an agents stimuli configuration, the importance of each stimulus is determined and compared to the examples. Thus, the probability of performing each action is approximated and an action-tag is chosen accordingly. We fit behaviors to pedestrian crowds, thereby enhancing their natural appearance. Categories and Subject Descriptors (according to ACM CCS): Computer Graphics [I.3.7]: Three-Dimensional Graphics and Realism—Animation 1. Introduction In recent years, the quality of computer generated crowds has risen to a degree where they are commonly used in virtual environment applications. Hordes of fighting soldiers, people fleeing a monster or cheering their favorite team can be found in many films and games. However, pedestrian crowds are not as common. Although the quality of animation, rendering and simulation has improved tremendously, the generation of a believable group of people occupying a city street remains a challenge. Since we are used to seeing such crowds in our day-to-day lives any peculiar behavior decreases the realism of the entire crowd. Usually, crowd simulation techniques focus on generating realistic crowds at the trajectory level. They direct people along believable, collision free paths. However, people do more than just walk. They talk to one another, they look around, they scratch their heads or perform various other actions, such as the ones seen in Figure 1. The absence of these mundane actions diminishes the credibility of the simulated crowd. However, should the agents perform such actions at inappropriate times, it may seem odd. In this work we show how the observed conditions that motivate secondary actions c The Eurographics Association 2009. can be learned from captured data and assigned to simulated agents. We fit behaviors to agents one action at a time and animate them accordingly. The aggregate effect of the animated actions enhances the impression that the agents are interacting with one another and with the environment. Behavior can be defined as a person’s aggregate actions and reactions to internal and external stimuli. The internal stimulus emanates, among other things, from ones personality and state of mind, while the external stimuli from the surrounding people and objects. Cognitive modeling methods can take into account both internal and external stimuli and have been shown to perform well on individual agents. However, these methods are quite complex and computationally intensive, therefore not suitable for fitting behaviors to a crowd. When looking at a real pedestrian crowd, all that we know about the people we see is derived from our observations during the few moments they are in our view. We may guess their internal stimuli, however, the interactions between the people, their movements within the environment and consistency of actions are the factors by which we judge them. Therefore, in principle, a system based only on the observed A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations convincing trajectories without taking actions into account. Our technique enhances the realism of crowds simulated by these methods. Since the trajectories are simulated separately and the behaviors are fitted to them, the assigned actions are affected by the trajectories, however they do not affect them. Our technique is data-driven. From a video of a crowd, example configurations of observed stimuli and the actions motivated by them are defined. These examples form a training set according to which we estimate the likelihood of performing each action given a stimuli configuration. We store the examples on the edges of a graph, in which nodes represent actions and directed edges transitions between actions. At run time, the stimuli configuration of a simulated agent is compared to example configurations from the training set, thus approximating the probability of performing an action. The similarity between the configurations is computed and validated by using stimuli and validity-maps. These maps, along with the global distribution of examples, capture the essence of crowd behavior. Agents are individually assigned action-tags, however the fitted behaviors admit both local and global characteristics. Therefore, each agent seems to act naturally, while at the same time, the crowd as a whole presents consistent pedestrian behaviors. During a simulation each agent is asked three questions; When should it perform an action? Which action should it perform? and how long should the action last? Our method answers these questions based on the similarity between the stimuli surrounding the agent and stimuli which motivated an action in the real world. We employ our method to annotate the trajectories of agents from various sources: real-data, an example-based simulation, a rule-based simulation and a flocking simulation. In all cases the crowds behave in a believable manner, thereby enhancing their natural appearance. Figure 1: Assigning behaviors to simulated crowds enhances the impression that the agents are interacting with each other and the environment. stimuli of an agent should be able to recreate a seemingly realistic crowd. Although reactive rule-based systems have been proposed for adding agent reactions to environmental stimuli, the interactions within a crowd are so rich and varied that great skill and effort are required to define and tune a set of rules that will faithfully capture them. Actions and trajectories have a mutual effect on each other. Some actions are more likely to occur along certain trajectories while some trajectories are more likely to be taken when certain actions are performed. Despite the link between the two, numerous methods exist for simulating 2. Related Work Crowd simulation research is an active theme in a number of fields, such as computer graphics, sociology and robotics. There are several approaches for simulating crowds. Some derive ideas from fluid mechanics while others use particles or social forces. However, the most popular approach is the rule-based approach. In recent years, data-driven methods, which are common in many fields, have been applied to crowd simulation. For example, Metoyer and Hodgins [MH03] allow the user to define specific examples of behaviors, while Musse et al. [MJBJ06] define the navigational model based on observed trajectories. In recent works, [LCHL07,LCL07], trajectories learned from videos of crowds are stored along with some representation of the stimuli that affected them. During a simulation, agents match their stimuli to the ones stored in c The Eurographics Association 2009. A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations Figure 2: The trajectories of people in a video are annotated with action-tags (letters represent actions). Based on the annotations, examples of stimuli configurations that motivated an action are defined and used to construct, for each action, stimuli and validity-maps. the database, and navigate accordingly. Most of the crowd simulation literature focuses on the navigational aspect. Our work focuses on the behavioral aspect and assumes that some system exists to provide us with agent trajectories. A number of works look beyond the navigational aspect. A cognitive decision making mechanism was defined by Terzopoulos et al. [TTG94], where a range of individual traits, such as hunger and fear, stimulate appropriate behaviors in simulated fish. Funge et al. [FTT99] simulated agents that perceive the environment and learn from it which is the most suitable behavior to choose out of a predefined set. These methods are rather inefficient and therefore not suited for crowds. Furthermore, when striving to generate realistic behaviors, they are, like most rule-based systems, complicated to define. In some works, such as that of Farenc et al. [FBT99], information stored within the environment triggers agents to perform various actions. One can regard the triggers as stimuli, however, they are prescribed and associated with specific objects. In our approach the stimuli are not attached to specific objects. In the film industry crowds are simulated using proprietary systems, such as Massive SoftwareTM . These are rule-based systems which also have rules for performing actions that mimic interactions between agents. Some works do not define rules explicitly. Sung et al. [SGC04] represent a set of behaviors as a finite state machine with probabilities associated with the edges. The probabilities are updated in real-time based on user defined behavior functions. Yu el al. [YT07] define a decision network framework for generating behaviors which can account for uncertainties in behavior, and affect both trajectories and actions. Some methods are data-driven. Subconscious upper body actions are modelled by Ashida et al. [ALA∗ 01], based on a statistical distribution, extracted from a video. This model is used in conjunction with a representation of an agents c The Eurographics Association 2009. internal stimulus. Lee et al. [LCHL07] simulate believable crowd behaviors by learning navigation and interaction behavior models from a video. At any given moment an agent either navigates through the environment or interacts with other agents, however it cannot do both. In our work the agents can walk and talk at the same time. Data-driven approaches are used extensively in character animation, where characters are animated using motion captured data. To facilitate the composition of long animated sequences from a set of short clips, motion graph approaches, such as [KGP02], are used. A motion graph is a finite state machine where each node represents a set of animation clips and edges smooth transitions between them. It is a convenient method for automatically animating characters, and is often used to animate crowds. The graph representation of transitions between actions which appears in this paper, is a means for assigning action-tags regardless of the method used to animate them. 3. Overview The focus of this work is a scheme for fitting behaviors to simulated pedestrian crowds, based on real world examples. We assume that two separate dedicated systems exist and run in parallel to ours, one for simulating crowd trajectories and another for animating them. Our method runs in two stages. In a preprocessing stage annotated trajectories, extracted from a video of a real crowd, are analyzed and examples of observed stimuli configurations, which motivated a person to perform an action, are defined, Figure 2. At run-time, an agent approximates the probability of performing different actions and stochastically selects one. The probability of each action is approximated according to the similarity between the agent’s stimuli configuration and the stimuli stored in the examples, Figure 3. A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations Figure 3: For each simulated agent a query stimuli configuration is defined. Using the maps, examples and a similarity function, the probability of performing each action is approximated and an action is chosen accordingly. Preprocessing: The input for the method is a set of trajectories extracted from a video [LCHL07,LCL07,MJBJ06]. We manually annotate the trajectories with action-tags that represent the actions that the people performed. Objects of certain interest are considered as stimuli as well and are annotated with the no-action tag. 4. Preprocessing The stimuli surrounding a person at a certain time motivates an action to be performed shortly after, or more precisely, a transition between actions to occur. From the annotated trajectories we define examples of such stimuli configurations, see Figure 4. Based on these examples, stimuli and validity-maps are constructed for each action separately. Depending on the action, some stimuli might be more important than others. This importance is captured by the stimuli-maps, which are density based influence functions, Figure 5. The validity-maps impose constraints over the stimuli required for performing an action, as observed in the input video. For instance, a validity-map would assure the presence of a person to the left of the subject person when performing the talk-left action, as shown in Figure 6. The aforementioned information is encoded into an action-graph. In the graph, nodes represent actions and directed edges observed transitions between actions. The examples are stored on the corresponding edges, and the maps on the appropriate nodes. Run-time: During a simulation each agent decides which action it should perform. From its current graph node, neighboring nodes represent potential actions. The validity of an action is determined by testing the agents stimuli against the action’s validity-maps. For each valid action, the most similar examples are collected using the stimuli-maps and a similarity function. The probability of an action is determined according to the number of examples collected and their degree of similarity to the agents stimuli. An action is chosen accordingly and the associated action-tag is assigned, see Figure 3. Figure 4: An example representing the stimuli that motivated subject person p to stop performing action ’B’ and start action ’D’. It consists of the subject person, the surrounding people (ei ) and objects, and their annotated trajectories over the past several frames. 4.1. Examples An example E represents the configuration of observed stimuli that motivated an action, Ak , to be performed shortly after. Therefore, it accounts for a transition from the current action to action Ak . An example is defined with reference to a person, p, denoted as the subject person, in the video at frame t. The transition is represented by a pair of actions (A j , Ak ), where A j is the action p performed at frame t and Ak the action at frame (t + ∆). Note that action Ak can be no-action or the same action as A j . The example stores the observed configuration of stimuli surrounding the subject person, which consists of both internal and external stimuli. We assume that the internal stimulus can be inferred from p’s recent annotated trajectory, and consider the people and objects that fall within a region surrounding p as external stimuli. For each stimulus, internal or external, the example stores its annotated trajectory, over the frames [t − δ,t], in the local coordinate system of the subject person, as seen in Figure 4. c The Eurographics Association 2009. A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations Figure 5: Stimuli-maps are density based influence functions surrounding the subject person (red arrow). Areas of high influence are marked in red and of low influence in white. Examples are generated from every frame along the trajectory of each person that appears in the input video. The examples are stored such that the subject person’s local coordinate system is aligned with a global coordinate system. In our experiments the constants ∆ and δ were set to 25 frames (1 second) and 12 frames respectively. 4.2. Stimuli-Map During a simulation we evaluate the similarity between a stimuli configuration of a simulated agent and the configuration stored in an example. In order to do that we need to find the relative importance of each stimulus in the configuration. A stimulus’s importance depends not only on the other stimuli in the configuration, but also on the action being evaluated. This importance is captured by the stimuli-maps. A stimuli-map acts as an influence function for a given action. In many works, influence is merely distance-based, where the closer the stimulus is to the subject person the more influence it has on his actions. Stimuli-maps allow the influence to be arbitrarily involved, as seen in Figure 5. For example, talking to someone on your left side should be influenced more by the presence of a person on your left, than the proximity of the people to your right. A stimuli-map is a two dimensional regular subdivision of the region of influence. It is constructed for action Ak according to the examples which motivate it, i.e. examples representing transitions from any action A j to action Ak . From each one of these examples, the last frame of the stimuli configuration (the configuration at frame t) is overlayed over the map and each map cell accumulates the stimuli that falls within it. A Gaussian filter is applied so that each stimulus contributes not only to its own cell, but also to the surroundings ones. Given an axis-aligned configuration of stimuli, Q, and a potential action, Ak , the amount of influence, wi , associated with an external stimulus, ei , is determined by overlaying Q on top of Ak ’s stimuli-map. The influence, wi , equals the value stored in the cell to which the stimulus ei belongs. The c The Eurographics Association 2009. Figure 6: Validity-maps represent regions where a stimulus must be present prior to performing the corresponding action. Crowds of different natures might produce different regions for the same actions. internal stimulus, represented by the subject person’s annotated trajectory, is assigned the value w p , which equals the average influence value of the external stimuli. The influence values are then normalized such that w p + ∑ wi = 1. i 4.3. Validity-Map Some actions require the presence of appropriate stimuli in order to be performed. The validity-map of action Ak is used to confirm that the stimuli required to perform the action exists in a given configuration. To keep the method general, applicable to any type of behavior, we deduce these requirements directly from the examples. If the vast majority of examples leading up to action Ak have at least one stimulus A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations stores the actions’ stimuli and validity-maps. A transition between actions is represented by a directed edge, to which the examples featuring the transition are assigned, see Figure 7. During a simulation, each agent traverses the graph from one node to another, assigning the corresponding action-tags to its trajectory. The next step in the traversal is chosen according to an approximated probability function over the actions represented by the neighboring graph nodes. 5. Fitting Behaviors Figure 7: A sample from the action-graph. A node represents an action and stores the corresponding stimuli and validity-maps. A directed edge represents a transitions between actions. It stores examples leading from the action of its source node to the action of its destination node. within some region, we conclude that the action can be performed only if there exists at least one stimulus within that region. As for the stimuli-maps, a validity-map for action Ak is constructed by overlaying on top of the map, the examples representing transitions from any action A j to action Ak . However, in this construction the cells accumulate example id’s. After the relevant examples have been processed, a circular region is grown from each cell of the map until it encapsulates most of the example id’s. In our experiments, a 95% threshold was used. The regions with the smallest radii define the required regions and the rest are discarded. An additional constraint is imposed over the regions, which is that the subject person cannot be included within them. The reason for this is that actions which have such requirements tend to be directional. Since circular regions are grown, there are instances where several overlapping circles of similar radii cover the required region and some of them spill over to the opposite side of the subject person, even though the required stimuli is concentrated only on one side of it. The validity-maps shown in Figure 6 were created from two different input videos. The first is a video of a dense crowd at a student fair and the second of a sparse crowd walking in front a department store. The differences between the required regions are a direct result of the number of times each action was performed and the natures of the crowds. The point-to actions appear only a limited number of times in the sparse video, as seen in Table 1, which results in a hard constraint that can, probably, be relaxed by using additional input data. 4.4. Action-Graph The action-graph is a probabilistic finite automata that provides a convenient means for fitting action-tags to simulated agents. In the graph an action is represented by a node which 5.1. Run-Time At run-time, our system runs in parallel to a crowd simulator, whose output trajectories are redirected as input for our system. For each simulated agent that requires a new action-tag, we find its stimuli configuration, Q, and current action, A j , in the same manner as for an example in the video. Our goal is to estimate the likelihood of a real person to perform each potential action Ak given the configuration Q. In terms of the action-graph, Ak is a potential action if a directed edge exists between the node of action A j and that of action Ak and Q passes Ak ’s validity test. To pass the test, Q is checked against Ak ’s validity-map. If it does not have the necessary stimuli for performing the action then the action receives a zero probability. If Q passes the validity test, then the probability depends on the similarity between Q and the example configurations representing this transition. These examples, which are stored on the directed edge between A j ’s node and Ak ’s node, are searched for the ones most similar to Q, and their similarity values are summed up. The similarity values for the different actions are normalized and thus the probability of choosing an edge and performing the corresponding action is approximated. An agent performs the same action until it traverses to a different node in the graph. Since the environment is constantly changing there is a need to periodically validate the assigned actions. An action is allowed to continue as long as it passes the validation test and an example similar to the agents’ current stimuli configuration exists on the self referencing edge of the action’s graph node. The reasoning behind this is that although the configuration changes over time, a real person that performed the action under similar circumstances can still be found in the input video. Note that there is a natural minimal length for an action, defined by the shortest animation cycle for it. 5.2. Similarity Function The similarity function, Sim(Q, E), quantifies the similarity between two stimuli configurations. A query configuration, Q, originating from a simulated agent and an example configuration, E, leading to action Ak . Generally, Q will not match exactly any one of the examples in the training set, there are always going to be unmatched or displaced stimuli. Therefore, the similarity between Q and E is a weighted sum c The Eurographics Association 2009. A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations of the similarities that do exist. Each stimulus qi ∈ Q is assigned a weight, wi , according to the stimuli-map of action Ak . The stimulus qi is matched to the stimulus e j ∈ E most similar to it according to the similarity function S(qi , e j ). Sim(Q, E) = w p S(q p , e p ) + ∑ qi ∈Q wi max S(qi , e j ) e j ∈E where q p and e p are the subject people of the configurations, their positions and actions over the previous δ frames representing the internal stimuli, and w p is the weight assigned to q p as described in section 4.2. The function S(qi , e j ) computes the similarity between two stimuli by taking into account the difference in position, dP(qi , e j ), and action, dB(qi , e j ), along their trajectories. S(qi , e j ) = ∑ ct St (qi , e j ) to POI’s, otherwise unwanted behaviors, such as people talking to inanimate objects, might occur. To accommodate the separability between people and POI’s, for each action two stimuli and validity-maps are defined. One for the people and another for the POI’s. During the matching process, the appropriate map is used depending on the type of stimulus. Note that the actions motivated by a POI are not limited to looking at it. 6. Results and Analysis The method was implemented in C# and the results measured on an AMD Athlon 64 X2 5200+ Dual Core Processor with 2 GB of RAM. The simulated crowds that appear in the accompanying video were animated automatically using a motion-graph. t St (qi , e j ) = 1 − αdP(qi , e j ) − βdB(qi , e j ) where ct is the relative weight of time t along the trajectory and ∑ ct = 1. t α and β are predefined constant weights whose sum equals 1 and in our experiments were equal to 32 and 13 respectively. The difference between actions, dB(qi , e j ), is the topological distance in the graph between the action-tags of qi and e j at time t, divided by the maximal topological distance in the graph. The difference between positions, dP(qi , e j ), is computed using the Euclidian distance, dist(qi , e j ), between the positions of the stimuli at time t. If the distance is over a user defined upper bound, rmax , then the difference is 1. If it is under the lower bound, rmin , then the difference is 0. Anywhere in between it equals the squared ratio between the difference of dist(qi , e j ) and rmin and the difference of rmax and rmin . dP(qi , e j ) = dist(qi , e j ) < rmin 0 dist(qi ,e j )−rmin rmax −rmin 1 2 rmin < dist(qi , e j ) < rmax rmax < dist(qi , e j ) The rmin lower bound is proportional to the distance of qi to the subject person qs at time t. 5.3. Points of Influence A person interacts with the people and objects in his surrounding environment. In the input video we mark objects that attract peoples attention, such as a dummy in a shop window or a garbage can, as Points of Influence, or POI’s for short. A POI is an external stimulus whose position is considered as a trajectory annotated with a no-action tag. During the matching process people should not be matched c The Eurographics Association 2009. 6.1. Data Preparation The results presented here are based only on one of the analyzed videos. The specific video was five minutes long and captured unaware pedestrians walking in front of a department store. The maximal number of pedestrians per frame is 18, with an average of 5-6 people per frame. They either walk on their own or in small groups of 2-4 people. We employed a user friendly interface in order to rapidly annotate their trajectories. Additionally, four dummies in a shop window, which attracted people’s attention, were marked as points of influence. For defining the examples, an oval region of influence contained in a rectangle of size 200 × 400 pixels was used. This translated roughly into a 3.5 by 7 meter region surrounding the subject person. Twelve actions were used, the eleven presented in Table 1 and an additional action representing no action. The method can easily accommodate a wider selection of actions with no significant degradation in performance. The number of annotated actions in the input video is 253, which results in 49,269 examples. An action-graph of 12 nodes and 63 edges was constructed. For each node a stimuli-map was created. Validitymaps were created for six of the twelve actions, which were found to have distinct validity regions, see Figure 6 (bottom). All maps have a resolution of 200 × 400 pixels. A square 9 × 9 gaussian filter was used in their creation. The memory required to store the complete data-structure is about 80 MB. Generally speaking, the number of potential stimuli configurations motivating the performance of an action is immense and therefore the amount of input data used has a direct influence on the quality of the probability approximation. However, even a short video of a typical crowd shows enough variety of actions and configurations, so that the overall behavior seen in the video can be captured and fitted to a simulated crowd. A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations 6.2. Run-time Performance Here we measure the cost of fitting behaviors to simulated agents. The run-time performance of the method depends mostly on the number of examples tested for each agent. To improve the performance of the method and reduce the memory requirements we performed two optimizations. First, the number of examples was reduced by clustering similar examples on each edge. For each cluster only a single representative is kept and is assigned a weight value equal to the size of the cluster. Clustering was done using the similarity function defined in 5.2. The second optimization involves the no-action node. Over two thirds of the examples lead to this node. Even after clustering, the number of remaining representatives is large. We found that for most query stimuli configurations a fairly similar one exists in the no-action example set. We make the assumption that no-action can always be performed. Therefore, we create a single cluster for all the examples leading to the no-action node and give them a single constant weight. The main expense of the method lies in the comparison between configurations. The number of comparisons depends on the number of examples and the cost of a single comparison depends on the density of the stimuli (people per square meter). To test the scalability of the method we ran several tests. We assigned behaviors to the same 3000-frame long (2 minutes) simulation three times. The number of examples according to which the action-graph was constructed varied between roughly 5000, 25000 and 50000 examples. The number of clusters created was 1855, 7439 and 13035 respectively. The number of clusters increases in a slower rate than the number of examples since new examples can be added to existing clusters. The time required to assign the behaviors was 35 sec, 65 sec and 78 sec respectively. Obviously, there is an increase, although sub-linear, in the required computational time. Next, we varied the density of the crowd in a small region. We constructed a single action-graph and used it to assign behaviors to four 3000-frame long (2 minutes) simulations. The simulations varied in the number of people, and had 6, 11, 20 and over 40 simulated people. The required computational times were 18 sec, 35 sec, 78 sec and 226 sec. A clear dependence between the computational cost and the density can be seen. However, this dependence has an upper bound as there is a limit to the number of people that fit in a square meter. In our opinion, the simulation which has over 40 people in a small region is close to the upper limit for a pedestrian crowd. If real-time performance is required for a dense crowd, then an agent’s region of influence can be shrunk. This will reduce the number of influencing stimuli and the per example comparison cost. Figure 8: Different types of simulated crowds. The Rulebased crowd contains only individuals, the flock moves as a group and the Example-based crowd is a mix of both. Even though the same set of examples and action-graph were used, the fitted behaviors match the nature of the simulated agents. 6.3. Evaluation of Fitted Behaviors To show the generality of the method we present results for four different types of crowd trajectories: • • • • Real captured data. A flocking simulation [Rey87], Figure 8. A rule-based simulation, Figure 8. An example-based simulation [LCL07], Figure 8. Two different experiments were ran on real-data. We checked the frequency of the fitted actions and their average length, see Table 1. The distribution of actions produced by the action-graph method (Column 3) corresponds quite closely to the real-data (Column 2). Obviously, our method is stochastic in nature, and therefore, cannot be expected to reproduce the exact same frequencies. A random assignment of actions, based on their frequencies in the input data, yields, as expected, a distribution similar to the input data (Column 4). However, there are several significant problems with random selection. First, the assigned actions do not look natural. People talking to thin air are common in these simulations. One might argue that these behaviors can be easily eliminated using a simple set of rules. Instead of manc The Eurographics Association 2009. A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations Action Type A - Talk right B - Talk left C - Look right D - Look left E - Look down F - Look back G - Point left H - Point right I - Talk on cellular J - Comb hair K - Look at watch Real Data % avg len 18.4 57 17.4 61 21.3 67 14.0 54 0.7 38 4.5 50 0.2 40 0.7 38 14.0 212 7.0 78 1.8 43 Action Graph % avg len 11.7 65 9.3 64 27.5 58 12.4 71 2.0 32 5.3 49 0.0 21 0.6 18 13.4 182 9.7 73 8.1 44 Random % avg len 17.5 18 15.5 19 20.8 47 11.3 17 0.8 20 3.2 19 0.1 12 0.5 13 21.6 35 6.9 27 1.7 16 Random + VM % avg len 10.6 19 11.2 21 28.9 45 12.8 19 1.3 12 4.5 16 0.0 40 0.2 18 21.6 32 6.9 16 2.1 16 Table 1: A comparison, against the input data, of the frequency and length of the behaviors selected by three different methods: our method, random selection and random selection filtered with the validaty-maps. ually defined rules, we used the validity-maps and filtered out these actions (Column 5). Although this resolved one problem, it did not resolve un-natural behaviors caused by the length of the assigned actions. The average length of an action produced by the action-graph is similar to the average length in the real-data. The same cannot be said for both random assignments, where very short actions are frequently found. Again, one might argue that this can be easily resolved by assigning a fixed or variable length to the chosen action. However, in our method there is no need to determine the length or the frequency of the actions in advance. Rather, they are determined according to the stimuli surrounding the agent during the simulation and their similarity to stimuli which motivated the same actions in the real world. To emphasize the last point, we applied our method using the same set of input examples, onto three very different situations: a flocking, a rule-based and an example-based simulation. The flock consists of a large group of agents walking together. The rule-based simulation mostly generates individual trajectories, while the example-based simulation is a mix of both, as can be seen in the accompanying video. The experiment showed that indeed our method accounts for the specific circumstances of each character. The resulting fitted behaviors reflect the different natures of the simulations. For instance, talking accounts for 52.8% of the actions fitted to the flock, for 20.4% of the example-based actions and only 3.5% of the rule-based ones. The average length of a talking action is also significantly different. For the flock and example-based simulations the average length is approximately 64 frames while for the rule-based simulation it is only 31 frames long. Looking in any direction, is an action that is as likely to be performed when walking alone, as when one is part of a group. However, talking on the phone, looking down and looking at the watch are actions that were performed more by individuals in our input data. These actions account for 23.6% of the total actions in c The Eurographics Association 2009. the rule-based simulation, while for only 12.3% and 10.4% of the example-based and flock actions. A random assignment has a specific frequency of actions assigned to it. A rule-based assignment of behaviors would have to be modified and tweaked to fit the nature of the crowd. On the other hand, our method is more general and fits behaviors based on the surrounding stimuli and their similarity to the examples. Our system provides the user with control over the frequency of behavior in two ways: (a) Scaling the cluster weight of the no-action node, which affects the overall frequency of actions. By scaling the weight, either up or down, the user changes the probability of performing no-action, therefore, increasing or decreasing the frequency of the other actions, see Table 2. (b) Scaling the weight of an arbitrary node in the graph, which affects the frequency of a specific action. These controls can be applied globally to all the agents or just to specific ones. For example, if some individual is known to frequently speak on the cellular phone, then we can scale the corresponding weight for this specific agent. Note that this control does not violate the validity of the action selection algorithm. By scaling up the weight of a certain action we reduce the importance of the stimuli. So, conceivably, an improper behavior might be selected. However, the presence of the validity-map assures that hard constraints are always satisfied. 7. Conclusions In this paper we presented a data-driven method for fitting behaviors to a pedestrian crowd thus increasing its realism. The example-based nature of the approach promotes behaviors that are closer to those observed in a real crowd. The method runs in real-time for crowds up to several dozen agents in size. It can be used in real-time applications, such as games, to enhance the ambient crowds, or in off-line productions to lighten the load of the animator. Additionally, the output of the fitted behaviors can be redirected and used as A. Lerner, E. Fitusi, Y. Chrysanthou & D. Cohen-Or / Fitting Behaviors to Pedestrian Simulations Action Type Talk right Talk left Look right Look left Look down Look back Point left Point right Talk on cellular Comb hair Look at watch Total Weight of the no-action node 0 1 10 40 100 60 46 68 38 12 51 60 53 33 9 121 122 83 31 21 69 97 77 43 19 11 16 25 9 7 69 82 59 30 9 0 0 0 0 0 7 8 5 2 3 156 70 51 14 17 80 72 59 26 12 18 52 50 19 15 642 625 530 245 124 Table 2: The user can control the frequency of the actions by changing the weight of the no-action node. As the weight decreases, so does the probability of selecting to do no action and thus other actions are chosen more often. additional input for the crowd simulator, thus allowing it to generate trajectories that consider the actions that the people perform. The system has its limitations. The circles used for defining the validity regions, can cover a larger area than required. Therefore, in rare cases a person standing on the edge of a region might cause a less than natural behavior. However, this and the restriction which is applied during the regions construction can be alleviated by using a different clustering technique. Another limitation, stemming mainly from the limited animation clips that we had at our disposal, is that actions are not directed at specific targets. For example, while talking, there are occasions when the person being talked to leaves and another passer by takes his place. The action is valid but unrealistic. This could possibly be solved by having targeted animations. environment dedicated to the simulation of virtual humans in urban context. Computer Graphics Forum 18, 3 (Sept. 1999), 309– 318. [FTT99] F UNGE J., T U X., T ERZOPOULOS D.: Cognitive modeling: Knowledge, reasoning and planning for intelligent characters. In Siggraph 1999, Computer Graphics Proceedings (1999), pp. 29–38. [KGP02] KOVAR L., G LEICHER M., P IGHIN F.: Motion graphs. Proceedings of the 29th annual conference on Computer graphics and interactive techniques (2002), 473–482. [LCHL07] L EE K., C HOI M., H ONG Q., L EE J.: Group behavior from video: a data-driven approach to crowd simulation. Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on Computer animation (2007), 109–118. [LCL07] L ERNER A., C HRYSANTHOU Y., L ISCHINSKI D.: Crowds by Example. Computer Graphics Forum 26, 3 (2007), 655–664. [MH03] M ETOYER R. A., H ODGINS J. K.: Reactive pedestrian path following from examples. In Proceedings of the 16th International Conference on Computer Animation and Social Agents (2003). [MJBJ06] M USSE S. R., J UNG C. R., B RAUN A., J UNIOR J. J.: Simulating the motion of virtual agents based on examples. In ACM/EG Symposium on Computer Animation, Short Papers (Vienna, Austria, 2006). [Rey87] R EYNOLDS C. W.: Flocks, herds, and schools: A distributed behavioral model. Computer Graphics 21, 4 (1987), 25– 34. [SGC04] S UNG M., G LEICHER M., C HENNEY S.: Scalable behaviors for crowd simulation. Comput. Graph. Forum 23, 3 (2004), 519–528. [TTG94] T ERZOPOULOS D., T U X., G RZESZCZUK R.: Artificial fishes: autonomous locomotion, perception, behavior, and learning in a simulated physical world. Artif. Life 1, 4 (1994), 327–351. [YT07] Y U Q., T ERZOPOULOS D.: A decision network framework for the behavioral animation of virtual humans. Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on Computer animation (2007), 119–128. Conceptually, an important contribution of this work is the introduction of the stimuli and validity-maps. They provide a detailed, non-linear method for assessing the importance of each feature when evaluating a complex situation. This information is derived directly from the example data, avoiding the need to define weights for all features/situations. We believe that by studying stimuli and validity-maps one can extract information regarding interactions among people in a crowd that can be used to enhance rule-based systems or for quantitative behavior analysis of people in a crowd. References [ALA∗ 01] A SHIDA K., L EE S., A LLBECK J., S UN H., BADLER N., M ETAXAS D.: Pedestrians: creating agent behaviors through statistical analysisof observation data. The Fourteenth Conference on Computer Animation. Proceedings (2001), 84–92. [FBT99] FARENC N., B OULIC R., T HALMANN D.: An informed c The Eurographics Association 2009.
© Copyright 2026 Paperzz