Petri-Nets for Video Event Understanding Gal Lavee and Ehud Rivlin Technion – Israel Institute of Technology Motivation • • Huge volume of ‘interesting’ video data to process Growing visual surveillance demands – Control of the security sensitive areas like stores, airports, parking lots, banks and other public places • Its an Important Problem!! Video database analysis demands – Offline surveillance data analysis (e.g. abnormal behaviour detection, specific event querying) • Well-understood supporting work – – – – – – Feature extraction Object detection Object tracking Object recognition “Low-level” event recognition Event domain knowledge specification using ontology languages Video Events Activities- Bobick Composite Events- Bremond Known Formalisms • • • • • • • • • • FSMs( Sequence) Bayesian Networks (Uncertainty, Factorize State Space) HMMs (Sequence , Uncertainty) CRFS ( Sequence, Uncertainty, relax Independence Assumptions) Grammars ( Hierarchy, Partial Ordering) What About? Long Term Temporal Dependencies? Concurrency? Temporal, spatial, logical relations? “Incomplete” Events ? What About Semantics? Petri Net Solution for Event Understanding • • • • Petri-Net formalism (defined shortly) Represents the dynamic evolution of the video sequence Encodes semantic knowledge of the domain Formalism naturally captures inherent properties of video events • – Logical, Temporal and Spatial composition – Concurrency and Partial Ordering – Long-term temporal dependence Recursive Reuse of Fragments (Hierarchy) • • • Recognizes events and….. Snapshot of the state of the system at any time before or after an event has occurred How far away is event of interest? Petri Nets- Defined • A graphical tool for formal description of system dynamics. • What graph components does Petri net comprise? – Place nodes ()- describe possible local system states. – Transition nodes () - describe events that may modify the state. – Arcs () - specify the relation between system states and events. – Tokens - markers that reside in place nodes and are used to specify the PN state Presource_idle Shared Resource Example: Presource_busy Tuser_request Puser_active Prequesting Tstart Tend Puser_access Petri Nets ctd. Petri Nets ctd. • • • The instant location of the tokens (called marking) defines the current state of the model Enabling Rule – Enabled if all input places have tokens. – Additional requirements may be added (conditional transitions) When transition fires: – Tokens from input places deleted – Tokens placed in output places Nothing to do with Petri Dish Petri Net Properties • Petri Net formalism is useful for modeling • Logical Relations Petri Net Properties • Petri Net formalism is useful for modeling • Temporal Relations Petri Net Properties • Petri Net formalism is useful for modeling • Concurrency and Partial Ordering Reachability set and Reachability graph • Given a PN model and an initial marking – Compute Reachabillty Set (set of reachable markings) – Reachabillity graph visualizes the path taken to each particular state Reachability Graph: Reachability Set: Marking Puser_active M0 M1 1 Prequesting 1 M1 Puser_access 1 Presource_busy 1 Presource_idle M0 M2 1 1 M2 Petri Net Extensions • Priority Transitions – Each transition is associated with a natural number – Allows resolution of conflicts (e.g. multiple transitions which share an input) • Timed Transitions – – – – Extends priority concept Associates a real interval of time with each transition Must elapse between enabling and firing Model real world phenomenon Petri Net Extensions • Stochastic Petri Nets – Stochastic timed transition delays (discrete-state stochastic process) – Exponential distribution for transition delays • Negative exponential probability density function (PDF) Dn 1 e tn / n • Generalized Stochastic Petri Net is PN where: – Immediate transitions coexist with stochastic timed transitions PN Video Event Models • • PNs in the literature have not been built in an agreed upon fashion However two distinct classes of building PN event models have emerged Plan Petri-Nets Object Petri-Nets PN Video Event Models Object PN Plan PN Tokens Objects Plan Progress Places Object States Plan States Transitions Object State Change (Events) Plan Advancement Enabling Rules Conditioned on Object Properties Conditioned on Scene Properties Event Firing of Transition (not all events are interesting) End of Plan (firing of “sink” transitions) One PN for multiple objects/events One PN per event Plan PNs • Plan Petri-Nets – – – – Castel (1996) Natural extension of PN plans in other domains In this work a number of “plans” are kept track of At each observation (knowledge received from the “numerical layer”) the plans in progress are checked to see if they are consistent with the observation. Parking Lot Example Plan Prototypes Pedestrian Arsonist Vehicle Movement Arrival Action Vehicle Departure Explain Observation Using Plans Cars Parked Current Plans: 1. Vehicle Departure Consistent with Vehicle Departure– New plans created Not consistent with vehicle arrival , Pedestrian Movement, Arsonist action Pedestrian Appears Current Plans: 1. Vehicle Departure 2. Pedestrian Movement 3. Arsonist Action Consistent with Vehicle Departure, Pedestrian Movement, Arsonist action – New plans created Not consistent with vehicle arrival Pedestrian Disappears Current Plans: 1. Vehicle Departure 2. Pedestrian Movement (Terminated) Consistent with existing plans Vehicle Departure, Pedestrian Movement – plans maintained Pedestrian Movement reaches end of plan- Pedestrian movement event can be said to have occurred Not consistent with arsonist action – plan rejected Vehicle Starts Moving Current Plans: 1. Vehicle Departure 2. Pedestrian Movement (Terminated) Consistent with existing plan Vehicle Departure Parking Lot Example • • • • • After the observation sequence… The only consistent event is vehicle departure…. Even though it hasn’t concluded Using this system we can make assertions such as a certain event is possible/ not possible before having seen the complete event. Joined with information on the duration of the various sub-events we can conjecture on when a possible event might occur in terms of an offset from the current state Feedback loop with observation is possible Object PNs • Ghanem 2004 & 2007 • Tokens are objects • Multiple objects can be represented within same net • • Snapshot of the state of the system at any time before or after an event has occurred Feedback Loop Traffic Monitoring Domain Example Negative EventSecurity Guard has Not Returned to post after 15 minutes Video event modeling with GSPN • • • • • • Object PNs Borzin et al 2007 Capture multiple events of interest in a domain within a single net Stochastic Timed Transitions (GSPN) The parameters of these distributions may be learned from training data Also allows estimates on reachabillity of certain events (Using Marking Analysis discussed shortly) Video event modeling with GSPN • Basic representation concepts – Each token represents a detected object in a specific state – A place represents a possible state of one or more objects – Transitions represent an event or a satisfied relation • • Logical relations and Temporal Relations enforced by appropriate PN fragments Spatial relations – Topological, directional or distance – Defined by the enabling rules attached to transitions • Enabling Rules – Define conditions on tokens (objects) that must be met for the associated transition to become enabled and fire Surveillance System Synthesized Dataset (optional) Video Input Intermediate Video Processing Unit Abstracted Video Representation Video Event Recognition Unit User Interface • Behavior Modeling Unit GSPN Based Behavior Model Intermediate video processing unit – – – – – Motion detection Object detection and classification Object tracking Can be replaced with format compliant datasets Supports CAVIAR ground truth format Results User Interface Surveillance System Synthesized Dataset (optional) Video Input Intermediate Video Processing Unit Abstracted Video Representation Video Event Recognition Unit User Interface • Behavior Modeling Unit Results User Interface GSPN Based Behavior Model Behavior modeling unit – Provides a graphical interface for creating GSPN models – Allows splitting an entire graph to small fragments – Supports various templates that can be edited or extended by user. Surveillance System Synthesized Dataset (optional) Video Input Intermediate Video Processing Unit Abstracted Video Representation Video Event Recognition Unit User Interface • Behavior Modeling Unit Results User Interface GSPN Based Behavior Model Video event recognition unit – Receives the intermediate video processing results and the model – Provides textual description of the detected events Surveillance System Interface Behavior Modeling Interface Surveillance System Interface Video Event Interpretation Interface The Data • • • • • Assume good tracking, detection and recognition CAVIAR annotation format Synthetic Video Corresponds to real video Generation of similar video for training Synthetic Video Animator • Single scene editor • User interface • Scene series editor Video Event Analysis • Scenario 1: ‘Security check in a public place’ – Every visitor must pass a security check before he enters the place – The following cases are considered to be abnormal and must be reported: • A visitor enters the place without being checked • The security check is abnormally long – Example of event of interest: Video Event Analysis • GSPN model for ‘Security check in a public place’: Video Event Analysis • Interpretation results for ‘Security check in a public place’: Frame Message 1 'Object_Appeared' fired on objects: 0 20 'Object_Appeared' fired on objects :2 25 ‘Object_Appeared' fired on objects :6 56 'Visitor_Entered_the_Hall' fired on objects :2 61 'Guard_Met_Visitor' fired on objects :0, 2 61 'Visitor_Entered_the_Hall' fired on objects :6 68 'Visitor_Was_Not_Checked' fired on objects :6 71 'Security_Check_Is_Too_Long_Detected' fired on objects :0, 2 86 'Meeting_Is_Over' fired on objects :0, 2 Marking Analysis • • • • PN structure can be translated into a reachabillity graph A training set provides statistics about transitions between adjacent markings Using these the probabillity for future transitions can be calculated Example: λ4,0 λ5,0 M0 λ0,1 λ0,2 M2 M1 λ1,3 M3 λ1,4 λ3,4 λ1,5 M4 λ2,5 M5 – Where the probability to move to marking Mk from marking Mn is: n ,k N n,k Nn Video Event Analysis • Marking analysis for ‘Security check in a public place’: – – – • The marking data was collected during the training process The reachability graph was constructed upon the possible states observed during the training process Statistic information was used to predict the most probable next system state Marking graph is used for Discrete Time Markov Chain description Example: Empty 1.0 Guard_In_Hall 1.0 Guard_Waiting 1.0 One_Visitor_Appeared 1.0 One_Visitor_Stopped _Near_Guard 1.0 0.66 0.63 0.72 Visitor_Walking_ Towards_Guard Guard_Met_One_ Visitor Guard_Checked _One_Visitor 0.34 0.28 One_Visitor_Passed _Near_Guard 0.37 1.0 0.12 Guard_Met_Two _Visitors 0.78 0.22 One_Visitor_has_Evaded _the_Check 0.58 0.3 – The probability to get to ‘Guard_Checked_One_Visitor’ from ‘Visitor_Walking_Towards_Guard’ is (the red path): P 0.72 1 (0.63 0.37 0.78) 0.66 Video Event Analysis • Scenario 2: ‘Traffic junction control’ – Assume a junction without any traffic lights or traffic signs – The Law: car may enter the junction unless there is a car on its right side – Example event of interest: Video Event Analysis • GSPN model for ‘Traffic junction control’: Video Event Analysis • Interpretation results for ‘Traffic junction control’: Frame Message 0 'Car_Appeared' fired on the objects :2 1 'Car_Appeared' fired on the objects: 0 10 'Car_Appeared' fired on the objects :1 18 'Car_Entered_Z1' fired on the objects: 0 23 'Car_Entered_Z2' fired on the objects :2 34 'Car_Entered_Z1' fired on the objects :1 52 'Car_In_Z1_Breaks_the_Law' fired on the objects: 0 77 'Car_Entered_Z3' fired on the objects: 0 80 'Car_Appeared' fired on the objects :3 Building PNs from Ontologies • • • • • We have seen Petri Nets are a formalism for describing video events that enables description of a particular event domain, Other Formalisms exist (e.g., Hidden Markov Models, Grammar Models….etc.) A knowledge engineer can describe the semantic content of a particular event domain in a standard way. Such an expert also has to have expert knowledge of the modeling formalism. To bridge this gap we propose that there be a process for translating domain knowledge into an event modeling formalism. Ontology Languages • • To formalize our knowledge of an event domain we require a standard method of knowledge specification – an ontology Competing ontology specification standards for video events exist – VERL (Nevatia et al 2004) – CASEE (Hakeem et al 2004) Ontology Languages -VERL • • • Defines constructs such as Sequence, Change Allows definition of predicates and entities Captures temporal and logical relationships SINGLE-THREAD(tailgate(ent x, ent y, facility f), AND (portal-of(door, f)) Sequence( approach(y, door), unlock(y, door), open(y, door), AND(enter(y, f), near(x, y)), NOT(unlock(x, door)), enter(x, f))) Ontology Languages- CASEE • Extends the concept of case frames hierarchically [ PRED: Moves, AG: Train, D: Signals, LOC: Zone1, FAC: Towards, AFTER: [ PRED: Switches, AG: Signals, FAC: On, AFTER: [ PRED: Moves, AG: Gate, FAC: Down, AFTER: Switches, SUB: [ PRED: Stops, AG: Vehicle, LOC: Zone2, FAC: Outside, AFTER: Moves ] ] ] ] Translating the Ontology to PN • • • • Methodology for constructing PN from ontology descriptions (not automatic or optimal) Similar models for similar events Each Sub-event is represented by a PN fragment Simple Sub-Event Fragment (no Temporal Relations) • Temporal Sub-Event Fragment Translating the Ontology to PN • • Next we connect the fragments representing the various sub-events in a manner that corresponds to their relationship in the ontology event description. Logical Relations (AND) • Temporal Relations (OVERLAPS) Temporal Relations Translating the Ontology to PN • • • It may be possible to represent a temporal relation in more than one way. Having a consistent representation avoids having arbitrary model structures Nodes shared between fragments may be fused. Building a Plan PN • • • In a plan PN transitions can represent “primitive” events as given by the ontology descriptions These primitive events can be projected into our event structure fragment Example: “Safe Crossing Event” – An approaching train causes the signal to change, the gate to lower, and the approaching car to stop. – Specified using CASEE – Create temporal sub-event fragments for each of the sub-events – Connect fragments according to temporal relations Building a Plan PN CASEE Ontology Representation: [ PRED: Moves, AG: Train, D: Signals, LOC: Zone1, FAC: Towards, AFTER: [ PRED: Switches, AG: Signals, FAC: On, AFTER: [ PRED: Moves, AG: Gate, FAC: Down, AFTER: Switches, SUB: [ PRED: Stops, AG: Vehicle, LOC: Zone2, FAC: Outside, AFTER: Moves ] ] ] ] Resulting Plan PN: Simplifying the structure • • • Redundant nodes are merged to give a simpler structure The Label SE indicates each of the sub-events in the ontology specification Each transition is now associated with the appropriate “primitive” sub-event Building an Object PN • • • • • In an Object PN model tokens represent objects in the system We define PN fragments that include places for each possible state of an object These fragments are then connected to the special fragments representing the structure of the event (made up of sub-event fragments) as appropriate. Our objects of interest for the “safe crossing” event are train, car signal and gate Not known are object states (may be implicit by event specification or explicit with small extension to the ontology) Building an Object PN • • • • • The possible sates of car can be “inscene”, “stopped”, “inzone2”, “stoppedinzone2”. Each of these states are represented as a place in the car state transition fragment. Conditional transition nodes allow changes of state based on token properties. Similar fragments are constructed for train gate and signal States are connected appropriately to the event structure fragment Building an Object PN Petri Nets- Summary • • • • • • • Inference is the propagation of tokens through the PN Relies on semantic structure ‘Snapshot’ of the system ‘How far Away?’, quantified by probabilities or absolute time. Naturally capture inherent video event properties such as concurrency and partial ordering May be specified subjectively Consistent process for specification needs to be implemented What About Uncertainty?? • • • • • • • Video Events ARE uncertain PN model presented is deterministic Uncertainty resides at lower levels Higher level becomes more qualitative than quantitative Propagation Nets (Shi and Bobick 2004) assign a probability to each state transition Other work assigns a probability to tokens as they propagate through network It is not critical to assign a probability to each event – Prof. Bobick’s .7 vs .00000000000001 example • • If two events are feasible they should both be considered We have seen that multiple event explanations can exist Thank You Acknowledgements to : Michael Rudzsky Artyom Borzin Questions? Publications • • • • Surveillance Event Interpretation Using Generalized Stochastic Petri Nets. Artyom Borzin, Ehud Rivlin, and Michael Rudzsky. WIAMIS 2007, The 8th International Workshop on Image Analysis for Multimedia Interactive Services, 6-8 June 2007, Santorini, Greece. Representation and Recognition of Multiagent Interactions by Generalized Stochastic Petri Nets. Artyom Borzin, Ehud Rivlin, and Michael Rudzsky. CBMI-2007, Fifth International Workshop on Content-Based Multimedia Indexing. June 25-27, 2007 Bordeaux, France. Building Petri Nets from Video Event Ontologies. Gal Lavee, Artyom Borzin, Ehud Rivlin, and Michael Rudzsky. Advances in Visual Computing. Third International Symposium, ISVC 2007, Lake Tahoe, NV, USA, November 26-28, 2007. LNCS 4841, p. 442-451. Recognition of Human Behavior from Surveillance Video using Marking Analysis in Generalized Stochastic Petri Nets, Borzin, Artyom, MsC Thesis link References • • • Nagia M. Ghanem, Petri Net Models for Event Recognition in Surveillance Videos, PhD Thesis, 2007,link Nagia Ghanem, Daniel DeMenthon, David Doermann, Larry Davis, "Representation and Recognition of Events in Surveillance Video Using Petri Nets," cvprw, p. 112, 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7, 2004 C. Castel, L. Chaudron, and C. Tessier, "What is going on? a high level interpretation of sequences of images," in Proceedings of the workshop on conceptual descriptions from images, ECCV, 1996, pp. 13--27.
© Copyright 2026 Paperzz