An Interactive Framework for Retrieving Incidents in Transportation Surveillance Video Databases Xin Chen Advisor: Dr. Chengcui Zhang Department of Computer and Information Sciences, University of Alabama at Birmingham 1. Introduction and Motivation 3. Object Tracking & Trajectory Modeling 2. Related Work & Merits of the Proposed Work Goal: Learn and retrieve semantic events in video databases Vehicle Segmentation and Tracking: Related Work: such as accidents in transportation surveillance videos. Segmentation -- Simultaneous Partition and Class Parameter Estimation (SPCPE) [1] algorithm. (xcentroid, ycentroid) Three Major Learning Algorithms for Event Detection from Videos: Hidden Markov Model (HMM)[5] Belief Networks [4] Self-Organizing Map (SOM) [7] Relevance Feedback: Rui et. al [6] proposed to use RF in Content-based Image Retrieval. Main Challenges : “Semantic Gap” – A gap between high level semantic concepts and low level video features. In Information Retrieval, there’s no prior knowledge to construct the training set for learning. Relevance Feedback is well-known in Content Based Image Retrieval. We apply it to the semantic video retrieval. A mapping between spatiotemporal trajectories and the Neural Network input nodes is designed. The Neural Network model for time series forecasting is adjusted for spatiotemporal semantic events detection. The Proposed Learning Framework: Window Sliding: Extract trajectory segments by sliding window -- a way to partition time series data. Continuity of the data is kept. Window Size=6 Step Size=1 Sudden Change of the Velocity (vdiff) Minimum Distance from the nearest vehicle (mdist) Time t1 x1 Step t2 A Sample Point: α i = [1/mdisti, vdiffi, θi] x2 x3 x4 x5 x6 x7 x8 Trajectory: α = [α1, α2, …, αn] ... xn Window sliding x1 x2 x3 x4 x5 x6 x7 x8 Window sliding tk-5 ... xk-5 xk-4 xk-3 xk-2 xk-1 xk Step x1 x2 x3 x4 x5 x6 x7 ... xk-5 xk-4 xk-3 xk-2 xk-1 xk ... xn …… M1 x8 ... xk-5 xk-4 xk-3 xk-2 xk-1 xk ... Hidden Output Layer Layer Sudden Change of the Direction (θ) Based on the Neural Network for Time Series Data. Prediction Detection Relevance Feedback is incorporated. vi …... Input Layer Sampling: Sample centroids along a trajectory at the rate of 5 frames per sampling point. M2 wij …... xn xt-m ... xt-3 xt-2 xt-1 Experimental Results: Initial Iteration: Tested on two video clips; One is taken in a tunnel featuring single vehicle accidents (2504 frames, 109 trajectory sequences); Another one is taken in an intersection in Taiwan, featuring multiple vehicle accidents (592 frames, 168 sequences). Sampling rate is 5 frms per sampling point; Window size is 3. Five iterations of user relevance feedback are performed - Initial (no feedback), First, Second, Third, and Fourth. The top 20 video sequences are returned to the user. The percentage of relevant sequences (accuracy) within the top 5, 10, 15 and 20 is calculated. Compared with Weighted Relevance Feedback Method [6]. The user specifies an event of interest (e.g., traffic accidents). There is no relevance feedback. Top sequences are returned to the user by heuristic: 1 2 2 max ( vdiff q i i ) 2 i mdist i Subsequent Iterations: The user gives feedback to the retrieval results. Training data = [xt-2, xt-1, xt, fdk, opt] The learning algorithm refines and returns the retrieval results to the user. The whole process goes through several iterations until a satisfactory result is obtained. 1 0.9 0.8 0.7 0.6 Weighted_RF 0.5 Proposed Framework 0.4 Interface 0.2 0.1 0 Initial 1st 2nd 3rd 4th System Overview Retrieval Results Raw Video 0.8 Accuracy 0.6 Retrieval Results Feedbacks Refined Results Event Modeling Object Tracking 0.5 Weighted_RF 0.4 Proposed Framework 0.3 0.2 0.1 Learning and Retrieval Trajectories Models Feedback Metadata Trajectory Modeling Hidden Layers: One hidden layer with sigmoid transfer; The number of nodes equals that in the input layer. One output layer with linear transfer; There is one output node that scores the likelihood of an event in the sequence. Initial Weights: First layer – random weights Second layer – multiple linear regression weight initialization Search Algorithm: Conjugate Gradient A semantic video retrieval framework is proposed. The neural network is applied to event detection from video sequences, a special type of time series data. A mapping between spatiotemporal trajectories and network input nodes is developed. The proposed work incorporates the Relevance Feedback in interactive video retrieval. 0 Initial 1st 2nd 3rd The event models for other general events will be constructed and tested. More video data will be collected with the associated metadata for normalizing all the videos before the storage and retrieval. The framework will be extended to include query by example, query by sketches, and the customized combination of query types. References: 0.7 Initial Query fdk Future Work: 0.3 Fourth Iteration Retrieval Results of the 1st Clip (tunnel) Query xt Conclusions: Retrieval Results Accuracy Experiment Setup: xt-1 8. Conclusions and Future Work 7. Experiments Learning and Retrieval Process: # frames is the typical sequence length of an event. Input Nodes: xt = αi = [1/misti, vdiffi, θi] fdk 6. Learning and Retrieval (2) Neural Network Design: # frames Window Size: Sampling Rate xt-2 Feedback Centroids: X Data Preparation: Accident Features: q Trajectory: 5. Learning and Retrieval (1) Traffic Accidents: Y Tracked Vehicles and Their Centroids 4. Event Modeling Fitted Curve Merits of the Proposed Framework: An interactive semantic video retrieval framework is proposed. The user guides the learning and retrieval process through Relevance Feedback. The Neural Network for Time Series data is the learning algorithm. The proposed framework is especially useful in mining and retrieving data from large multimedia databases. This framework can be tailored to apply to many fields. Experimental results show the effectiveness. Advantage: The trajectory can be described by only a few coefficients. Derivatives on the curve are velocities. Tracking -- Distinguish the static objects from mobile objects in the frame. Solution: Relevance Feedback (RF) [6] The Least Square Curve Fitting method is used to model the trajectories of vehicles. A trajectory is represented by a kth degree polynomial: y = a0 + a1x + … +akxk Vehicle Segment Neural Network is Used: Mostly in forecasting trends in Time Series Data [2] Rarely in detecting spatiotemporal patterns [3] Summary: Trajectory Modeling: 4th Fourth Iteration Retrieval Results of the 2nd Clip (intersection) [1] Chen, S.-C., Shyu, M.-L., Peeta, S., and Zhang, C. 2003. LearningBased Spatio-Temporal Vehicle Tracking and Indexing for Transportation Multimedia Database Systems. IEEE Trans. on Intelligent Transportation Systems, Vol. 4, No. 3, pp. 154-167. [2] Davey, N., Hunt, S.P., Frank, R. J. 2000. Time Series Prediction and Neural Networks. Journal of Intelligent and Robotic System, Vol. 31. [3] Gao, D., Kinouchi, Y., Ito, K., and Zhao, X. 2005 Neural Networks for Event Extraction from Time Series: a Backpropagation Algorithm Approach. Future Generation Computer Systems, Vol. 21, pp.1096-1105. [4] Huang, T., Koller, D., Malik, J., Ogasawara, G., Rao, B., Russell, S., and Weber, J. 1994. Automatic Symbolic Traffic Scene Analysis Using Belief Networks. In Proceedings of National Conference on Artificial Intelligence. [5] Kamijo, S., Matsushita, Y., and Katsushi I. 2000. Traffic Monitoring and Accident Detection at Intersections. IEEE Transactions on Intelligent Transportation Systems (June 2000). Vol. 1, No. 2, pp. 108-118. [6] Rui, Y., Huang, T.S., and Mehrotra, S. 1997. Content-based Image Retrieval with Relevance Feedback in MARS. In Proceedings of the International Conf. on Image Processing, pp. 815-818. [7] Xie, D., Hu, W., Tan, T., and Peng, J. 2004 Semantic-based Traffic Video Retrieval Using Activity Pattern Analysis. In IEEE International Conference on Image Processing (ICIP).
© Copyright 2026 Paperzz