Optimized Learning Rate for Energy Waste minimization in a Background Subtraction based Surveillance System Muhammad Umar Karim Khan Smart Sensor Architecture Lab, KAIST Daejeon, South Korea [email protected] Chong Min Kyung Smart Sensor Architecture Lab, KAIST Daejeon, South Korea [email protected] Abstract— In this paper surveillance system employing background subtraction scheme is discussed. The aim of the work is to minimize the waste energy of the overall system due to false positives. Pixels in the foreground of a motion detection system remain non-zero even if a moving object has stopped due to the settling time associated with an adaptive background subtraction scheme as Mixture of Gaussians. Temporal variance in a visually static pixel region also triggers false positives. Optimal learning rate for different parameters as threshold values, ROI size and total number of frames in the scene has been derived in this paper. I. INTRODUCTION Growing interest in recent surveillance systems is associated with the availability of cheap sensors and processors [1]. Computer vision algorithms and the video compression schemes have also evolved to meet the requirements of modern surveillance systems. Remote surveillance for safety and security has received significant attention in research recently [2], [3]. Future surveillance systems are predicted to be composed of a distributed multisensor network, based on real-time computer vision algorithms, which require minimal manual reconfiguration for different applications [2]. Visual security is generally based on a CCTV which is continuously monitored by security personnel. Although widely used even now, this scheme is prone to human inefficiencies. Intelligent surveillance systems are supposed to act timely and properly as any suspicious activity is detected. This reduces the dependency on security personnel and allows efficient monitoring of the environment [4]. Visual surveillance systems can use sensors besides camera nodes, such as microphones, to enhance the efficiency of the system [3]. Surveillance demands for military have also evoked a rapid interest in intelligent surveillance systems [5]. Complete and exact information needs to be provided to the concerned personnel rapidly. A chronological classification of surveillance systems is provided in [6]. The First Generation Surveillance Systems 978-1-4673-5762-3/13/$31.00 ©2013 IEEE Khawaja M. Yahya CSE Deptt, UET Peshawar Pakistan [email protected] k (1GSS) are completely based on analogue data, with a CCTV providing information to a human monitor. The Second Generation Surveillance System (2GSS) enable real time automated analysis of incoming information by using digital devices at the backend. Alarms are triggered in case of occurrence of a critical event. In Third Generation Surveillance Systems (3GSSs) the digital transformation is completed as the scene information is converted to digital video format at the camera node and forwarded using a computer network, such as ad-hoc network. Surveillance using wireless sensor networks is governed by constraints imposed by battery power, channel bandwidth, memory [8]. In [7] the authors propose a video encoding scheme for battery-constrained environments employing Dynamic Voltage Scaling (DVS). Allocation of battery resources between compression and transmission for minimum distortion is dealt with in [8] where the authors have presented a mathematical Power Rate Distortion (P-R-D) model for a Wireless Video Sensor Network. Network codingbased WVSN has been discussed in [9] where the authors aim to maximize the lifetime of the overall network. In this paper we aim to minimize the waste power at every Wireless Video Sensor Node by using an event detector. The event detector is used to turn on the system when an event of interest occurs. Energy consumption is minimized by reducing the number of false positives in the system. The rest of the paper is organized as follows. Section II presents the proposed overall surveillance system configuration, including the definition of motion based event. Section III briefly describes the Mixture of Gaussians-based background subtraction. Reasons for false positives in the surveillance system are analysed in Section IV. A mathematical model that gives the total number of false positives in the system is presented and graphically depicted in Section VI. Section VII concludes the paper. 2355 II. SYSTEM CONFIGURATION The surveillance system is composed of a front end event detector with back end video encoder and a transmitter. In order to preserve battery power the event detector continuously monitors the environment and triggers the rest of the system on only if an event of considerable criticality has occurred. The event detector block acts as a controller. Without using the event detector, the system will be operating in full throttle mode all the time and the battery will be consumed quickly. To ensure efficient operation of the proposed system, the event detector should be less power consuming compared to the rest of the back end system. In a motion-based surveillance system, an event is defined as a motion within a region of interest (ROI). A common example can be an artefact placed in a museum which is monitored at night time. Any movement near the artefact is an event and should trigger the rest of the system. In such a scenario, the video information about the environment should either be transmitted or stored in a black-box if transmission is not feasible. In other words, the whole back end system should be triggered on in case of an event by the event detector block. Thus event detector acts as a controller for the overall system. complexity but lack the adaptive while the MoG scheme can efficiently handle illumination changes as well as multiple background layers. The MoG based scheme does not use a buffer like other methods mentioned but updates the background information with every input frame as it is a recursive scheme with low memory requirements. The MoG-based background subtraction scheme is described as follows. A set of Gaussian distributions are associated with every pixel. If a pixel in the scene has a value that lies within a certain range of the mean value of the distribution, the pixel is considered to be part of the background, otherwise it is included in the foreground. The range is dependent on the variance of the distribution. The variance and the mean of every pixel location in the background model are updated with every new input frame from the scene. Moreover, more than one distribution is normally associated with each pixel to adapt to different layers of the background. The probability of observation of the current pixel is given by , In order to detect motion we have used background subtraction to determine the object flow in the scene. A foreground frame is obtained by subtracting the background from the scene. Motion is indicated by presence of non-zero pixels in the foreground. However, as will be discussed in the subsequent sections even in the absence of any moving objects in the scene, the foreground does contain non zero pixels. Therefore, it becomes necessary to indicate an event only if the value of non-zero pixels in the ROI is greater than a certain threshold value. III. MIXTURE OF GAUSSIANS BASED BACKGROUND SUBTRACTION There are a number of background subtraction schemes presented in literature [11], but the three most commonly used are N-frame differencing-based, Kalman filter-based and MoG(Mixture of Gaussians)-based background subtraction scheme. The first two are simpler in terms of computational (1) Here η is the i-th Gaussian component density, µ is the mean value of the pixel intensity for i-th component, σ is the variance in the pixel intensity for i-th component, ω , is the weight associated with the i-th component is the time index. The performance of MoG-based background subtraction is controlled by a number of parameters which are 1) 2) 3) 4) 5) Figure 1: Block Diagram of Overall System , Background component weight threshold Standard Deviation Scaling factor Learning rate Total number of Gaussian components Maximum number of components allowed in the background The details of these parameters are given in [10]. In our work we have used the values of these parameters as suggested in [10], i.e., • • • • 0.25 2.5 4 3 IV. FALSE POSITIVES DUE TO OBJECTS IN THE SCENE As discussed in section II, event in our work is defined as a moving object in the ROI. If the object is static, it is lack of activity and the system should not be triggered in this case. Whenever a moving object stops in the ROI the system should switch to stand-by mode in which the event detector is monitoring the environment and the backend system is turned off. Consider a surveillance system used to monitor the movements of micro-organisms. The user is not interested in static micro-organisms. The event is defined as the motion of micro-organisms. Thus, if micro-organisms stop moving, the 2356 F α, N , M ln N M 1.04α .01 .03 .05 .07 .09 .11 1 23 45 67 89 111 133 155 177 199 221 243 Non-zero pixels 500 450 400 350 300 250 200 150 100 50 0 Frame Number Figure 2 : Number of Non-zero Foreground pixels in the window against the frame number for different learning rates Figure 3 shows how the number of false positives w. r. t. the learning rate changes due to the change in the number of pixels in ROI. The value of the threshold for the following set of curves is set at 20. Also the effect of changing the threshold values on the relationship between the learning rate and the total number of false positives is shown in the set of curves in figure 4, where the value of M is fixed at 441. From the given set of curves it is clear that the difference in the number of false positives is larger at smaller values of the learning rate. V. FALSE POSITIVES DUE TO ILLUMINATION CHANGES In a visually static scene, the pixels are continuously changing. This is due to the presence of different types of distortions as noise, camera distortion etc. 250 200 Total False positives overall system should switch to stand-by mode as soon as possible. The time taken by the system to switch to stand-by mode after the motion has ceased is tantamount to energy wastage. MoG-based background subtraction is an adaptive scheme. The background of a scene is generated based on the history of objects. Objects which enter and then stay in the scene are included as background objects. In MoG-based background subtraction, the system is learning about the new objects which enter a scene e.g. if a car enters ROI in a frame and then stops, the car will be initially part of the foreground but with the passage of time, the car will be included in the background. The time taken by the system to learn that the car has stopped moving and it should be included in the background depends on the learning rate of the system. In case of MoG-based background subtraction, larger values of learning rates translates to static objects being included in the background quickly and vice-versa. Static objects should not trigger the system. However, nonzero pixels are present in the ROI even if an object stops. If the number of non-zero pixels in the ROI is greater than a priori threshold value, a false event is indicated by the event detector. The backend system is turned on falsely. Energy is wasted by such false positives reducing the battery lifetime of a wireless video sensor node. If the number of frames for which there are non-zero pixels is greater than threshold value (duration of false positive), which is controlled by the learning rate of the system, event detection is alerted to wake up the system. Figure 2 shows the number of non-zero pixels in the ROI of the foreground against the frame number. The ROI considered here is a 441-pixel square region (21x21 pixels). A controlled video is used in which the pixel values in ROI are kept constant. Ideally, the number of non-zero pixels in the foreground should be zero. The number of non-zero pixels in the foreground decays to the zero value and the time constant is controlled by alpha, the learning rate. Using curve-fitting, the number of false positives due to a moving object stopping in the ROI is given by 150 (2) M=100 M=144 100 Here F is the number of frames for which the system is turned on falsely due to the stopping of a moving object, N is the threshold value for the number of pixels, M is the total number of pixels in the ROI and α is the learning rate of the system. M=196 50 0 0.01 0.04 0.07 Learning Rate 0.1 Figure 3 : Number of FP with learning rate for different ROI sizes 2357 300 th=10 250 False positives th=20 200 th=10 th=40 th=40 100 th=70 0.01 th=30 150 50 th=50 0 th=60 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 Total False positives 400 350 300 250 200 150 100 50 0 0.04 0.07 0.1 Learning Rate th=70 Learning Rate Figure 4 : Number of FP against learning rate r for different threshold values Figure 6 : Experimental Results for number of FP with different values of threshold agaainst the learning rate The effect of these pixel changes in visuually static regions on the foreground is greatly influenced by the t learning rate of the system. Larger learning rate means thatt the variance value is varied and the magnitude of variation heavily h depends on the recent change in pixel values. Smalll changes in pixel values cause the pixel to be a part of the forreground. Similarly in a system having large learning rate the mean m swiftly shifts as well towards new pixel values in thee scene. The rapid changes in the mean and variance vaalues increase the probability of the pixel being in the foreground. Two foreground images from the hall monitor video v sequence are presented in figure 5. The first one is from f a foreground sequence having a learning rate of .01 wherreas the second one has a learning rate of .10. The number of non-zero pixels in the foreground is greater for the second caase, showing more non-zero pixels are produced in the foreeground for larger values of learning rate. As the event is based on the number of non-zero pixels in the ROI in the foregroundd (section II) larger learning rate increases the probability of false positives. Mathematically the numberr of frames for which false positive indicated is given as , , , 0.00208 1 58.29 9 0.7964 6.131 10 0 (3) 8.929 10 3.393 10 Here F is the number of falsse positives due to the effect of illumination changes and T is the total number of frames in the video. Comparison betweenn the developed model and the experimental results is shown inn figure 7. 300 False positives 250 200 150 100 50 0.12 0.11 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.01 Experimental results for the average number of false positives for five different surveillance videos with 300 frames are shown in figure 6. The ROI wass considered to be a region with no visual activity. The total fram mes for which false positive is indicated against the learning rate for different values of threshold are plotted. 0.02 0 Figure 5: Learning rate of .01 used in first frrame, the next one uses 0.10 Learrning Rate Figure 7 : Comparison of the devveloped model and experimental resu ults VI. TOTAL NUMBER OF O FALSE POSITIVES Experimental results show thhat the false positives occurring due to illumination changes aree uniformly distributed in time. With the assumption that the moving m objects that stop in the 2358 , , , , ln N M 1.04α 0.00208 1 8.929 10 3.393 , , , , , , , , , 20 30 40 50 60 False positive threshold 70 Figure 9 : Minimum number of FP against different values of threshold for a 3000 frame video 10 , 800 700 600 500 400 300 200 100 0 10 (4) 58.29 0.7964 6.131 10 Total False positives ROI (the number of which can be estimated based on the scene or through a learning method) are also uniformly distributed in time the model for the overall false positives is given by (5) , , The last term in the equation is to exclude false positives overlapping in time. Using this model the number of false positives w. r. t the learning rate is given with a threshold value of 20 pixels and ROI of 441 pixels. The optimal learning rate from the given curve can be given as .0325 as shown in figure 8. The minimum number of false positives is \ related to the value of the threshold as per the above mathematical model. This exponential decaying relation is given in figure 9. As is indicated the minimum number of false positives decreases with increasing the threshold value but also increases the probability of false negatives in the system. False positives 3500 3000 2500 2000 1500 1000 500 0 VII. CONCLUSION A surveillance system based on Mixture of Gaussians is analysed in this work. The system is optimized for energy by minimizing the total number of false positives. The reasons for false positives in the system were discussed and optimization strategy is introduced based on a number of parameters associated with background modelling and ROI. A mathematical model was also developed to determine the total number of false positives in the system. REFERENCES [1] M. Reiter and P. Rohatgi, ‘‘Homeland security guest editor’s introduction,’’ IEEE Internet Comput., vol. 8, no. 6, pp. 16---17, Nov./Dec. 2004, doi: 10.1109/MIC.2004.62. [2] M. Valera and S. A. Velastin, ‘‘Intelligent distributed surveillance systems: A review,’’ IEE Proc.-Vis. Image Signal Process., vol. 152, no. 2, pp. 192---204, Apr. 2005, doi: 10.1049/ip-vis: 20041147. [3] C. S. Regazzoni, V. Ramesh, and G. L. Foresti, ‘‘Scanning the issue/ technology special issue on video communications, processing, and understanding for third generation surveillance systems,’’ Proc. IEEE, vol. 89, no. 10, pp. 1355--1367, Oct. 2001, doi: 10.1109/5.959335. 0.01 0.017 0.024 0.031 0.038 0.045 0.052 0.059 0.066 0.073 0.08 0.087 0.094 0.101 0.108 0.115 [4] A. C.M. Fong and S. C. Hui, ‘‘Web-based intelligent surveillance system for detection of criminal activities,’’ Comput. Control Eng. J., vol. 12, no. 6, pp. 263---270, Dec. 2001. [5] H. A. Nye, ‘‘The problem of combat surveillance,’’ IRE Trans. Mil. Electron., vol.MIL-4, no. 4, pp. 551---555, Oct. 1960, doi: 10.1109/IRETMIL. 1960.5008289. Learning Rate Figure 8 : Number of FP against learning rate for threshold=20 [6] M. Bramberger, A. Doblander, A. Maier, B. Rinner, and H. Schwabach, ‘‘Distributed embedded smart cameras for surveillance applications,’’ Computer, vol. 39, no. 2, pp. 68--75, Feb. 2006, doi: 10.1109/MC.2006.55. 2359 [7] Zhihai He, Yongfang Liang, Lulin Chen, Ishfaq Ahmad, Dapeng Wu, ‘‘Power-Rate-Distortion Analysis for Wireless Communication Under Energy Constraints,’’ IEEE Trans. On Circuits and Systems for Video Technology, vol. 15, no. 5, May 2005 [8] Zhihai He, Dapeng Wu, ‘‘Resource Allocation and Performance Analysis of Wireless Video Sensors,’’ IEEE Trans. On circuits and systems for Video Technology, vol. 16, no. 5, May 2006 Source/Channel Rate Adaptation and Network Coding-Based Error Control in Wireless Video Sensor Networks,’’ IEEE Trans. On Vehicular Technology, vol. 60, no. 3, March 2010 [10] Chris Stauffer, W. E. L Grimson, ‘‘Adaptive Background Mixture Models for Real time tracking,’’ [11] Massimo Piccardi, “Background Subtraction Techniques; A Review,” IEEE International conference on Man, Systems and Cybernetics, 2004 [9] Junni Zou, Hongkai Xiong, Chenglin Li, Ruifeng Zhang, Zhihai He, ‘‘Lifetime and Distrotion Optimization With Joing 2360
© Copyright 2026 Paperzz