Optimized Learning Rate for Energy Waste

Optimized Learning Rate for Energy Waste
minimization in a Background Subtraction based
Surveillance System
Muhammad Umar Karim Khan
Smart Sensor Architecture Lab,
KAIST
Daejeon, South Korea
[email protected]
Chong Min Kyung
Smart Sensor Architecture Lab,
KAIST
Daejeon, South Korea
[email protected]
Abstract— In this paper surveillance system employing
background subtraction scheme is discussed. The aim of the
work is to minimize the waste energy of the overall system due to
false positives. Pixels in the foreground of a motion detection
system remain non-zero even if a moving object has stopped due
to the settling time associated with an adaptive background
subtraction scheme as Mixture of Gaussians. Temporal variance
in a visually static pixel region also triggers false positives.
Optimal learning rate for different parameters as threshold
values, ROI size and total number of frames in the scene has
been derived in this paper.
I. INTRODUCTION
Growing interest in recent surveillance systems
is
associated with the availability of cheap sensors and
processors [1]. Computer vision algorithms and the video
compression schemes have also evolved to meet the
requirements of modern surveillance systems. Remote
surveillance for safety and security has received significant
attention in research recently [2], [3]. Future surveillance
systems are predicted to be composed of a distributed multisensor network, based on real-time computer vision
algorithms, which require minimal manual reconfiguration for
different applications [2].
Visual security is generally based on a CCTV which is
continuously monitored by security personnel. Although
widely used even now, this scheme is prone to human
inefficiencies. Intelligent surveillance systems are supposed to
act timely and properly as any suspicious activity is detected.
This reduces the dependency on security personnel and allows
efficient monitoring of the environment [4]. Visual
surveillance systems can use sensors besides camera nodes,
such as microphones, to enhance the efficiency of the system
[3].
Surveillance demands for military have also evoked a rapid
interest in intelligent surveillance systems [5]. Complete and
exact information needs to be provided to the concerned
personnel rapidly.
A chronological classification of surveillance systems is
provided in [6]. The First Generation Surveillance Systems
978-1-4673-5762-3/13/$31.00 ©2013 IEEE
Khawaja M. Yahya
CSE Deptt,
UET Peshawar
Pakistan
[email protected]
k
(1GSS) are completely based on analogue data, with a CCTV
providing information to a human monitor. The Second
Generation Surveillance System (2GSS) enable real time
automated analysis of incoming information by using digital
devices at the backend. Alarms are triggered in case of
occurrence of a critical event. In Third Generation
Surveillance Systems (3GSSs) the digital transformation is
completed as the scene information is converted to digital
video format at the camera node and forwarded using a
computer network, such as ad-hoc network.
Surveillance using wireless sensor networks is governed by
constraints imposed by battery power, channel bandwidth,
memory [8]. In [7] the authors propose a video encoding
scheme for battery-constrained environments employing
Dynamic Voltage Scaling (DVS). Allocation of battery
resources between compression and transmission for
minimum distortion is dealt with in [8] where the authors have
presented a mathematical Power Rate Distortion (P-R-D)
model for a Wireless Video Sensor Network. Network codingbased WVSN has been discussed in [9] where the authors aim
to maximize the lifetime of the overall network.
In this paper we aim to minimize the waste power at every
Wireless Video Sensor Node by using an event detector. The
event detector is used to turn on the system when an event of
interest occurs. Energy consumption is minimized by reducing
the number of false positives in the system.
The rest of the paper is organized as follows. Section II
presents the proposed overall surveillance system
configuration, including the definition of motion based event.
Section III briefly describes the Mixture of Gaussians-based
background subtraction. Reasons for false positives in the
surveillance system are analysed in Section IV. A
mathematical model that gives the total number of false
positives in the system is presented and graphically depicted
in Section VI. Section VII concludes the paper.
2355
II. SYSTEM CONFIGURATION
The surveillance system is composed of a front end event
detector with back end video encoder and a transmitter. In
order to preserve battery power the event detector
continuously monitors the environment and triggers the rest of
the system on only if an event of considerable criticality has
occurred. The event detector block acts as a controller.
Without using the event detector, the system will be
operating in full throttle mode all the time and the battery will
be consumed quickly. To ensure efficient operation of the
proposed system, the event detector should be less power
consuming compared to the rest of the back end system.
In a motion-based surveillance system, an event is defined
as a motion within a region of interest (ROI). A common
example can be an artefact placed in a museum which is
monitored at night time. Any movement near the artefact is an
event and should trigger the rest of the system. In such a
scenario, the video information about the environment should
either be transmitted or stored in a black-box if transmission is
not feasible. In other words, the whole back end system
should be triggered on in case of an event by the event
detector block. Thus event detector acts as a controller for the
overall system.
complexity but lack the adaptive while the MoG scheme can
efficiently handle illumination changes as well as multiple
background layers. The MoG based scheme does not use a
buffer like other methods mentioned but updates the
background information with every input frame as it is a
recursive scheme with low memory requirements.
The MoG-based background subtraction scheme is
described as follows. A set of Gaussian distributions are
associated with every pixel. If a pixel in the scene has a value
that lies within a certain range of the mean value of the
distribution, the pixel is considered to be part of the
background, otherwise it is included in the foreground. The
range is dependent on the variance of the distribution. The
variance and the mean of every pixel location in the
background model are updated with every new input frame
from the scene. Moreover, more than one distribution is
normally associated with each pixel to adapt to different
layers of the background. The probability of observation of
the current pixel is given by
,
In order to detect motion we have used background
subtraction to determine the object flow in the scene. A
foreground frame is obtained by subtracting the background
from the scene. Motion is indicated by presence of non-zero
pixels in the foreground. However, as will be discussed in the
subsequent sections even in the absence of any moving
objects in the scene, the foreground does contain non zero
pixels. Therefore, it becomes necessary to indicate an event
only if the value of non-zero pixels in the ROI is greater than
a certain threshold value.
III. MIXTURE OF GAUSSIANS BASED BACKGROUND
SUBTRACTION
There are a number of background subtraction schemes
presented in literature [11], but the three most commonly used
are N-frame differencing-based, Kalman filter-based and
MoG(Mixture of Gaussians)-based background subtraction
scheme. The first two are simpler in terms of computational
(1)
Here η is the i-th Gaussian component density, µ is the
mean value of the pixel intensity for i-th component, σ is the
variance in the pixel intensity for i-th component, ω , is the
weight associated with the i-th component is the time index.
The performance of MoG-based background subtraction is
controlled by a number of parameters which are
1)
2)
3)
4)
5)
Figure 1: Block Diagram of Overall System
,
Background component weight threshold
Standard Deviation Scaling factor
Learning rate
Total number of Gaussian components
Maximum number of components allowed in the
background
The details of these parameters are given in [10]. In our
work we have used the values of these parameters as
suggested in [10], i.e.,
•
•
•
•
0.25
2.5
4
3
IV. FALSE POSITIVES DUE TO OBJECTS IN THE SCENE
As discussed in section II, event in our work is defined as a
moving object in the ROI. If the object is static, it is lack of
activity and the system should not be triggered in this case.
Whenever a moving object stops in the ROI the system should
switch to stand-by mode in which the event detector is
monitoring the environment and the backend system is turned
off. Consider a surveillance system used to monitor the
movements of micro-organisms. The user is not interested in
static micro-organisms. The event is defined as the motion of
micro-organisms. Thus, if micro-organisms stop moving, the
2356
F α, N , M
ln
N
M
1.04α
.01
.03
.05
.07
.09
.11
1
23
45
67
89
111
133
155
177
199
221
243
Non-zero pixels
500
450
400
350
300
250
200
150
100
50
0
Frame Number
Figure 2 : Number of Non-zero Foreground pixels in the window
against the frame number for different learning rates
Figure 3 shows how the number of false positives w. r. t.
the learning rate changes due to the change in the number of
pixels in ROI. The value of the threshold for the following set
of curves is set at 20.
Also the effect of changing the threshold values on the
relationship between the learning rate and the total number of
false positives is shown in the set of curves in figure 4, where
the value of M is fixed at 441.
From the given set of curves it is clear that the difference in
the number of false positives is larger at smaller values of the
learning rate.
V. FALSE POSITIVES DUE TO ILLUMINATION CHANGES
In a visually static scene, the pixels are continuously
changing. This is due to the presence of different types of
distortions as noise, camera distortion etc.
250
200
Total False positives
overall system should switch to stand-by mode as soon as
possible. The time taken by the system to switch to stand-by
mode after the motion has ceased is tantamount to energy
wastage.
MoG-based background subtraction is an adaptive scheme.
The background of a scene is generated based on the history
of objects. Objects which enter and then stay in the scene are
included as background objects. In MoG-based background
subtraction, the system is learning about the new objects
which enter a scene e.g. if a car enters ROI in a frame and
then stops, the car will be initially part of the foreground but
with the passage of time, the car will be included in the
background. The time taken by the system to learn that the car
has stopped moving and it should be included in the
background depends on the learning rate of the system. In
case of MoG-based background subtraction, larger values of
learning rates translates to static objects being included in the
background quickly and vice-versa.
Static objects should not trigger the system. However, nonzero pixels are present in the ROI even if an object stops. If
the number of non-zero pixels in the ROI is greater than a
priori threshold value, a false event is indicated by the event
detector. The backend system is turned on falsely. Energy is
wasted by such false positives reducing the battery lifetime of
a wireless video sensor node. If the number of frames for
which there are non-zero pixels is greater than threshold value
(duration of false positive), which is controlled by the learning
rate of the system, event detection is alerted to wake up the
system.
Figure 2 shows the number of non-zero pixels in the ROI of
the foreground against the frame number. The ROI considered
here is a 441-pixel square region (21x21 pixels). A controlled
video is used in which the pixel values in ROI are kept
constant. Ideally, the number of non-zero pixels in the
foreground should be zero. The number of non-zero pixels in
the foreground decays to the zero value and the time constant
is controlled by alpha, the learning rate.
Using curve-fitting, the number of false positives due to a
moving object stopping in the ROI is given by
150
(2)
M=100
M=144
100
Here F is the number of frames for which the system is
turned on falsely due to the stopping of a moving object, N
is the threshold value for the number of pixels, M is the total
number of pixels in the ROI and α is the learning rate of the
system.
M=196
50
0
0.01
0.04
0.07
Learning Rate
0.1
Figure 3 : Number of FP with learning rate for different ROI
sizes
2357
300
th=10
250
False positives
th=20
200
th=10
th=40
th=40
100
th=70
0.01
th=30
150
50
th=50
0
th=60
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
Total False positives
400
350
300
250
200
150
100
50
0
0.04
0.07
0.1
Learning Rate
th=70
Learning Rate
Figure 4 : Number of FP against learning rate
r
for different
threshold values
Figure 6 : Experimental Results for number of FP with different
values of threshold agaainst the learning rate
The effect of these pixel changes in visuually static regions
on the foreground is greatly influenced by the
t learning rate of
the system. Larger learning rate means thatt the variance value
is varied and the magnitude of variation heavily
h
depends on
the recent change in pixel values. Smalll changes in pixel
values cause the pixel to be a part of the forreground. Similarly
in a system having large learning rate the mean
m
swiftly shifts
as well towards new pixel values in thee scene. The rapid
changes in the mean and variance vaalues increase the
probability of the pixel being in the foreground. Two
foreground images from the hall monitor video
v
sequence are
presented in figure 5. The first one is from
f
a foreground
sequence having a learning rate of .01 wherreas the second one
has a learning rate of .10. The number of non-zero pixels in
the foreground is greater for the second caase, showing more
non-zero pixels are produced in the foreeground for larger
values of learning rate. As the event is based on the number of
non-zero pixels in the ROI in the foregroundd (section II) larger
learning rate increases the probability of false positives.
Mathematically the numberr of frames for which false
positive indicated is given as
,
,
,
0.00208
1
58.29
9
0.7964
6.131 10
0
(3)
8.929 10
3.393
10
Here F is the number of falsse positives due to the effect of
illumination changes and T is the total number of frames in
the video. Comparison betweenn the developed model and the
experimental results is shown inn figure 7.
300
False positives
250
200
150
100
50
0.12
0.11
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.01
Experimental results for the average number of false
positives for five different surveillance videos with 300
frames are shown in figure 6. The ROI wass considered to be a
region with no visual activity. The total fram
mes for which false
positive is indicated against the learning rate for different
values of threshold are plotted.
0.02
0
Figure 5: Learning rate of .01 used in first frrame, the next one
uses 0.10
Learrning Rate
Figure 7 : Comparison of the devveloped model and experimental
resu
ults
VI. TOTAL NUMBER OF
O FALSE POSITIVES
Experimental results show thhat the false positives occurring
due to illumination changes aree uniformly distributed in time.
With the assumption that the moving
m
objects that stop in the
2358
,
,
, ,
ln
N
M
1.04α
0.00208
1
8.929 10
3.393
,
, ,
,
,
,
,
, ,
20
30
40
50
60
False positive threshold
70
Figure 9 : Minimum number of FP against different values of
threshold for a 3000 frame video
10
,
800
700
600
500
400
300
200
100
0
10
(4)
58.29
0.7964
6.131 10
Total False positives
ROI (the number of which can be estimated based on the
scene or through a learning method) are also uniformly
distributed in time the model for the overall false positives is
given by
(5)
, ,
The last term in the equation is to exclude false positives
overlapping in time.
Using this model the number of false positives w. r. t the
learning rate is given with a threshold value of 20 pixels and
ROI of 441 pixels. The optimal learning rate from the given
curve can be given as .0325 as shown in figure 8.
The minimum number of false positives is \ related to the
value of the threshold as per the above mathematical model.
This exponential decaying relation is given in figure 9. As is
indicated the minimum number of false positives decreases
with increasing the threshold value but also increases the
probability of false negatives in the system.
False positives
3500
3000
2500
2000
1500
1000
500
0
VII.
CONCLUSION
A surveillance system based on Mixture of Gaussians is
analysed in this work. The system is optimized for energy by
minimizing the total number of false positives. The reasons
for false positives in the system were discussed and
optimization strategy is introduced based on a number of
parameters associated with background modelling and ROI. A
mathematical model was also developed to determine the total
number of false positives in the system.
REFERENCES
[1] M. Reiter and P. Rohatgi, ‘‘Homeland security guest
editor’s introduction,’’ IEEE Internet Comput., vol. 8, no. 6,
pp. 16---17, Nov./Dec. 2004, doi: 10.1109/MIC.2004.62.
[2] M. Valera and S. A. Velastin, ‘‘Intelligent distributed
surveillance systems: A review,’’ IEE Proc.-Vis. Image Signal
Process., vol. 152, no. 2, pp. 192---204, Apr. 2005, doi:
10.1049/ip-vis: 20041147.
[3] C. S. Regazzoni, V. Ramesh, and G. L. Foresti, ‘‘Scanning
the issue/ technology special issue on video communications,
processing, and understanding for third generation
surveillance systems,’’ Proc. IEEE, vol. 89, no. 10, pp. 1355--1367, Oct. 2001, doi: 10.1109/5.959335.
0.01
0.017
0.024
0.031
0.038
0.045
0.052
0.059
0.066
0.073
0.08
0.087
0.094
0.101
0.108
0.115
[4] A. C.M. Fong and S. C. Hui, ‘‘Web-based intelligent
surveillance system for detection of criminal activities,’’
Comput. Control Eng. J., vol. 12, no. 6, pp. 263---270, Dec.
2001.
[5] H. A. Nye, ‘‘The problem of combat surveillance,’’ IRE
Trans. Mil. Electron., vol.MIL-4, no. 4, pp. 551---555, Oct.
1960, doi: 10.1109/IRETMIL.
1960.5008289.
Learning Rate
Figure 8 : Number of FP against learning rate for threshold=20
[6] M. Bramberger, A. Doblander, A. Maier, B. Rinner, and
H. Schwabach, ‘‘Distributed embedded smart cameras for
surveillance applications,’’ Computer, vol. 39, no. 2, pp. 68--75, Feb. 2006, doi: 10.1109/MC.2006.55.
2359
[7] Zhihai He, Yongfang Liang, Lulin Chen, Ishfaq Ahmad,
Dapeng Wu, ‘‘Power-Rate-Distortion Analysis for Wireless
Communication Under Energy Constraints,’’ IEEE Trans. On
Circuits and Systems for Video Technology, vol. 15, no. 5,
May 2005
[8] Zhihai He, Dapeng Wu, ‘‘Resource Allocation and
Performance Analysis of Wireless Video Sensors,’’ IEEE
Trans. On circuits and systems for Video Technology, vol. 16,
no. 5, May 2006
Source/Channel Rate Adaptation and Network Coding-Based
Error Control in Wireless Video Sensor Networks,’’ IEEE
Trans. On Vehicular Technology, vol. 60, no. 3, March 2010
[10] Chris Stauffer, W. E. L Grimson, ‘‘Adaptive Background
Mixture Models for Real time tracking,’’
[11] Massimo Piccardi, “Background Subtraction Techniques;
A Review,” IEEE International conference on Man, Systems
and Cybernetics, 2004
[9] Junni Zou, Hongkai Xiong, Chenglin Li, Ruifeng Zhang,
Zhihai He, ‘‘Lifetime and Distrotion Optimization With Joing
2360