Human Detection Based on Covariance Features and SVM

Recent Researches in Electrical Engineering
Human Detection Based on Covariance Features and SVM
YAHIA SAID, MOHAMED ATRI
Laboratory of Electronics and Microelectronics (EμE) -LAB 99 ES 30
Faculty of Sciences of Monastir
University of Monastir, 5000
TUNISIA
[email protected]
Abstract: - Detecting pedestrians is a challenging problem owing to the motion of the subjects, the camera and
the background and to variations in pose, appearance, clothing, illumination and background clutter.
The Region Covariance Matrix (RCM) descriptors show experimentally significantly out-performs existing
feature sets for pedestrian detection. In this paper, we present an efficient features extraction scheme: the
Integral CovReg, inspired from Region Covariance Matrix (RCM) descriptors, combined with SVM classifier
for pedestrian detection.
Key-Words: - Pedestrian detection; Image descriptor; Integral CovReg; SVM.
normalization in overlapping descriptor blocks are
all important for good results.
In [2], Tuzel et al. introduce covariance features
as fast region descriptors for performing pedestrian
detection task and it has achieved promising results.
Human appearance is modeled by a dense set of
features variances extracted inside a detection
window, their correlations with each other and their
spatial layout. It can fuse different types of features
(image intensities, derivatives, etc.), while
producing a compact representation. The
performance of the covariance descriptor is found to
be superior to other methods, as rotation and
illumination changes are absorbed by the covariance
matrix [3],[4].
In this paper, we propose a single frame human
detection system which follows the path of Dalal et
al.[1] and Tuzel et al.[2] This detection system is
based on Integral Covariance Region descriptor
combined with Support Vector Machines [5] for the
recognition stage. SVM is a classification method,
which has proved to be very efficient in such case of
high dimensional data. It has been developed to
detect a human centered in a 128 × 64 single image.
The implementation of our system was based on MS
VC++ and OpenCv [6] which is an Open source
computer vision library, originally developed by
Intel, specialized in the real time image processing.
This paper is organized as follows: section 2
describes related works in human detection systems.
In section 3, we detail the single frame detector and
we give more precision about the Integral RegCov
1 Introduction
Human detection is one of the most challenging
problems in computer vision due to large variations
caused by body articulation, illumination conditions,
occlusions and appearance. At the same time
detecting and tracking people has a wide range of
applications including automotive safety, video
surveillance, assisted living and robotics.
Consequently, there is an increasing interest on this
research field in the last years and a vast literature.
The pattern recognition field applied to the
human detection is faced with nature even of the
human. Various variabilities enter in account:
variability on the scale, of posture and appearance.
Indeed, the human can be positioned with various
sites in the image, under various aspects and point
of view. The first need is a robust feature set that
allows the human form to be discriminated cleanly,
even in cluttered backgrounds under difficult
illumination.
Dalal and al. [1] presented a human detection
algorithm with excellent detection results. Their
algorithm uses a dense grid of Histograms of
Oriented Gradients (HOG) to represent a detection
window. The Histogram of Oriented Gradient
(HOG) [1] descriptors provided better performance
than other existing feature sets. Each stage of the
HOG computation has much influence on
performance, concluding that fine-scale gradients,
fine orientation binning, relatively coarse spatial
binning,
and
high-quality
local
contrast
ISBN: 978-960-474-392-6
311
Recent Researches in Electrical Engineering
descriptor and its parameters. Results are discussed
in paragraph 4 and, finally, section 5 ends the paper
with some final remarks.
results show that their approach method was
effective.
Region covariance matrix (RCM) [2] is among
the state-of-the-art features for human detection.
RCM is a matrix of covariance of some image
statistics computed inside the region of an image. It
shows good discriminative power and elegant
robustness. For example, Wu et al. [3] and Hussein
et al. [4] show that the covariance matrix has the
best discriminative power for human detection task
compared with other descriptors like HOG [1] and
the Edgelet features [8].
2 Related Works
Many different approaches for human detection
have been proposed in the literature. Lowe [7] uses
a description with orientation of the gradient around
interest points. The research of the interest points is
carried out of multi-resolution, which makes it
possible to define a variable size for the vicinity
interest points. The size depends on the scale to
which the interest point is detected. The image is
then characterized by a whole of local histograms.
By regarding these data as "bags of words", we lose
then the data space component, which reduces the
method relevance.
Wu et al. [8] introduced edgelet features, which
are a new type of silhouette oriented features. Part
detectors, based on these features, are learned by a
boosting method. Responses of part detectors are
combined to form a joint likelihood model that
includes cases of multiple, possibly inter-occluded
humns.
The most successful and popular descriptor is
histograms of oriented gradients (HOG) proposed
by Dalal et al. [1]. The method is founded on a
description of the image using histograms of
gradient orientation. These histograms are
calculated on the whole image, but are defined on a
local area. The image description is a vector
containing all the calculated histograms. This vector
brings, on the one hand, local information for each
area and makes it possible, on the other hand, to
preserve a space fitting, the vectors being calculated
systematically by the same process.
A complete system for pedestrian detection
based on HOG descriptors was introduced by Suard
and al. [9]. Their system deals with stereo infrared
images. The system first locates positions of interest
within the larger view field where humans could
possibly be located. Then normal Support Vector
Machine classifiers operate on the HOG descriptors
taken from these smaller positions of interest to
formulate a decision regarding the presence of a
pedestrian. This approach gives good results for
window classification, and a preliminary test
applied on a video sequence proves that this
approach is very promising.
Zhang et al. [10] optimized the HOG features to
achieve an accurate human detection system. They
don’t normalize the input detection windows but
resize the cell and block by same ratio. Simulation
ISBN: 978-960-474-392-6
3 Overview of the Proposed Method
3.1 Integral CovReg Features
The proposed detection algorithm operates under a
sliding window philosophy, where a window is a
rectangular region of size 128×64 pixels that scans
the entire image with a horizontal and vertical stride
of 8 pixels. In a similar way to the HOG descriptor
[1], each detection window is divided into Cells of
size 8×8 pixels and each group of 2×2 cells is
integrated into a Block in a sliding fashion. For
every Block, a covariance matrix of image features
is calculated.
We give, below, followed steps to calculate the
Integral CovReg descriptor:

Features Extraction:
On the detection window we perform features
extraction, sequentially for each pixel, in a row-wise
fashion. For each pixel we extract:
(1)
where Gx, Gxx, ... are first- and second-order
derivatives of the image intensity, and the last term
is the edge orientation. With this definition, the
detection window is mapped to a d = 6-dimensional
feature image. The covariance descriptor of a Block
is a 6×6 matrix.
 Integral Representation:
Covariance matrices can be computed in a very fast
way by adopting integral representations under the
form of tensors [2]. We have to compute the firstorder integral tensor P which is a d-dimensional
vector encoding the sum of each feature
of all
pixels in the Block:
312
Recent Researches in Electrical Engineering
The basic kernels are:
• Linear:
(2)
and then Q, the d×d second-order integral tensor
defined by
(6)
•
Polynomial:
(3)
(7)
 Covariances Computation:
The covariance matrix of Block B is
•
Radial Basis Function (RBF):
(4)
(8)
where S = 16×16 is the number of pixels inside the
Block B, PB is the d-dimensional vector of tensors,
and QB is the d×d dimensional matrix of the secondorder tensors.
The covariance descriptor of a Block is a
symmetric positive definite 6×6 matrix and due to
symmetry only the upper triangular or lower
triangular part is stored, which has only 21 different
values.
An image of 128×64 pixels contains 7×15 blocks
with multiple overlapping. We assemble the
vectorized covariance matrices obtained for all
blocks in a single 1-D vector of 2205 components.
This is our final Integral RegCov descriptor.
•
Sigmoid:
(9)
In our experiments, we use Linear kernel to classify
the feature data of train set.
4 Experiments and Results
We evaluated our system with two different static
human date sets:
• The MIT pedestrian database [11]: contains
709 standing human images taken in city
streets. The set only contains images
featuring the front or back of human figures
and contains little variety in human pose.
3.2 SVM Classifier
Vladimir Vapnik proposed the Support Vector
Machine (SVM) in 1995 [5]. Since this time, the
SVM have been largely used for the pattern
recognition, the regression and the density
estimation.
Given a training data set (
,
where are the training examples Integral RegCov
feature vector and
the class label, the SVM
require the solution of the following optimization
problem :
“Fig. 1,” shows some sample images from the MIT
pedestrian data set.
•
The INRIA database [12]: proposes
examples of human in varied postures, with
different appearances and under very
diverse scenes and luminosity conditions.
“Fig. 2,” shows some sample images from the
INRIA static person data set.
Subject to
(5)
We selected 2416 images from INRIA data set as
positive training examples, and 4120 images as
negative training examples. For the test, we use
1178 images as positive examples, and 452 images
as negatives examples from INRIA data set and 709
images from MIT data set.
The method consists in mapping
in a high
dimensional space owing to the function . Then
SVM looks for an optimal decision function of the
.
form
The kernel function is
.
ISBN: 978-960-474-392-6
313
Recent Researches in Electrical Engineering
Integral RegCov+SVM are performing better than
the Integral HOG+SVM.
“Fig. 4,” shows sample detections of the proposed
Integral RegCov detector.
TABLE I. TEST RESULT TABLE
MIT data
set
Model
INRIA
positive test
samples
INRIA
negative test
samples
1.1), (1.2),…, (2.1), (2.2),… depending on your
100%
85.9%
99.1%
NormalSections.
HOG [1]
various
Optimized HOG [10]
Fig. 1. Some sample images from the MIT pedestrian data set
Integral HOG [13]
Our method
(709/709)
(1012/1178)
(448/452)
100%
99.07%
98.89%
(709/709)
(1167/1178)
(447/452)
100%
94.48%
97.56%
(709/709)
(1113/1178)
(441/452)
100%
98.38%
98.45%
(709/709)
(1159/1178)
(445/452)
Fig. 2. Some sample images from the INRIA static person data set
This system is executed in Visual Studio 2010 with
OpenCV library and synchronized on a 2.0GHz
Dual-core Laptop. The test result on test set,
compared to [1], [10] and [13], is shown in table 1.
From the table 1, we can say that our results are
comparable in terms of accuracy with Normal HOG
[1] and Optimized HOG [10] features, and
outperforms the Integral HOG [13] features. To
quantify the performance of the detectors we plot
the Receiver Operating Characteristic (ROC) curve.
Each point on the ROC curve represents a
sensitivity/specificity pair corresponding to a
particular decision threshold. We tested the Integral
HOG and Integral RegCov detectors with the
INRIA person database.
The true positive rate (Sensitivity) is plotted in
function of the false positive rate (100-Specificity)
for different cut-off points of a parameter.
The ROC curves (Fig. 3) show the performances
achieved by the two methods. The features with
ISBN: 978-960-474-392-6
Fig. 3 ROC curve of various detectors
4 Conclusion
In this paper, we present a human detection system
using Integral Region Covariance features combined
with SVM classifier.
To evaluate the proposed method, we have used two
different static human date sets: MIT and INRIA
pedestrian databases. Experimental results have
314
Recent Researches in Electrical Engineering
demonstrated the effectiveness and efficiency of the
proposed algorithm.
Future works include the use the proposed
scheme for detecting other objects (cars, animals...)
and/or tracking of the detected objects.
[6] http://sourceforge.net/projets/opencvlibrary/
[7] David Lowe. “Distinctive image features from
scale invariant keypoints”. International Journal
of ComputerVision, 60(2) :91–110, 2004.
[8] B. Wu and R. Nevatia. “Detection of multiple,
partially occluded humans in a single image by
bayesian combination of edgelet part
detectors”. IEEE International Conference on
Computer Vision, pages 90-97, 2005.
[9] F. Suard, A. Rakotomamonjy, and A.
Bensrhair. “Pedestrian Detection using Infrared
images and Histograms of Oriented
Gradients”.In Intelligent Vehicles Symposium
2006, June 13-15, 2006, Tokyo, Japan
[10] G.Zhang, F.Gao, C.Liu, W.Liu, H.Yuan. “A
Pedestrian Detection Method Based on SVM
Classifier and Optimized Histograms of
Oriented Gradients Feature”. In 2010 Sixth
International
Conference
on
Natural
Computation (ICNC 2010)
[11] http://cbcl.mit.edu/cbcl/software
datasets/PedestrianData.html
[12] http://pascal.inrialpes.fr/data/human/
[13] Yahia Said, Mohamed Atri and Rached
Tourki. “Human Detection Based on Integral
Histograms of Oriented Gradients and
SVM”. International
IEEE
Conference
(CCCA’11), Hammamet, Tunisia, March 0305, 2011.
References:
[1] N. Dalal and B. Triggs. “Histograms of
oriented gradients for human detection”. IEEE
Conference on Computer Vision and Pattern
Recognition, pages 886-893, 2005
[2] Oncel Tuzel, Fatih Porikli, Peter Meer,
Pedestrian detection via classification on
riemannian manifolds, IEEE Transactions on
Pattern Analysis and Machine Intelligence 30
(10) (2008) 1713–1727
[3] B. Wu and R. Nevatia. Optimizing
discrimination efficiency tradeoff in integrating
heterogeneous local features for object
detection. In CVPR, 2008.
[4] Mohamed Hussein, Fatih Porikli and Larry
Davis.“A
Comprehensive
Evaluation
Framework and a Comparative Study for
Human Detectors”. IEEE TRANSACTIONS
ON INTELLIGENT TRANSPORTATION
SYSTEMS, VOL. 10, NO. 3, SEPTEMBER
2009.
[5] V.N.Vapnik. "The Nature of Statistical
Learning Theory". Springer-Verlag, 1995.
Fig. 4. Examples of detection results using Integral RegCov. Detected humans are enclosed in rectangles.
ISBN: 978-960-474-392-6
315