Fedotov N., Moiseev A., Romanov S., Kolchugin A., Smolkin O.

26
FEATURE SPACE MINIMIZATION IN TRIPLE FEATURES
COMPUTERIZED GENERATION PROCEDURE1
N. Fedotov2, A. Moiseev3, S. Romanov2, A. Kolchugin2, O. Smolkin2
2
Penza State University,
Krasnaya 40, Penza, 440026 Russia, e-mail: [email protected]
3 All-Russian Distance Institute of Finance and Economics,
Kalinina 33-B, Penza, 440052 Russia, email: [email protected]
The approach to large number of pattern recognition features generation based on stochastic geometry methods is considered. The feature space minimization and the most
informative features selection problem arisen in this case is proposed to solve using
Karhunen-Loeve decomposition.
Introduction
Theoretical works in pattern recognition area
is mainly focused on decision procedure building methods and correspondent mathematical
tools. Features used for classification of patterns into classes are considered most often as
being known or measured. But when we solve
practical problems referred to pattern recognition, particularly recognition of patterns presented by graphical images, informative quantitative features retrieving is not an easier task
than decision procedure building. Extraction of
feature from given image could be considered
as a process of extreme information compression where a scalar numerical value corresponds to an image consists of vast number of
pixels.
Traditionally, one considers feature forming as
purely empirical task. Stochastic geometry apparatus allows doing more than obtaining theoretical description for this stage of recognition.
We propose then universal method to form a
lot of new constructive features for patterns
presented by images on the base of stochastic
geometry.
The prominent characteristic of features
formed by our method is their structure in the
form of three-functional composition [1—3].
Therefore we call these features the triple
ones. Three-functional structure makes possi_______________________________________________________________________
1
This work is supported by RFBR, Project No. 05-01-00991
ble to generate automatically a lot of features
that enhances our abilities in solving a lot of
recognition problems including ones with
large number of classes such as hieroglyphs
recognition, nanoobjects recognition, technical
flaw detection.
At present, more than 200 functionals from
different fields of mathematics which are suitable for recognition features forming are discovered. It allows obtaining thousands of features in computerized generation mode.
Since the features are formed in computerized
mode and in abundance, some features duplicate each other. The important task to select a
subset of features which is enough to separate
objects into given classes is appeared.
It is significant that the features have no a priori meaning but contain information which are
necessary for recognition.
In this work we show that it is able to solve
feature space minimization problem using
Karhunen-Loeve decomposition.
Triple feature forming
The key element of triple features theory is socalled trace transformation concerned image
scanning along given trajectories. The theory
of trace transformation is discussed in details
in previous articles written by authors
[2,3,5,4]. The most useful for practical appli-
27
cations is discrete version of trace transformation performed on the base of discrete
scanning lattice.
Let F ( x, y ) is an image on a plane ( x, y ) . We
put scanning line l ( , p, t ) on a plane specifying normal coordinates  and p :
x  cos   y  sin   p ,
where t specifies point on the line. Let us consider the result of intersection between
F ( x, y ) and scanning line l ( , p, t ) . We define function g ( , p)  T( F  l ( , p, t )) as a
result that gives functional T being applied to
intersection between image and scanning line
while  and p are fixed. In discrete case, parameters of scanning line form two discrete
sets
and
  {1 ,2 , ,n }
  { p1 , p2 , , pn } . As a result functional T
gives us the matrix with elements
tij  T  F  l ( j , p j , t )  . Determine scanning
provide us with unambiguous value for each
matrix element. We call this matrix the tracetransform.
We should notice that trace functional is not
necessarily to be defined by properties of section of image by scanning line (number of intersections, sum of length of intersections
etc.). We can evolve information about neighborhood of this section to compute trace functional. It is especially actual for grayscale image scanning.
Trace-transformation is the first stage of triple
feature forming. The further computation consists of consequent application of diametrical
functional Ρ to the matrix columns. As a result
we obtain 2  -periodical curve (or vector if
this is a discrete case). The further information
compression is performed by means of circus
functional Θ , which gives us certain value —
the feature of an image.
Thus, we calculate a new triple feature as a
consecutive composition of three functionals:
 ( F )  Θ Ρ T( F  l ( , p, t )) , где каждый
функционал ( Θ , Ρ и T ) действует на
функции одной переменной (  , p и t )
соответственно.
Triple features are generated formally using a
collected library of functionals for learning
sample, without taking into account geometrical meaning or other a priori characteristics
of features. Then we select the small number
of the most informative features according certain criteria. Feature selection is often called
feature space minimization process, which is
based on mathematical statistics and information theory application. The main advantage
of this approach is its universality which allows to use it in cases when it is difficult to
specify concrete geometrical characteristics
important for classification (we think that it is
typical for the majority of applications). The
main disadvantage of this approach is its high
computational complexity for recognition system learning since we should generate thousands of features to select a small number of
the most informative ones.
Feature space minimization
The most effective features minimum set
searching procedure based on Karhunen-Loeve
decomposition coefficients was developed to
minimize feature space after generation.
The reason for using discrete form of
Karhunen-Loeve decomposition is that it has
the following optimum properties:
- it minimizes root-mean-square error using
only finite number of basic functions in decomposition;
- it minimizes function of entropy expressed
through variance of decomposition coefficients.
Let patterns are subjects to be classified into k
classes 1 , 2 ,..., k . Let we denote the sample
of values of k features referred to one of the
 xi (t1 ) 
 x (t ) 
 i 2 
classes i , i  1,..., k as x i  
.
...


 xi (ts ) 
Discrete form of generalized Karhunen-Loeve
decomposition could be expressed by the fols
lowing formula: xi   cij  j , or in matrix
j 1
form xi  ci , it is expected that coefficients
cij meet the condition E{cij }  0 . Expectation
28
operator is computed on all values cij . Correlation matrix is defined according to the following formula:
k
R   p (i ) E  xi xi ,
(1)
i 1
where p(i ) is the estimation of occurrence
probability of i -th class, i  1, 2, , k .
Decomposition coefficients are provided by
formulas:
ci  xi  ci  xi  ci  xi ,
since   I owing to orthonormality of Cartesian vectors which forms matrix  . The theoretical justification of Karhunen-Loeve decomposition is considered in [6], therefore we
turn to algorithm of informative recognition
features minimum set searching based on
Karhunen-Loeve decomposition coefficients.
Let we denote the sampling of feature values
j ( j  1,..., s ) of object of class i , i  1,..., k
as x ji . We form matrix of expectations in the
following way:
E[ x11 ] E[ x12 ] ... E[ x1n ]
E[ x21 ]
...
E[ x22 ] ... E[ x2 n ]
...
...
...
Dr  Dl , then feature xr possesses better separating power than feature xl . Feature xr
brings more information than feature xl .
In order to exclude the least informative features, we find the sum of all variances
l
S   D j . We will include in the set of inj 1
formative features ones in the order of correspondent variance decreasing, unless sum of
selected variances achieves vS . Our experiments shows that the optimum value of v is
located in the range 0,8  v  0,95 , depending
on required classification precision.
Conclusion
The triple features theory allows obtaining a
lot of features through computerized generation procedure. Minimization procedure is
used to select the most informative features
from generated feature set.
Generation and minimization are performed in
automatic mode that appears to be the undoubted advantage of triple features theory.
,
E[ xm1 ] E[ xm 2 ] ... E[ xks ]
where E[ x ji ] — the average value of j -th
feature for i -th class.
For system consisting of s features we compute correlation matrix according to (1).
Using matrix R diagonalization procedure, we
will obtain eigenvalues D j ( j  1, 2, s ). This
values are nothing but variances of new features system  j . Values D j ( j  1,..., s ) are
ordered in such a way that satisfies the following inequalities:
D1  D2  ...  Dp 1  Dp  ...
When we arrange coordinate functions  j in
their correspondent eigenvalues D j ( j  1,..., s )
descendant order, decomposition coefficients
are also arranged in the order of their separating power decreasing. The first one brings the
largest amount of information. It means that id
the functions r and l correspondent to variances Dr and Dl , and at the same time
References
1. Fedotov N.G. Stochastic geometry methods in pattern recognition. – Moscow: Radio i Svyaz, 1990
(in Russian).
2. Fedotov N.G. The Theory of Image Recognition
Features Based on Stochastic Geometry // Pattern
Recognition and Image Analysis. – 1998. – Vol. 8.
– No. 2. – pp. 264–267.
3. Fedotov N.G., Kadyrov A. A. Image Scanning in
Machine Vision Leads to New Understanding of Image // Proc. of 5th Int. Workshop of Digital Image
Processing and Computer Graphics. – Samara, Russia: Held by the Int. Society for Optical Engineering
(DIP’94), SPIE, 1994. – Vol. 2363.
4. Fedotov N.G., Shulga L.A., Moiseev A.V., Kolchugin A.S. New geometrical dual tracetransformation and its application to nonlinear image
filtration // Artificial intelligence, 2006. - № 2. - с.
117—120 (in Russian).
5. Fedotov N.G., Shulga L.A., Moiseev A.V., Kolchugin A.S. Pattern Recognition Feature and Image
Processing Theory on the Basis of Stochastic Geometry // Proc. of the 2nd Int. Conf. on Informatics in
Control, Automation and Robotics, ICINCO 2005,
Barcelona, Spain, September 2005. — Vol. III, p.
187—192.
6. Tou J., Gonzalez R. Pattern recognition principles.
— Addison-Wesley, 1974.