Extracting the Recurring Patterns from Image
Pengyu Hong, Thomas Huang
Beckman Institute
405 N. Mathews
University of Illinois at Urbana Champaign
Urbana, IL61802, USA
{hong, huang}@ifp.uiuc.edu
ABSTRACT
This paper presents a preliminary research on extracting the
recurring patterns from an image. In practice, the recurring
patterns have high correlation with the objects that appear
recurrently in the image. We assume that the patterns only
undergo shift transformation in the image. The main idea of
our method is to use the local context information for
detecting the 2D distinguishable sub -patterns (DSP) that
provide us the spatial information of the patterns in the
image with high confidence level. The 2D DSPs are
distinctive features of the patterns. The computation is then
focused on the signals around those DSPs and eventually
grows the DSPs into patterns. To handle the noisy image,
we develop dynamic local K -means to quantize the image.
The shapes and the position of the DSPs are found in the
quantized image. The pattern growing procedure is
performed on the original noisy image based on the
information provided by the DSPs.
Keywords: Recurring pattern, Distinguishable sub -pattern,
Dynamic local K-means.
1.
INTRODUCTION
In this paper, we present a n ovel approach for extracting
recurring patterns from an image that might be formed by
combining several small images side by side together. We
assume that the patterns are only undergoing shift
transformations. Previous pattern learning approaches
represent each example as a linear combination of
component vectors. Those approaches are Principal
Component Analysis (PCA) [1], Factor Analysis (FA) [2],
Independent Component Analysis (ICA) [4, 5], and
Transformed Component Analysis (TCA) [3]. Given a set
of arbitrary images without the prior knowledge about the
sizes, positions, and the numbers of the patterns in the
images, it is very difficult to decide the number and the size
of the component vectors.
range. An im portant observation is that a pattern can be
decomposed into ambiguous sub
-patterns ( ASP) and
distinguishable sub -patterns ( DSP) (Figure 1). The DSPs
represent the distinctive features of a pattern, which make a
pattern different from the non -pattern signa ls or the other
patterns. The ASPs represent the context information that
makes the DSPs as the distinctive features of the patterns.
Based on this observation, an iterative coarse to fine
searching is performed to extract the patterns. The
algorithm first identifies the DSPs by learning the local
context information. Then it focuses on the signals around
those DSPs and uses maximum likelihood criteria to grow
them into whole patterns.
In this paper, we extend the above method to handle 2D
signals (image). A method similar to [6] is used to find the
2D DSPs. To handle the noise image, we assume the
distribution of the noise is Gaussian. The noisy image is
quantized by applying dynamic local K-means based on the
pixels’ local context information. The DSP can didates are
found in the quantized image. Those DSP candidates are
classified into different classes base on their areas and
color moments. For each class, we use template -matching
technique to decide the transformations that best align the
DSP candidate i n the class. The true shape of the DSP is
calculated based on the aligned
DSP candidates. Those
transformations give us the spatial information of the
patterns in the image. The computation is then focused on
the correspondent local area around the
DSPs in the
original image to extract the whole patterns.
The most significant advantage of our approach is that the
searching space of the shift transformations is not
predefined. Instead, it is decided by the positions of the
DSPs. Thus the computation comple xity for searching the
transformations is low and the opportunities of falling into
an unreasonable local minimum are reduced dramatically.
The experimental results show this approach is promising.
2.
A1 D
1
A
2
D
2
……
An
In this section, we briefly go over the 1D pattern extraction
technique, and then extend it to 2D noisy free image.
A Pattern
Figure 1. A pattern consists of several ASP
A2, …, An) and DSPs (D1, D2, …, Dn).
PATTERN EXTRACTING
s ( A1,
In 1D case, Hong, et al. [6] proposed an iterat ive coarse to
fine technique to extract multi -temporal sequence pattern
from a signal sequence, which is consisted of pattern
signals and non-pattern signals. The pattern signals and the
non-pattern signals (random signals) have the same value
2.1. Extracting Pattern from 1D Signals
In [6], a method is proposed to blindly extract
multi-temporal sequence pattern from a signal sequence.
The pattern is defined as a segment o f signals that recurs at
least a certain number of times
PT (the population
threshold) and with some minimum length requirement.
The population threshold PT is decided by the users. If the
PT is set too high, some or even all of the patterns will be
missed. If it is set too low, some non -pattern segments will
be extracted.
First an Elman network (simple recurrent neural network)
[7] is trained as a one -step predictor to learn the local
context information of the signals. The trained Elman
network is used t o reestimate the training signals. By
examining the output of the trained Elman network, a set of
DSPs is selected from the training signal sequence. Those
DSPs are the segments of the signals that are well predicted
by the trained Elman Network. Their pop ulations are larger
than PT. The algorithm then focuses on the signals around
the DSPs and uses maximum likelihood criteria to expand
them into whole patterns.
A simple example is given to illustrate how the technique
works. We sampled a piece of music n ote sequence for its
pitch level at every a quarter note. The music has 11 pitch
levels. We represent the silence and the 11 pitch levels as
the value from 0 to 11. Figure 2(a) shows the music signal
(solid line) and the corresponding output of the trained
Elman network (dash line). Four DSPs (Figure 2(b)) are
found by examining the output of the trained Elman
network. Each DSP repeats twice in the signal sequence.
After growing, some DSPs merge together. We get only
one pattern shown in Figure 2(c). The pa ttern recurs twice
in the sequence.
arg max P( x i | X N ) = −1 if the number that
xi recurs
xi
given X N is smaller than the population thre shold.
Similar to the 1D case, we predict the value of the pixel
based on its neighbor pixels and collect the segments of the
image that are correctly estimated. Those segment whose
populations are smaller than the predefined population
threshold are disc arded. The rest segments are selected as
the 2D DSPs. To grow the 2D DSPs, those non-DSP pixels
adjacent to the border of the DSPs are examined. We first
select a DSP with the maximum number of recurrence. The
conditional probability P ( xi | DSP, u ) and the population
of each pixel X are calculated. The parameter u denotes
the relative position of X to the DSP. Those pixels that
have the same relative position to the same
DSP in the
image are compared. The one are with maximum
conditional probability are added to the
DSPs if their
populations exceed the population threshold. This
procedure is repeated until no more pixels satisfy the
condition. The pixels of the newfound pattern are discarded
from the image. The next DSP is selec ted and the same
procedure is repeated. Some patterns may share some
DSPs, the above mechanism will extract the pattern with
larger size first.
(a)
(b)
(a)
(b)
(c)
Figure 2. Extracting Pattern from Music.
(c)
2.2. Extracting Pattern from 2D Signals
We first extend the above method to the noise free image,
and then try to handle the noisy image later. In our
experiments, the value range of the image is 0 -255. It turns
out that it is difficult to train the Elman network on the real
images because: (1) the number of the hidden units
required becomes very large; (2) the data is always not
evenly distributed. We replace the Elm
an network by
estimating the value of a pixel
X as
x = arg max P( x i | X N ) , where X N stands for the
xi
neighbors of the pixel
X and
P( x i | X N ) is the
probability of xi given XN. We hope that only the value of
the pattern signa ls can be well estimated because a pattern
pixel given it context information recurs more frequent
than non
-pattern signals. Hence, we set
(d)
(e)
(f)
Figure 3. 2D Patterns
A simple example is illustrated in the figure 3 to show how
it works. There are 4 patterns in the image (240 x 200) and
PT is set as 4. We pre -decide the shapes of the patterns.
Their positions in the image are randomly generated. The
number of the times that each pattern recurs is randomly
generated from 4 -- 7. Th e values of the pattern pixels and
the non-pattern pixels are randomly generated that
uniformly distributed over [0,255]. Figure 3(a) is the
synthesized image. Figure 3(b) shows the shapes of the
pattern and their positions in the image. Figure 3(c) shows
the positions of the DSPs found in the image. There are
five DSPs found in the image. They are zoomed in and
shown in Figure 3(d). All the patterns are extracted and
their positions are shown in Figure 3(e). Figure 3(f)
illustrated the extracted patterns.
We intentionally make that Pattern 3 and Pattern 4 share a
subpart, so that they share a DSP. Pattern 4 is extracted
first because it has bigger size. A close look may find out
that a few of non -pattern pixels along the boundary of the
pattern are also e xtracted. This is due to those pixels
happening to recur several times when the signals are
randomly generated. Up to now we are working on noisy
free signal. In the following section, we will extend it to
handle noisy images.
3.
DYNAMIC LOCAL K-MEANS FOR IMAGE
QUANTIZATION
To handle noisy images, we do image quantization and find
the DSP in the quantized the image. After identifying the
shapes and the positions of the
DSPs in the quantized
image, we go back to the original image and extract the
patterns. I mage quantization has been widely investigated
in image processing [9, 11]. The criteria of a “good”
quantization method here is not necessary better
compression rate and less distortion. Instead, we hope the
“good” quantization method assign the same valu e to the
pattern signals that are supposed to be the same given their
local context information despite the existing noise. The
quantization method should consider the fact that the
pattern signals given their context information recur more
frequently than non-pattern signals.
Assuming the distribution of the pattern signals is
unknown and the noise is Gaussian with zero mean and
covariance σ, we develop a dynamic local k -means to do
the data quantization. The input data vector
pi of the
quantization algo rithm is the intensity vector formed by
taking the gray values of a pixel x and its neighbors x h and
putting them sequentially into the vector, where
h = 1,
2, …, H.
3.1. Local K-means
The quantization problem can be formulated by finding a
mapping between the intensity vector of the original image
P = {pi, i = 1, 2, … N} to a set of intensity vectors Q = {qi,
i = 1, 2, … M}. The nearest-neighbor principle states that
the mapping is equal to:
m( pi ) = arg min pi − q j
(1)
qj
The mapping function minimizes the error:
Error =
1
N
N
i =1
p i − m( p i )
(2)
A widely use technique for finding the local optimal
solution for Eq(2) is k -means algorithm [12]. One starts
with a random configuration and qi0 . At each iteration t a
new Q is computed as follows:
q tj+1 =
1 L
( pi | m( p i ) = q tj )
L i =1
(3)
However, high cost of computation makes K -means time
consuming for image quantization. In [10], Local K -means
technique is used to quantize images. Local K -means can
be considered as a special case of the Kohonen ma p [9].
Each time a data is used to update Q by:
q tj+1 = q tj + α t ( pi − q tj ), if m( pi ) = q tj .
= q tj , otherwise.
(4)
where α t is a decreasing sequence that ensures convergence
of a solution to a local minimum.
3.2. Dynamic Local K-means
It is difficult to decide how many classes ar e necessary to
make the quantization method “good”. A natural approach
is to start with a small number of the classes and add new
classes gradually. Every time the local K -means procedure
stops, the whole data set is classified into local data set by:
S k = { p i | m( p i ) = qk }, k = 1,2,... M
(5)
The mean of each local data set is used to update qi . Then
the error of each local set is simply calculated as:
E Sk =
1
N sk
N sk
i =1
2
pi − q k ,
pi ∈ S k , S k = N sk (6)
If there are some classes whose error is larger than
(H+1)*σ 2, the one with maximum variance is split. The
covariance matrix of the class S k to be split is calculated.
The old class center q k is replaced by two new vectors:
qk1 = q k +
qk 2 = qk −
λ1
2
λ1
2
u1
(7)
u1
where λ1 is the largest eigenval ue of the covariance matrix
and u1 is the corresponding eigenvector.
3.3. Quantize the image
The quantizer trained by dynamic local K -means is then
used to quantize the image. The data vector formed by the
intensity value of an image pixel x with its neighbors is
classified to a class say S k . The quantized value of x will
be assigned as the value of the corresponding element
of q k .
4.
EXTRACT PATTERN FROM NOISY IMAGE
4.1. Find the DSP Candidates
Given a noisy image, the dynamic local K-means described
in section 3 is used to quantize the image. The quantized
image is considered as noisy free image. The prediction
method of section 2.2 is applied to the quantized image.
Those large segments of image that correctly predicted are
collected as DSP candidates. Since the quantization results
may not be ideal, one cannot hope a
DSP will appear
exactly the same in the different places of the image.
An example is shown in figure 4. The original gray image
(340 x 340) has two patterns. A zero mean Gaussian noise
with the variance of 6 is superimposed to the original clean
image. The noisy image is shown as figure 4(a). Figure 4(b)
is the quantization result of the dynamic local K
-means.
Basically, the edges are blurred out in the quantized image.
The value of each pixel in the quantized image is predicted
by its neighbors. Those correctly predicted pixels are show
in the figure 4(c). The large well predicted segments are
shown in figure 4(d) corresponding to their positions in the
original image. No segments are in exactly the same shape.
The shapes of some DSP candidates are similar.
4.2. Verify DSPs
The area and color moments [8] of each
are calculated:
=
=
∞
M 00 =
M 10
M 01
cx =
∞
−∞ −∞
∞
∞
− ∞ −∞
∞
∞
−∞ −∞
DSP candidates
f ( x, y )dxdy
xf ( x , y )dxdy
yf ( x, y )dxdy
M
M 10
, c x = 01
M 00
M 00
µ jk =
∞
∞
−∞ −∞
( x − c x ) j ( y − c y ) k f ( x, y )dxdy
(8)
Each DSP candidate has a feature vector f = < a, m, h>T,
where a is the area, m = M00 / a, and h = µ20 + µ02.
Those DSP candidates should be classified into classes.
The DSP candidates in the same class are considered as a
same DSP but in the different positions of the image.
Similar to the quantization problem, we don’t know how
many DSP classes should be. Again, we choose dynamic
K-means as the solution. The similarity of a
DSPs
candidate to the class C is define by the
Mahalanobis
distance:
d 2 = ( f − f c ) T Σ −1 ( f − f c )
(9)
This distance measurement reflects how similar the DSPs
in a same class are. Beginning with one class, if the
classification results show that the similarity between the
members of the class is low, a new class will be added
using the same way illustrated by Eq(7). If the number of
the members in a class is too small, those members will not
be considered as DSPs. After verification, three DSP sets
are selected from figure 4(d). There positions are shown in
figure 5(a).
For every DSP class, we calculate the shift transformation
of each DSP in the class that gives out the best alignment
result of the DSPs by minimizing the error:
N
i =1
Ti ( p i ) −
1
N
N
k =1
2
error =
Tk ( p k ) , Ti ,Tk ∈ . (10)
Ti and Tk are the transformations applied to the DSPs pi and
pk respectively. Ψ is a set of transformations that Ti and Tk
can choose from. The idea of the above equation is to find
the transformations that give out the best template
matching results. Obviously, the sizes (height and width) of
the DSPs and the positions of the DSPs in the image decide
the family of Ψ. The expectation minimization algorithm
[14] can be used to solve the above minimization problem.
N
i =1
i ≠k
Ti ( pi ) − pk
2
, pk is the fixed DSP.
(11)
After aligning the DSP candidate in the same DSP class,
the shape of the DSP is recalculated by overlap all of them
together. A pixel belongs to the DSP if and only if the pixel
is shared by all of the DSP candidates. Figure 5(b) shows
the verification result of the DSPs in figure 5(a). There are
3 DSPs. The verification procedure may break a DSP into
several parts. We consider those parts together as a whole
DSP.
4.3. Growing the DSPs in the Noisy Image
Once the DSPs are verified, the method similar to the
section 2.2 can be used to grow the DSPs into the patterns.
The pixels along their boundaries are examined.
The
intensity values of the pixels that have the same relative
position to the DSPs from a DSP class are classified into
classes via dynamic K -means. The variance of each class
should be smaller than the noise variance. If there is a class
whose members are larger than the population threshold PT,
the pixel is attached to the DSPs. The value of the pixel is
set as the mean of the class. The procedure is continued
until no more pixels can be added to the DSP. Then we say
a new pattern is found. All the pixe ls belong to the pattern
are removed from the image. A new DSP is selected and
the whole procedure is repeated until all DSPs are checked.
Figure 6 (a) shows the patterns extracted from image figure
4(a). Figure 6(b) is the ground truth. Some pixels are
missed. This is due to both the noise superimposed and the
dynamic K -means that can only guarantee local optimal
solution.
5.
where Σ = E [( f − f c )( f − f c ) T ]
and fc is the center of the class C.
error =
In practice, to further reduc e the computation complexity,
we can simply fix one DSP and transform other DSPs to
minimize the error:
CONCLUSION AND DISCUSSION
We propose a technique for extracting recurring patterns
from the image. The main idea is to detect the 2D
distinguishable sub -patterns that provide us the spatial
information of the patterns in the image with high
confidence level. The computation is then focused on the
signals around those DSPs and eventually grows the DSPs
into patterns.
However, in the daily life, we are facing the images that are
suffered not only from shift transformations but also from
other transformations ( e.g. scaling, rotation, warping,
lighting change, etc.). An efficient feasible approach for
handling these problems in more general and for large scale
of image database is under further pursuit.
6.
REFERENCES
[1] I. T. Jollife,
Principal Component Analysis
1986-Springer series in Statistics.
[2] B.S. Everitt, An Introduction to Latent Variable
Models, Chapman and Hall, NY. 1984.
,
[3] B. J. Fr ey and N. Jojic, “Transformed Component
Analysis: Joint Estimation of Spatial Transformations
and Image Components,”
IEEE International
Conference on computer Vision 1999
, Kerkyra,
Greece.
[4] A. J. Bell and T. J. Sejnowski, “An information
maximization approa ch to blind separation and blind
deconvolution,” Neural Computation , vol. 7, pp
1129-1159, 1995.
[5] M. Girolami, A. Cichocki and S. Amari, “A Common
Neural-Network Model for Unsupervised Exploratory
Data Analysis and Independent Component Analysis,”
IEEE Tran s. on Neural Networks , vol. 9. no. 6, pp.
1495-1501, Nov 1998,
(a)
(b)
[6] Pengyu Hong, Sylvian R. Ray, Thomas S. Huang, “ A
New Scheme for Extracting Multi Temporal Sequence
Pattern,” International Joint Conference on Neural
Networks, Washington, DC. July 1999.
[7] J. L. Elman, “Finding Structure in Time,”
Science, 14, 179-211, 1990.
Cognitive
[8] M. K. Hu, “Visual pattern recognition by moment
invariants,” IRE Trans. Info. Theory, pp. 179-187, Feb.
1962,
(c)
(d)
Figure 4.
[9] J. A. Kangas, T.A. Kohonen, and J.T. Laaksonen,
“Variants of self
-organizing maps,”
IEEE
Transactions on Neural Networks, 1(1):93-99, 1990.
[10] Oleg Verevka, John Buchanan, “Local K
-means
algorithm for color image quantization,” Proceedings
of graphics Interface 95, Quebec City, Canada, 1995.
[11] A. N. Netravali and B.G. Haskell, Digi tal Pictures:
Representation and Compression, Plenum, New York,
1988.
[12] Y. Linde, A. Buzo, and R.M. Gray, “An algorithm for
vector quantizer design,”
IEEE Transactions on
Communication, 28(4), pp.84-95, Jan. 1980.
(a)
Figure 5.
[13] Duda, Peter E. Hart and David G. Stork, Patt
ern
Classification and Scene Analysis, Wiley, New York,
NY, 1998.
[14] T. M. Mitchell,
Machine Learning
McGraw-Hill, NY, 1997.
7.
(b)
, WCB/
ACKNOWLEDGMENTS
This work was supported in part by National Science
Foundation Grant CDA 96-24386.
(a)
(b)
Figure 6.
© Copyright 2026 Paperzz