Influence on ART2 Clustering Algorithm with Different Adjusting

Sept. 2012, Vol. 2 Iss. 9, PP. 30-34
Communications in Information Science and Management Engineering
Influence on ART2 Clustering Algorithm with
Different Adjusting Learning Rate*
Shujie Du
Computer Center of Ocean University of China
238, Songling Road, Qingdao, Shandong Province, China
[email protected]
Abstract- The process of ART2 can recognize learned models
fast and be adapted to new objects rapidly. It carries out
clustering with hierarchy structure by using competitive
learning and self-steady mechanism in dynamic environment
with noise and without supervision. Here discuss the commonused learning rules at first. The way to adjust learning rate is
suggested and the assimilation effect is verified by a shape
learning trial. The categorization results are also compared to
illustrate the effects of different learning rates. To some extent,
the improved algorithm solves the pattern drifting problem.
field F2. F1 is similar to the comparison layer in ART1, which
includes some calculating tiers and gains controller. F2 is
similar to the recognition layer in ART1, which is responsible
for competitive matching to current input modes [2]. Supposed
F1 and F2 have N neurons, where F1 contains M neurons and
F2 contains N-M neurons, which constitute N-dimensional
state vectors to indicate the short-term memory in this
network. The in & out connection weight vectors between F1
and F2 constitute a self-adaptive long-term memory. Downtop and top-down weights are identified by zij, zji
respectively.
Keywords- ART2; Assimilation Effect; Data Clustering;
Learning Rate; Adaptation Process
I.
INTRODUCTION
The Adaptive Resonance Theory (ART) raised by S.
Grossberg and A. Carpenet in 1976 is a self-organizing
neural network [1]. When the neural network and environment
have interactions, if environment information coding will
spontaneously produce in neural networks, the selforganizing system is activated. ART grows out of the
competition network interaction model, which is composed
of two cooperation-competition components. The typical
adaptive resonance neural networks mainly include ART1
and ART2. ART1 handles with binary input vectors, while
ART2 handles with any analog input vectors, so it has more
broad usage.
Fig. 1 ART2 neural network
ART2 is an unsupervised learning neural network raised
in 1987. It is based on competitive learning mechanism, and
heavily influenced by the model of organism’s memory. Its
memory capacity would increase as learning patterns grow
more and more. ART2 can not only learn offline, but also put
into use while learning online, which means that learning and
application can’t be separated absolutely. Different from
other neural networks, ART2 has fast learning capability.
Once sample vectors have been input, all the important
features would be memorized in long-term memories. The
most important feature is that its memory capability could
strike a medium perfectly between stability and elasticity,
keep weighting coefficients mainly unchanged to make
memory system stabilized as well as to adapt vectors
changing gradually. So ART2 system is adapted to various
kinds of stable or dynamic environments, and becomes one of
the ideal clustering algorithms.
M neurons in F1 receive input mode X from outsides,
handled with features enhancement and noise suppression in
F1, they are transferred to F2 by down-top weight zij. N-M
neurons in F2 receive signals from F1, determine the winner
by competitions, the winner neuron will be activated, while
others are inhibited. Then the in & out weight vectors
connected with the active neurons should be adjusted. The
gain controller takes charge of comparing the similarity
between input mode X and out weight vector of the active
neuron in F2. If the similarity is lower than threshold, the
reset subsystem (Fig. 2) sends out signals to deactivate the
winner neuron in F2, and choose another winner neuron in F2,
until the similarity suits the demand. A new pattern can
definitely be assigned to some neuron respectively if neuron
numbers N-M is larger than that of all possible input modes.
III.
II.
LEARNING ALGORITHMS OF ART2
THE WORKING PRINCIPLE OF ART2
A. Mathematical Model of Feature Representation Field F1
The structure of ART2 is shown as Fig. 1, and topology
of unit i is shown in Fig. 2. ART2 is composed of two layers:
feature representation field F1 and category representation
There exist M processing units in F1, each of which is
composed of 3 layers: up, middle and down. There are two
- 30 -
Sept. 2012, Vol. 2 Iss. 9, PP. 30-34
Communications in Information Science and Management Engineering
kinds of neurons (represented by circle in Fig. 2) in each
layer, circles that can appear solid or empty. Neurons
represented by empty circles have two kinds of input
excitation, which compares two kinds of input vectors,
activate or restrain the excitation. Neurons represented by
solid circles achieve calculating modulus of the input vectors
[3], [4]
.
The middle and upper layer in F1 also constitutes a closed
positive feedback loop, which includes the calculation below.
pi  ui 
N

j  M 1
g ( y j ) z ji
(6)
si  pi / (e‖P‖)
(7)
B. Mathematical Model of Category Representation Field
F2
Reset
The function of F2 is to determine the maximum activated
node by competitions, whose weight vectors have maximum
similarity to input vectors. Supposed the input vector of Node
j in F2 is:
M
T j   pi zij
j=M+1, ……, N
(8)
i 1
The winner in F2 should be chosen as following:
T j*  max T j 
j=M+1, ……, N
(9)
While node j* is the maximum activated node, other nodes
should be restrained:
d
g( y j )  
0
j  j*
j  j*
(10)
d is a top-bottom (F1→F2) feedback parameter,0 < d < 1.
According to (10), Equation (6) can be simplified as
following:
u  dz ji
pi   i
 ui
The bottom and middle layers in F1 constitute a closed
positive feedback loop, which includes two normalized
calculations and one nonlinear transformation. The input
equation and normalization in bottom layer are given below.
dz ji
zi  xi  aui
qi  zi / (e‖Z‖)
dt
dt
dzij
(1)
 g ( y j )( pi  z ji )
(F1→F2)
(12)
 g ( y j )( pi  zij )
(F2→F1)
(13)
when F2 has determined the competition winner j*, if j≠j*,
(2)
The input equation and normalization in middle layer are
shown below:
dz ji
dt
 0,
dzij
dt
0
if j = j*,we can get the conclusion from Equations (12),
(13).
(3)
(4)
dz j*i
The tiny positive real number e in above equation can be
ignored compared with ‖V‖ and ‖Z‖.
u
 d ( pi  z j*i )  d (1  d )( i  z j*i )
dt
1 d
(14a)
As for the nonlinear transformation function f(x) between
bottom and middle layer, middle and upper layer, they are
usually adopted in the following way.
0 0  x  
f ( x)  
x 
x
(11)
C. Weight Regulation Rules
Weights are adjusted as following equations:
Fig. 2 Topological graph of ART2
vi  f (qi )  bf ( si )
ui  vi / (e‖V‖)
j  j*
j  j*
dzij*
u
 d ( pi  zij* )  d (1  d )( i  zij* ) (14b)
1 d
dt
After iterated once, the adjusted results given by (14a),
(14b) are changed to:
(5)
- 31 -
Sept. 2012, Vol. 2 Iss. 9, PP. 30-34
Communications in Information Science and Management Engineering
z j*i (t  t )  tdui (t )  (1  td (1  d )) z j*i (t )
(15a)
zij* (t  t )  tdui' (t )  (1  td (1  d )) zij* (t )
(15b)
Weight vectors are initialized
z ji  0, z ij  1 / 
 (1  d )
M

i  1, 2,  , M ; j  M  1, M  2,  , M  N ; d  0.9 .
The function of orientation subsystem is to determine
whether or not reset F2 according to similarity matching. The
matching is defined as following.
ui  cpi
ri 
 cP‖
e‖U‖‖
i=1, 2, …, M
TABLE I SUPERVISED LEARNING AND TEST RESULTS (a=10,b=10,REFERRED
IN eq.(1), (3))
DIFFERENT EFFECTS PRODUCED BY SLOW AND FAST
LEARNING RATE
As mentioned in Equations (15a), (15b), it gives the
learning rule of vectors zji and zij, which corresponds to long
time memory (LTM) system of the winner neuron J in F2.
The effective exciting duration of input pattern X is
represented by ∆t, which definitely means learning rate, it is
equivalent to time of input samples’ learning in ART2. Some
literatures [5]-[7] put forward the idea that ∆t = 1 in learning
rules, while there isn’t any restrictions that ∆t < 1, but in facts,
too large ∆t could easily cause oscillation or divergence.
Compared with other neural network, the significant feature
of ART2 is that it is a continuous network [8].
If Equations (14a), (14b) equal to 0, that means t→∞,
then they are simplified to the following.
z j*i 
ui
1 d
zij* 
Round
Cluster
Result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
○-?
△→○
○
△→○
△→○
△-?
○
□→○
□→○
□→△
□-?
○
△→○
△→○
△
□→△
□→△
□→△
□
○
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
○-?
△→○
○
△→○
△→○
△-?
○
□→○
□→△
□→△
□→△
□-?
△→○
△
□→△
□→△
□→△
□
○
△→○
1
2
3
4
5
6
7
8
9
10
11
12
13
14
○-?
△→○
○
△→○
△→○
△→○
△→○
△-?
○
□→○
□→○
□→△
□→△
□→△
(16)
Set threshold ρ, 0 < ρ <1, the output of solid circle A
represents modulus of similarity, expressed as║R║, if ║R
║< ρ then the orientation subsystem resets F2.
IV.
learning. And the weight adjusting algorithm is based on
slow updating rules described in Equations (15a), (15b).
Learning is taken 20 rounds in each case of ∆t=10, ∆t=5 and ∆
t=1. The results are shown in Table 1. “-?” means raising
question, needs answers from teacher. “□→△” represents
mistaking quadrilateral for triangle, etc.
ui'
1 d
This gives the fast learning rule in most common ART2
[9]
. As for linear function, if fast learning rule is adopted,
LTM vector zj has the same direction with the middle-layer
vector U in F1 produced by input vector X. The significant
feature of fast learning is amplitude of LTM vector reach
1/(1-d) one at a time, which is called fast commitment [10].
There are no adjustable parameters except d (d is
generally close to 1, which makes vector p take z j i as the
*
principal factor) in fast learning, therefore no methods could
be used to adjust learning rates directly. By Fig. 2 and
Equations (1), (3), after the winner in F2 returns z j i , it would
*
pqvu
and
contribute
to
U
via
u  w  x  q , u  p two routines. So parameter a and b
are critical for adjusting learning rates in ART2.
Supposing that a robot is supervised to recognize three
different shapes: circle (angles=0), triangle and quadrilateral,
and the sensed system has been established with computer
vision technology. Teacher supervision is adopted while
- 32 -
Neuron
Activated
∆t=10
1
1
1
1
1
2
1
1
1
2
3
1
1
1
2
2
2
2
3
1
∆t=5
1
1
1
1
1
2
1
1
2
2
2
3
1
2
2
2
2
3
1
1
∆t=1
1
1
1
1
1
1
1
2
1
1
1
2
2
2
Succeed
Vigilance
√
×
√
×
×
√
√
×
×
×
√
√
×
×
√
×
×
×
√
√
0.985
0.9925
0.9826
0.9913
0.9956
0.9857
0.9758
0.9879
0.9940
0.9970
0.9870
0.9771
0.9886
0.9943
0.9843
0.9922
0.9961
0.9980
0.9881
0.9782
√
×
√
×
×
√
√
×
×
×
×
√
×
√
×
×
×
√
√
×
0.985
0.9925
0.9826
0.9913
0.9956
0.9857
0.9758
0.9879
0.9940
0.9970
0.9985
0.9885
0.9943
0.9843
0.9922
0.9961
0.9980
0.9881
0.9782
0.9891
√
×
√
×
×
×
×
√
√
×
×
×
×
×
0.985
0.9925
0.9826
0.9913
0.9956
0.9978
0.9989
0.9889
0.9790
0.9895
0.9948
0.9974
0.9987
0.9993
Sept. 2012, Vol. 2 Iss. 9, PP. 30-34
Communications in Information Science and Management Engineering
15
16
17
18
19
20
□→△
□-?
△→○
△→○
△
□→○
2
3
1
1
2
1
×
√
×
×
√
×
coordinate (α, β) represents isosceles triangle mode, whose
value is (2, tanθ).
0.9997
0.9897
0.9948
0.9974
0.9874
0.9937
These triangle data are tested by different learning rates
and input sequences (sequential, inverse), the parameters in
ART2 model are assumed as following:
From the table above, we come to the conclusion that
assimilation effects and adaptation process are distinguishing
features of ART2. Take ∆t=10 for example, In Round 6,
while vigilance ρ rises to 0.9956, system no longer mistakes
triangle for the previous result circle. New neuron is activated,
ρdrops to 0.9857. At the moment, the input shape circle is
recognized correctly. In Round 10, with ρ rising to 0.9940,
system is aware of the difference of feature vectors between
quadrilateral and circle that has been memorized. So it no
longer mistakes quadrilateral for circle, but still mistakes
quadrilateral for triangle until ρ rises to a higher value
0.9970, whereby the 3rd neuron is activated after the teacher
supervises. In Round 16, system mistakes quadrilateral for
triangle. From Rounds 16 to 18, with ρ rising to 0.9980,
system recognizes quadrilateral again, and in the slow
learning mechanism, analyzes the characteristics of the
quadrilateral once more, adjusts LTM weights, and is taken
into new balance. In last 2 rounds, because LTM weights in
ART2 have been adjusted properly, even if ρ drops to
0.9782, system can still recognize these shapes correctly. As
to ∆ t=5 and ∆ t=1, with the interval decrease,
misidentifications increase. Assimilation effects become
more obvious and adaptation process will sustain even if ρ
is nearly up to 1, this is due to a short period of stimulus
duration, LTM has not approached equilibrium.
In practical application, appropriate learning rate is
important to eliminate noise effectively and keep system
stable, in the mean while, it can also eliminate incorrect effect
that different sample input sequences act on the classified
results. Experiments show that ultra fast learning is
unsuitable for dealing with those data with high-level noise
[11]
. On the other hand, although slow learning could
eliminate noise preferably, it produces meaningless search
process, which leads to slow calculation.
a=8, b=8, c=0.15, d=0.9, e=0.0001, =0.1, ρ=0.999,
zji=0.08-0.001*j, j=0, 1, …, M. ∆t=0.1
As for these base angles, which are input sequentially and
ascending, the comparison result is shown in TABLE Ⅱ.
TABLE II
COMPARISON OF TWO ALGORITHMS IN SEQUENTIAL INPUT
ORDER
Fast Rate(t→∞)
Category
Slow Rate (∆t=0.1)
Range
Scale
Range
Scale
1
1~89
2
None
89
1~60
60
0
61~80
3
None
20
0
81~89
9
On the other hand, the base angles which are input
inversely and descending, the comparison result is shown in
TABLE Ⅲ.
TABLE III COMPARISON OF TWO ALGORITHMS IN INVERSE INPUT ORDER
Category
Fast Rate(t→∞)
Slow Rate (∆t=0.1)
Range
Scale
Range
Scale
1
89~1
89
89~81
9
2
None
0
80~61
20
3
None
0
60~1
60
As shown above, ART2 with slow learning rate could
recognize the process that patterns change gradually, the
improvement of susceptibility to gradually changing process
achieves better classification effects and robustness [14], [15].
VI.
CONCLUSION
Based on traditional fast learning rates, the modification
of weights-updating with slower learning rates can reduce
speed of pattern drifting by demonstration. By the way,
pattern drifting shown in classification applications in ART2
is essential feature but not shortcoming. It isn’t suitable for
some specific samples. In other fields, face recognition, e.g.
[16], traditional ART2 model should be more suitable, it fits
the condition that person getting older is similar to that one
pattern changes gradually as time goes.
The efficient solution for balancing learning rate is to put
forward the model which combines the features of fast
commitment and slow recoding [12]. Fast commitment is
usually used when uncommitted nodes have been activated,
to avoid meaningless search for committed nodes produced
by slow learning. Slow recoding is usually used when
committed nodes have been activated, it uses learning rules
similar to slow learning in order to solve noise problem and
influence of different samples input sequence.
ACKNOWLEDGMENT
V.
This work is supported by the National Natural Science
Foundation of China (Nos.40176014, 40067013).
CORRESPONDENT EXPERIMENTS ANALYSIS
In order to verify different classified results related to
weight modification rates, data from samples [13] are
adopted as the research object which are isosceles triangle
mode with gradual changed base angle, each of these 89 sets
of bivector is composed of base long α and high β. Supposed
base angle θ, θ=1°, 2°, …, 89°, α=2, so β = α*tanθ/2 = tanθ,
We are grateful to Prof.LIU Z.S at Ocean University of
China for comments and suggestions, and wish to thank Dr.
Chen.Z who is well versed in ART2 algorithm. We also wish
to express our appreciation to the Remote sensing institute of
Ocean University of China for powerful analytical supports.
- 33 -
Sept. 2012, Vol. 2 Iss. 9, PP. 30-34
Communications in Information Science and Management Engineering
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
Carpenter G. A., and Grossberg S, “ART2:Self organization of
stable category recognition codes for analog input patterns,”
Applied Optics,vol. 26, pp. 4919-4930, 1987.
Carpenter G. A., and Grossberg S., “ART2-A: An adaptive
resonance algorithm for rapid category learning and
recognition,” Neural Network, vol. 4, pp. 493-504, Apr. 1991.
Frank T., Kraiss K. F., and Kuhlen T., “Comparative-analysis
of fuzzy art and ART2A network clustering performance,”
IEEE Trans on Neural Network, vol. 9, pp. 544-549, Mar.1998.
Shi D., Ong Y. S., and Tan E. C., “Handwritten chinese
character recognition using kernel active handwriting model,”
Systems, Man and Cybernetics, pp. 251-255, Jan. 2009.
Davenport M. P., and Titus A., “Multilevel category structure
in the ART2 network,” Neural Network, vol. 15, pp. 145-158,
Jan.2004.
Klotz G. A., and Stacey D. A., “ART2 based classification of
sparse high dimensional parameter sets for a simulation
parameter selection assistant,” Neural Networks, vol. 31, pp.
1081-1085, Feb. 2005.
Alahakoon D., Halgamuge S. K.,and Srinivasan B. “Dynamic
self-organizing maps with controlled growth for knowledge
discovery,” Neural Networks , vol. 3, pp. 601-614, Nov. 2000.
Martin T.,et al. Neural Network Design. PWS Pub. Co.2006.
Ardavan A., and Seyed S. M., “Application of modified art2
artificial neural network in classification of structural
- 34 -
[10]
[11]
[12]
[13]
[14]
[15]
[16]
members,” 15th ASCE Engineering Mechanics Conference,
Columbia University, New York, Jun 2-5,2002.
Seungdoo P., John M. V., and Raymond J. G. “Direct
oxidation of hydrocarbons in a solid-oxide fuel cell,” Nature,
vol. 404, pp. 265-267, 2004.
Chen Z. G., and Chen D. Z.,“Integrated strategy of pattern
classification and its application,” Journal of Zhejiang
University: Engineering Science, vol. 36, pp. 601-602, Jun.
2010.
Cao Y. Q., and WU J. H., “Projective ART for clustering data
sets in high dimensional spaces,” Neural Network, vol. 15, pp.
105-120, 2002.
Liu L., Hu B., and Shi L. F., “Systematic review of ART2
neural network,” Journal of Central South University, vol. 8,
pp. 21-26, Aug. 2007.
Pham D. T., and Sukkar M. F., “A predictor based on
adaptive resonance theory,” Artificial Intelligence in
Engineering, vol. 12, pp. 219-228, Dec. 2009.
Houshmand G. P. B., “An efficient neural Classification chain
of SAR and optical urban images,” International Journal of
Remote Sensing, vol. 22, pp. 1535-1553, Aug. 2001.
Pham D. T., and Chan A. B., “Unsupervised adaptive
resonance theory neural networks for control chart pattern
recognition,” Proceedings or the Institution of Mechanical
Engineers,
vol.
215,
pp.
59-67,
Jan.
2001.