III. Method of the stochastic game solving

1
Multiagent game model of events synchronization
Petro Kravets
Computer Science Department, Lviv Polytechnic National University, S. Bandery Str., 12, Lviv, 79013, UKRAINE, E-mail:
[email protected]
Abstract – The adaptive game method of events
synchronization in multiagent systems in the conditions of
uncertainty is developed. The essence of a method consists in
alignment of delays of approach of events on the basis of
supervision of actions of the next players. The formulation of
stochastic game is executed and game algorithm for its solving
is developed. Influences of parameters on convergence of a
game method are investigated by means of computer
experiment.
Key words – multiagent systems, events synchronization,
stochastic game, self-learning recurrent method, uncertainty
conditions.
I. Synchronization of systems. Introduction
Synchronization is an acquisition by objects of the
distributed system of a uniform rhythm of work.
Synchronization can be caused by weak interaction
between objects or action of external force. The effect of
synchronization has been opened by Ch. Huygens in
XVII-th century as a result of supervision over the coordinated rate of two pendulum clock. Synchronization of
clocks has been caused by weak influence of own
fluctuations of their general support.
The phenomena of synchronization of systems widely
occur in the nature, in the technician and a society [1–5].
Under certain conditions they can be caused simple
physical or chemical laws or can be caused by collective
interaction of active objects (agents) with complex
dynamics of behaviour. Here as the active it is understood
the autonomous objects, capable to supervise
environment states, to develop and realise operating
signals.
The rhythmic luminescence of a swarm of insectsglowworms, the chirm of crickets, the movement of
shoals of fishes, the flight of flock of birds and others are
synchronization examples in the nature. The coordination
of biological rhythms of live organisms with external
natural factors, the formation of heart rates and electric
rhythms of a brain are synchronization displays in biology
and medicine. Synchronization is the central mechanism
of processing of the information in different areas of a
brain and an information transfer between these areas.
In the physicist synchronization examples are the coordinated fluctuation of the several pendulums, the
induced sounding of organ pipes, oscillation of electric
signals.
In the radio physicist, a radio engineering, a radarlocation, radio measurements, a radio communication
synchronization is used for stabilisation of frequency of
electric, electromagnetic and quantum generators,
synthesis of frequencies, demodulation of signals, control
of the phased antenna arrays, in radio-Doppler systems, in
exact times systems, at different ways of an information
transfer.
In the mechanic the synchronization phenomenon is
used for construction various vibrotechnics.
In power synchronization is used for maintenance of
exact coincidence of frequencies of several electric
generators of an alternating current under condition of
their parallel work on the general loading, for spatial
orientation of the distributed planes of photoelectric cells
for collecting solar light and its transformation into
electric energy.
In astronomy synchronization is shown in an
establishment of parities between average angular speeds
of references of celestial bodies about the axis and speeds
of their orbital movements.
Synchronization of streams of a code and the data is
necessary for maintenance of work of the modern
distributed software and data warehouse with the network
realisation which example is multiagent systems.
In social systems of the phenomenon of
synchronization are observed in movement of a formation
of soldiers, dances, choral singing, flapping of hands in a
concert hall, mass prayers of believers, harmonious work
of legislative and executive authorities and etc.
In multiagent systems synchronization necessary for
maintenance of the co-ordinated work of their
components, message transfer between agents,
maintenance of conditions of self-organising when the
distributed system behaves as complete, is artificial the
generated organism at the expense of internal factors,
without corresponding external influence [4 – 6].
Synchronization processes in systems of the different
nature have much in common and can be studied with use
of the general mathematical and computing tools.
In a role of electrotechnical models of synchronization
of system of the distributed objects as a rule accept
oscillator synchronization networks. For synchronization
studying use methods of theories of control systems,
fluctuations, phase dynamics, mapping theory, nonlinear
environments and networks, chaos, fractals, cellular
automatic machines, etc.
In the synchronization theory two basic parts allocate:
 The classical theory of synchronization which studies
the phenomena in the connected periodic selfoscillatory systems;
 The theory of chaotic synchronization which studies cooperative behaviour of chaotic systems.
Among systems of chaotic synchronization allocate
three main types:
 Full (or identical) synchronization – states of the
connected objects completely coincide;
 The generalised synchronization – exits of objects are
connected through some function;
 Phase synchronization – an establishment of some
parities between phases of co-operating objects, result
“COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE
2
of that is coincidence of their characteristic frequencies
or characteristic time scales.
Distinguish such kinds of synchronization of objects:
 Compulsory synchronization of objects by means of an
external source of signals, for example, the generator of
clock frequency;
 The free spatially-distributed synchronization of objects
among themselves.
Considering inherent in collective of agents factors of a
competition, interaction, cooperation and learning for
research of processes of coordination and synchronization
multiagent systems in the conditions of uncertainty we
use model of stochastic game [7 – 9].
Game synchronization of events in multiagent systems
is the actual scientifically-practical problem not enough
studied now. Unlike synchronization of oscillator
networks which are described by systems of the
differential equations, stochastic games of multiagent
systems investigate difficult behaviour of networks of
intellectual agents with various models of decisionmaking in the conditions of uncertainty on the basis of
artificial intellect methods. During stochastic game there
is a self-learning of agents to choose on the average
optimum pure strategies (actions), reconstructing own
vectors of the dynamic mixed strategies (conditional
probabilities of variants of actions). Under certain
conditions, which are defined by parameters of
environment, in parameters of a game method and criteria
of a choice of variants of decisions, self-learning of
stochastic game leads to synchronization of strategies of
agents.
Construction of a stochastic game method of the
identical
spatially-distributed
synchronization
of
processes of multiagent systems is the purpose of this
work. For purpose achievement are carried out the
formulation of a stochastic game problem, the method is
offered and the algorithm is developed for its solving,
results of computer modelling of stochastic game are
analysed.
II.Statement of a game problem
Let's consider set of agents D which can observe states
each other. Let each agent i  D signals about realisation
of some event during the random moments of time
t1i , t2i ,... . This event is accessible to supervision by the
next agents from a subset Di  D i  D ( D   Di ) on
iD
time interval [tni 1 , tni ) , where n  1, 2... designates a serial
number of the moment of time. In a limiting case
Di  D i  D each agent observes events of all other
agents. Regulating time interval size tni  tni  tni 1
between own events, agents aspire to synchronize events
within local subsets Di . The essence of synchronization
consists in alignment of time of approach of event of each
agent with time of realisation of events of its neighbours.
For this purpose agents randomly and independently
choose one of pure strategies U i  (u i (1), u i (2),..., u i ( N ))
who is current size of a time interval from the moment of
realisation of event tni 1 :
u i (k )  k tmax / N , k  1..N ,
where t max is the maximum value of a time interval
between events.
The way of formation of the moments of time tni
defines a kind of stochastic game: with a synchronous or
asynchronous choice of pure strategies. In synchronous
game all agents simultaneously and independently choose
pure strategies on each iteration n  1, 2... . In
asynchronous game pure strategies are chosen only by
agents for which tni  min(tnj ) .
jD
At the moment of time the tni agent with number i
calculates an average deviation of fronts of time of
realisation of events in a subset of the next agents Di
from time of realisation of own event which is current
loss of the player:
 ni | Di |1  | tni  tnj |  ni i  D ,
jDi
where  is a white noise which simulates discrepancy of
measurement of time of approach of events.
The calculated deviation is used for formation of
current loss  ni  R1 :
i
n
 ni   ni .
Binary loss  ni  {0,1} is defined behind a sign on
differential of a current deviation  ni :
0, if  ni   ni 1  0

.
 
i
i

1, if  n   n 1  0
i
n
Current losses ni  ni (unDi ) are functions of collective
strategies u Di  U Di   U j of players which form local
jDi
subsets Di  D , Di   i  D .
Stochastic characteristics of random losses are not
known to agents a priori.
Efficiency of synchronization of events is defined by
functions of average losses:
1 n
(1)
in   i i  D .
n  1
The game purpose is minimisation of functions of
average losses:
(2)
lim in  min i  D .
n
So, on the basis of calculations of random current
losses { ni } players should choose pure strategies {u ni } so
that with time course n  1, 2,... to provide performance
of system of the purposes (2). Depending on a way of
formation of sequences {uni | i  D, n  1, 2,...} the
multicriterion problem (2) has solutions which satisfy one
of conditions of a collective optimality: on Nesh, Pareto,
etc. [7 – 9].
“COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE
3
For the game solving it is necessary to construct such
method of generating of sequences pure strategies {u ni }
i  D that synchronization of events of agents within
local subsets Di  D i  D has led to global
synchronization of events of all agents from set D .
1) function of average losses:
n | D |1

iD
i
n
,
Generating of time intervals between events
{uni | i  D, n  1, 2,...} we will execute on the basis of
where  in it is calculated according to (1);
2) synchronization level – average quantity of agents with
the synchronized events:
1 n
 n    ti  tmax ,
n  1 iD
where  ( )  {0,1} is indicator function of event,
pni  ( pni (u1i ), pni (u2i ),...,
tmax  max(ti ) is the maximum value of time fronts of
III.Method of the stochastic game solving
discrete dynamic distributions
pni (u Ni )) i  D which in terms of the theory of games
are called as the mixed strategies. Vectors pni  S
possess the value on unit  -simplex [8].
Let's define function of average losses for
corresponding matrix game:


iD
agents.
N
V i ( p Di ) 

p
v i (u Di )
u Di U Di
jDi ;u
j
j
(u j )
u Di
where p  S ; S   S ; u  U .
Di
Di
Di
Di
N
,
Di
jDi
Considering
value
of
a
gradient
 pi V i 
 i

 M  T in i e(uni ) pni  pi  , on the basis of a method
 e (un ) pn

of stochastic approximation [10] we will receive such
self-learning recurrent method of change of vectors of the
mixed strategy:

 i e(u i ) 
(3)
pni 1   Nn1  pni   n T n i n i  ,
e (un ) pn 

where  Nni1 is the operator of projecting on an unit  simplex;  n  0 is parameter of  -simplex expansion;
 n  0 is parameter of a step of learning; e(uni ) is an unit
vector-indicator of a choice of pure strategy uni  u i ;
eT (uni ) is a vector-column.
Parameters  t and  t in (3) define speed of learning of
agents. For maintenance of convergence of process of
learning of stochastic game these parameters are set as
positive, monotonously decreasing sizes:
 t   0 / t ,  t   0 / t  ,
where  0 ,   0 ;  0 ,   0 .
Conditions of convergence a gradient stochastic game
method (3) to an equilibrium point on Nesh with
probability 1 and in mean-square convergence are defined
in [8].
Vectors of the mixed strategy are used for a random
choice of pure strategies i  D :
k






uni  ui [k ] k  arg  min  pni (uni [ j ])    , k  1.. N  ,
k
j 1






where  [0, 1] is the random real number with uniform
distribution.
Efficiency of synchronization of events is estimated:
IV.Results of computer modelling
For research we will construct a multiagent model of
synchronization of events in biological systems, for
example, spontaneously arising rhythmic luminescence of
a swarm of insects-glowworms of Lampyridae family. In
this model agents synchronize the change of states (a
luminescence and fading of fireflies).
Let L agents take places on a straight line. The
distance between agents can be any within availability of
supervision of signals from the next agents. Each agent
i  D fixes the moments of time tnj j  Di of reception
of signals from the next agents and calculates their current
deviation  ni from the moment of time of generating of
own signal . For an example, we will choose a difference
method of formation of a current deviation of the
moments of time:
| tni 1  tni 1  2tni |, if i  1 and i  L;

 ni  | tni 1  tni |, if i  1;
 i 1 i
| tn  tn |, if i  L.
Current deviations of the moments of time are exposed
to action of white noise which models a measurement
error, and used as current losses of players:
 ni   ni   ni .
Value of white noise is calculated as a Gaussian
random variable with a zero mathematical expectation
and a dispersion :
 12

ni  d    j , n  6  ,
 j 1

where   [0,1] is the real random number with the
uniform law of distribution.
The initial moments of time of occurrence of events
accept random values i  1..L . Game begins with not
learned mixed strategies: p0i [ j ]  1/ N , j  1..N , i  1..L .
The maximum time interval between events in all
experiments equals tmax  N . Modelling of stochastic
game is carried out throughout nmax  104 iterations or at
achievement of the set level  n  K of synchronization
of agents.
“COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE
4
On Fig. 1 in logarithmic scale schedules of functions of
average losses and average quantity of the synchronized
players are represented. Results are received for such
parameters of a game method: L  100 , N  2 ,  0  1 ,
 0  0.999 / N ,   0.5 ,   1 , d  0.01 .
Reduction of average losses  n and growth of quantity
of the synchronized agents  n testify to convergence of a
game method owing to performance of system of criteria
(2). At growth of quantity of strategy N the quantity of
the synchronized agents decreases and value of function
of current losses increases.
On Fig. 2 time fronts of approach of event in multiagent
system for the learned stochastic game L  50 of agents
with N=4 strategies are represented. Agents begin game
with different delays t0i , i  1..L . Despite it, for a small
amount (~100) steps they enter into a mode of
synchronization of events.
The carried out researches allow to assert that the
recurrent game method (3) provides synchronization of
events for variants of a synchronous and asynchronous
choice of pure strategy, variants of not binary and binary
losses.
The results received in this article allow to establish
optimum values of parameters of a game method for the
problem solving of synchronization of events in
multiagent system for the shortest time.
Results of work can be used for the coordination of
work of components of multiagent systems of different
function, for message transfer between agents, for
construction
of
communication
protocols,
for
maintenance of conditions of self-organising of
multiagent systems.
In the offered model of synchronization of multiagent
systems the intellectual level of agents is limited to
possibilities of the theory of stochastic automatic
machines with variable structure. Increase of
intellectuality of agents by probably attraction of methods
of an artificial intellect, for example, artificial neural
networks, Bayesian networks of decision-making, the
fuzzy logic, the reinforcement learning and etc.
References
Fig. 1. Characteristics of synchronization of stochastic game
Fig. 2. The time diagramme of synchronization of events
Apparently on Fig. 2, alignment of time fronts of
approach of events of agents is result of learning of
stochastic game. Growth of quantity of pure strategies
leads to occurrence of the frontal noise caused by a
random choice of time intervals between events.
Conclusion
The considered game method at appropriate adjustment
of its parameters provides synchronization of events in
multiagent system. Theoretically, value of such
parameters should satisfy to base conditions of stochastic
approximation. In practice, values of parametres which
provide convergence of a game method to one of points
of a collective optimality, can be specified as a result of
computer modelling.
[1] Lindsey W.C. Synchronization system in
communication and control / W.C. Lindsey. – PrenticeHall, Englewood Cliffs. – New Jersy, 1972.
[2] Блехман И.И. Синхронизация в природе и технике
/ И.И. Блехман. – М.: Наука, Главная редакция
физико-математической литературы, 1981. – 352 с.
[3] Устойчивость, структуры и хаос в нелинейных
сетях синхронизации / В.С
Афраймович.,
В.И. Некоркин, Г.В. Осипов, В.Д. Шалфеев / Под
ред. А.В. Гапонова-Грехова, М.И. Рабиновича. –
ИПФ АН СССР. – Горький, 1989. – 256 с.
[4] Пригожин И. Самоорганизация в неравновесных
системах:
От
диссипативных
структур
к
упорядоченности через флуктуации / И. Пригожин,
Г. Николис. – М.: Мир, 1979. – 512 с.
[5] Хакен Г. Информация и самоорганизация. Макроскопический подход к сложным системам: Пер. с
англ./ Г. Хакен. – М.: КомКнига, 2005. – 248 с.
[6] Wooldridge M. An Introduction to Multiagent
Systems / M. Wooldridge. – John Wiley & Sons, 2002.
– 366 p.
[7] Fudenberg D. The Theory of Learning in Games /
D. Fudenberg, D.K. Levine. – Cambridge, MA: MIT
Press, 1998. – 292 pp.
[8] Назин А.В. Адаптивный выбор вариантов:
Рекуррентные алгоритмы / А.В. Назин, А.С. Позняк.
– М.: Наука, 1986. – 288 с.
[9] Мулен Э. Теория игр с примерами из
математической экономики / Э. Мулен. – М.: Мир,
1985. – 200 с.
[10]
Граничин
О.Н.
Введение
в
методы
стохастической аппроксимации и оценивания:
Учеб. пособие / О.Н. Граничин. – СПб.:
Издательство С.-Пб. университета, 2003. – 131 с.
“COMPUTER SCIENCE & INFORMATION TECHNOLOGIES” (CSIT’2014), 18-22 NOVEMBER 2014, LVIV, UKRAINE