An Abstract Model of a Coordination Protocol using the UPPAAL

An Abstract Model of a Coordination Protocol using the UPPAAL model checker
Colm Bhandal, Mélanie Bouroche, Arthur Hughes
School of Computer Science and Statistics
Trinity College
Dublin, Ireland
Email: [email protected], [email protected], [email protected]
Abstract—Comhordú is a coordination model for autonomous mobile wireless real time systems. A formalisation
of Comhordú is developed using the UPPAAL framework. The
formal model is then analysed and simulated, with desirable
properties such as system safety machine checked. The formalisation gives us a better understanding of Comhordú and the
verification of the LTL formulae supports previous claims as
to the correctness of Comhordú.
Keywords-Wireless Networks; Mobile Computing; Agent;
Real Time; Mobile Reactive Systems; Coordination Protocol;
Safety Properties; UPPAAL model checker;
I. I NTRODUCTION
The developing paradigm of pervasive systems continues
to pose a number of challenges to researchers across its
many sub disciplines. Mobile reactive systems, composed
of autonomous entities capable of spatial movement, comprise a class of pervasive systems central to which is the
problem of coordination. This class of systems will include,
but is not limited to, systems of robots [1], [2] e.g. for
space exploration, fleets of autonomous cars [?], swarms of
unmanned aerial vehicles [?]. Coordination in this setting is
informally defined as the cooperation of entities within the
system towards the production of a result [3]. A coordination
model over some class of systems is a theoretical framework
which embodies the abstract structure of such a system and
provides a strategy for coordination therein. Coordination
models are useful frameworks upon which multi agent
systems (MASs) may be based. However, the establishment
of the correctness of a model, particularly the protocol
therein, is an essential step to be taken before developers
can trust them and integrate them into the design process.
Furthermore, if a model is to be used at all, it must be clear,
i.e. well defined in some precise language.
For these reasons, we have chosen to formalise and verify
the pre-existing Comhordú model, which was originally
specified using a mixture of English descriptions, some set
c
2011
IEEE. Personal use of this material is permitted. Permission from
IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution
to servers or lists, or reuse of any copyrighted component of this work in
other works.
This work is funded by the Irish Research Council for Science Engineering and Technology, IRCSET
notation and UML diagrams [3], [?]. This paper presents
a preliminary step towards formalising Comhordú and verifying some key system properties. The formal model is an
abstraction of the original, and is hence incomplete. Despite
this, the formalisation provides us with some valuable contributions:
•
•
•
Through the process of formalising the model, a number of ambiguities therein have been discovered and
resolved.
Some aspects of the model have been reconsidered and
redesigned in light of the formalisation.
The claim of system safety in [?] has been approximated in this formal setting and verified for a simple
system.
Section II briefly outlines the Comhordú coordination
model. Section III describes the main features of the UPPAAL model of Comhordú. Section IV covers the analysis
of an instance of the UPPAAL model in terms of the
verification of relevant safety and liveness LTL formulae.
Also covered in that section is an example simulation of the
system. In Section V, the achievements and limitations of
this formalisation are discussed and current and future work
on formalising Comhordú is mentioned.
II. A B RIEF OVERVIEW OF C OMHORD Ú
Autonomous mobile entities, for example autonomous
cars, evolving in the same environment need to coordinate
their behaviour to ensure some system-wide safety constraints (e.g., that no more than one car crosses a junction at
a given time). These entities are assumed to make decisions
independently, and to be able communicate with each other
over wireless networks. In traditional consensus-based solutions to this problem (e.g. [?], [?], [?]), each entity awaits
confirmation from its peers before taking any potentially
non-safe action. This approach relies on the assumptions
that entities have access to reliable communication and that
the number of entities in the system is known a priori. In
practice, however, wireless communication is highly unreliable and the achievable performance varies greatly over time
and location [?]. Furthermore, the number of entities in the
system might be unknown and unbounded (e.g., number of
cars around an intersection).
Comhordú [3], [?] provides a responsibility-based alternative for real-time coordination of autonomous mobile
entities. This approach exploits two typical characteristics
of these type of systems. Firstly, entities have mode(s) of
operation that ensure that they will not violate safety (for
example, by not entering an intersection). In Comhordú,
such a mode is called a fail-safe mode. Note that progress
must require entities to enter non-fail-safe modes, otherwise
ensuring the safety constraint is trivial. Secondly, entities
that are far enough apart will not violate the safety constraints. This allows for a bound on the scope of interactions
required to ensure safety.
In Comhordú, each entity is initially responsible for
ensuring that its behaviour does not lead to a violation of the
safety constraint, by remaining in a fail-safe mode. Entities
can enter a non-fail-safe mode by periodically sending
messages in their vicinity to entities whose behaviour might
conflict with theirs and lead to a violation of the safety
constraint. Upon reception of such a message, an entity must
adapt its behaviour to guarantee safety, in effect becoming
responsible for ensuring that its behaviour does not conflict
with that of the sender.
To palliate the unreliability of communication, Comhordú
exploits the feedback on the current state of communication provided by the space-elastic adaptive routing (SEAR)
protocol [?]. This protocol provides, within a given time
bound, the area over which a given entity can currently
communicate, called the coverage. Comhordú specifies the
area over which an entity must be able to communicate
to ensure safety, and when the coverage falls below this
threshold, the entity must revert to a fail-safe mode, as it
cannot ensure that those whose behaviour might conflict with
its own will have received its message.
The following section abstracts these concepts and formalises them using timed automata.
III. T HE A BSTRACT UPPAAL M ODEL
UPPAAL [?] is a model checking tool which allows timed
automata to be built, simulated, and analysed. There are
numerous introductory articles and tutorials e.g. [?] available
on UPPAAL. The tool may be downloaded for academic use
from the official UPPAAL webpage: http://www.uppaal.com.
The abstract model of Comhordú consists of UPPAAL
templates. An automaton template is a class of automata
which may be instantiated with parameters, much like a
class in OO programming may be instantiated, yielding
an object. With each template, there will be a number of
abstractions made from the details of the original component
it models. The limitations of these abstractions are discussed
in Section V. Due to spatial limitations, only the Entity
template is discussed here in detail. Following that is a
section on the overall working of the model, which provides
an overview of the other components.
A. The Entity Template
Instances of the Entity template will correspond to entities
of the system. As a first abstraction, all entities are assumed
to have the same type e.g. trucks and cars will fall under the
same banner “entity”. In support of this approximation, note
that the Comhordú protocol abstracts away from entity types:
All entities use the same protocols for sending and receiving
messages. Hence the Entity template describes an entity on
a level of abstraction just detailed enough to capture its
behaviour in terms of the protocol. Figure 1 depicts the
Entity template.
Figure 1.
The Entity Template.
The integer parameters to this template are treact , twait ,
and period. The parameter treact is the maximum time it
will take an entity to reach fail safe mode, while twait is the
maximum time an entity must wait before entering a critical
mode. The length of time between successive broadcasts
from an entity is denoted by period.
1) Locations: Locations in UPPAAL are discrete symbolic states. A full state of a system of UPPAAL automata
consists of the locations of all the component automata as
well as the values of all the clocks, the latter of which
are real numbers. Focusing now on the entity template, the
locations represent equivalence classes on its modes. Stable
modes are coarsely divided into three classes: fail safe, good
and acting. The first two of these represent all fail safe
modes, the former representing those in which the coverage
is degraded. The third represents an entity in a non fail safe
mode, which will sometimes be referred to here as a critical
mode, performing some task. The remainder of the modes
are transition modes, i.e. those in which the entity is in the
process of transitioning from one stable mode to another.
The locations of Entity are as follows.
•
•
•
•
•
failSafe: An entity is assumed to begin in a fail safe
mode. If the coverage happens to be good to begin with,
this is equivalent to a coverage upgrade happening at
t = 0, hence no loss of generality is incurred by this
assumption.
good: An entity in this location has good coverage but
does not require to act in a critical mode. The mode
here is fail safe.
preparing: An entity in this location is beginning to
send messages as per the protocol so that it can begin
to act. While preparing to take action, an entity remains
in fail safe mode.
acting: An entity in this location is acting in some
critical mode and continually sending messages as per
the protocol in order to ensure other entities remain
in fail safe mode and do not violate the safety of the
system.
reactingBad & reactingGood: In both cases, the entity is
reacting to a message it has received by transitioning
to failsafe mode. This transition is not instantaneous,
and hence is modelled as one of these two wait states
which include time guards on their outward edges. An
entity in reactingBad, once it has waited long enough,
will transition to failSafe, the location in which the coverage is degraded. Similarly, an entity in reactingGood
transitions to good, which is also safe, but with the
option of progress due to upgraded coverage.
1) Semantics of Sending a Transfer Message: Figure 2
illustrates the semantics of a transfer message in this UPPAAL system. This is an informal diagram illustrating the
main components which feature in the typical journey of a
transfer message. The dynamics of such a journey will be
explained via these components. Note that this is a typical
situation, i.e. it is one of m possible situations, each situation
being one of the m entities sending a message. It has been
convenient to choose em as the sender in this case and, since
all entities in this system are identical, there is no loss of
generality in this choice. Let us now explore the journey
of a transfer message, beginning at the sending entity and
ending at the receiving entities.
2) Channels: The channels of Entity are as follows:
•
•
•
adaptNotif: AdaptNotif is used by the entity to receive
messages about its state of communication from the
commState process, which represents the underlying
space elastic model. Every message received indicates
that the coverage has been toggled i.e. from bad to good
or vice versa.
msgIn: This channel is used to receive transfer messages from other entities. Once a message is received on
this channel, an entity, if it is acting, will immediately
initiate a transition to a fail safe state i.e. good or
failSafe.
msgOut: This is used to communicate to other entities
that this entity needs them to transition/remain in fail
safe modes in order for it to be able to begin/continue
acting.
Figure 2. Diagram illustrating the logical connection between components
involved in the sending of a transfer message.
•
•
B. The Model as a Whole
The model as a whole involves a set of entities, a set
of communication states, one per entity, and a variety of
buffers to model messages in transit. The exact system of
components that was analysed as part of this work will
be presented not here, but in Section IV. This section will
instead focus on the two main dynamics of the system: The
sending of adaptNotif messages and the sending of transfer
messages.
•
e m: This is the sending entity. When this entity sends a
message, it is received by commState m and continues
its route from there.
commState m: If the state is Bad, then the message
from e m is received and then discarded instantly. This
is modelled as a self-looping edge in the commState
template, which could not be included here for spatial reasons. If the state is Good, then this process
forwards the message on to its next stage: to one
of the bufferMsgControl m,i processes. commState m
may be viewed as a filter, only allowing a message
into transit if the state of communication is good and
otherwise effectively discarding the message.
bufferMsgControl m,i: When this receives a message
from commState m, it forwards the message across a
broadcast channel to the m−1 buffers under its control,
i.e. each bufferMsg (m,i,j), and enters a state of waiting
until these buffers finish delivering the message. This
process is essentially a broadcasting hub.
bufferMsg m,i,j: On receipt of a broadcast message
from the corresponding hub, bufferMsgControl m,i,
this process holds the message for msgLatency units
of time and then delivers it on to ej . If this process
receives a cancel broadcast directly from the commState, as per the scenario described in Section III-B2,
then it may lose the message pending delivery, i.e.
return to a waiting state without forwarding the message
to its final destination. This process incorporates both
message delay and partial message loss. Note that other
solutions based on broadcast would not accommodate
this model of partial message loss. Rather, they would
yield a less realistic “all or nothing” model.
• e j: This is one of the m − 1 receiving entities.
2) Semantics of Sending an AdaptNotif Message: The
journey of an adaptNotif message is similar to that of a
transfer message in that a number of components play a
part in it and buffering is involved. Figure 3 depicts the setup
involving the passage of such a message through the system.
Again, this setup will be explained starting with the first
component in the journey of the message, the commState
whose coverage has been toggled, and ending with the last
components, the corresponding entity and buffers.
•
Figure 3. Diagram illustrating the logical connection between components
involved in the sending of an adaptNotif message.
•
commState m: This is the commState process associated with the entity e m. When this process takes an
edge from good to bad or vice versa, the coverage state
has been toggled, and an adaptNotif message is sent
to one of the buffers associated with commState m,
namely some bufferAN m,i. If the transition of coverage is from good to bad, then another message is
also sent, a cancel message to every bufferMsg (m,k,i)
•
•
•
i.e. all messages from the entity e m are potentially
cancelled.
bufferAN m,i: Each one of these buffers is available
to receive an adaptNotif message from commState m
and effectively delay the delivery of this message on to
e m by adaptN otif units of time.
e m: This is the receiving entity of the adaptNotif
message.
bufferMsg m,k,i: When a cancel message is received
by this process, while a message is pending delivery
from it, it may potentially transition back to the wait
state without sending on the message. This simulates
possible message loss.
IV. M ODEL A NALYSIS AND S IMULATION
This section introduces the components making up the
exact UPPAAL system which instantiates the templates discussed thus far. An example simulation through this system
is then given, demonstrating part of its behaviour. Finally, a
collection of LTL formulae are examined. These formulae
have been verified by the analysis engine of UPPAAL.
A. The System: A Network of Automata
The system in question consists of two entities entity1
and entity2, along with two commStates commState1
and commstate2, one matched to its respective entity.
The adaptation notification buffers for entity1 and entity2
are buf f erAN 1 and buf f erAN 2 respectively. Due to
adjustments made to the constants, only one adaptation
notification message will ever be in transit at any one time,
hence one buffer per entity suffices. Similar adjustments
lead to the necessity of only two buffer message controllers buf f erM sgControl1 and buf f erM sgControl2,
one for each entity. Since there are only two entities, it
follows that only one message buffer is present for each
of these controllers. These are called buf f erM sg1 2 and
buf f erM sg2 1 respectively. These automata composed together in parallel comprise the overall system, which is a
network of concurrently operating automata.
1) An Example Simulation: Figure 4 shows an example
simulation of the system as a message sequence chart
(MSC). The MSC was generated with the aid of the simulator tool in UPPAAL. This is just one of an infinite number
of possible finite simulations1 of the system. An MSC is a
two dimensional depiction of a simulation of some network
of components. Along the horizontal axis are laid out the
components. Along the vertical axis runs logical time i.e the
chronological order of events is maintained vertically from
top to bottom. If some component changes state at some
time, then the new state is shown at the point on the MSC
corresponding to that component and that time. Similarly, if
a message is sent from one component to another at some
1 Simulations
whose traces are finite.
time, then an arrow is drawn in the chart at the time in
question connecting the lifelines2 of these two components.
for this example model. These properties will now be
discussed:
•
•
•
Figure 4.
An example simulation as a message sequence chart (MSC).
Due to spatial limitations, we zone in on only some
components. The trace begins with a coverage enhancement
for entity1 i.e. the communication state of entity1 enters
the location Good. As per the space elastic model, an
adaptN otif message is sent to update entity1, informing
it of the new coverage. The message is not sent directly to
entity1. Rather, it is sent via buf f erAN 1, which models
the message in transit by entering the location P ending.
After some time elapses, the buffered message is relayed
to entity1, which enters the good location. Simultaneously,
the buffer is emptied. Now, the coverage degrades with
commstate1 entering the location Bad and buf f erAn1
once again begins buffering an adaptN otif message to
update entity1. However, before entity1 gets this message,
it sends a transfer message msgEC1 as per the protocol.
This message is discarded by the communication state,
which is in a bad state i.e. a state modelling message loss.
Finally, entity1 is notified of the degradation in coverage
before it starts acting and hence immediately returns to the
f ailSaf e location.
B. Properties Analysed within the System
Four safety properties were specified with respect to this
UPPAAL system of components. Each of these properties
was analysed via the UPPAAL analysis engine and verified
2 The extrusion of a component through time is referred to here as its
lifeline.
•
A[] not deadlock: A desirable property of most reactive
systems such as this one is that it is without deadlock.
Deadlock is a property satisfied by any system state
which has no outgoing transitions. Transitions here can
mean either an enabled edge, possibly accompanied by
an action, or a transition in time. In the latter case, the
location remains the same but time passes on, changing
the overall state. The entire property stated here uses
universal quantification over all states of the system
with A[]. Hence the property in full says that no state
of the system is ever deadlocked.
A[] not (entity1.acting && entity2.acting): This is
the first of three mutual exclusivity properties. These
properties are highly important in that their satisfaction
guarantees the safety constraint in an abstract sense.
This particular property states that in all states of the
system, both entities are not acting at the same time.
A[] not (entity1.acting && entity2.reactingGood): This
property states that there is never a state in the system
such that entity1 is acting and entity2 is reacting with
its coverage good. The only state entity2 should be in
if entity1 is acting is one of the two fail safe states.
Notice that, since the system is symmetric with respect
to the entities, there is no need to check this property
with the order of entity1 and entity2 reversed.
A[] not (entity1.acting && entity2.reactingBad): This
is similar to the previous property, only entity2 is
reacting with its coverage bad.
The final three of these properties embody a rudimentary
notion of safety, which is essentially the notion that no
two entities should be in non fail safe modes at the same
time. That the protocol guarantees mutual exclusion in this
simplified context is evidence towards the stronger guarantee
hypothesised in [?], that a more general safety constraint
holds. Though in this model there is no spatial information,
and non fail safe modes are indistinguishable, the mutually
exclusive essence of the protocol is captured and proved
correct.
This model does not enforce liveness, i.e. the property
that every entity eventually gets to act. A property that does
hold is E <> entity1.acting, stating that it is possible for
an entity to reach a critical state. However, the property
A <> entity1.acting, which asserts that entities will always
eventually act, is not satisfied. This comes as no surprise,
particularly when one considers that entities have the possibility of delaying indefinitely in wait states, perhaps due to
degraded coverage. The refutation of this property should not
be construed as a problem; our aim here is to demonstrate
the safety of the Comhordú protocol, not the fairness or
liveness of some particular scheduling strategy, which would
be implementation dependent.
V. C ONCLUSIONS AND O NGOING W ORK
This paper presents a formal abstraction of the Comhordú
model in UPPAAL, the correctness of which is machine verified by the UPPAAL automatic verification tool. It is limited
what we can infer from such an abstraction in which spatial
details are neglected, modes are grouped under a coarse
equivalence, and the model instance checked consists of no
more than two entities. However, the results are promising,
particularly in light of the small scope hypothesis of [?],
in that they agree with previous intuitions regarding the
correctness of the protocol in terms of its safety guarantee.
Limitations of this work include, but are not limited to
the following:
• Model Size: while we could have built larger models,
we decided to focus our attention elsewhere first- on a
more detailed formalisation of Comhordú- before more
work on model checking and analysis was undertaken.
Furthermore, checking larger and larger models quickly
becomes infeasible due to state space explosion.
• Admittedly, model checking itself has its limits in
that we are only ever checking instances of a class
of systems and not proving properties over the whole
class. For the latter, we may in future turn to theorem
proving.
• Abstractions, such as removing spatial information,
have provided us with a tractable but incomplete model.
More consideration is needed to justify omissions or
approximations of spatial data.
Currently, work is underway on a full process algebraic
description of Comhordú. The language of the new model is
TCBS’, a derivative of the TCBS developed in [?], [?], [?].
The focus is now on establishing the correct foundations,
i.e. a proper formal model with no major losses of detail,
rather than on model checking. Once that model has been
thoroughly analysed and deemed precise, we may in future
begin building and machine checking another UPPAAL
model, or perhaps a model using a different tool.
We would like to acknowledge & thank the anonymous
reviewers whose feedback has been helpful in steering this
work.
R EFERENCES
[1] N. Kubota, Y. Nojima, N. Baba, F. Kojima, and T. Fukuda,
“Evolving pet robot with emotional model,” in Evolutionary
Computation, 2000. Proceedings of the 2000 Congress on.
IEEE, 2000, vol. 2, pp. 1231–1237 vol.2.
[2] D. J. Klein, V. Gupta, and K. A. Morgansen, “Coordinated
control of robotic fish using an underwater wireless network,”
in Wireless Networking Based Control. Springer, 2011, pp.
323–339.
[3] M. Bouroche, “Real-time coordination of mobile
autonomous entities,” Ph.D. dissertation, School of
Computer Science and Statistics, Trinity College Dublin,
2007. [Online]. Available: https://www.cs.tcd.ie/publications/
tech-reports/reports.08/TCD-CS-2008-25.pdf