Chronicles Construction Starting from the Fault

Chronicles Construction Starting from the Fault Model
of the System to Diagnose
Bruno Guerraz1 and Christophe Dousson2
Abstract. This article falls under the problems of the diagnosis of
distributed systems such as telecommunication networks. Among the
various techniques used for the on-line diagnosis, we are interested
in the chronicle recognition. A chronicle is a set of patterns of observable events temporally constrained. Within the framework of the
diagnosis, the set of chronicles gives the signatures of the system
failures and constitutes the necessary expertise to the diagnosis. As
for the rule based systems, the acquisition of this expertise is problematic. We propose a method based on the Petri net unfolding to
generate the chronicles necessary to the diagnosis, starting from the
fault model of the system to be diagnosed.
1
2
A
[3,4]
B
[10,15]
C
D
[2,5]
[1,+∞[
E
Introduction
This paper is related to the monitoring of a dynamic system like
telecommunication networks: more precisely, we address the problem of following the dynamic changes of the system in order to detect
(and identify) the faulty states. Our approach relies on the fault model
of the system. Modeling is based on a tiles representation which was
introduced in [1]: a tile is a transition of a partial state of the system. The advantage of such tile-based modeling with respect to more
classical Petri net-based approaches is that here, the knowledge of
the global behavior of the system is not required. Moreover, tiles are
generic, the same tiles is used several time in the same model.
The characteristics of telecommunication networks are on the one
hand the big size of the model and on the other hand, the majority of
the events are non observable and moreover, several transitions may
have the same label (an alarm generated by the network). These characteristics lead to ambiguity during on-line diagnosis directly with
the model and thus to a combinatory explosion of explications. To
solve this problem, several methods have been proposed, [5] propose
a stochastic approach: the diagnosis problem is defined as the computation of the most likely history of a partially stochastic Petri net
given a sequence of observed alarms. Another approach is to off-line
compile the model in order to minimise the work on-line. [11] propose such an approach, a structure called “diagnoser” which is built
starting from the global model of the system is used for the on-line
diagnosis.
In this work, we take into account time information in the model
and we propose a way to “compile” the fault model of the system
in chronicles. These chronicles will allow then the diagnosis of the
sytem by on-line recognition.
A chronicle represents some pieces of evolution of the observed
system. A chronicle is a partial order of observable events con1
strained by time and labeled by a faulty situation. Figure 1 shows
a chronicle which contains five events. Its interpretation is the following : the event “B” must occur between 3 and 4 units of time
after “A”, the event “C” must occur between 10 and 15 units of time
after “B” and between 2 and 5 units of time after “D” etc. . . .
France Telecom R&D, 2 avenue Pierre Marzin, 22307 Lannion cedex,
France. e-mail: [email protected]
–id.–, e-mail: [email protected]
Figure 1.
A chronicle: a partial order of observable events whith some
time constraints between them.
More recently, [7] proposes the notion of template languages
which is similar to the chronicles (events constrained by time) with
the difference that templates model the correct behavior of the system
and are used to confirm the correct behavior of a system. Whereas in
the “chronicle based diagnosis” proposed in [2], chronicles model the
fault behavior of a system and the diagnosis consists in tracking online the occurrences of these “abnormal” chronicles. The advantage
of this approach is on one hand the simplicity and great capacity of
expression of the formalism and on the other hand the effectiveness
of the tools of on-line recognition.
In the context of telecommunication network monitoring, each abnormal situation can be described by one or more chronicles, the
events correspond to time-stamped alarms and the constraints relate
to their occurrence date. The “chronicle-based diagnosis” consists in
tracking on line the occurrences of these chronicles starting from the
flood of alarms risen from telecommunications equipment.
In section 2, we describe an application example on which we will
rely to illustrate the majority of the concepts presented later. In the
following section, we introduce the initial tiles model and how it is
refined by adding time to it to define the temporal tiles. Then, in
the next section we present an algorithm which builds a structure
containing all the possible behaviors of the temporal tiles system,
starting from the initial temporal tiles : the time branching process.
This algorithm is inspired by the Petri net unfolding, a well known
partial order semantic of Petri nets introduced in [10] and described
in more detail in [3]. In section 5, we explain how we extract the
chronicles from the maximal time occurrence net and how we build
all the chronicles necessary to the diagnosis of the system.
51
2
Application example
R:failed
We took into consideration an example extracted from our application context which is the monitoring of a Synchronous Data Hierarchy (SDH) telecommunication network. The whole SDH model describes the behavior of the network when a fault occurs and how the
effects are propagated across all the equipment. In the general case,
faults are propagated through the hierarchy of SDH layers of a given
piece of equipment (vertical propagation) and also among equipment
(horizontal propagation) through the links and connections.
Figure 2 gives an example of a laser breakdown on the physical
layer (optical fiber): (1) the emitter detects the problem and generates a Transmission Failed (TF) alarm and (2) the right hand receiver
of the peer detects the absence of signal and generates a Loss Of Signal (LOS) alarm. The former behavior corresponds to ITU-T Recommendations on SDH networks3 . At this stage, a safety mechanism (3)
called “Automatic Laser Switch” stops the emitter in order to avoid
damage by emitting a laser on a cut fiber. Then, as before, the emitter
in turn generates a TF, and finally (4) the left hand receiver generates
a LOS. All theses steps are associated also by vertical propagations
which are not described in this paper but could be found in [1].
TF
LOS
(1)
(2)
(4)
(3)
A
B
LOS
TF
Figure 2. Example of the laser emitter breakdown : (1) the transmitter
breaks down and generates an alarm TF (2), the receiver notices an absence
of signal and generates a LOS alarm (3), the transmitter stops automatically
and generates TF and (4) the receiver generates in its turn a LOS alarm.
R:failed
ε
E:ok
E:failed
Figure 3. A tile : if the system is in the partial state {R:failed,E:ok} then,
the dumb label is emitted, the value of the variable “R” remains the same
one and the value of the variable “E” becomes “failed”.
3.2
Time
We consider time as a linearly ordered discrete set of instants whose
resolution is sufficient for modeling the dynamics of the environment (i.e. any change can be adequately represented as taking place
at some instant of the set). Out time-map manager relies on timepoints as elementary primitives and we handle time constraint as binary constraint between them.
We define a time constraint graph T as a set of time points with
time constraints between them. These constraints are numerical and
are expressed as pairs of real numbers CT (t, t0 ) = [I − , I + ] corresponding to the lower and upper bounds on the temporal distance
from t to t0 . The special values +∞ and −∞ could also be used
(so, the symbolic constraint after could be expressed as [0, +∞]).
Notice also that, if t or t0 are not defined in the time graph T , then
we could always define CT (t, t0 ) = [−∞, +∞] (in other words, if
a time point is not constrained by a graph, then the default constraint
is [−∞, +∞]).
We use the following operators for the constraint propagation (⊕)
and the constraint conjunction (∩):
I ⊕J
=
[I − + J − , I + + J + ]
(1)
I ∩J
=
[max(I − , J − ), min(I + , J + )]
(2)
We also define a partial order relation (denoted by ) between two
time graphs as follows:
T T 0 ≡def ∀(t, t0 ) ∈ T 2 , CT (t, t0 ) ⊆ CT 0 (t, t0 )
3
Representation
3.1
Tiles
The tile formalism was introduced in [6]. Let V be a finite set
of variables v. Each variable v takes its value in a finite domain
denotedQby Dv . The set of states of the system is defined by
XV = v∈V Dv . Thus, for V ⊆ V, we can define the set of local
Q
states by XV = v∈V Dv . Elements of XV are denoted by xV and
for v ∈ V, we denote by v(xV ) the value of the variable v in the state
xV . In the sequel, transitions between two local states will be referred
to as tiles. Formally a tile is a quadruplet θ = hV, xV − , α, xV + i,
where V is a subset of variables and (xV − , α, xV + ) a transition between the local states xV − and xV + . α labels transition and ranges
over a set of possible event labels which contains the dumb label denoted by . Figure 3 shows a tile modeling the ALS mechanism of
the section 2. Variables are “E” for Emission and “R” for Reception.
Variables are represented by circle and the transition by a box. We
also denote the pre-condition and the post-condition of a tile θ by
pre(θ) = (V, xV − ) and post(θ) = (V, xV + ).
3
ITU-T Recommendation G.774 : “Synchronous digital hierarchy (SDH) Management information model for the network element view”
(3)
Due to the constraint propagation, a constraint graph may have
many equivalent representations (two constraint graphs are equivalent if they assume exactly the same sets of solutions for all the
ti ). But we can prove that there exists a unique equivalent constraint
graph, which is minimal (in the sense of the relation defined by
equation 3). Its computation is ensured by a path-consistency PC2
algorithm with the complexity of O(n3 ) [8]. In the following, we
denote Tb this minimal graph ; it is the canonical representative of its
class of equivalence.
In order to merge the constraints of two time graphs, we also define
the union (∪) between time graphs. The resulted graph (T =T 0 ∪ T 00 )
contains all the time points of T 0 and T 00 and the constraints are
defined as follows4 :
∀(t, t0 ) ∈ T 2 , CT (t, t0 ) = CT 0 (t, t0 ) ∩ CT 00 (t, t0 )
(4)
Figure 4 shows the result of the union of two time constraint
graphs, T = T 0 ∪ T 00 . Of course, a graph resulting from the union of
two graphs could be inconsistent. For instance, if the time constraint
of the figure CT 0 (t2 , t3 )=[5,13] is replaced by [0,0] then, T 00 is not
consistent.
4
52
We have the following relation (T 0 ∪ T 00 ) T 0 .
t2
t4
[1,2]
t2
t2
[6,15]
t1
[10,20]
[0,5]
T'
CT(tR(failed), tε)=[1,2]
t4
CT(tE(ok), tε)=[0,+∞]
CT(tε,tE(failed))=[0,0]
[6,15]
[5,13]
[0,10]
[1,2]
[5,10]
t1
[10,20]
[0,5]
t3
t3
T"
t5
T = T'
T"
R:failed
t3
t5
3.3
τ3
ε
R:failed
CT(tR(ok), tε)=[0,+∞]
CT(tε, tR(failed))=[0,0]
R:ok
ε
R:failed
Figure 5. Left tile of left models the laser breakdown of section 2 (variable
emission becomes failed), right tile means that when reception becomes
failed then after 1 to 4 units of time alarm LOS is generated and Sync2
becomes ok.
Temporal tiles
CT (tv(xV − ) , tv(xV + ) ) = [1, +∞]
(5)
A system of temporal tiles is a triple Σ = hV, X0 , T i where V is
a finite set of variables, X0 is a set of initial states, T is a finite set
of temporal tiles with V = ∪t∈T Vt (we assume that at the initial
state all the variables have just changed value). The example of the
section 2 can be modeled with a system of 17 temporal tiles. Some
of these temporal tiles are extracted from the standards of the ITUT and the others (as the Automatic Laser Switch procedure) from
the expertise. There are three variables for each piece of equipment:
“Emission” (E),“Reception” (R) and “Operational state” (O). Variables “Reception” and “Emission” take their value in {ok,failed},
“Operational state” in {enabled, disabled}. Some variables of synchronization between the equipment are also added in the model. At
the initial state all the equipment is operational (i.e. “Emission” and
“Reception” are “ok” and “Operational state” is “enabled”). Figure 5
shows some of these temporal tiles. The left temporal tile models the
emission breakdown and the right temporal tile modelises the emission of the alarm “Loss Of Signal” (LOS). The instant notation is
as described above; for instance tR(f ailed) corresponds to the instant
when the value of the variable “R” passes from “ok” to “failed”.
Behavior of the system
We present in the following sections a way to describe the whole
behavior of a temporal tiles system and an algorithm to build it.
4.1
τ2
E:failed
LOS
R:failed
The union of two time constraint graphs
A temporal tile is a couple τ = hθ, T i where θ is the atemporal part
of the tile which is defined as in section 3.1 and T is the temporal
part of the tile which is defined as a time graph according to the
section 3.2.
The links between the pre- and post-conditions of θ and the instants of the time graph are ensured through the change of value of
the variables.
We denote by tv(xV ) the instant when the variable v passes from a
value different from v(xV ) to v(xV ) and by tα the date when event
α occurs. All these instants are in the time graph associated to the
tile τ and therefore, any kind of constraints could be set between any
couple of such instants. Some constraints on a temporal tile are due
to domain axiom : a variable can change value only one time at the
same time. So for a tile τ = hθ, T i with θ = hV, xV − , α, xV + i,
∀v ∈ V that v(xV − ) 6= v(xV + ), these constraints are added :
4
CT(tR(failed), tLOS)=[1,4]
R:failed
E:ok
Figure 4.
τ1
Temporal run
Definition 1 (Run) The interleaved sequence of states and events
x0 , a1 , x1 , a2 , . . . is a run R of the system Σ. If x0 ∈ X0 and, for
each k > 0, there exists a tile θ = hV, xV − , α, xV + i such that :
(
ak = α
v(xk−1 ) = v(xk ) if v ∈
/ Vk
v(xk−1 ) = v(xV − ), v(xk ) = v(xV + ) if v ∈ Vk
(6)
As the state changes are described in the tiles, a run is also completely described with an initial state and a sequel of tiles. From now
on, a run is described as x0 , θ1 , θ2 , . . . and each state xk could be
computed with x0 , θ1 , . . . , θk .
Moreover, as tiles define local transitions on partial state, two successive tiles can be based on two disjoint sets of variables. In this
case these tiles are said to be “concurrent” and by exchanging the order of both tiles, we obtain two equivalent runs. Henceforth, we will
not define a run as a sequence of tiles but as an equivalence class of
tiles for the concurrence. This brings us to consider a run as a partial
order of tiles.
We extend the notion of run in the framework of temporal tiles. A
temporal run is a set of temporal tiles and an initial state where the
sequence of the atemporal parts of the tiles is a run and its associated
time constraint graph is consistent.
Definition 2 (Temporal run) The 2-uple hx0 , (τ1 , τ2 , . . .)i with
τi = hθi , Tτi i is a temporal run of Σ iff :
R = hX
S0 , (θ1 , θ2, . . .)iis a run of Σ and,
T = ( τ ∈R Tτ ) ∪ Tcaus is consistent
(7)
Where Tcaus is the time constraint graph containing the time constraint graph resulting from causality: ∀(τi , τj ) ∈ R2 , if τi is a predecessor of τj in the partial order of tiles then tαi and tαj are time
points of Tcaus and CTcaus (tαi , tαj ) = [1, +∞].
The tricky point of this definition is the construction of the time
constraint graph as the time points correspond to the change of value
of the variables. If in a tile, the value of a variable is the same on
the pre-set and the post-set then, when this tile is appended to the
run, the value of this variable does not change. Figure 6 shows the
beginning of a temporal run (up) and its time constraint graph (bottom). When the tile labeled by LOS is appended to the run, the value
of the variable R does not change. Thus, when the tile labeled by α
is appended, the time constraint [1, 2] is added between tR(f ailed)
and tα : indeed, the instant when the variable R changes of value is
tR(f ailed) . The constraint [1, +∞] is added between tLOS and tα
with an aim of ensuring causality between the first tile and the second: it ensures that the cause precedes the consequence.
To summarize, a temporal run can be defined by an initial state
and a partial order of temporal tiles and, in an equivalent way ac-
53
CT(tR(failed), tLOS)=[1,4]
CT(tsync2(failed), tLOS)=[0,+∞]
CT(tLOS, tsync2(ok))=[0,0]
R:failed
CT'(tR(failed), tα)=[1,2]
CT'(tα, tR(ok))=[0,0]
R:failed
R:failed
R:ok
α
LOS
sync2:ok
sync2:false
tR(failed)
tα
[1,2]
[1,4]
[0,0]
tR(ok)
[1,+∞]
tLOS
[0,+∞]
tsync2(failed)
Figure 6.
[0,0]
tsync2(ok)
A temporal run : a run with its associated time constraint graph.
cording to definition 2, by a (atemporal) run and a (consistent) time
constraints graph.
4.2
Petri net and occurence net
A run of a system described with temporal tiles is one of the possible behaviors of this system. The notion of occurence net introduced
within the framework of the Petri nets is a formalization of a run including nondeterministic choices (or conflicts). Thus, an occurrence
net represents several runs of a system together. We give below the
definitions of a Petri net and then of an occurrence net.
Definition 3 (Petri net) A Petri net is a tuple N = (P l, T r, L, In)
where P l is the set of places, T r is the set of transitions, L is the set
of links between the places and the transitions (i.e. L ⊆ (P l × T r) ∪
(T r × P l)) and In ⊆ P l is the initial marking.
For each transition, t ∈ T r, the preset •t is the set of places
connected upstream to t, i.e. •t = {p ∈ P l|(p, t) ∈ L}. The
postset of t, t• is the set of places connected downstream to t, i.e.
t• = {p ∈ P l|(t, p) ∈ L}. The preset and the postset are also
defined for each place p ∈ P l, •p = {t ∈ T r|(t, p) ∈ L} and
p• = {t ∈ T r|(p, t) ∈ L}.
Definition 4 (Causality, Conflict, Concurrency) Given two nodes
n and n0 of a Petri net, (places or transitions), we say that n causes
n0 (written n ≤ n0 ), if either n = n0 or there is a path from n to
n0 . We say that n and n0 are in conflict, (written n#n0 ) if there is
a place p different from n and n0 , from which one can reach n and
n0 by two different paths. And we say that n and n0 are concurrent if
neither n ≤ n0 , nor n0 ≤ n, nor n#n0 .
Definition 5 (co-set) A set of concurrent nodes is named a co-set.
More formally, CO is a co-set iff ∀(n, n0 ) ∈ CO2 , n and n0 are
concurrent.
Definition 6 (Occurrence net) An occurrence net N
=
(P l, T r, L, In) is defined as a finite acyclic Petri net such
that each place has at most one predecessor (| •p |≤ 1), no transition is in self-conflict (we have not t#t) and the initial marking is
In = {p ∈ P l| • p = ∅}.
Usually, occurrence nets are used for describing several behaviors
of a Petri net [3] by the way of the branching process of a Petri net
which is an occurrence net with a homomorphism on the Petri net. It
represents several runs of a Petri net but it is not necessarily “maximal” : it does not necessarily contain full runs and, moreover, it may
not contain all possible runs of the system. The complete branching
process of a Petri net is called the unfolding and in most of cases, it
would be infinite.
4.3
Time branching process
We extend the notion of occurrence net by adding time information
and we define the time branching process as a time occurrence net
associated to a temporal tiles system. Then, we give the algorithm
allowing to obtain the maximal time branching process.
A time occurrence net is an occurrence net to which one associates a time point to each node (place or transition) and some time
constraints between these time points. It is important to notice that
because of the conflicts, the whole of the time constraints does not
constitute a time constraint graph. Indeed, two nodes in conflict do
not form parts of the same run. Thus, there cannot be any time constraint between these two time points.
Definition 7 (Time occurrence net) A time occurrence net Nt =
hN , T i is an occurrence net N = (P l, T r, L, In) for which each
node n ∈ P l ∪ T r is associated to one time point in T . And
∀(n, n0 ) ∈ (P l ∪ T r)2 such that we do not have n#n0 there is a
constraint ∈ T between the time points associated to n and n’.
As the branching process is an occurrence net associated to a Petri
net, we want to associate a time occurrence net to a set of tiles in
order to define a time branching process. A time branching process of
a tiles system is a time occurrence net for which each transition with
its pre and post-set corresponds to one temporal tile of the system.
Definition 8 (Time branching process) A time branching process
of Σ is a time occurence net Nt for which each transition t of Nt
with its post and preset is associated to a tile τ of Σ. The time-points
associated to •t ∪ t ∪ t• should satisfy the time constraints of the tile
τ.
This defines a map µ such that, for each transition t ∈ Nt , µ(t) =
τ and µ(•t) = pre(τ ) and µ(t•) = post(τ ).
We give below the algorithm to construct the maximal time
branching process of a temporal tiles system. This construction is
made like a puzzle game. The algorithm starts with the branching
process having the places corresponding to the initial marking of the
system. The temporal tiles are added one at a time. To manage the
conflicts and the time constraints, to each transition t of the branching process, we associate a time constraint graph. This graph denoted
by T (t) is the union between the time constraints of the temporal tile
τ for which µ(t) = τ and the
S time constraints graph of the nodes
which cause this transition ( t0 |t0 •∩•t6=∅ T (t0 )) and the time constraint graph Tcaus of causality constructed as in section 4.1. Notice
that T (t) is the time constraint graph of the temporal run defined by
the tile τ = µ(t) and all its predecessors in the puzzle. The time
constraint graph is built and added to the transition when the temporal tile is added to the “puzzle”. Figure 7 shows the time branching process N of a temporal tiles system of three tiles τ1 (reception
failure),τ2 (emission of the LOS alarm) and τ3 (ALS mechanism)
with an initial state X0 given by R : ok and E : ok. The time constraint graph T (t1) associated to the transition t1 is the time constraint graph of the tile τ1 = µ(t1). The transitions t2 and t3 are in
conflict thus, their associated time constraint graphs have the same
prefix T (t1) to which we make the union with the time constraint
54
R:ok
N
LOS
t2
R:failed
ε
t1
µ(t1) = τ2
R:failed
R:failed
ε
t3
µ(t3) = τ3
E:ok
E:failed
T(t1)
tR(ok)
[0,+∞]
tε
[0,0]
tR(failed)
T(t2)
tR(ok)
[0,+∞]
tε
[0,0]
tR(failed)
T(t3)
tR(ok)
[0,+∞]
tε
[0,0]
tR(failed)
[1,2]
[0,+∞]
tE(ok)
tLOS
[1,4]
tε
[0,0]
tE(failed)
Figure 7. A system of three temporal tiles τ1 ,τ2 and τ3 , its time branching
process N and the time constraints graphs associated to each transition of
N : t1, t2 and t3.
graph of τ1 for t1 and of τ2 for t2. Notice that T (t2) is the time constraint graph of the temporal run hX0 , (τ1 , τ2 )i and T (t3) the time
constraint graph associated to the temporal run hX0 , (τ1 , τ3 )i.
In order to build the maximal time branching process, we need the
notion of “(atemporal) tiles that can be added to a given occurrence
net”. For a given occurrence net N , the possible extensions of N
denoted by pe(N ) are the pairs (τ, X), where X is a co-set of places
of N and τ is a tile such that:
• µ(X) = pre(τ )
• ∀t ∈ N , µ(t) 6= τ or µ(•t) 6= X
For a tile system Σ = hV, X0 , T i, the construction of its maximal
time occurrence net is described in the algorithm below :
N ← X0
while pe(N ) 6= ∅ do
choose
S a pair (τ, X) from pe(N )
if ( ∀t|t•∩X6=∅ T (t)) ∪ Tτ ∪ Tcaus is consistent then
append τ to N
S
T (tr(τ )) ← Tτ ∪ ( ∀t|t•∩X6=∅ T (t)) ∪ Tcaus
end if
end while
This algorithm thus makes it possible to obtain the complete behavior of a system of temporal tiles (i.e. all temporal runs in the same
structure). These temporal runs are maximal and in most cases they
are not finite. Our objective is to detect the failures of the system
as soon as possible, that is why we can limit the time length of the
puzzle by adding a time constraint [0,ω] between the initial state and
each transition of the puzzle, ω being the maximum length of the
puzzle. The value of ω is given by expertise.
5
Extraction of the chronicles
Previous sections show how we can build all the temporal runs of
the system. If we project each run on the observable part, we obtain
a set of observable events (transitions with a label different from )
constrained by time. This is a chronicle which is the signature of
this run. Thus, if we extract from the branching process all the runs
relevant to a fault propagation, we could project them in order to
define all the relevant signatures of this fault. This is the intuitive
idea of our approach.
In order to build all the chronicles necessary to the diagnosis (i.e.
all the fault signatures), we build a maximal time branching process
for each failure of the system. Notice that in a tiles system, each failure is modeled by a tile (the left tile of the figure 5 models the laser
emission breakdown). In a tiles system Σ = hV, X0 , T i, we distinguish the tiles modeling a primary breakdown (denoted by Tc ) from
the other tiles (denoted by T0 ): T = Tc ∪ To and Tc ∩ To = ∅.
For each tile τ of Tc we build the maximal time branching process
of the system Σ = hV, X0 , To ∪ {τ }i with the algorithm of the section 4.3. This maximal branching process describes all the complete
behaviors of the system when the failure modeled by τ occurs.
Then we are interested in the maximal temporal runs of this fault
in order to obtain the chronicles which are the complete signature of
this fault. We denote a temporal run as hR, T i where R is the run
without the time constraints (as defined in 4.1) and T the time graph
of the run. We define a partial order relation between two temporal
runs (for the time constraints, this comparison relies on the minimal
propagated graphs) as follows:
c0
hR, T i ≤ hR0 , T 0 i ≡def R ⊆ R0 and Tb T
(8)
This partial order allows us to define the maximal temporal runs
from a maximal run. But the main difficulty is to extract directly the
maximal temporal runs (the chronicles). Indeed, let us refer to figure 8, on the left side we have the time constraint graph of a maximal
run. To simplify the example, all the events of this run are observable. The maximal chronicles we want to extract (the chronicles corresponding to the maximal temporal runs) are C1 and C2. C1 is the
minimal time constraints graph of R. This graph is maximal indeed,
there is no graph more larger. The chronicle C2 is maximal too, there
is no graph more larger and less constrained. But C2 is not a temporal run of the system. Indeed, as we see on the graph of C1, if
the alarm “LOS” occurs between one and two units of time after the
alarm “TF” then the alarm “TF” must occur again. The real maximal temporal run is the graph of C2 with the constraint [3, 5] instead
of [1, 5] ([3, 5] is the complementary of [1, 2] in [1, 5]) but the computation of all the exclusive constraints graph is NP-complete and
possibly leads to high number of chronicles.
LOS
tLOS
[0,0]
[1,5]
tTF
[1,2]
R
Figure 8.
[0,0]
[1,2]
tTF
TF
LOS
[1,2]
C1
[1,5]
TF
TF
C2
A maximal run R and the two chronicles extracted from R : C1
and C2.
As our goal is to obtain chronicles which will be recognised on
line, it is sufficient to extract these chronicles as above if we add
exclusion links between some of them: the recognition of a chronicle must exclude (or must have priority on) the recognition of the
“smaller” chronicles extracted from the same maximal run. A chronicle C 0 is smaller than the chronicle C if the time points of C 0 are in
C. In our example, C2 is smaller than C1 therefore the recognition
of C1 have priority on the recognition of C2.
55
6
Future works
Some open problems raised in the previous sections are listed hereunder. We propose a way of investigation in order to improve our
approach. The last part is another point of view of this work which
gives some clues for other extensions than time addition.
First of all, we saw that the maximal runs could be infinite and we
cut them with an upper bound of the duration; this is acceptable when
you roughly know the dynamic of your modeled system (which is
often the case). Nevertheless, within the framework of the Petri nets,
[9] showed that there exists a finite structure which contains all the
information related to the behavior of a safe Petri net : this structure is
called the branching prefix. The algorithm to construct the branching
prefix was refined in [4]. We plan to study how to adapt this algorithm
in order to deal with time ; this work will be probably connected to
timed Petri nets.
Secondly, the extraction outputs a set of chronicles with exclusion
links between some of them. For the being time, we rely on the online recognition engine to deal with these links (the engine must delay some recognitions in order to ensure for each recognised chronicle that a chronicle with a higher priority will not be recognised later)
but we think that these links could be ensured by an off-line processing of the extracted chronicle models.
Finally, if we consider the time graph as a set of parameters (of
any kind) and a set of constraints (of any kind, too), our approach
consists in adding a set of applicability rules to each tile and extending the puzzling of the occurrence net in order to satisfy these
constraints. The properties we need are that we should be able to
verify the satisfiability of any conjunctive set of constraints (in order
to know if a tile is a possible extension or not) and that a partial order
relation between the extracted pattern is defined (in order to extract
the maximal ones).
7
[5] E. Fabre, A. Aghasaryan, A. Benveniste, R. Boubour, and C. Jard,
‘Fault detection and diagnosis in distributed systems: an approach by
partially stochastic petri nets’, Journal of Discrete Event Dynamic Systems, (June 1998). Kluwer Academic Publishers, Boston.
[6] C. Jard, ‘Synthesis of distributed testers from true-concurrency models of reactive systems’, in International Journal of Information and
Software Technology, Elsevier, (2002).
[7] Holloway L. and Pandalai N., ‘Template Languages for Fault Monitoring of Timed Discrete Event Processes’, IEEE Transactions on Automatic Control, 45(5), (May 2000).
[8] Alan K. Mackworth and Eugene C. Freuder, ‘The complexity of constraint satisfaction revisited’, Artificial Intelligence, 59, 57–62, (1993).
Elsevier Science Publishers.
[9] K. McMillan, Symbolic model checking : an approach to the state explosion problem, Computer science, Carnegie Mellon University, 1993.
[10] M. Nielsen, G. Plotkin, and G. Winskel., ‘Petri nets, event structures
and domains’, Theoretical Computer Science, 13(1), 85–108, (1980).
Elsevier.
[11] M. Sampath, S. Lafortune, and D. Teneketzis, ‘Active diagnosis of
discrete-event systems’, IEEE Transactions on Automatic Control,
908–929, (July 1998).
Conclusion
In the framework of the chronicles recognition, we presented a way
to build the set of chronicles to diagnose a system in which faults
propagation is modeled by a temporal tiles system. We first refined
the notion of tile by adding time : the temporal tile. Then, we defined
the notions of temporal run and time branching process of a temporal
tiles system. As the branching process of a Petri net represents several runs of a Petri net in the same structure, the time branching process represents several temporal runs of a temporal tiles system. We
presented an algorithm allowing to build the maximal time branching
process (i.e. a structure containing all the temporal runs). We showed
how to extract chronicles from the time branching process and how
to build all the necessary chronicles to diagnose the system.
REFERENCES
[1] A. Aghasaryan, C. Dousson, E. Fabre, A. Osmani, and Y. Pencolé, ‘Modeling fault propagation in telecommunications networks
for diagnosis purposes’, in 18th World Telecommunications Congress
(WTC’02), Paris, France., (September 2002).
[2] C. Dousson, ‘Extending and unifying chronicle representation with
event counters’, in Proc. of the 15th ECAI, pp. 257–261, Lyon, France,
(July 2002). F. van Harmelen, IOS Press.
[3] J. Engelfriet, ‘Branching processes of Petri nets’, Acta Informatica, 28,
575–591, (1991).
[4] J. Esparza, S. Romer, and W. Vogler, ‘An Improvement of McMillan’s Unfolding Algorithm’, Tools and Algorithms for Construction and
Analysis of Systems, 87–106, (1996).
56