Appendix Proof of Proposition III.2: We have to check whether there

36
Appendix
Proof of Proposition III.2: We have to check whether there exists a schema WS j ∈ WS ∨
such that s |= WS j . To this aim, for each schema WS j = A, E, a0 , AF , Join, Fork with
1 ≤ j ≤ m, we can first check that s[1] = a0 and s[length(s)] ∈ AF . Then, we have to
look for an instance I = AI , EI of WS j such that s is a topological sort of I. We next
show how such an instance can be constructed. Let G = AG , EG be the graph such that
AG = tasks(s), and EG = {(a, b) | (a, b) ∈ E ∧ a, b ∈ AG ∧ Fork(a) = XOR} ∪ {(a, b) |
b ∈ AG ∧ (a, b) ∈ E ∧ Join(b) = AND}. Notice that G is built by taking the edges in
A, E that do not originate in xor-fork nodes, as well as the ones that end in and-join
nodes; moreover, G contains all the tasks in s, by construction. Then, we can immediately
conclude that s is not compliant with WS j if some of the following conditions holds:
(c1) ∃(a, b) ∈ EG such that a ∈ AG ;
(c2) ∃(a, b), (a, c) ∈ EG such that b = c and Fork(a) = XOR;
(c3) ∃b ∈ AG such that for each (a, b) ∈ E, a ∈ AG ;
(c4) ∃a = s[i], b = s[j] with i < j such that there is a directed path from b to a in WS j .
The above conditions check for violations of local constraints or precedence relationships
of WS j in G. If they are satisfied, it remains to show that each activity in G is in fact
reachable from a0 . To this aim, recall that G does not take into account edges originating in
xor-fork nodes which are, in fact, the only nodes that could cause problems in connectivity.
Therefore, we have to try to add this kind of arcs to G so that all the xor-forks constraints
are satisfied and connectivity is guaranteed.
Let S = {as = labelS (a) | a ∈ AG ∧ Fork(a) = XOR ∧ ∃(a, b) ∈ EG } be the set of
(relabelled) xor-fork nodes that have no outcoming edges in EG , and let T = {bt =
labelT (b) | b ∈ AG ∧ ∃(a, b) ∈ EG ∧ ∃(a , b) ∈ E s.t. a ∈ S} be the set of (relabelled)
nodes in G that have no incoming edge in EG but can be reached by using edges that
go out from xor-fork nodes in S. Functions labelS and labelT are bijective functions with
a different codomain, which simply rename the nodes in S and T and guarantee that
S ∩ T = ∅. Consider, now, a flow network FG with nodes S ∪ T ∪ {source, target} and
edges {(source, as ) | as ∈ S} ∪ {(bt , target) | bt ∈ T } ∪ {(as , bt ) | ∃(a, b) ∈ E s.t. as =
labelS (a) ∧ bt = labelT (b)}. Assuming that all edges have unitary capacities, it is easy to
see that the maximum flow from source to target is |T | iff s is compliant with WS j .
37
Fig. 14. Reduction in proof of Theorem III.6.
To conclude the proof let us calculate the complexity of the procedure illustrated above.
The checks of (c1)...(c4) can be done in linear time in the number of edges of E. The
computation of the maximum flow in a network with integer capacities bounded by C can
be carried out by standard algorithms (e.g., [44]) in time O(C × |V | × |E|), where V and
E denote the set of nodes and edges in the network, respectively. Therefore, the result
follows since C = 1, V = O(nf ) (indeed, |S| = nf and |T | ≤ nf ), E = O(ef ), and the
whole procedure is repeated m times at most, i.e., once for each schema in WS ∨ .
2
Proof of Theorem III.6: The case m ≥ |LP | is trivial since we can accommodate each
trace in a separate schema obtaining a disjunctive schema that, obviously, is both 1complete and 1-sound. Instead, without fixing any bound on the size of LP , the decision
problem is feasible in ΣP2 = NPNP . Indeed, we can guess in NP a disjunctive schema WS ∨ ,
such that |WS ∨ | ≤ m, and then check whether it is both 1-complete and 1-sound. By
Proposition III.5, 1-completeness can be checked in polynomial time and 1-soundness is
co-NP-complete.
Let us now prove that the decision problem is NP-hard. Recall that deciding whether
a graph G = V, E whose vertices have degree at most t can be colored by 3 colors is an
NP-complete problem [44], for any fixed constant t ≥ 4. Let C = {r, b, y} be the set of
available colors. We associate a process PG and a log LPG with G as follows. Let v1 , ..., vn
be any ordering of the vertices in V . Then, for each vertex vi in V , PG contains all the
activities: xi , ei , ri , bi , yi , f ri , f bi , f yi ; moreover, for each edge (vi , vj ) in E with i < j, PG
38
contains the activity ei,j . Finally, PG contains the activities start and end, and no other
activity is in PG . For each vertex vi in V , let {vj1 , ...., vjk } be the subset of its neighbors for
which jh > i, 1 ≤ h ≤ k. Then, for each color C in {b, r, y}, let Ii (C) be the graph shown
in Figure 14.(a), consisting of the following arcs: (start, xi ), (xi , Ci ), (Ci , ei ), (ei , f Ci ),
(f Ci , end), (Ci , Cj1 ), (Cj1 , ei,j1 ), (ei,j1 , end), . . . , (Ci , Cjk ), (Cjk , ei,jk ), (ei,jk , end). Let Ti (C)
be the set of all the topological sorts of the graph Ii (C); then LPG = ∪vi ∈V,C∈{r,b,y} Ti (C).
We show that G is 3-colorable ⇔ EPD(LPG ,1,1,3) over the log LPG admits a solution.
(⇒) Consider a 3-coloring of the graph G, i.e., a function λ : A → {r, b, y}, such that for
any pair of distinct vertices vi and vj with (vi , vj ) ∈ E it is the case that λ(vi ) = λ(vj ).
Given λ, we preliminary build two other 3-colorings, say λ and λ , by circularly shifting
the assignments of colors in λ. Then, we partition the log LPG into three sets, say L, L and
L , such that L (resp. L , L ) contains exactly all the traces in Ti (λ(vi )) (resp. Ti (λ (vi )),
Ti (λ (vi ))), for each vertex vi ∈ V . Now, we claim that EPD(LPG ,1,1,3) has a solution of
the form WS ∨ = {WS, WS , WS }, where WS (resp. WS , WS ) is the result of solving
EPD(L,1,1,1) (resp. EPD(L ,1,1,1), EPD(L ,1,1,1)). We next consider the construction of
WS only, since similar arguments apply for WS and WS as well. Nodes and edges in
WS are obtained by merging all the instances in the set {Ii (λ(vi )) | vi ∈ V }. The local
constraints of WS are such that: start is a xor-fork, and all other activities are and-fork ;
all the activities are or-join; start is the initial activity and end is the only final one. Let
us show that WS is, in fact, a solution for EPD(L,1,1,1). First, WS is 1-complete since it
covers by construction all the traces in L. Moreover, it is also 1-sound, since it does not
introduce any ‘spurious’ trace that is not in L. To see why this is the case, consider two
instances Ii (C) and Ij (C̄) such that i < j, C, C̄ ∈ {r, b, y}. If vi and vj are not adjacent
in E, then the traces in Ti (C) and Tj (C̄) do not share any activity besides start and end;
hence, no spurious trace can be introduced by merging their associated instances. Assume
now that vi and vj are such that (vi , vj ) ∈ E. Then, the traces in Ti (C) and Tj (C̄) might
only share the activity C̄j corresponding to the coloring of vj . But this is impossible, given
that λ is a 3-coloring and hence C = C̄ must hold.
As an example, Figure 14.(b) shows the workflow schema built for the log containing only
the traces in T1 (r) and T2 (y), for a graph with E = {(x1 , x2 )}, whereas Figure 14.(c) shows
the workflow built for T1 (r) and T2 (r) witnessing a spurious trace (dashed edges).
39
(⇐) Assume that WS ∨ = {WS, WS , WS } is a solution for EPD(LPG ,1,1,3). We show
that G can be in fact colored with 3 colors. To this aim, consider the log L (resp. L , L )
formed by all the traces that are topological sorts of instances for WS (resp. WS , WS ).
The following properties hold on the traces in L (similarly for L and L ):
P1 : L does not contain any pair of traces ti ∈ Ti (C) and tj ∈ Tj (C̄), with i < j and
C, C̄ ∈ {r, b, y}, such that (i) (vi , vj ) ∈ E and (ii) C = C̄. Indeed, if such traces exist,
then WS must contain some trace that is not in L because of the arguments used in the
(⇒)-part. But, this contradicts WS ∨ to be a 1-sound schema.
P2 : L does not contain any pair of traces ti ∈ Ti (C) and t̄i ∈ Ti (C̄) with C = C̄. Indeed,
these traces would share the activities start, xi , ei and end. And, again some spurious
trace may be introduced (see Figure 14.(d) showing T1 (r) and T1 (y)).
Armed with the properties above, we can conclude that traces in L are associated with
a subset V̄ of nodes in V that are correctly colored, i.e., each node in V̄ is assigned exactly
one color (P2 ) and any two adjacent vertices are colored with different colors (P1 ). To
conclude the proof it is sufficient to note that all the nodes in G are, in fact, correctly
colored because WS ∨ is a 1-complete schema.
2
Proof of Theorem IV.3: It is easy to see that the algorithm can be implemented with three
scans over LP , used for building the dependency graph, updating the edges, and constructing the local constraints, respectively. These are, in fact, the dominant operations. Then,
we concentrate on the two conditions.
(Maximum Completeness Condition) Let WS be A, E, a0 , AF , Fork, Join. We show
that for each trace s ∈ LP there is an instance I of WS such that s is a topological
sort of I. We first focus on the control flow only, and prove that: ∀s ∈ LP , s is a
topological sort of some subgraph Gs of A, E. To this aim, it is sufficient to show that
∀s[i] ∈ tasks(s), there exists a path in A, E of the form a0 = s[j1 ], s[j2 ], ..., s[jk ] such
that 1 = j1 < j2 < ... < jk = i.
We prove the claim by structural induction on the index 1 ≤ j ≤ i. Base: For j = 1,
the claim trivially holds, since s[1] = a0 , for each trace s. Induction: Assume that the
claim holds, for each activity s[h] with h < j. We show that it holds for s[h + 1] as well.
Clearly, if (s[h], s[h + 1]) is in A, E, then we have done. Therefore, assume that such
an edge has been removed in Step 3. In this case, because of Step 4–9, there exists an
40
activity s[k] with k < h such that both (s[k], s[h]) and (s[k], s[h + 1]) have been added to
E. Hence, the result follows because of the inductive hypothesis on s[k].
To conclude the proof, we have to take care of the fact that each activity in s may be
correctly executed by satisfying all the local constraints. But this is straightforward, by
definition of Fork and Join in Steps 12-18.
(Monotonicity Condition) Let WS be A, E, a0 , AF , Fork, Join. Let LP be a log such
that LP ⊆ LP , and let WS = A , E , ao , AF , Fork , Join be the output of MineWorkflow
on input LP . We have to show that for each instance I of WS , and for each topological
sort s of I , there exists an instance I of WS of which s is a topological sort as well. We
first focus again on the control flow only, and we show that ∀s[i] ∈ tasks(s), there exists
a path in A, E of the form a0 = s[j1 ], s[j2 ], ..., s[jk ] such that 1 = j1 < j2 < ... < jk = i.
The proof is again by induction on 1 ≤ j ≤ i. For j = 1 it trivially holds. Let us show
that it holds for s[h + 1], by assuming that it holds for each s[j] with j ≤ h. To this aim,
consider a task s[p] with p ≤ h such that (s[p], s[h + 1]) is in E . Notice that such a task
must exist, otherwise s would not be a topological sort of WS . If (s[p], s[h + 1]) is also in
E, then we have concluded. Otherwise, we observe that (s[p], s[h + 1]) ∈ E entails that
there exists a trace s̃ ∈ LP such that s̃[f ] = s[p] and s̃[f +1] = s[h+1], for some suitable f .
Since s̃ is in LP as well, it is the case that (s[p], s[h + 1]) was removed from E in Step 3, for
(s[p] and s[h + 1]) were recognized as parallel activities. Let s̃[l] = s[k], for some suitable
l < f and k < h, be the “closest” task (singled out in Step 5) that precede both s[p] and
s[h+1] — in the worst case, the initial activity. In this case, (s̃[l], s̃[f +1]) = (s[k], s[h+1])
is added to E in Steps 4–9, and the result follows by inductive hypothesis on s[k].
To conclude the proof, we have to take care of the fact that each activity in s may be
correctly executed by satisfying all the local constraints. Again by induction, let s[i] be
an activity and assume that all the activities of the form s[j], with j < i, can be correctly
activated. If Join(s[i]) = OR, it is sufficient to note that s[i] can be in any case activated by
the arcs added in Step 5 (of the form (pre, s[i])), since pre is activated by induction and
that Fork(pre) = OR must hold. Otherwise, i.e, if Join(s[i]) = AND, each instance executing
s[i] executes all of its predecessors, by construction of constraints.
2