Summary Current Data Alignment could be sub

Data Conformance Checking
using Optimal Alignments
Felix Mannhardt, Massimiliano de Leoni,
Hajo A. Reijers
Problem (Adapted from Massimiliano de Leoni)
Activity d should have
occurred, since
Activity<h5000
hasn’t
amount
been executed: D
(a; {A = 5001;R = Michael; E = Pete});
«Sue»
cannot
be not
«OK»
(b; {V = OK;E = Pete});
authorized to
(c; {I = 530;D = NOK;E = Sue});
perform b: is not
(f; {E = Pete});
Assistant
(a; {A = 3000;R = Michael; E = Pete});
(b; {V = OK;E = Sue});
(c; {I = 530;D = OK;E = Sue});
(f; {E = Pete});
Department of Mathematics and Computer Science
PAGE 1 / 18
How Does Data Alignment Work?
β€’ Petri Nets with Data:
A
𝑋 < 5000
B
𝑋 β‰₯ 5000
D
C
X
β€’ Two new β€œMoves” with associated β€œCosts”:
β€’ Move with incorrect write operation
β€’ Move with missing write operation
β€’ Formulation of an MILP problem for CF Alignment:
+π‘π‘œπ‘ π‘‘(π‘šπ‘–π‘ π‘ π‘–π‘›π‘”) = πœ…π·π΄
β€’ π’Žπ’Šπ’ π’™πŸ
β€’ 𝒙 βˆ’ π‘΄π’™πŸ ≀ 𝟐𝟎𝟎𝟎 ∧ βˆ’π’™ βˆ’ π‘΄π’™πŸ ≀ βˆ’πŸπŸŽπŸŽπŸŽ ∧ 𝒙 β‰₯ πŸ“πŸŽπŸŽπŸŽ
β€’ π’™πŸ ∈ 𝟎, 𝟏 , 𝒙 ∈ β„€
[1] M. de Leoni, W. M. P. van der Aalst (2103). Aligning Event Logs and Process Models for Multi-Perspective
Conformance Checking: An Approach Based on Integer Linear Programming.
[2] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst (2011). Conformance checking using cost-based fitness
analysis.
Department of Mathematics and Computer Science
PAGE 2 / 18
Current Data Conformance Checker in ProM
Input
Output
Petri Net with
Data
Data Conformance Checking
Event Log
Cost
Control Flow
Alignment
Department of Mathematics and Computer Science
PAGE 3 / 18
Data
Alignment
Data
Alignment
Shortcomings of the Current Solution
(a; {A = 3000;R = Michael});
(b; {V = NOK});
(c; {I = 530;D = OK});
(f);
(𝜿π‘ͺ𝑨 = 𝟎)
Perfect CF Alignment
L
a
b
c
f
P
a
b
c
f
Resulting DF Alignment (πœΏπ‘«π‘¨ = πŸ‘)
(a; {A = 5001; R = Michael});
(b; {V = OK});
(c; {I = 530;D = NOK});
(f);
Better DF Alignment
(𝜿π‘ͺ𝑨 = 𝟏)
(a; {A = 3000;R = Michael});
(b; {V = NOK});
(c; {I = 530;D = OK});
(f);
Department of Mathematics and Computer Science
PAGE 4 / 18
First Idea (Multi-Alignment Approach)
Input
Output
Petri Net with
Data
Optimal Data
Alignment
Data Conformance Checking
Yes
Event Log
No
Optimal?
𝜿π‘ͺ𝐴 β‰₯ 𝚱 𝑩𝒆𝒔𝒕𝑺𝒐𝑭𝒂𝒓
OR
Cost
πœΏπ‘«π΄ = 𝟎
Control Flow
Alignment
Data
Alignment
Cache
Image source: http://commons.wikimedia.org/wiki/File:Pictofigo_-_Idea.png
Department of Mathematics and Computer Science
PAGE 5 / 18
Second Idea (Single-Alignment Approach)
(a; {A = 3000;R = Michael});
(b; {V = NOK});
(c; {I = 530;D = OK});
(f);
<a>
Move in Both
<a>
Move in Both
<a,b>
…
<a,b,c>
(1,0)
<a,b,c>
Move in Both
<a,b,c,f>
Image source: http://commons.wikimedia.org/wiki/File:Pictofigo_-_Idea.png
Department of Mathematics and Computer Science
<a>
(0,0)
<a,b>
Move in Both
(1,0)
(πœ…πΆπ΄ ,πœ…π·π΄ )
<>
PAGE 6 / 17
(0,2)
…
Single-Alignment Approach I
β€’ For each node in the search space
β€’ Compute a Data Alignment (MILP) for the prefix
β€’ Remember πœ…π·π΄ and the variable assignment
β€’ Use the variable assignment to check if an MILP needed
β€’ A best-first search on the overall cost (πœ…π·π΄ + πœ…πΆπ΄ )
returns one optimal Data Alignment
β€’ Use of ILP heuristic for A* [2] still possible
β€’ πœ…π·π΄ + πœ…πΆπ΄ never gets better (no negative edges!)
β€’ But, our search space is bigger!
[2] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst (2011). Conformance checking using cost-based fitness
analysis.
Department of Mathematics and Computer Science
PAGE 7 / 18
Single-Alignment Approach: Search Space
{A =states
5001;R =are
Michael});
β€’ (a;
Two
equivalent iff (a; {A = 5001;R = Michael});
(b; {V = OK});
(b; {V = OK});
vs.
β€’
Same
marking
of
β€œEvent
Net”
& {IProcess
(c; {I = 530;D = OK});
(c;
= 530;D =Model
OK});
(c;
{I = 530;D
= NOK});assignment wrt.
(c; {I
= NOK});
β€’ Same
variable
all= 530;D
guards
(f);
(f);
as in [2]
[2] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst (2011). Conformance checking using cost-based fitness
analysis.
Department of Mathematics and Computer Science
PAGE 8 / 18
Comparison: Improvement of Fitness
Change in Fitness
60%
50%
40%
30%
254 Traces
2 Traces
20%
10%
0%
Average
Max
Insurance Institute (12,000 Traces)
Synthetic Example (1,200 Traces, Length 4-15, 10% Noise)
Department of Mathematics and Computer Science
PAGE 9 / 18
Comparison: Dutch Insurance Institute
120
Seconds
100
80
60
40
20
0
Running Time
Old
Department of Mathematics and Computer Science
PAGE 10/ 18
Multi
Single
Comparison: Dutch Insurance Institute [1]
4.5
120
4
100
3.5
80
# MILP
# MILP
3
2.5
2
1.5
60
40
1
20
0.5
0
0
Average
Multi
Max
Single
Multi
Single
[1] M. de Leoni, W. M. P. van der Aalst (2103). Aligning Event Logs and Process Models for Multi-Perspective
Conformance Checking: An Approach Based on Integer Linear Programming.
Department of Mathematics and Computer Science
PAGE 11 / 18
Comparison: Dutch Insurance Institute
50
4,000
45
3,500
# Queued States
# Queued States
40
35
30
25
20
15
10
3,000
2,500
2,000
1,500
1,000
5
500
0
0
64
Average
Old
Multi
Department of Mathematics and Computer Science
Max
Single
PAGE 12 / 18
Old
Multi
Single
Comparison: Synthetic Model (10% Noise)
80
70
Seconds
60
50
40
30
20
10
0
Running Time
Old
Department of Mathematics and Computer Science
PAGE 13 / 18
Multi
Single
16
1,600
14
1,400
12
1,200
10
1,000
# MILP
# MILP
Comparison: Synthetic Model (10% Noise)
8
6
800
600
4
400
2
200
0
0
Average
Multi
Max
Single
Department of Mathematics and Computer Science
Multi
PAGE 14 / 18
Single
800
180,000
700
160,000
600
140,000
# Queued States
# Queued States
Comparison: Synthetic Model (10% Noise)
500
400
300
200
66
100
26
0
120,000
100,000
80,000
60,000
40,000
20,000
0
Average
Old
Multi
Department of Mathematics and Computer Science
Max
Single
PAGE 15 / 18
Old
Multi
Single
Comparison Wrap-up
β€’ Multi-Alignment Approach
β€’ Building CF Alignments (sorted) up to a certain 𝜿π‘ͺ𝑭 is
not feasible for certain models/traces
β€’ Though faster in some cases (Good Fitness)
β€’ Single-Alignment Approach
β€’ Again, solving many (smaller) MILPs
β€’ Integrated Optimizations:
βˆ’ Check if guards already fulfilled
βˆ’ Check if only write operations missing
βˆ’ Re-use calculated Data Alignments
Department of Mathematics and Computer Science
PAGE 16 / 18
 No MILP
 No MILP
 1 x MILP
What Next?
β€’ Improve the Implementation
β€’ Faster MILP solving by re-use the lpsolve instance?
β€’ Reduce memory footprint of both approaches?
β€’ Will a Decomposition of the process model help?
β€’ Case study with Event Log from Italian local police
β€’
β€’
β€’
β€’
Event Log about the management of road-traffic fines
Process with multiple decision points
Process with non-trivial guards
Event Log contains data attributes
β€’ Submit Paper to FASE 2014
Department of Mathematics and Computer Science
PAGE 17 / 18
Summary
β€’ Current Data Alignment could be sub-optimal
β€’ Two approaches for an optimal Data Alignment
β€’ Multi-Alignment Approach
CF
Alignment
MILP
CF
Alignment
MILP
CF
Alignment
…
CF
Alignment
Optimal?
Data
Alignment
MILP
MILP
β€’ Single-Alignment Approach
Best First Search
β€’ Both implemented in ProM
β€’ Soon to be integrated in Data Aware Replayer
β€’ Which one to use depends on the case
Department of Mathematics and Computer Science
LAST PAGE
Data
Alignment
Image source: http://commons.wikimedia.org/wiki/File:Pictofigo_-_Idea.png
Department of Mathematics and Computer Science