Conformance Checking using Cost

Decomposing Data-aware
Conformance Checking
Massimiliano de Leoni, Jorge Munoz-Gama,
Josep Carmona, Wil van der Aalst
PAGE 0
Example: A Credit Institute
(a; {A = 3000;R = Michael; E = Pete});
(b; {V = OK;E = Sue});
(c; {I = 530;D = OK;E = Sue});
(f; {E = Pete});
Activity d should have
occurred, since
(a; {A = 3000;R =
Michael;
E = Pete});
Activity
h hasn’t
amount<5000
(b; {V = OK;E =been
Pete});
executed:
D
For
such
a
credit
(c; {I = 530;D = OK;E
= Sue});
«Sue»
not
cannot
be
OK
amount,
should
(d, {I = 599; D = NOK;
E
=
Sue});
authorized to
be
interest <450
(f; {E = Pete}); perform
b: is not
Assistant
PAGE
(a; {A = 5001;R = Michael; E = Pete});
(b; {V = OK;E = Pete});
(c; {I = 530;D = NOK;E = Sue});
(f; {E = Pete});
Petri Net with Data : Variables and
Read/Write Operations
Verification
Interests
Assessment (c)
Decision
Register
Register Loan
Loan Rejection
Rejection
(g)
(g)
Verify
Verify (b)
(b)
n4
n4
n3
Inform Customers
(e)
n5
n6
Open Credit Loan
(h)
n1
Credit Request
(a)
Register Negative
Verification (d)
n2
Amount
Write Operations
Read Operations
Renegotiate (f)
Variables
PAGE 2
Binding
• A binding is a triplet (t,r,w) where
• t is the transition that fires
• r: V  U is the variables that are read along with
the values
− dom(r) is the set of read variables
− r(v) is the value read for variable v
• w: V  U is the variables that are written along
with the values
− dom(w) is the set of read variables
− w(v) is the value read for variable v
PAGE 3
A Sequence of bindings
Verification
Interests
Assessment (c)
Decision
Register
Register Loan
Loan Rejection
Rejection
(g)
(g)
Verify
Verify (b)
(b)
n4
n4
n3
Inform Customers
(e)
n5
n6
Open Credit Loan
(h)
n1
Credit Request
(a)
Register Negative
Verification (d)
n2
Amount
Renegotiate (f)
Necessary condition for a binding (t,r,w): dom(r) and dom(w) coincides with
the expected read and write operations.
PAGE 4
n1
Each transition is associated with all
valid bindings
Verification
Interests
Assessment (c)
Decision
Register
Register Loan
Loan Rejection
Rejection
(g)
(g)
Verify
Verify (b)
(b)
n4
n4
n3
Inform Customers
(e)
n5
n6
Open Credit Loan
(h)
Credit Request
(a)
Register Negative
Verification (d)
n2
Amount
Renegotiate (f)
Transition
Guard
Credit Request
--
Verify
0.1 * r(A) < w(I) < 0.2 * r(A)
Assessment
r(V) = true
Register Negative Verification r(V) = false AND w(D) = false
Inform Requester
--
Register Loan Rejection
r(D) = false
Open Credit
r(D) = true
PAGE 5
Alignments
Move in both without
incorrect write operations
Move in both with
incorrect write operations
Move in log
Move in process
PAGE 6
Cost of alignments
• Each move is associated with a cost
• Cost of alignment is the sum of the costs of its moves
<x>
: Cost of “move
on model”
: Cost of “move
on log”
<y>
1
1
<w>
2
: Cost of
reading/writing a
wrong value
<z>
: Cost of not
writing or
reading a
variable
3
2
2
Verification
2
Interests
Assessment (c)
3
2
Decision
2
3
Verify
Verify (b)
(b)
Register
Register Loan
Loan Rejection
Rejection
(g)
(g)
2
n4
n4
3
3
2
2
n3
3
3
Inform Customers
(e)
n5
n6
Open Credit Loan
(h)
n1
Credit Request
(a)
Register Negative
Verification (d)
n2
3
3
2
2
Amount
Renegotiate (f)
2
2
Cost of alignments: some examples
10
8
An optimal
alignment: an
alignment with the
lowest cost
Finding optimal alignments: Approach 1
1. Computing the control-flow alignment using existing
techniques (the «Arya» technique)
2. Enriching the alignment with the data operations.
• The alignment is enriched, thus minimizing the cost of
the alignment
• Naturally formulated as an Mixed Integer Linear Program
Log:
S {z=10,y=0} – A{x=1} – C{y=11} – E – A{x=3} –
B{y=13} -
Process: S {z=1, y=0} – A
C
E –– A{x=3}
A
B
A{x=10} –– C{y=11}
–– E
–– B{y=13}
-- FF
M. de Leoni, W.M.P. van der Aalst: Aligning event logs and process models
for multi-perspective conformance checking: An approach based on integer linear
programming. Proceedings of BPM 2013
PAGE 9
Finding optimal alignments: Approach 2
F. Mannhardt, M. de Leoni, H. Reijers, W.M.P. van der
Aalst: Balanced Multi-Perspective Checking of Process
Conformance. Computing Journal, Springer
(under review)
Log:
(a; {A = 3000;R = Michael; E = Pete}); – (b; {V = NOK; E = Sue});
Process:
Process:(a;
a {A = 3000;R = Michael; E = Pete}); – (b;
b {V = NOK; E = Sue});
PAGE 10
Finding an optimal alignments:
complexity
• Finding an optimal alignments is exponential on the size
of the model, i.e. the number of activities and data
variables.
• IDEA: Divide-and-conquer approach
• Petri Net with Data is decomposed into smaller fragments
that are checked separetely.
• If the decomposition is valid
− Any trace is fitting the entire model if and only if it fits all
smaller fragments.
t1 t2 t3
t1 t2 t3 t4 t5 t6
t5 t6
t3 t4
PAGE 11
Valid decomposition without data
• The following can only appear in precisely one fragment:
1.
2.
3.
4.
Places
Invisible transitions
Visible transitions with the same label (name)
Arcs
• Visible transitions with unique label may appear in
multiple fragments
• Each variable appears in precisely one fragment
• Each transition shared among fragments may read/write
different variables
• The union of the fragments is the entire model
W.M.P. van der Aalst: Decomposing petri nets for process mining: A generic
approach. Distributed and Parallel Databases 31(4) (2013)
PAGE 12
Valid decomposition with data
• The following can only appear in precisely one fragment:
1.
2.
3.
4.
Places
Invisible transitions
Visible transitions with the same label (name)
Arcs
• Visible transitions with unique label may appear in
multiple fragments
• Each variable appears in precisely one fragment
• Each transition shared among fragments may read/write
different variables
• The union of the fragments is the entire model
PAGE 13
Instantation of Valid Decompositions
• Different strategies are possible.
• We propose two strategies extending what exists for the
data-unaware case:
• Maximal Decomposition
• SESE-based decomposition
Verification
Interests
Assessment
Decision
Register
Register Loan
Loan Rejection
Rejection
Verify
Verify
n4
n4
n3
Inform Customers
n5
n6
Open Credit Loan
n1
Register Negative
Verification
n2
Credit Request
Amount
Renegotiate
PAGE 14
Maximal Decomposition
• Construction the smallest components that satisfy the
Valid Decomposition Definition
• Variables and Places are mutually exclusive
Assessment
Assessment
Register
Register Loan
Loan Rejection
Rejection
Register
Register Loan
Loan Rejection
Rejection
n4
n4
n5
n3
n6
Inform Customers
Inform Customers
Verify
Open Credit Loan
Register Negative
Verification
Register Negative
Verification
Open Credit Loan
Renegotiate
Register
Register Loan
Loan Rejection
Rejection
Assessment
Interests
Verify
Decision
Verify
Amount
Credit Request
Verification
Renegotiate
Register Negative
Verification
Verify
n1
Credit Request
n2
Credit Request
Renegotiate
Open Credit Loan
SESE-based Algorithm
Register
Register Loan
Loan Rejection
Rejection
Assessment
n1
Credit Request
n2
Verify
n3
n4
n4
n5
n6
Open Credit Loan
Register Negative
Verification
a) Petri Net
S1
Inform Customers
a
S8
S9
b
S10
S2
S3
S4
Renegotiate
m
S1
S2
S3
S5 S6
e
d
c
a
S7
b
g
n
o
p
k
l
c
f
S8
i
j
h
k
l
S5
i
j
c) RPST
S9
m
n
o
p
S6
e
S4
d
S7
f
g
h
S10
b) Workflow graph and SESEs
PAGE 16
Example of the SESE-based Algorithm
(k = 2)
Assessment
Assessment
Register
Register Loan
Loan Rejection
Rejection
Register
Register Loan
Loan Rejection
Rejection
n4
n4
n5
n3
n6
Inform Customers
Verify
Open Credit Loan
Register Negative
Verification
Register Negative
Verification
Open Credit Loan
Renegotiate
Register
Register Loan
Loan Rejection
Rejection
Assessment
Interests
Verify
Verification
Verify
Amount
Credit Request
Decision
Renegotiate
Register Negative
Verification
Open Credit Loan
Verify
n1
Credit Request
n2
Credit Request
Renegotiate
PAGE 17
Implementation
Available in the package
DataConformanceChecker
PAGE 18
Experiments
• Generating different event logs with 5000 traces with a
different average trace length
• This ensured by enforcing a larger number of credit renegotiations
• 20% of the transition firings are so as to not satisfy the guards
Verification
Interests
Assessment
Decision
Register
Register Loan
Loan Rejection
Rejection
Verify
Verify
n4
n4
n3
Inform Customers
n5
n6
Open Credit Loan
n1
Register Negative
Verification
n2
Credit Request
Amount
Renegotiate
PAGE 19
Results: an exponential reduction of the
computation time
1000
(in seconds)
Computation Time
10000
No Decomposition
SESE-based decomposition (k=2)
100
10
5
10
15
20
25
30
Average number of events per event-log trace
PAGE 20
Projection on the model
#correct(t,DPN) = number of
#total(t,DPN) = number of
• For
each
transition
moves
in both
without t:
moves for t in the alignments
incorrect
write operations
for
• n = number
of fragments
in which t occursof each log trace and DPN
t in the alignments between
•each
𝐷𝑃𝑁
the DPN
i-th fragment in which t occurs.
log𝑡trace
𝑖 is and
𝑓𝑖𝑡𝑇𝑟𝑎𝑛𝑠𝑖𝑡𝑖𝑜𝑛 𝑡 = 1 −
𝑛 #𝑐𝑜𝑟𝑟𝑒𝑐𝑡(𝑡, 𝐷𝑃𝑁
𝑖=1
𝑡 𝑖)
#𝑡𝑜𝑡𝑎𝑙(𝑡, 𝐷𝑃𝑁 𝑡 𝑖 )
𝑛
PAGE 21
Projection on the model based on
decomposition is an approximation!
No decomposition
Decomposition
Move in both without incorrect write
operations for t
Move in both without
incorrect write operations
for t in all fragments
containing t
• Move in both with incorrect write
operations for t
• Move in log
• Move in move
The same move for t in at
least one of fragments
containing t
t1 t2 t3
t1 t2 t3 t4 t5 t6
t5 t6
t3 t4
PAGE 22
Projection on the model (without
decomposition)
With decomposition
Without decomposition
PAGE 23
Conclusion
• Finding an alignment is exponential in the model size
• To speed the computation:
1. Decompose the model in submodels
2. Alignment each trace with each submodel
• The decomposition needs to be valid:
Any trace is fitting the entire model if and only if it fits all
smaller fragments.
• A more extensive evaluation is needed
•
•
Using real processes
Synthetic data referring to models with dozens of
transitions
PAGE 24