faculteit technologie management

Genetic Process Mining
Wil van der Aalst
Ana Karla Medeiros
Ton Weijters
Eindhoven University of Technology
Department of Information Systems
[email protected]
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining
– Internal Representation
– Fitness measure
– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining
– Internal Representation
– Fitness measure
– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Process Mining
X = apply for license
A = classes motobike
B = classes car
C = theoretical exam
/faculteit technologie management
C = theoretical exam
D = practical motorbike exam
E = practical car exam
Y = get result
Process Mining (cont.)
• Most of the current techniques cannot handle
– Structural constructs: non-free choice, duplicate tasks
and invisible tasks
– Noisy logs
– Reason: local approach
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining
– Internal Representation
– Fitness measure
– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Genetic Algorithms
– Global approach
global optimum
/faculteit technologie management
local optimum
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining
– Internal Representation
– Fitness measure
– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Genetic Process Mining (GPM)
Aim: Use genetic algorithm to tackle noise, duplicate activities, non-free
choice and invisible tasks
Internal Representation
Fitness Measure
Genetic Operators
/faculteit technologie management
GPM – Internal Representation
• Causal Matrix
Input

X
A
B
C
X
A
B
C
D
E
Y
/faculteit technologie management
D
E
Y
Output
GPM – Internal Representation
• Causal Matrix
Input

X
A
B
C
D
E
Y
X
0
1
1
0
0
0
0
A
0
0
0
1
0
0
B
0
0
0
1
1
0
0
C
0
0
0
0
1
1
1
D
0
0
0
0
0
0
1
E
0
0
0
0
0
0
1
Y
0
0
0
0
0
0
0
/faculteit technologie management
0
Output
GPM – Internal Representation
• Causal Matrix
Input

X
A
B
C
D
E
Y
Output
X
0
1
1
0
0
0
0
A \/ B
A
0
0
0
1
0
0
C /\ D
B
0
0
0
1
1
0
0
C /\ E
C
0
0
0
0
1
1
1
0
D \/ E
D
0
0
0
0
0
0
1
Y
E
0
0
0
0
0
0
1
Y
Y
0
0
0
0
0
0
0
True
/faculteit technologie management
GPM – Internal Representation
• Causal Matrix
Input
True
X
X
A \/ B
A /\ C
B /\ C
D \/ E

X
A
B
C
D
E
Y
Output
X
0
1
1
0
0
0
0
A \/ B
A
0
0
0
1
0
0
C /\ D
B
0
0
0
1
1
0
0
C /\ E
C
0
0
0
0
1
1
1
0
D \/ E
D
0
0
0
0
0
0
Y
E
0
0
0
0
0
0
1
1
Y
0
0
0
0
0
0
0
True
/faculteit technologie management
Y
GPM – Internal Representation
• Causal Matrix
– Compact representation
Task
X
Input
{}
Output
{{A,B}}
A
B
C
D
{{X}}
{{C},{D}}
{{X}}
{{C},{E}}
Input
True
X
{{A,B}}  {{D,E}}
X
A
0
1
{{A},{C}} AX {{Y}}
0
0
E
Y
{{B},{C}}
{{D},{E}}
X
A \/ B
A /\ C
B /\ C
D \/ E
B
C
D
E
Y
Output
1
0
0
0
0
A \/ B
0
1
1
0
0
C /\ D
0
{{Y}}
0
0
{}
0
0
1
0
1
0
C /\ E
0
0
0
1
1
0
D \/ E
0
0
0
0
0
1
Y
E
0
0
0
0
0
0
1
Y
Y
0
0
0
0
0
0
0
True
B
C
D
/faculteit technologie management
GPM – Internal Representation
• Causal Matrix
– Semantics
Task
A
Input
{}
Output
{{B},{C,D}}
B
C
D
E
{{A}}
{{A}}
{{A}}
{{B},{C}}
{{E,F}}
{{E}}
{{F}}
{{G}}
F
G
{{B},{D}}
{{E},{F}}
{{G}}
{}
/faculteit technologie management
Invisible tasks only fire to
enable visible tasks!
GPM – Internal Representation
• Causal Matrix
Deadlock!
– Semantics
Task
A
Input
{}
Output
{{B},{C,D}}
B
C
D
E
{{A}}
{{A}}
{{A}}
{{B},{C}}
{{E,F}}
{{E}}
{{F}}
{{G}}
F
G
{{B},{D}}
{{E},{F}}
{{G}}
{}
/faculteit technologie management
Invisible tasks only fire to
enable visible tasks!
GPM – Internal Representation
• Causal Matrix
– Mappings
Task
A
Input
{}
Output
{{B},{C,D}}
B
C
D
E
{{A}}
{{A}}
{{A}}
{{B},{C}}
{{E,F}}
{{E}}
{{F}}
{{G}}
F
G
{{B},{D}}
{{E},{F}}
{{G}}
{}
/faculteit technologie management

GPM – Internal Representation
• Causal Matrix
– Mappings
Task
A
B
C
D
Input
{}
{}
{{A}}
{{A,B}}
Output
{{C,D}}
{{D}}
{}
{}
/faculteit technologie management

GPM – Fitness Measure
• Main idea
– Benefit the individuals that can parse more frequent
material in the log
• Challenges
– How to assess an individual’s fitness?
– How to punish individuals that allow for undesired
extra behavior?
/faculteit technologie management
Fitness - How to assess an individual’s
fitness?
- Use continuous semantics parser and register
problems
L = log and CM = causal matrix
/faculteit technologie management
B
SS
A
E
D
EE
C
Trace:
SS,A,B,C,D,EE
Original net
B
SS
A
E
D
EE
C
Individual
For noise-free, fitness punishes:
OR-split  AND-split
AND-join  OR-join
/faculteit technologie management
B
SS
A
E
D
EE
Trace:
SS,A,B,C,D,EE
C
Original net
B
SS
A
E
D
EE
C
Individual
For noise-free, fitness punishes:
OR-join  AND-join
AND-split  OR-split
/faculteit technologie management
Fitness - How to assess an individual’s
fitness?
/faculteit technologie management
Fitness - How to punish individuals that allow
for undesired extra behavior?
Fitness = 1
/faculteit technologie management
Fitness - How to punish individuals that allow
for undesired extra behavior?
- Count the amount of enabled tasks at every
reachable marking
/faculteit technologie management
Fitness Measure
L = log and CM = causal matrix and CM[] = population
where
/faculteit technologie management
Genetic Operators
• Crossover
– Recombines existing material in the population
– Crossover probability
– Crossover point = task
– Subsets are swapped
• Mutation
– Introduce new material in the population
– Mutation probability
– Every task of a individual can be mutated
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining
– Internal Representation
– Fitness measure
– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Experiments and Results
• Experiments
– ProM framework
• Genetic Algorithm Plug-in
• http://www.processmining.org
– Simulated data
• Results
– The genetic algorihm found models that could parse all
the traces in the log
/faculteit technologie management
ProM framework – Genetic Algorithm Plug-in
/faculteit technologie management
ProM framework – Genetic Algorithm Plug-in
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining
– Internal Representation
– Fitness measure
– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Conclusion and Future Work
• Conclusion
– Genetic algorithms can be used to mine process
models
• Future Work
– Tackle duplicate tasks
– Apply the genetic process mining to "real-life" logs
/faculteit technologie management
http://www.processmining.org
/faculteit technologie management