Association Rule Mining

Profiling based unstructured process
logs
Peter Khisa Wakholi
Supervisors: Prof Wil Van der Aalst – Eindhoven University
Prof Ddembe Williams – Makerere University
7/31/2017
1
Section 1
INTRODUCTION
7/31/2017
2
Process Mining Overview
Construction of models from
event logs:
• Process Model
• Social networks
• Organisational Model
Compare model with the
event log and analyse
discrepancies.
•Audit and security
Extend model with a new
aspect or perspective
• Performance
Characteristics
7/31/2017
3
Unstructured Processes
• Browsing a website is an
example of an unstructured
process. Other examples
– Patient flow in hospital
– Customer care processes
– Use of a machine
• Unstructured process lack a
definite structure or organization
and are not formally organized or
systematized during their
execution.
• The execution path depends on:
– a set of factors that control the flow
– attributes and interests of the actors
7/31/2017
4
The Problem
• Construction of models from unstructured
event logs is possible but interpretation is
difficult.
• There is a need to develop a better
understanding for unstructured processes.
• This understanding would help in;
– Behavioural analysis to gain new insights on
processes and actors
– Predicting the execution path of incomplete
processes
7/31/2017
5
Motivation
• Profiling can help develop a better
understanding of the underlying process
models by.
– Extracting meaningful process models from
logs
– Determining the rules that define the
control flow for each case
– Determining the attributes of the actors that
influence observed behaviour.
7/31/2017
6
Section 2
CURRENT RESEARCH
7/31/2017
7
Research Questions
• How can complete and highly accurate
profiles be developed from unstructured
event log data?
– What techniques that can be used to extract
process related profiles based on event log data?
– How can these techniques be deployed to develop
a complete profile for unstructured processes?
– What interpretation or meaning can be attributed
to observed behavior in the profiles?
7/31/2017
8
Research Approach
• Experimentation
– Develop a concept
– Experiment based on model generated
event logs
– Experiment on real logs
– Develop a model, method, guidelines or
framework
7/31/2017
9
Hypothesis of DFD for Profiling Process
Event Logs
Profiling Data
High level
Petri net
Event Log
File
Domain
Knowledge
Filtered Log
File
Association
rules
Filtering
Clustering
& Filtering
Association
Rule Mining
Process
Mining
7/31/2017
Intra
Profile
PROM
Analysis
10
Profiling Hypothesis
• Event Log File – This is a log of events for an
unstructured business process. It is assumed that it
contains process related data for extracting the model
and case related data for developing profiles.
• Clustering and Filtering – Real life logs contain a lot
of noise. In addition, the underlying process models
could be complex. The purpose of this stage will be to
refine the logs through filtering and clustering based
on some attributes. The current PROM plug-ins will be
assessed for their appropriateness in profile
generation.
• Process Mining – The refined log is mined to
discover the underlying process model, which is used
as the basis for profile generation. This research will
seek to identify appropriate Plug-ins for this task.
7/31/2017
11
Profiling Hypothesis ....
• Association rule mining – This will pay a major part
in generating profiles. The idea is to map every path
in the process model with characteristics that define
its users based on association rules. The first part of
this study will focus on this.
• PROM Analysis – We recognise that there are many
PROM plug-ins that can be used to provide some
profile related information. They will be analysed in
order to determine how appropriate they are and to
develop some guidelines for profiling.
• Intra Profile – Association rules generated are only
useful if they do not contradict themselves. This
stage will seek to develop a mechanism to refine the
rules by removing any contradictions.
7/31/2017
12
Profiling Hypothesis ....
• Filtering – Knowledge of the domain under
study. This knowledge should be used to
ensure that the profile generated clearly
reflects the expected behaviour patterns. A
specific domain will be identified in order to
illustrate the concept.
• Profile – The expected output of all the
processes explained above is a complete
profile. The study will explore how this can be
achieved.
7/31/2017
13
Section 3
ASSOCIATION RULE MINING
FOR PROFILE GENERATION
7/31/2017
14
The Idea
• For every path (arc between two places)
of an unstructured process model
– Develop a list of characteristics that
defines attributes of actors that follow the
path.
– The profile of an actor is the list of
attributes defined by the path followed.
7/31/2017
15
Approach
• Develop an algorithm to generate association
rules.
• Implement the algorithm in PROM.
• Develop a model using CPN tools.
• Analyse the results using the plug-in.
• Refine the algorithm and idea till the results
are satisfactory
• Test the plug-in using real life logs
• Refine the idea based on the results
obtained.
• Write a paper on the findings
7/31/2017
16
The Website
Browse
Model
7/31/2017
17
Discovered Process Model
7/31/2017
18
Association Rule Mining
• Goal: Given an unstructured event log each
of which contain some event log and data
attributes from a given collection.
– Develop a process model that defines control flow
– For every segment in the model generate
association rules
• Express the segment as a sequence
• Find the attributes of the actors that are associated with
the segment
– Develop a set of rules that govern the
entire path for each case.
7/31/2017
19
Next Steps
• Develop a definitive and detailed
algorithm
• Develop a plug-in in PROM to test the
algorithm
7/31/2017
20