using process mining to measure the expected costs of business

Association for Information Systems
AIS Electronic Library (AISeL)
PACIS 2016 Proceedings
Pacific Asia Conference on Information Systems
(PACIS)
Summer 6-27-2016
USING PROCESS MINING TO MEASURE
THE EXPECTED COSTS OF BUSINESS
PROCESSES
Bo Hsiao
Chang Jung Christian University, [email protected]
LihChyun Shu
National Cheng Kung University, [email protected]
Michal Young
University of Oregon, [email protected]
Hsueh-Ching Yang
National Cheng Kung University, [email protected]
Follow this and additional works at: http://aisel.aisnet.org/pacis2016
Recommended Citation
Hsiao, Bo; Shu, LihChyun; Young, Michal; and Yang, Hsueh-Ching, "USING PROCESS MINING TO MEASURE THE
EXPECTED COSTS OF BUSINESS PROCESSES" (2016). PACIS 2016 Proceedings. 264.
http://aisel.aisnet.org/pacis2016/264
This material is brought to you by the Pacific Asia Conference on Information Systems (PACIS) at AIS Electronic Library (AISeL). It has been
accepted for inclusion in PACIS 2016 Proceedings by an authorized administrator of AIS Electronic Library (AISeL). For more information, please
contact [email protected].
USING PROCESS MINING TO MEASURE THE EXPECTED
COSTS OF BUSINESS PROCESSES
Bo Hsiao, Department of Information Management, Chang Jung Christian University, Tainan
City, Taiwan, [email protected]
LihChyun Shu, Department of Accountancy, National Cheng Kung University, Tainan City,
Taiwan, [email protected]
Michal Young, Department of Computer and Information Sciences, University of Oregon,
USA, [email protected]
Hsueh-Ching Yang, Department of Accountancy, National Cheng Kung University, Tainan
City, Taiwan, [email protected]
Abstract
Process mining is a means to extract hidden information from the event logs accumulated by the
information systems that drive increasingly complex business processes, potentially improving their
manageability. Traditional process mining is used mainly to extract a useful description of how an
enterprise’s processes actually work (as versus how they are documented to work), usually presented
in the form of a flowchart. To the best of our best knowledge, previous process mining techniques have
not used available information on duration and frequency of portions of a business process. We
propose a method that combines process mining with Bayes’ Theorem to augment a mined model with
probabilities information. This additional information increases the value of the mined model and can
be used not only in making predictions, but also in making decisions. Together with activity-based
costing that assigns some cost to each activity in a process, our process mining technique can measure
the expected costs of different stages in the process to support improvement of the underlying
processes. We apply our approach to a chip probing process of a semiconductor firm in Taiwan. Our
results confirm that the proposed approach could improve company decisions regarding their internal
supply chain management.
Keywords: Process mining, Bayes' Theorem, Activity-based costing, Semi-conductor
1
INTRODUCTION
The main purpose of process mining is to exhibit an accurate picture of underlying business processes
as they are actually carried out, and so provide decision makers with valuable objective information
hidden in the event logs. Most enterprises are focused on finding the flow charts of real process.
Excavating valuable information is difficult when the workflow and pattern can’t provide the
necessary information to help enterprise make decisions, especially in practice. In the case of
semiconductor, the deviation of process lacks the function to find the production process change,
which needs to be revised for managers to make easier decisions on adjustments on the process. In
other words, the prior process identification is not sufficiently detailed on the required information to
improve and revise the process for decision making. Furthermore, when process changes cannot be
identified, they can affect delivery performance that indulges in customer commitment. Moreover,
during progress of production the process expected cost constantly varies which affects financial
judgement. Thusly, our study will focus on the three problems mentioned above, as our study issues.
Addressing the above problems, van der Aalst (2011) used event logs to identify enterprise processes,
and found that real-world processes have a spaghetti-like structure. Acknowledging contributions of
van der Aalst (2011), provided solution to detect process change, furthermore study of van der Aalst et
al. (2011) also solves the time duration issue. However the study did not recover the expected process
cost. Decision theory illustrates that people make decisions based on the expected value of options.
Probabilities of each option are a core factor of calculating the expected value. Therefore, we can
enhance process improvement analysis by incorporating probabilistic information in process. This
probabilistic information should be decided on a logical method, and human experience is used in
practice. The probabilistic information by rule of thumb may produce some errors, thus the
probability-calculating based on event logs could be more reliable. The comparative addition
information (e.g., cost, time) is necessary for manager in the decision process to help the decision
making of process improvement.
This study proposes an approach that combines prior process mining techniques and Bayes’ theorem
to include the probabilistic information in the field of process mining. To allow easy reference to the
study of van der Aalst et al. (2011), this study will substitute work station with activity. The activity
cost information can be involved in this approach, and the expected cost of each activity can be
decided. The activity cost is based on the activity-based costing (Becker et al., 2009), which is widely
used in companies to making decision. The proposed approach is well applied to the case of chip
probing process in a semiconductor firm. To summarize our contribution as twofold: Firstly using our
approach to the chip probing process can provide additional information on affecting the process
improving and helping the delivery performance; Secondly by our approach, deviation of process can
be discovered in prior and the probabilistic information can be applied to predict future activity.
2
BACKGROUND
To motivate our research and describe our approach, we use an airline compensation process taken
from the process mining study of van der Aalst (2011). The example shows the event logs of this
compensation process, this process consists of six cases, eight activities, six resources and cost, from
2009/12/30 to 2010/12/24. Details can refer the study of van der Aalst (2011). The timestamp recorded
the complete time of the activity, and resource field is the employee who is responsible for the activity.
The accounting department calculates the cost of activity by the activity-based costing.
Figure 1 is the process pattern discovered by alpha algorithm mining plug-in in ProM1. The beginning
activity is “Register request” activity, and following examine stage is parallel with “Check ticket”
1
ProM is an open source framework for process mining algorithms. ProM provides a platform to users and developers of the
process mining algorithms that is easy to use and easy to extend.
activity and “Examine casually” or “Examine thoroughly” activity. After this stage, the airline
company should decide the request result, “Reject request” or “Pay compensation”, and “Reinitiate
request” is followed with “decide” activity as well. The process will restart the examine stage after
“Reinitiate request”, thus the “Reject request” and “Pay compensation” are the final activity.
Figure 1.
The process pattern discovered by the alpha algorithm.
In Figure 2 illustrated the transit systems of the patterns (a) - (b), and which can be extended with
information to predict the completion time of running instances. The transition systems classify
process by their trace, and the difference between patterns (a) - (b) is the order. For example, the trace
{A, B, C} and {A, C, B} are grouped as the same class in pattern (a). Because the study of van der
Aalst et al. (2009), order of activities is not important in time, thus the pattern (a) is chosen to predict
the completion time. In other situation, the order of activities has implicit information, thus pattern (b)
is chosen as the based model to decide the probabilities.
Figure 2.
Two example transition systems extracted from the same log.
Event logs are converted to the trace shown in Figure 3(B), and instead activities with symbols
a,b,c,d,e,f,g,h which represented register request, examine thoroughly, examine casually, check ticket,
decide, reinitiate request, pay compensation, and reject request, respectively. In the case of Case 1 of
Figure 3(b), there are five activities as mentioned above, thus the trace sequence is “abdeh”. In this
example, the branch in Figure 3(A) matches the six cases, first branch is Case1, second is Case 5, and
so on.
Bayes’ theorem is illustrated in the compensation example, with Ω extracted from event logs. The
process contains five activities α= {a, b, c, d, e, f, g, h}. There are six cases. The trace sequence of
case 1 is “abdeh”, and case 2 is “adceg”, shown in Figure 3(B). For each activity, #(activity) is the
cases number of the activity in the Ω. For example, Register request occurs in six cases, and hence # (a)
is 6. In every case activity (a) will be the start-point, therefore the probability of activity (a) denoted as
P(a) equal to 1. Case number of (b) is 3, and the P(b) is 3/6. The probability of activity is given
in P(X)  # ( X ) TotalNumCases . Bayes’ theorem describes conditional probability. For example,
we want to know the probability of activity (b) that given activity (a) has occurred, denoted as P (b a ) .
#(ab) means the case number of “activity (a) and activity (b) occur at the same case” event, and hence
join probability of #(ab) is equal to 1, thus P (b a ) is 1/6. The conditional probability of activity Y
given activity X P (Y X ) is given by P (Y | X )  P ( X  Y ) P ( X ) . In product design process, cost
information should be better combined with product structure and business process (Tornberg et al.,
2002). Cost information includes both total direct cost and indirect cost. Using activity-based costing
to aggregate activity cost into total process cost, the cost information can be adequately represented
(Becker et al., 2009).
Figure 3.
3
The transition systems accord with Figure 2(b).
METHOD
Following the compensation example, probabilities are obtained as shown in Figure 4. After “Register
request” (a), there are three paths can be chosen, “Examine thoroughly” (b), “Examine casually” (c)
and “Check ticket” (d). The probability of “Examine thoroughly” (b) is 1/6, which is calculated from
#(ab) / #(a). After “Register request”, there are one of six cases directly follows “Examined
thoroughly”, and are rejected. One of two cases directly follows “Examined casually” and one of three
are accepted but two of three reinitiated. Half of these reinitiated cases are rejected and half are
accepted. However, the longer processes are involved with the “Examine casually” activity. One of
three cases directly follows “Check ticket”, and half of these cases are rejected and half accepted. Thus,
the rate of paying compensation is 50%, and hence this request is fair.
The fork of the transition system is a decision point where managers can decide on the wanted path.
To decide which path is better, additional information is needed. Table 1 is the cost of each activity,
which can be calculated by activity-based costing.
Figure 5 is the compensation example, acquiring the expected cost of the process (i.e. equal to the
average cost), also the cost of each decision point. The method is different, calculating the cost of the
possible process, which is the end of the transition system. For example, the first possible path is
“abdeh” with the cost at 950. The cost of each possible path is the red number (furthest right) shown in
Figure 5. As illustrated by decision point “acdef”, which has two path. The path with 1/2 probability
cost is 2750, and 1/2 probability that the cost is 1850. Therefore, the expected cost of “acdef” is
(1/2)2750 + (1/2)1850 = 2300. In addition, the expected cost of the start point “a” is (1/6)950+
(3/6)1850+ (2/6)950 = 1400. The cost can affect making decision. According to the Figure 4, the
“Examine casually” activity is the better candidate because of the shortest time. However, the
“Examine casually” spend the most cost in this case is 1850. To combine the two results, the best path
following the “Register request” activity is “Check ticket” activity.
Figure 4.
The compensation example with probability.
Case id
a
b
c
d
e
f
g
h
Activity
Register Request
Examine Thoroughly
Examine Casually
Check Ticket
Decide
Reinitiate Request
Pay Compensation
Reject Request
Cost
50
400
400
100
200
200
200
200
Table 1.
Cost of each activity in compensation example
Figure 5.
The expected cost in compensation example
4
EMPIRICAL RESULT
In this section, a semiconductor company is used in a case to test our proposed approach in assessing
the activity probability. This company is one of the largest publicly-held companies in Taiwan. Also,
this company is a leading global provider of semiconductor back-end services and offers innovative
test platforms that support a full range of applications. We applied following analysis. First, predict
next activity using probabilistic information. Secondly, compare probabilistic information in different
duration. Finally, analysis expected cost using complete time. The used event log covers the period
from 2005/1 to 2005/9. The process begins with IQC and the ends at OQC. A total number of 425
cases, and 23 activities were logged. The total possible traces are 872 which lead to a graph of 872
nodes, and there are 77 branching points. To simplify the transition system, the last activity is shown
in the nodes in Figure 6 to 8. For example, node {IQC CP1}, which the first activity is IQC, and the
following activity is CP1 activity, but only the CP1 activity is shown in the node. Figure 6 shows the
part result, after IQC, 61% cases are found to branch off to CP1 activity, and 24% branches off to UV
activity. Proceeding from CP1 activity, 86% branches off to CP1DT activity while 12% branches off
to CP1 activity. If entering CP1 twice, it is guarantee 100% will enter CP1DT activity. Upon doing
CP1 twice, it is found on the second CP1 has difference result, because of previous activities on the
same trace.
IQC
UV
CP1
CP1DT
CP1DT
BAKE1
Figure 6.
CP1
Other 6
Other 2
Other 6
CP1
CP1DT
Segment Result of the CP Process from 2005/1 to 2005/9
To validate this approach, event logs are divided into two parts. The first part started at 2005/1/1, and
ended at 2005/8/31 whereas the second part started at 2005/9/1, and ended at 2005/9/30. However,
although the duration of the first part is longer than the second part, the cases of the first part are 121,
and the second part cases are 246. Consequently, the event logs of second part are used to compare the
probability of event log of first part. Result shown in Figure 7, the first number is from the 2005/1 to
2005/8, and the second number is from the 2005/9. Most of the predicted probabilities are similar.
However, when the current trace is IQC, CP1, CP1DT, and the result of predicting the following
activities has much difference. The first part only has two results, 96% cases to the CP1DT activity
again, and the other is BAKE1 activity. But in the September, only 45% cases go to the CP1DT
activity, and 35% cases go to the BAKE1 activity, and 10% to other 5 activities. This difference could
because of the change of the process or the learning effect of the CP1DT activity. The possible trace
from the 2005/1 to 2005/8 is 412, but the trace from the 2005/9 is 542, the additional 130 traces may
be a signal of the change of the process. This company changes their process at 2005, and uses our
method we can find it earlier.
IQC
UV
CP1
CP1DT
CP1DT
BAKE1
Figure 7.
CP1
Other
Other
Other
CP1
CP1DT
Part result of the CP process form 2005/1 to 2005/8 and 2005/9.
Timeliness is an important factor in the semiconductor, due to an affect to the delivery performance.
The average time of each activity is the mean of duration between check-out time and check-in time,
data source collected from 2005. The longest time is CP3DT work station which needs 80.71 days,
runner-up is IQC, which is 46.83 days, and CP3 is 43.97 days. According this result, manager may
focus on the CP3 and CP3DT activity, and try to avoid the happening of CP3 activity. Besides, the
IQC is the first activity, which can’t be avoided, and manager may improve the efficient of IQC
activity. Figures 8 are the result of our approach, the expected time of total process is 126.93 day
when the wafer into the process.
Figure 8.
5
Part expected time result at IQC of the CP process form 2005/1 to 2005/9.
CONCLUSION
Our approach combines augments a transition system model with probabilistic information activitybased cost information to support management decision-making. The difference between the approach
of van der Aalst et al. (2011) and our approach is presented below. (a) The discovery model will differ
on the same process with different duration. The difference between these discovery models is the
deviation of process. In our empirical result, a process deviation in 2005 indicated a change in
production processes. The CP process became more complex in 2005/10; this deviation is attributed to
new businesses and purchase of new equipment. Using the proposed approach, differences in
probabilities could have been noted to find the deviation earlier. (b) The process information in the
past research is obtained by interviewing employees, and may lead to some misunderstanding of the
process. Our empirical results show this approach can improve delivery performance. When a wafer
moves to the CP3 activity, this wafer could not be delivered in the time that was guaranteed to
customer. Expected cost information in process mining can support decision making, as illustrated by
reanalysis of the compensation example from the study of van der Aalst et al. (2011). Considering
only time information, as in the original study, the path of “examine thoroughly” activity requires the
longest time and has the highest cost. Re-analyzed by our method, the path to “examine casually” has
the highest cost, suggesting a different decision.
Although the result cannot present the full picture, transition systems can provide key information to
some users. Whether the right information will be provided to the right person depends on role setting.
Further, finding the rightness indices for judging the discovery patterns whether good is also suggested
in the future work.
References
Becker, J., Bergener, P., and Räckers, M. (2009). Process-Based Governance in Public
Administrations Using Activity-Based Costing. In Electronic Government (pp. 176-187). Springer
Berlin Heidelberg.
Tornberg, K., Jämsen, M., and Paranko, J. (2002). Activity-based costing and process modeling for
cost-conscious product design: A case study in a manufacturing company. International Journal of
Production Economics, 79(1), 75-82.
van der Aalst W.M.P, van Hee, K.M., van Werf, J.M. and Verdonk, Marc. (2010), Auditing 2.0: Using
Process Mining to Support Tomorrow’s Auditor, IEEE Journals & Magazines, 43(3), 90-93
van der Aalst W.M.P..(2011). Process Mining: Discovery, Conformance and Enhancement of
Business Processes. 1st Edition. Springer.
van der Aalst, W.M.P. ,Schonenberg, M. H. and Song, M. (2011). Time prediction based on process
mining. Information Systems, 36(2), 450-475.