Association for Information Systems AIS Electronic Library (AISeL) PACIS 2016 Proceedings Pacific Asia Conference on Information Systems (PACIS) Summer 6-27-2016 USING PROCESS MINING TO MEASURE THE EXPECTED COSTS OF BUSINESS PROCESSES Bo Hsiao Chang Jung Christian University, [email protected] LihChyun Shu National Cheng Kung University, [email protected] Michal Young University of Oregon, [email protected] Hsueh-Ching Yang National Cheng Kung University, [email protected] Follow this and additional works at: http://aisel.aisnet.org/pacis2016 Recommended Citation Hsiao, Bo; Shu, LihChyun; Young, Michal; and Yang, Hsueh-Ching, "USING PROCESS MINING TO MEASURE THE EXPECTED COSTS OF BUSINESS PROCESSES" (2016). PACIS 2016 Proceedings. 264. http://aisel.aisnet.org/pacis2016/264 This material is brought to you by the Pacific Asia Conference on Information Systems (PACIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in PACIS 2016 Proceedings by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact [email protected]. USING PROCESS MINING TO MEASURE THE EXPECTED COSTS OF BUSINESS PROCESSES Bo Hsiao, Department of Information Management, Chang Jung Christian University, Tainan City, Taiwan, [email protected] LihChyun Shu, Department of Accountancy, National Cheng Kung University, Tainan City, Taiwan, [email protected] Michal Young, Department of Computer and Information Sciences, University of Oregon, USA, [email protected] Hsueh-Ching Yang, Department of Accountancy, National Cheng Kung University, Tainan City, Taiwan, [email protected] Abstract Process mining is a means to extract hidden information from the event logs accumulated by the information systems that drive increasingly complex business processes, potentially improving their manageability. Traditional process mining is used mainly to extract a useful description of how an enterprise’s processes actually work (as versus how they are documented to work), usually presented in the form of a flowchart. To the best of our best knowledge, previous process mining techniques have not used available information on duration and frequency of portions of a business process. We propose a method that combines process mining with Bayes’ Theorem to augment a mined model with probabilities information. This additional information increases the value of the mined model and can be used not only in making predictions, but also in making decisions. Together with activity-based costing that assigns some cost to each activity in a process, our process mining technique can measure the expected costs of different stages in the process to support improvement of the underlying processes. We apply our approach to a chip probing process of a semiconductor firm in Taiwan. Our results confirm that the proposed approach could improve company decisions regarding their internal supply chain management. Keywords: Process mining, Bayes' Theorem, Activity-based costing, Semi-conductor 1 INTRODUCTION The main purpose of process mining is to exhibit an accurate picture of underlying business processes as they are actually carried out, and so provide decision makers with valuable objective information hidden in the event logs. Most enterprises are focused on finding the flow charts of real process. Excavating valuable information is difficult when the workflow and pattern can’t provide the necessary information to help enterprise make decisions, especially in practice. In the case of semiconductor, the deviation of process lacks the function to find the production process change, which needs to be revised for managers to make easier decisions on adjustments on the process. In other words, the prior process identification is not sufficiently detailed on the required information to improve and revise the process for decision making. Furthermore, when process changes cannot be identified, they can affect delivery performance that indulges in customer commitment. Moreover, during progress of production the process expected cost constantly varies which affects financial judgement. Thusly, our study will focus on the three problems mentioned above, as our study issues. Addressing the above problems, van der Aalst (2011) used event logs to identify enterprise processes, and found that real-world processes have a spaghetti-like structure. Acknowledging contributions of van der Aalst (2011), provided solution to detect process change, furthermore study of van der Aalst et al. (2011) also solves the time duration issue. However the study did not recover the expected process cost. Decision theory illustrates that people make decisions based on the expected value of options. Probabilities of each option are a core factor of calculating the expected value. Therefore, we can enhance process improvement analysis by incorporating probabilistic information in process. This probabilistic information should be decided on a logical method, and human experience is used in practice. The probabilistic information by rule of thumb may produce some errors, thus the probability-calculating based on event logs could be more reliable. The comparative addition information (e.g., cost, time) is necessary for manager in the decision process to help the decision making of process improvement. This study proposes an approach that combines prior process mining techniques and Bayes’ theorem to include the probabilistic information in the field of process mining. To allow easy reference to the study of van der Aalst et al. (2011), this study will substitute work station with activity. The activity cost information can be involved in this approach, and the expected cost of each activity can be decided. The activity cost is based on the activity-based costing (Becker et al., 2009), which is widely used in companies to making decision. The proposed approach is well applied to the case of chip probing process in a semiconductor firm. To summarize our contribution as twofold: Firstly using our approach to the chip probing process can provide additional information on affecting the process improving and helping the delivery performance; Secondly by our approach, deviation of process can be discovered in prior and the probabilistic information can be applied to predict future activity. 2 BACKGROUND To motivate our research and describe our approach, we use an airline compensation process taken from the process mining study of van der Aalst (2011). The example shows the event logs of this compensation process, this process consists of six cases, eight activities, six resources and cost, from 2009/12/30 to 2010/12/24. Details can refer the study of van der Aalst (2011). The timestamp recorded the complete time of the activity, and resource field is the employee who is responsible for the activity. The accounting department calculates the cost of activity by the activity-based costing. Figure 1 is the process pattern discovered by alpha algorithm mining plug-in in ProM1. The beginning activity is “Register request” activity, and following examine stage is parallel with “Check ticket” 1 ProM is an open source framework for process mining algorithms. ProM provides a platform to users and developers of the process mining algorithms that is easy to use and easy to extend. activity and “Examine casually” or “Examine thoroughly” activity. After this stage, the airline company should decide the request result, “Reject request” or “Pay compensation”, and “Reinitiate request” is followed with “decide” activity as well. The process will restart the examine stage after “Reinitiate request”, thus the “Reject request” and “Pay compensation” are the final activity. Figure 1. The process pattern discovered by the alpha algorithm. In Figure 2 illustrated the transit systems of the patterns (a) - (b), and which can be extended with information to predict the completion time of running instances. The transition systems classify process by their trace, and the difference between patterns (a) - (b) is the order. For example, the trace {A, B, C} and {A, C, B} are grouped as the same class in pattern (a). Because the study of van der Aalst et al. (2009), order of activities is not important in time, thus the pattern (a) is chosen to predict the completion time. In other situation, the order of activities has implicit information, thus pattern (b) is chosen as the based model to decide the probabilities. Figure 2. Two example transition systems extracted from the same log. Event logs are converted to the trace shown in Figure 3(B), and instead activities with symbols a,b,c,d,e,f,g,h which represented register request, examine thoroughly, examine casually, check ticket, decide, reinitiate request, pay compensation, and reject request, respectively. In the case of Case 1 of Figure 3(b), there are five activities as mentioned above, thus the trace sequence is “abdeh”. In this example, the branch in Figure 3(A) matches the six cases, first branch is Case1, second is Case 5, and so on. Bayes’ theorem is illustrated in the compensation example, with Ω extracted from event logs. The process contains five activities α= {a, b, c, d, e, f, g, h}. There are six cases. The trace sequence of case 1 is “abdeh”, and case 2 is “adceg”, shown in Figure 3(B). For each activity, #(activity) is the cases number of the activity in the Ω. For example, Register request occurs in six cases, and hence # (a) is 6. In every case activity (a) will be the start-point, therefore the probability of activity (a) denoted as P(a) equal to 1. Case number of (b) is 3, and the P(b) is 3/6. The probability of activity is given in P(X) # ( X ) TotalNumCases . Bayes’ theorem describes conditional probability. For example, we want to know the probability of activity (b) that given activity (a) has occurred, denoted as P (b a ) . #(ab) means the case number of “activity (a) and activity (b) occur at the same case” event, and hence join probability of #(ab) is equal to 1, thus P (b a ) is 1/6. The conditional probability of activity Y given activity X P (Y X ) is given by P (Y | X ) P ( X Y ) P ( X ) . In product design process, cost information should be better combined with product structure and business process (Tornberg et al., 2002). Cost information includes both total direct cost and indirect cost. Using activity-based costing to aggregate activity cost into total process cost, the cost information can be adequately represented (Becker et al., 2009). Figure 3. 3 The transition systems accord with Figure 2(b). METHOD Following the compensation example, probabilities are obtained as shown in Figure 4. After “Register request” (a), there are three paths can be chosen, “Examine thoroughly” (b), “Examine casually” (c) and “Check ticket” (d). The probability of “Examine thoroughly” (b) is 1/6, which is calculated from #(ab) / #(a). After “Register request”, there are one of six cases directly follows “Examined thoroughly”, and are rejected. One of two cases directly follows “Examined casually” and one of three are accepted but two of three reinitiated. Half of these reinitiated cases are rejected and half are accepted. However, the longer processes are involved with the “Examine casually” activity. One of three cases directly follows “Check ticket”, and half of these cases are rejected and half accepted. Thus, the rate of paying compensation is 50%, and hence this request is fair. The fork of the transition system is a decision point where managers can decide on the wanted path. To decide which path is better, additional information is needed. Table 1 is the cost of each activity, which can be calculated by activity-based costing. Figure 5 is the compensation example, acquiring the expected cost of the process (i.e. equal to the average cost), also the cost of each decision point. The method is different, calculating the cost of the possible process, which is the end of the transition system. For example, the first possible path is “abdeh” with the cost at 950. The cost of each possible path is the red number (furthest right) shown in Figure 5. As illustrated by decision point “acdef”, which has two path. The path with 1/2 probability cost is 2750, and 1/2 probability that the cost is 1850. Therefore, the expected cost of “acdef” is (1/2)2750 + (1/2)1850 = 2300. In addition, the expected cost of the start point “a” is (1/6)950+ (3/6)1850+ (2/6)950 = 1400. The cost can affect making decision. According to the Figure 4, the “Examine casually” activity is the better candidate because of the shortest time. However, the “Examine casually” spend the most cost in this case is 1850. To combine the two results, the best path following the “Register request” activity is “Check ticket” activity. Figure 4. The compensation example with probability. Case id a b c d e f g h Activity Register Request Examine Thoroughly Examine Casually Check Ticket Decide Reinitiate Request Pay Compensation Reject Request Cost 50 400 400 100 200 200 200 200 Table 1. Cost of each activity in compensation example Figure 5. The expected cost in compensation example 4 EMPIRICAL RESULT In this section, a semiconductor company is used in a case to test our proposed approach in assessing the activity probability. This company is one of the largest publicly-held companies in Taiwan. Also, this company is a leading global provider of semiconductor back-end services and offers innovative test platforms that support a full range of applications. We applied following analysis. First, predict next activity using probabilistic information. Secondly, compare probabilistic information in different duration. Finally, analysis expected cost using complete time. The used event log covers the period from 2005/1 to 2005/9. The process begins with IQC and the ends at OQC. A total number of 425 cases, and 23 activities were logged. The total possible traces are 872 which lead to a graph of 872 nodes, and there are 77 branching points. To simplify the transition system, the last activity is shown in the nodes in Figure 6 to 8. For example, node {IQC CP1}, which the first activity is IQC, and the following activity is CP1 activity, but only the CP1 activity is shown in the node. Figure 6 shows the part result, after IQC, 61% cases are found to branch off to CP1 activity, and 24% branches off to UV activity. Proceeding from CP1 activity, 86% branches off to CP1DT activity while 12% branches off to CP1 activity. If entering CP1 twice, it is guarantee 100% will enter CP1DT activity. Upon doing CP1 twice, it is found on the second CP1 has difference result, because of previous activities on the same trace. IQC UV CP1 CP1DT CP1DT BAKE1 Figure 6. CP1 Other 6 Other 2 Other 6 CP1 CP1DT Segment Result of the CP Process from 2005/1 to 2005/9 To validate this approach, event logs are divided into two parts. The first part started at 2005/1/1, and ended at 2005/8/31 whereas the second part started at 2005/9/1, and ended at 2005/9/30. However, although the duration of the first part is longer than the second part, the cases of the first part are 121, and the second part cases are 246. Consequently, the event logs of second part are used to compare the probability of event log of first part. Result shown in Figure 7, the first number is from the 2005/1 to 2005/8, and the second number is from the 2005/9. Most of the predicted probabilities are similar. However, when the current trace is IQC, CP1, CP1DT, and the result of predicting the following activities has much difference. The first part only has two results, 96% cases to the CP1DT activity again, and the other is BAKE1 activity. But in the September, only 45% cases go to the CP1DT activity, and 35% cases go to the BAKE1 activity, and 10% to other 5 activities. This difference could because of the change of the process or the learning effect of the CP1DT activity. The possible trace from the 2005/1 to 2005/8 is 412, but the trace from the 2005/9 is 542, the additional 130 traces may be a signal of the change of the process. This company changes their process at 2005, and uses our method we can find it earlier. IQC UV CP1 CP1DT CP1DT BAKE1 Figure 7. CP1 Other Other Other CP1 CP1DT Part result of the CP process form 2005/1 to 2005/8 and 2005/9. Timeliness is an important factor in the semiconductor, due to an affect to the delivery performance. The average time of each activity is the mean of duration between check-out time and check-in time, data source collected from 2005. The longest time is CP3DT work station which needs 80.71 days, runner-up is IQC, which is 46.83 days, and CP3 is 43.97 days. According this result, manager may focus on the CP3 and CP3DT activity, and try to avoid the happening of CP3 activity. Besides, the IQC is the first activity, which can’t be avoided, and manager may improve the efficient of IQC activity. Figures 8 are the result of our approach, the expected time of total process is 126.93 day when the wafer into the process. Figure 8. 5 Part expected time result at IQC of the CP process form 2005/1 to 2005/9. CONCLUSION Our approach combines augments a transition system model with probabilistic information activitybased cost information to support management decision-making. The difference between the approach of van der Aalst et al. (2011) and our approach is presented below. (a) The discovery model will differ on the same process with different duration. The difference between these discovery models is the deviation of process. In our empirical result, a process deviation in 2005 indicated a change in production processes. The CP process became more complex in 2005/10; this deviation is attributed to new businesses and purchase of new equipment. Using the proposed approach, differences in probabilities could have been noted to find the deviation earlier. (b) The process information in the past research is obtained by interviewing employees, and may lead to some misunderstanding of the process. Our empirical results show this approach can improve delivery performance. When a wafer moves to the CP3 activity, this wafer could not be delivered in the time that was guaranteed to customer. Expected cost information in process mining can support decision making, as illustrated by reanalysis of the compensation example from the study of van der Aalst et al. (2011). Considering only time information, as in the original study, the path of “examine thoroughly” activity requires the longest time and has the highest cost. Re-analyzed by our method, the path to “examine casually” has the highest cost, suggesting a different decision. Although the result cannot present the full picture, transition systems can provide key information to some users. Whether the right information will be provided to the right person depends on role setting. Further, finding the rightness indices for judging the discovery patterns whether good is also suggested in the future work. References Becker, J., Bergener, P., and Räckers, M. (2009). Process-Based Governance in Public Administrations Using Activity-Based Costing. In Electronic Government (pp. 176-187). Springer Berlin Heidelberg. Tornberg, K., Jämsen, M., and Paranko, J. (2002). Activity-based costing and process modeling for cost-conscious product design: A case study in a manufacturing company. International Journal of Production Economics, 79(1), 75-82. van der Aalst W.M.P, van Hee, K.M., van Werf, J.M. and Verdonk, Marc. (2010), Auditing 2.0: Using Process Mining to Support Tomorrow’s Auditor, IEEE Journals & Magazines, 43(3), 90-93 van der Aalst W.M.P..(2011). Process Mining: Discovery, Conformance and Enhancement of Business Processes. 1st Edition. Springer. van der Aalst, W.M.P. ,Schonenberg, M. H. and Song, M. (2011). Time prediction based on process mining. Information Systems, 36(2), 450-475.
© Copyright 2026 Paperzz