Appl. Math. Inf. Sci. 8, No. 6, 3085-3093 (2014) 3085 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/080649 Breadth First Search Sequence based Method for Efficient Process Retrieval Ye Yanming1,2,∗ , Yin Yuyu3 , Xu Yueshen1 , Cao Bin1 and Yin Jianwei1 1 College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China of Information Engineering, Hangzhou Dianzi University, Hangzhou 310018, China 3 College of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China 2 College Received: 19 Nov. 2013, Revised: 17 Feb. 2014, Accepted: 18 Feb. 2014 Published online: 1 Nov. 2014 Abstract: With increasing improvement of the business process management (BPM) technology, large-scale business process repositories have been adapted widely. However, due to the explosion of the number of business processes, a large part of enterprises are confronting with the challenge on effective management of those massive processes. Usually, each business process is modeled as a process graph, and therefore most existing approaches are based on graph mining algorithms. This paper puts forward a new method, which first utilizes the breadth first search (BFS) algorithm to label the process model, and then calculates the similarity based on the matching distance. The experimental results show that our method is efficient enough for practical use, especially suitable for fuzzy retrieval. Keywords: process retrieval, breadth first search, BFS code, process matching 1 Introduction As a widely employed approach in enterprises to regulate business logic and handle business processes, workflow techniques gain increasing development with continuous enterprises information construction. There are various business processes in a company, which cooperate with each other to accelerate the work efficiency in an organization. In the meantime, most companies have to modify some processes to keep pace with the frequent requirement changes. Therefore, a company usually has a large number of candidate business processes. In order to manage these business processes efficiently, many enterprises have built their own business process model repositories, and regard them as important knowledge bases in business process management within organizations. There are often thousands of process models in a single repository [1, 2], and enterprises tend to accumulate large numbers of process models along with time. For example, Suncorp, one of the largest Australian insurers, maintains a repository containing more than 6000 process models [3], and Chinese Railway Corporation holds more than 200000 models [4]. It is an intractable challenge to manage such enormous business process model repositories. In ∗ Corresponding practice, it is a common way to retrieve existing process models to reuse the whole process or process fragments, when a new process model is created. Therefore, the ability to retrieve efficiently from a business process model repository is critical for enterprises. There is another technology called process recommendation that uses similar basic definitions. Generally, the process retrieval may be the preceding step when process recommendation is conducted, but there is no inclusion relation between the two techniques. The main differences of the two technologies are explicated as follows. 1.The granularity of conduct is different. Process recommendation requires precise retrieval, and a certain process along with its process fragments may be the intermediate result (in most proposed methods, the problem is to find all the subgraphs of the corresponding process graph). While the common process retrieval only needs to return the whole process or some certain process fragments. 2.Process retrieval is the basic technique in a process repository. It may be useless when a process repository contains numerous process models, so that we cannot find the certain process effectively that author e-mail: [email protected] c 2014 NSP Natural Sciences Publishing Cor. 3086 Y. Yanming et. al. : Breadth First Search Sequence based Method for... matches the search condition. However, process recommendation is not the requisite technology for a process repository except for intelligent modeling purpose. 3.In process recommendation and process retrieval techniques, their core process sets are different. Process recommendation technologies use candidate process sets that are a subset of the process repository and all frequent subgraphs of the candidate processes as the basic analysis data. While process retrieval technologies generally use all processes in a repository as the compared data. 4.The usage intentions are different. Process recommendation is mainly used in intelligent modeling, while process retrieval is mainly used in maintenance of the process repository. Next, the research background will be introduced for deeper comprehension of the paper. A process model can be depicted with a graphical model, which describes the way that a certain process is composed. Usually, a process consists of different tasks, in which resources are involved in carrying out these tasks, and objects are being manipulated [5]. Therefore, the core task of most of the existing process retrieval methods is to find out all process models in a repository that contains the given process fragment as a subgraph. For example, a versatile graph matching algorithm based on fix-point computation was proposed in [6], which was announced to be usable across various scenarios. Paper [7] focused on the application of graph matching algorithms to the similarity search problem and studied four kinds of graph matching algorithms. Paper [8] proposed a DFS Code-SED method for process retrieval that adopted depth first search (DFS) code to label the process model and calculated DFS codes based on Levenshtein distance to get the similarity values, which improved the computation efficiency of measuring the similarity between two graphs. Nevertheless, to find all subgraph isomorphisms is a NP-complete problem [9]. Different approaches have been proposed to solve this problem, for instance, paper [10] proposed a two-stage approach that reduced the number of models that needed to be checked for subgraph isomorphism. However, all the methods are either complex or inefficient, and most of them are not suitable for process retrieval with loop structures. This paper presents a BFS sequence based process retrieval method. Firstly, we transforms each process in the process repository into the corresponding BFS sequences, based on the matching distance between the targeted process and reference process in the repository. Then, we calculate the largest process matching distance as the matching degree. Finally, this approach will output the retrieval results that reach or exceed the predefined matching degree threshold. Our approach does not need to handle subgraph isomorphism problem, which greatly reduces the complexity. Though the process isomorphism is still c 2014 NSP Natural Sciences Publishing Cor. needed to be determined for filtering the process repository, it is not a NP-complete problem. Especially, our approach is much appropriate for fuzzy retrieval, in all cases that the processes contain loop structures or not, the problem of which has not been wholly solved in other methods. The presented paper is organized as follows. After the above introduction, Section 2 gives some definitions with related basic instructions, and at the end of this section, highlights how to calculate process matching degree. In Section 3, we discuss the implementation of BFS based process retrieval algorithm. Furthermore, in Section 4, the method is evaluated through sufficient experiments. Finally, Section 5 discusses the research perspective. 2 Preliminary Business process is a series of activities that are performed by different performers to specific target respectively, and the order of activities represents collaboration of these performers. In workflow systems, the process reflects the actual business process, and the activity node represents the business operation in enterprises. Information or operations will flow or conduct in turn according to the nodes sequence. Therefore, the business process model can be abstracted as a directed graph, in which the nodes represent activities, with a corresponding label (implicating activity type, content, and the serial number etc.) and the arcs indicate nodes orders. 2.1 Business Process Graph There are many different formalized definitions for business process, which are used in different occasions to describe the different features of the real business process. In these definitions, activities and arcs are widely used as standard terms. For simplicity, this paper gives the definition as follows. Definition 1(Business Process Graph): Let T be the activity type, and a business process is a sextet P = (A, R, f , s, e) where: 1. A is the finite set of activities, which is represented by nodes in a process graph. 2. R ∈ A × A is the relationship between activities, which is represented by arcs or edges in a process graph. 3. f : A → T is the activity type function and T = {and join − andsplit, and join − orsplit, or join − andsplit, or join − orsplit}. 4. s ∈ A is the start activity. 5. e ∈ A is the end activity. Activity x ∈ A is the input of activity y ∈ A if and only if there exists a directed arc connecting x with y (that is, (x, y) ∈ R). Node x ∈ A is the output of node y ∈ A if and Appl. Math. Inf. Sci. 8, No. 6, 3085-3093 (2014) / www.naturalspublishing.com/Journals.asp only if there exists a directed arc connecting y with x (that is, (y, x) ∈ R). The activity that has no input node is called start activity and the activity that has no output activity is called end activity. In a workflow repository, any business process can be modeled as a process graph containing only one start activity s and one end activity e. So the business process in Definition 1 can be simplified as P = (A, R, f ). Definition 2 (Business Process Isomorphism): Let P = (A, R, f ) and P′ = (A′ , R′ , f ′ ) be two business processes. The business isomorphism between P and P′ (denoted as P ∼ = P′ ) is a mapping g : A → A′ such that: 1. ∀u ∈ A, ( f (u) = f ′ (g(u))) 2. ∀u, v ∈ A, ((u, v) ∈ R ⇒ (g(u), g(v)) ∈ R′ ) The presented method in this paper only adopts the entire business process as the process pattern, and does not need to handle subgraph isomorphism. However, for efficiency, before a new process is created or added into the process repository, the method still needs to determine whether the process is isomorphism with one of the processes in the repository. Canonical label is widely used to solve the graph isomorphism problem. The canonical label for a graph (denoted as cl(G)) is a unique code which is a sequence of bytes, characters or numbers. It is irrelative with the order of nodes and arcs of the graph and fully depends on the graph topology. If the canonical labels of two graphs are the same , then these graphs are isomorphic to each other. This paper adopts BFS (Breadth first search) standard sequence to construct canonical labels. Related definitions of BFS standard sequence will be discussed in the following. 3087 In Definition 1, the activity and its type are defined separately and the type is the output of function f with certain activity input. For simplicity, we can define an activity label mapping function L : (A, f (A)) → N that combines the activity A and its type f (A) into a unique activity label N. Then the business process in Definition 1 can be simplified as P = (N, R). Next, we give some definitions on BFS sequence. Definition 3 (BFS Sequence) : The breadth first traversal order on a process P is a linear order. As Definition 1 shows, the directed arc of P can be labeled by ordered activities. Therefore, the BFS sequence of P can be represented as follows: {s}, {sni : sni ∈ R}, {ni n j : ni n j ∈ R}, . . . , {nt e : nt e ∈ R}, {e} Then the BFS sequence of P is: BFSsequence(P) = s♯{sni }♯{ni n j }♯ . . . {nt e}♯{e} Where, the symbol ♯ is the separator that divides the different traversal hierarchies. For example, as shown in b s a d e c Fig. 1: Process Sample P Fig. 1, the BFS sequence of the process sample P is 2.2 Process Canonical Label DFS (Depth First Search) and BFS (Breadth First Search) are widely used in graph mining. Compared with BFS, DFS consumes less memory, but runs slower due to the stack utilization. On the other hand, BFS consumes more memory, but runs faster than DFS. The fact in construc -tion practice of process resource library and process patterns shows that the vast majority of business processes are of simple structure and less nodes. In the selected process library used in this paper, there are 86% processes including less than 20 nodes, and nearly 71% processes have less than 10 nodes. Therefore, the problem of memory usage is not a matter. On the contrary, because the number of processes may be huge, the time performance is more important. Therefore, this paper uses BFS to realize the canonical label of the process graph. Breadth first search by different nodes orders of the same hierarchy may lead to different BFS sequences. Therefore, the paper presents standard BFS sequence to ensure that the BFS sequence of the same process is unique. And we can reconstruct unique process with the standard BFS sequence. Next, we give some definitions on the BFS sequence. BFSsequence(P) = s♯sa♯ab, ac♯bd, cd♯de♯e (1) Definition 4 (Standard BFS Sequence): A BFS sequence of process P is called standard BFS sequence if the labels of the same hierarchies are by lexicographic order, which is denoted as BFSsequence(P). Except for Eq. (1), the BFS sequence of process shown in Fig. 1 can also be s♯sa♯ac, ab♯cd, bd♯de♯e or others. But only Eq. (1) follows the Definition 4, so only Eq. (1) can be called standard BFS sequence of the process. Generally, if there are continuous repeated fragments in BFS sequence of a process, the process may contain loop structures. However, as shown in Fig. 2(a)(b), the fragments ab, bc, ca in the given processes are continuously repeated in both BFS sequences and the Fig. 2(b) apparently does not have loop structures. Therefore, we can not determine whether a process has loop structures or not only based on the existence of continuous repeated fragments of its BFS sequence. It is easy to see that for a process with N activities and no loops, its biggest BFS sequence hierarchy is not more than N + 1. Therefore, if the number of BFS sequence hierarchies is more than the number of nodes in a process, the process will certainly contain loop structures. c 2014 NSP Natural Sciences Publishing Cor. 3088 Y. Yanming et. al. : Breadth First Search Sequence based Method for... s a b e c D s a b c a b c a e E Fig. 2: Process samples with similar BFS sequence Definition 5 (Extensional Standard BFS Sequence): The extensional standard BFS sequence of the process that contains N nodes has at most N + 2 hierarchies, and the first N + 1 hierarchies are the same as standard BFS sequence and the (N + 2)th hierarchy consists of all nodes hierarchies numbers that the arcs in (N + 1)th hierarchy may be connected to. The extensional standard BFS sequence is denoted as EBFSsequence(P). For example, the extensional standard BFS sequence of process that is shown in Fig. 2(a) is: s♯sa♯ab♯bc♯ca, ce♯ab, e♯4 The set expression of the extensional BFS standard sequence is EBFSSsequence(P) = p1 , p2 , . . . pm , where pi is the arcs set of the ith hierarchy and can be denoted as {pi1 , pi2 . . .}. For example, the set expression of the extensional BFS standard sequence of process that is shown in Fig.2(a) is: S EBFSSsequence(P) ={p1 , p2 , p3 , p4 , p5 , p6 , p7 } ={s, {sa}, {ab}, {bc}, {ca, ce}, {ab, e}, {4}} Definition 6 (BFS Determination of Process Isomorphism): Given process P and process P′ , if there is isomorphism between P and P′ , their extensional BFS standard sequences of P and P′ must be the same. That is, P∼ = P′ ⇔ EBFSSsequence(P) = EBFSSsequence(P′ ) ⇔ S EBFSSequence(P) = S EBFSSsequence(P′ ) Using extensional standard BFS sequence to build process repository every process can be uniquely identified and whether a new process can be added into repository depends on whether there is not isomorphism between it and any other processes of the repository. 2.3 Process Matching Degree As mentioned above, extensional standard BFS sequence can help us build process repository. This part mainly discusses how to calculate the process matching degree, which is the basis of process retrieval. Definition 7 (Process Matching Matrix) : Let P = (N, R) and Q = (N ′ , R′ ) be two process graphs. c 2014 NSP Natural Sciences Publishing Cor. S EBFSSsequence(P) = {p1 , p2 , . . . , pM } and S EBFS Ssequence(Q) = {q1 , q2 , . . . , qM } are their extensional standard BFS sequences respectively. The process matching matrix can be expressed as: ψ (pM , q1 ) . . . ψ (pM , qN ) ... ... ... Mat(P, Q) = (2) ψ (p1 , q1 ) . . . ψ (p1 , qN ) where ψ (pi , q j ) is the comparison function which is defined as: ψ (pi , q j ) = SED(pi , q j ) − |length(pi ) − length(q j )| (3) min(length(pi ), length(q j )) where, SED(x, y) is the string edit distance of x and y , which is the minimum number of insertions, deletions and substitutions to transform x into y. Length(x) is the number of characters that string x contains. Note that it has been mentioned in Sect. 2.1 that any business process can be modeled as a process graph containing only one start node s and only one end node e. The process matching path can be defined as follows. Definition 8 (Process Matching Path): For process P and Q in Definition 7, let Mat(P, Q) be the process matching matrix. A process matching path W , is a contiguous set of matrix elements that define a mapping between P and Q. The vth element of W is defined as wv = (i, j)v , so we get: W = w1 , w2 , , wV , where max(M, N) ≤ V < M +N −1 (4) The process matching path is typically subject to several constraints. Boundary conditions : w1 = (1, 1) and wV = (M, N), this requires the path to start and end in diagonally opposite corner cells of the matrix. Continuity: Given wv = (a, b) then wv−1 = (a′ , b′ ) where a − a′ ≤ 1 and b − b′ ≤ 1. This restricts the allowable steps in the path to adjacent cells (including diagonally adjacent cells). Monotonicity : Given wv = (a, b) then wv−1 = (a′ , b′ ) where a − a′ ≥ 0 and b − b′ ≥ 0. This forces the points in W to be monotonically spaced in turn. There are various matching paths that satisfy the above conditions, but we are only interested in the path that minimizes the distance, and the minimized distance is also called process matching distance. The path distance can be gained using dynamic programming to evaluate the following recurrence which defines the cumulative distance d(i, j) as the value of current cell (ψ (i, j)) and the minimum of the cumulative distances of the adjacent elements: d(i, j) = ψ (i, j)+min{d(i−1, j −1), d(i−1, j), d(i, j −1)} (5) For simplicity, the process matching path is referred to the path with minimized distance in the latter. Appl. Math. Inf. Sci. 8, No. 6, 3085-3093 (2014) / www.naturalspublishing.com/Journals.asp Definition 9 (Process Matching Degree) : For process P and Q in Definition 7 , let Mat(P, Q) be process matching matrix and W = w1 , w2 , , wV is the matching path, then the process matching degree can be calculated by the Eq. (4): MatchDegree(P, Q) = 1 − d(M, N) max(M, N) (6) where M and N are the BFS sequence hierarchy of process P and Q, and d(M, N) is the cumulative distance of cell Mat[M, N] in the top right corner of the process matrix. When a querying process or process fragment is retrieved, the matching degree between the querying one and each one in the repository will be calculated and all the processes that make the matching degree larger than the given threshold will be returned as results. Next, the implementation of the method will be discussed in detail. 3 Implementation Let θ be the threshold of process matching, the BFS sequence based process retrieval method can be divided into four modules as shown in Fig. 3. Step 1. This step transforms the process into BFS sequence form and abandons the isomorphism process. When a new process is being added into the standard process repository, it must be transformed into BFS sequence form if it is not in this form and then process isomorphism will be determined between the new process and each one in repository. Only if there is no any isomorphism, the new process can be added into the repository. Step 2. In this step, a user can use a graphical interface to input the parameter θ that stands for the precision demand that the user expects and the use the transforming tool to transform the query process into BFS sequence. Step 3. In this step, the standard process repository will be reduced to form the process candidate set depending whether the process contains the nodes that the input process does. Step 4. This step is the core part that calculates matching degree between querying process and each one in the repository and output those that meet the user’s precision demand. 3.1 Build the Process Candidate Set Building process candidate set is actually to cut the standard process repository. Obviously, a process that has no intersection or rarely intersects intersected with the querying process on the nodes can not match the querying process well. Therefore, we can just remove the processes whose nodes sets have less intersection with that of querying process. Next, we give the pseudo code (shown in Algorithm 1). 3089 Algorithm 1 Pseudo code of building process candidate set Input: Process: p, Standard process repository: p rep, Parameter:θ Output: Candidate process set: CPS 1: Initialize candidate process set: CPS 2: H p ← get length of BFS sequence of p 3: N p ← get number of nodes in p 4: NodeSetof P ← get nodes set of p 5: for each q in p rep do 6: Hq ← get length of BFS sequence of q 7: if Hq < H p then 8: continue 9: Nq ← get number of nodes in q 10: else if Nq < N p then 11: continue 12: NodeSetofQ ← get nodes set of q 13: diffSetofPQ ← NodeSetofP − NodeSetofQ 14: else if diffSetofPQ.length > Hp ×(1 − θ ) then 15: continue 16: else 17: CPS.add(p) 18: end if 19: end for 20: return CPS 3.2 Matching Degree Calculation Process retrieval is to get the processes that satisfy the given process matching degree from the process candidate set. Next, we give the pseudo code to show how to calculate the matching degree (shown in Algorithm 2). Algorithm 2 Pseudo code of building process candidate set Input: Process: p, q Output: Matching degree: md 1: Hp ← get length of BFS sequence of p 2: Hq ← get length of BFS sequence of p 3: d[ ] ← new [H p, Hq] 4: for i = i to H p do 5: for j = i to Hq do 6: d[i, j] = Ψ (i, j) + min{Ψ (i − 1, j), Ψ (i, j − 1), Ψ (i − 1, j − 1)} 7: end for 8: end for 9: md = 1 − d[H p, Hq] max(H p, Hq) 10: return md c 2014 NSP Natural Sciences Publishing Cor. 3090 Y. Yanming et. al. : Breadth First Search Sequence based Method for... Preprocessing Query Interface Process Matching Matrix Retrieval Result Process Candidate Set Standard Process Repository Matching Degree Calculation Fig. 3: The process of BFS sequence based process retrieval method 4 Experimental Evaluation In this section, experiments on the retrieval accuracy and performance of the method and comparison between our BFS sequence based method and DFSCode-SED based method [8] (the latter method was claimed to be more efficient) are carried out on the dataset that is collected from administrative examination and approval processes from the administrative department of a local government (China), which contains 221 business processes involving in totally 52 activities. All experiments are done on a 2.4GHz Intel Core 2 Duo P8600 PC with 4GB main memory, running on Windows 7. In order to facilitate the discussion, the experiments are grouped into full retrieval and fragment retrieval. The full retrieval is to judge whether the repository contains the querying process and output it if it does. The fragment retrieval is to find and output all processes that contain the querying process fragments to a certain degree. The fragment retrieval is also divided into exact retrieval and fuzzy retrieval for better analysis. longer than that of complex structure if their nodes number is the same. Similarly, the retrieval time of process ’C5’ suffers from a sudden increase and this may be because the BFS sequence length of process with loop is even longer than that of simply structure with the same nodes number. The comparison experiment between our BFS sequence method and DFSCode-SED is also carried out in this section and the result is shown in Fig. 4(b). As it can been seen in Fig. 4(b), the average retrieval time of the two methods is approximately the same for simple structure . For complex structure process retrieval, the BFS method seems need more time than DFSCode-SED from the figure alone. That is because the DFSCode-SED method can not retrieve the process with loop structure and the output average time only employs the values of the first four processes. In conclusion, the retrieval time depends on both the nodes number and the structure of the querying process in our BFS sequence based method and the loop structure has more impact on the efficiency. Our method is similar in efficiency with the DFSCode-SED method, but our method supports the querying process containing loop structure, while the DFSCode-SED method can not. 4.1 Full Retrieval Evaluation In our BFS sequence based method, one to full retrieve only needs to set the parameter θ = 1. We randomly select five processes with simple structure (no branch and no loop) and five processes with complex structure (4 contain branch structure and 1 contains loop structure) from the collected 221 processes as the testing querying processes. These selected processes are divided into two groups and labeled as ’S1, S2, S3, S4, S5’ and ’C1, C2, C3, C4, C5’ by their nodes number order in each group respectively. The test results are shown in Fig. 4. In Fig. 4(a), the retrieval time rises when the nodes number of the processes rises in turn although varied little. And the first four processes of simply structure consumes little more retrieval time than the first processes of complex structure respectively. It may be because the BFS sequence length of a process with simply structure is c 2014 NSP Natural Sciences Publishing Cor. 4.2 Fragment Retrieval Evaluation In fragment retrieval experiment, we cut out 5 process fragments (labeled as F1, F2, F3, F4, F5) from the collected processes and build 5 process fragements (labeled as B1, B2, B3, B4, B5) based on the 52 activities randomly as the testing querying processes to evaluate the exact retrieval and fuzzy retrieval respectively. By setting the parameter =0.9, 0.75, 0.6 respectively, our BFS sequence based method is experimented in turn. Some results are shown in Table 1 and Fig. 5. As we can see from Table 1 and Fig. 5, when the matching degree threshold θ decreases, the number of retrieved processes and the total retrieval time both rise. On the one hand, the matching degree threshold is used to control the return of retrieval results and is irrelevant to Appl. Math. Inf. Sci. 8, No. 6, 3085-3093 (2014) / www.naturalspublishing.com/Journals.asp (a) Querying Process 3091 (b) Simple Structure and Complex Structure Fig. 4: Full Retrieval Evaluation Fig. 5: Fuzzy Retrieval Evaluation Table 1: Fuzzy Retrieval Result Parameterθ 0.9 0.75 0.6 Total Time(ms) Exact Retrieval Fuzzy Retrieval 173 221 191 249 237 286 the executing time of matching degree calculation algorithm. On the other hand, it has been discussed in full retrieval that the executing time of matching degree calculation is only based on the the nodes number and process structure. However, in the experiment results, the total time changes so much for different threshold θ under the same querying process. In Sect. 3.1, building candidate process set must input the matching degree threshold θ , and obviously, the number of the processes that the candidate set contains will rise by the θ being larger. The size of candidate set increases along with the threshold θ decreasing and thus, although with the same querying process, the retrieval time will increase, as a result that the retrieval efficiency is reduced. The Number of Retrieved Processes Exact Retrieval Fuzzy Retrieval 6 14 13 29 32 47 Both the total time and the number of retrieved processes in fuzzy retrieval are more than in exact retrieval and it may be also because the size of candidate process set in fuzzy retrieval is larger than that in exact retrieval. It is important to note that the BFS sequence based method can not control the switch to exact retrieval or fuzzy retrieval, and in the experiment, it is determined by the querying process fragments. 5 Conclusion and Future Work Process retrieval is an important technology in management of large business process model repositories and c 2014 NSP Natural Sciences Publishing Cor. 3092 Y. Yanming et. al. : Breadth First Search Sequence based Method for... most of the existing process retrieval methods use graphical notations to find matching processes that contain given process fragments as subgraphs. However, the complexity of finding all subgraph isomorphisms are known to be NP-complete, so most methods adopt canonical labels to solve the problem. Unfortunately, these methods are inadequate or limited in some circumstances, such as the direct retrieval of process with branches or loops, fuzzy retrieval and so on. This paper proposes a BFS sequence based method that can solve these problems, especially support fuzzy retrieval and the retrieval of process with loop structure. The most contribution of this paper is that it is the first time to propose BFS sequence to label process, while other methods mainly adopt DFS code to represent process and can not solve the loop problem. Meanwhile, our method can avoid the subgraph isomorphism determination that is known to be NP-complete. Still, much work has to be done in the future. The BFS sequence automatic construction is important to improve the performance of the method and the matching degree calculation algorithm can be improved to achieve better performance. Furthermore, the application of BFS sequence is another issue for future work. [6] Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. Proceedings of 18th IEEE International Conference on Data Engineering, 117-128 (2002) [7] Dijkman R, Dumas M, Garcła-Bauelos L. Graph matching algorithms for business process model similarity search[M]//Business Process Management. Springer Berlin Heidelberg, 48-63 (2009). [8] Cao B,Yin J W,Chen H X. A Levenshtein Distance Based Method for Process Retrieval. Computer Intergrated Manufacturing Systems, 18, 1766-1773 (2012). [9] Shasha D, Wang J T L, Giugno R. Algorithmics and applications of tree and graph searching. Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 39-52 (2002). [10] Jin T, Wang J, La Rosa M, et al. Efficient querying of large process model repositories. Computers in Industry, (2012). Yanming Ye is a Ph.D candidate at the College of Computer Science and Technology, Zhejiang University, Hangzhou, China. His research interests include workflow & business process management, cloud computing and social computing. Acknowledgement This work is supported by the National Natural Science Foundation of China under Grant (No.61272129), National High-Tech Research Program of China (NO. 2013AA01A213), New-Century Excellent Talents Program by Ministry of Education of China (No. NCET-12-0491), Zhejiang Provincial Natural Science Foundation of China (LR13F020002). Yuyu Yin received the Doctor?s degree in computer science from Zhejiang University, Hangzhou, China, in 2010. He is currently an assistant professor in Hangzhou Dianzi University. His research interests include service computing, cloud computing and middleware References [1] Fahland D, Favre C, Jobstmann B, et al. Instantaneous soundness checking of industrial business process models. Business Process Management. Springer Berlin Heidelberg, 278-293 (2009). [2] Dijkman R M, La Rosa M, Reijers H A. Managing large collections of business process models-current techniques and challenges. Computers in Industry, 63, 91-97 (2012). [3] La Rosa M, Dumas M, Uba R, et al. Business process model merging: an approach to business process consolidation. ACM Transactions on Software Engineering and Methodology (TOSEM), 22, 11 (2013) [4] Ekanayake C C, La Rosa M, Ter Hofstede A H M, et al. Fragment-based version management for repositories of business process models. On the Move to Meaningful Internet Systems: OTM 2011. Springer Berlin Heidelberg, 20-37 (2011) [5] Dijkman R, Dumas M, Garcła-Bauelos L. Graph matching algorithms for business process model similarity search. Business Process Management. Springer Berlin Heidelberg, 48-63 (2009). c 2014 NSP Natural Sciences Publishing Cor. techniques. Yueshen Xu is a Ph.D candidate at the College of Computer Science and Technology, Zhejiang University, Hangzhou, China. His research interests include service computing, recommendation system, business process management and applied machine learning. Appl. Math. Inf. Sci. 8, No. 6, 3085-3093 (2014) / www.naturalspublishing.com/Journals.asp Bin Cao received the Doctor?s degree in computer science from Zhejiang University, Hangzhou, China, in 2013. His research interests include workflow management, event processing and spatial database. 3093 Jianwei Yin is currently a professor in the College of Computer Science, Zhejiang University (China). He received his Ph.D. in Computer Science from Zhejiang University in 2001. He is the visiting scholar of Georgia Institute of Technology, America in 2008. His research interests include distributed network middleware, software architecture and information integration. c 2014 NSP Natural Sciences Publishing Cor.
© Copyright 2026 Paperzz