A Genetic Programming Algorithm To Derive Complex Event Processing Rules For A Given Event Type Norman Offel Master Thesis in the Course of Applied Computer Science August 28, 2016 Author Norman Offel Student number: 1313528 E-mail: [email protected] Adviser: Prof. Dr. Ralf Bruns Department of Computer Science, Faculty IV University of Applied Sciences and Arts of Hanover, Germany E-mail: [email protected] Co-Adviser: Prof. Dr. Jürgen Dunkel Department of Computer Science, Faculty IV University of Applied Sciences and Arts of Hanover, Germany E-mail: [email protected] Selbstständigkeitserklärung Hiermit erkläre ich, dass ich die eingereichte Master-Arbeit selbstständig und ohne fremde Hilfe verfasst, andere als die von mir angegebenen Quellen und Hilfsmittel nicht benutzt und die den benutzten Werken wörtlich oder inhaltlich entnommenen Stellen als solche kenntlich gemacht habe. Hannover, den August 28, 2016 Unterschrift Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background 2.1 Evolutionary Computation . . . . . . . . . . . 2.1.1 Evolution in Biology . . . . . . . . . . 2.1.2 Evolution for Problem Solving . . . . . 2.1.3 The Evolutionary Computation Family 2.2 Complex Event Processing . . . . . . . . . . . 2.2.1 Terminology . . . . . . . . . . . . . . 2.2.2 Language . . . . . . . . . . . . . . . . 2.3 Rule Learning . . . . . . . . . . . . . . . . . 1 1 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 5 6 9 11 12 13 15 3 Related Work 3.1 Evolutionary Computation in Rule Learning . . . . . . . . . . 3.2 Genetic Programming in Rule Learning . . . . . . . . . . . . 3.3 Optimization and Rule Learning in Complex Event Processing 3.3.1 Improving CEP performance . . . . . . . . . . . . . . 3.3.2 Learning CEP rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 18 19 21 21 21 . . . . . . . . 25 25 28 30 31 31 31 32 34 . . . . 35 36 37 38 42 . . . . . . . . 4 General Approach 4.1 The scenario . . . . . . . . . . . . . . . . . . . 4.2 Applying Genetic Programming to Rule Learning 4.3 Constraints in Genetic Programming . . . . . . 4.4 Evolutionary Operations . . . . . . . . . . . . . 4.4.1 Selection . . . . . . . . . . . . . . . . . 4.4.2 Crossover . . . . . . . . . . . . . . . . . 4.4.3 Mutation . . . . . . . . . . . . . . . . . 4.5 Summary . . . . . . . . . . . . . . . . . . . . . 5 CepGP – The Genetic Programming Algorithm 5.1 Rule Components . . . . . . . . . . . . . . . 5.1.1 Window . . . . . . . . . . . . . . . . 5.1.2 Condition . . . . . . . . . . . . . . . . 5.1.3 Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 5.3 5.4 5.5 5.6 Preparation Phase . . . . . . . . . . . . Initial Population Creation . . . . . . . . 5.3.1 Window Creation . . . . . . . . 5.3.2 Event Condition Tree Creation . 5.3.3 Attribute Condition Tree Creation Evolutionary Operators . . . . . . . . . . 5.4.1 Selection . . . . . . . . . . . . . 5.4.2 Crossover . . . . . . . . . . . . . 5.4.3 Mutation . . . . . . . . . . . . . Fitness Calculation . . . . . . . . . . . . 5.5.1 Condition . . . . . . . . . . . . . 5.5.2 Window . . . . . . . . . . . . . 5.5.3 Rule Complexity . . . . . . . . . 5.5.4 Total Fitness . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . 6 Implementation 6.1 Requirements . . . . . . . . . . 6.2 Input and Output Specification 6.2.1 Input . . . . . . . . . . 6.2.2 Output . . . . . . . . . 6.3 The Rule Engine . . . . . . . . 6.4 CepGP . . . . . . . . . . . . . 6.4.1 Preparation Phase . . . 6.4.2 Population Initialization 6.4.3 Evolutionary Process . . 6.4.4 Evaluation . . . . . . . 6.4.5 Parameters . . . . . . . 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . 7 Evaluation 7.1 Test Data . . . . . . . . . . . . 7.2 Testing . . . . . . . . . . . . . . 7.2.1 Convergence . . . . . . . 7.2.2 Parameter influence . . . 7.2.3 CepGP vs. Random Walk 7.2.4 Noise influence . . . . . . 7.3 Result Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 44 44 45 46 50 50 51 63 67 69 73 76 78 79 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 82 83 83 84 85 91 93 94 94 95 96 98 . . . . . . . 100 100 101 102 104 108 112 112 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion 113 8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Bibliography 116 List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 Basic Concept of Biological Evolution . . . . . . General Evolutionary Computation Algorithm . . History of Evolutionary Computation . . . . . . Abstraction of Events into Higher Level Complex Cycle of Event-Driven Systems . . . . . . . . . Origin of Learning Classifier Systems . . . . . . . . . . . . 5 8 10 11 12 16 3.1 3.2 autoCEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iCEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 24 4.1 4.2 4.3 4.4 4.5 Scenario of the Thesis . . . . . . . . . . . . . . . . . . . . Applications of Genetic Programming in classification tasks Model extraction with Genetic Programming . . . . . . . . Subtree Crossover . . . . . . . . . . . . . . . . . . . . . . Point Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 28 29 33 33 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 General process of CepGP . . . . . . . . . . . . . . . . . . . . . . . . . . . General Rule Components . . . . . . . . . . . . . . . . . . . . . . . . . . . Window Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Refined Rule Components . . . . . . . . . . . . . . . . . . . . . . . . . . . Rule example with Event Condition Tree . . . . . . . . . . . . . . . . . . . More Complex Rule Example with Event Condition Tree . . . . . . . . . . . Rule example with Event Condition Tree and Attribute Condition Tree . . . . Rule example with an Event Condition Tree and a more complex Attribute Condition Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Processing pipeline of a rule in CepGP . . . . . . . . . . . . . . . . . . . . Preparation Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Window Creation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . Event Condition Tree Creation with the Full Method . . . . . . . . . . . . . Event Condition Tree Creation with the Grow Method . . . . . . . . . . . . Attribute Condition Tree Creation . . . . . . . . . . . . . . . . . . . . . . . Comparison Operator Initialization . . . . . . . . . . . . . . . . . . . . . . General Crossover of CepGP with Elitism . . . . . . . . . . . . . . . . . . . Crossover Point Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculation of the Crossover Component and Crossover Point within . . . . . Subtree Crossover of Event Condition Trees . . . . . . . . . . . . . . . . . . 35 37 37 38 39 40 40 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 . . . . . . . . . . . . Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 42 43 45 46 47 48 49 52 53 54 56 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 5.31 5.32 5.33 Subtree Crossover With Two Attribute Condition Trees . . . . . . . . . . . Subtree Crossover With One Attribute Condition Tree . . . . . . . . . . . . Broken Attribute Condition Tree after Crossover of Attribute Condition Trees Broken Attribute Condition Tree after Crossover of Event Condition Trees . . General Algorithm to Repair an Attribute Condition Tree . . . . . . . . . . . Algorithm to Repair Broken Aliases . . . . . . . . . . . . . . . . . . . . . . General Mutation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . Example Mutation of the ECT . . . . . . . . . . . . . . . . . . . . . . . . . Example Mutation of the ACT . . . . . . . . . . . . . . . . . . . . . . . . . Relation between TP, FP, FN and TN . . . . . . . . . . . . . . . . . . . . . Illustration of ROC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . Idea of the Window Fitness Function . . . . . . . . . . . . . . . . . . . . . The Window Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Window Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . 58 59 60 61 63 64 64 66 68 70 73 74 75 76 6.1 6.2 Process of the Rule Evaluation . . . . . . . . . . . . . . . . . . . . . . . . CepGP Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 92 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 CepGP CepGP CepGP CepGP CepGP CepGP CepGP CepGP CepGP Converging in Small Data Set . . . . . . . . . . . . . . . . . . . converging in medium data set . . . . . . . . . . . . . . . . . . converging in large data set . . . . . . . . . . . . . . . . . . . . Result on Small Data Set With Initial Parameters . . . . . . . . Result on Small Data Set With Varying Population Sizes . . . . . Result on Small Data Set With Varying Amounts of Generations Result on Small Data Set With Different Crossover Rates . . . . Result on Small Data Set With Different Mutation Rates . . . . vs. Random Walk Comparing the Bests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 103 103 105 105 106 107 108 109 List of Tables 2.1 2.2 2.3 2.4 Terms and definitions in Evolutionary Computation Event Pattern Specification . . . . . . . . . . . . Context Condition Specification . . . . . . . . . . Aggregate functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 13 14 15 5.1 5.2 Summary of constraints to ACT nodes . . . . . . . . . . . . . . . . . . . . Support of Language Constructs in CepGP . . . . . . . . . . . . . . . . . . 42 81 6.1 6.2 6.3 6.4 6.5 Support of the Language Constructs in the Rule Engine . Decision Matrix for ACT-Comparison-Operator Evaluation Decision Matrix for ∧-Operator . . . . . . . . . . . . . . Decision Matrix for ∨-Operator . . . . . . . . . . . . . . Decision Matrix for ¬-Operator . . . . . . . . . . . . . . 88 89 89 90 90 7.1 7.2 7.3 Parameter Suggestions by Poli et al. [39] . . . . . . . . . . . . . . . . . . . 104 Comparison of the Original Hidden Rule and the results of CepGP and Random111 Final Default Parameters for CepGP . . . . . . . . . . . . . . . . . . . . . . 112 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction Contemporary business, especially on the Internet, is heavily dependable on up to date information about the current status. The vast amount of data that needs to be processed to provide all parts of the business with the needed information just in time is challenging for modern IT-systems and different strategies and technologies emerged in order to help to overcome those problems. One of these technologies is the so-called Complex Event Processing (CEP). Instead of saving each piece of information into a database for further analysis and querying for higher abstracted information, CEP sees the passing pieces of information as events. Events are emitted by agents observing the environment and contain low-level information about the observation in a particular moment. They are sent to a processing system, the CEP engine, that uses knowledge in the form of rules which represent cause and effect relations to extract information on a higher level of abstraction to provide domain experts with more insight into the current status of the business, or more general, the environment. “A key to understanding events is knowing what caused them – and having that causal knowledge at the time the events happen. The ability to track event causality is an essential step toward managing communication spaghetti.”(Luckham in [26] p. 10) This cause and effect relation, so far, needs to be modeled by the domain expert. However, there are happenings, whether recorded events or things that happen within the environment but are not monitored, for which the causes are yet unknown to the domain expert. Tracking down the causes for specific historical happenings, would aid domain experts to gain a better understanding of their environment and ultimately lead to a better knowledge base for the CEP engine to extract more valuable information for the business. 1.1 Motivation The complexity of systems, events and interrelations between events can overwhelm a domain expert. ““Being in control” requires humans to have understandable, personalized views of enterprise activity at every level of activity. [Manually] monitoring log files of rule engine activity, as provided by many of today’s process automation tools, is not acceptable.” ([26] 1 1 Introduction p. 39) Automatic identification of these interrelations enables better understanding of the environment and opens up potential for better rules and, thus, better systems to react to even the most complex situations. “However, there are some distinguishing problems in dealing with exceptions. • We must be made aware of their presence in real-time - that is, the process is not behaving as specified. • We must be able to find out what causes them. The first issue should be solved by the same real-time, levelwise personalized viewing that is needed to support process evolution in the face of marketplace changes. But the second issue, finding the causes, requires new diagnostic capabilities. We need to know which subprocesses are involved in creating events that have led to an exception. [...] This kind of capability is called runtime drill-down diagnostics.” ([26] p. 41) Even though Luckham proposes other means in his work to find the cause for the exceptions, or happenings, this work provides a different kind of tool to help domain experts identifying cause and effect relations concerning a specific event type. By using Genetic Programming and a tree representation of a CEP rule, this thesis strives to elaborate an automatic approach with minimal input that derives a most appropriate rule for the given event type within historical recorded event stream data. Several challenges arise from this goal. CEP has no standardized language specification, but even the common core language parts between different implementations, use more complex operators in addition to conventional rule languages or paradigms. Therefore, CEP rules impose constraints on the structure and components within the rule representation. Genetic Programming, on the other hand, needs freedom of alteration of the rule representation for its operations to work and find a most appropriate rule among all possible rules. In addition to finding a rule which most accurately describes the cause for a given event, there are further objectives to achieve within the search process of Genetic Programming. Rules should not only be accurate about the cause, but they also should be comprehensible for humans, the domain experts, to understand the cause and potentially find more valuable information in the event stream. CEP systems only cache events as long as they need them to evaluate their rule base. Socalled windows dictate for every rule how far back in the past the evaluation needs to be able to see to tell whether the rule fires or not. To provide more insight and the most accurate form of the rule, the Genetic Programming algorithm also needs to consider to find the smallest window for the cause and effect relation. The proposed algorithm of this work addresses all of the aforementioned challenges and provides a concept for a search of the rule that satisfies all the objectives and uses Genetic Programming at its core for the process. An implementation aids to evaluate the proposed concept and gives hints for parameter settings and further research fields. 2 Norman Offel 1.2 Structure of the Thesis 1.2 Structure of the Thesis The thesis starts by laying the foundation in chapter 2 on the following page and explaining the background of the three pillars this work is based on: Evolutionary Computation is the family of algorithms which use biological evolution as a model to find the best solutions to problems, Complex Event Processing is a technology for real-time processing of vast amounts of pieces of information, and Rule Learning is a field of research that applies algorithms, like Genetic Programming, to find most appropriate rules for expert systems, like CEP. Following in chapter 3 on page 18, related works are presented and a road map is drawn that eventually leads to the this thesis by combining parts of the described approaches. In chapter 4 on page 25, this thesis explains in detail the scenario and the overall approach that is followed throughout the proposed algorithm. The main part of this thesis, the Genetic Programming algorithm called CepGP, is elaborated in much detail in chapter 5 on page 35. It covers every operation of Genetic Programming and also describes how CEP rules are encoded in a way so that Genetic Programming can make the most use of it and still create only valid rules. Afterwards, this thesis continues by presenting an implementation of the proposed algorithm in chapter 6 on page 82 to enable a practical analysis of CepGP. The evaluation follows in chapter 7 on page 100. After discussing the findings, this work concludes in chapter 8 on page 113 by summarizing the contributions of the work of this thesis and hinting to potential future research. August 28, 2016 3 2 Background Using Genetic Programming to derive a Complex Event Processing rule is mainly based on three topics which this chapter elaborates to provide the foundation to follow the rest of this thesis. The first topic is concerned with Evolutionary Computation as the group of algorithms to which Genetic Programming belongs. The second topic is Complex Event Processing (CEP) as a way to process massive data in real time from different sources using rules. The third topic is Rule Learning as the field which unites the worlds of search or optimization and rule-based systems like CEP. 2.1 Evolutionary Computation The processing power of computers have long since been used to solve problems which a human could not feasibly solve because of their excessive amount of calculation steps and the long numbers involved. But even that power reaches its limits when confronted with even bigger problems. Optimization is about finding the best solution for a given problem. In that sense, optimization is a search of the best solution within the space of all solutions to a problem. Most commonly, this involves trying and evaluating each and every solution to the problem there is. This works only if the processing power can calculate all that within a reasonable amount of time. However, there are problems where this approach is not appropriate because the calculation of all solutions would take too long, even for computers. For these cases, there are algorithms which seek to find near optimal solutions within a short amount of time without processing all solutions but an efficiently and sophisticatedly selected subset of solutions. One of them is the group of the so-called Evolutionary Computation algorithms which is inspired by the ideas of the evolution theory of Charles Darwin to search through the solution space. They build generations of solutions that base on the information of the previous generation and close in to the optimal solutions with each generation. This section first briefly describes the biological principle behind Evolutionary Computation, moves on to converting this knowledge into computer scientific problem solving and concludes with a presentation of the algorithms belonging to the group of Evolutionary Computation. 4 2.1 Evolutionary Computation 2.1.1 Evolution in Biology Evolution traditionally is about adapting to changes in an environment in order to survive. These adaptations result in changes in the genetic structure of the individuals. The genetic structure of an individual includes all the information for structure, organisation, functionality and appearance. The need to adapt arises from the so-called “rule of nature” which states that only the most suitable to the environment, or in other words “fittest”, survive and are allowed to pass on their genes to the next generation to form even fitter individuals.[54] Mutation, Crossover and Selection are the evolutionary operations which drive the evolution. They are responsible for information interchange between individuals and forming the next generation. Each of them have their respective domain to contribute to the evolution. Figure 2.1 illustrates the basic concept of the biological evolution again. Figure 2.1: Basic concept of biological evolution (adapted from [54]); From a generation which needs to adapt to environmental influences (depicted as yellow arrows), a population of parents is selected based on their conformity or fitness; Mating pairs find each other and through recombination and mutation form the next population where the cycle begins anew Mutation Mutations can be seen as failures during reproduction. What at first sounds negative is indeed a necessary process to introduce new information into a population. However, since failures in reproduction mean that the information of the parents are not combined in a way that might take the best of both of them, it disrupts the process of optimization to the current environment. That is the reason why mutations usually have a very low probability and only make minor changes in the genotype – the genetic structure – of the individuals. August 28, 2016 5 2 Background Bigger changes in the population arise from adding the small changes within the individuals. If a mutation has a high impact on the genotype, it is typically supplanted because mostly it results in negative properties.([53] p. 9) Crossover Crossover is the result of combining two or more individuals. In the real world, this is the process of mating during which the sexual partners exchange parts of their genotype what results in one or more offsprings. In traditional evolution theory, crossover is not seen as an evolutionary factor because it does not introduce new information into the population. However, contemporary research regards the complex and close interrelation of information in the genotype. It now acknowledges that crossover can lead to new structures in the genotypes and in this way introduce new information into the population. This elevates crossover to an important evolutionary factor, way more influential to evolution than mutation.([53] p. 10) Selection The selection in a population describes the change in the frequency of specific information through differing numbers of offsprings of these information. It can be measured via the fitness and influences mainly two aspects in evolution: • Survivability, decided by their conformity to the environment (environmental selection) • Ability to find mating partners, also called sexual or mating selection In nature, there are also factors like the general ability to reproduce and mating frequencies. The fitness is an implicit measure for the quality of an individual because good individuals are more likely to reproduce more offsprings. The selection depends on the phenotype – the manifestation of the genes – and its performance regarding the challenges in the environment which is quantified by the fitness.([53] p. 10 f.) 2.1.2 Evolution for Problem Solving The history of research into evolution in biology is long reaching and filled with a lot of famous breakthroughs about life on earth. Computer science, however, is a rather young science but its list of contributions to our lives is not short, either. In the 1930s, Alan Turing invented the model of a universal computing machine - the Turing machine - and claimed that every algorithmically solvable problem can be computed with his model. From this moment on, computers began to evolve from machines meant to solve a specific problem to universal problem solvers. At the same time, there are problems which are not algorithmically solvable, like the Halting Problem, or where there is yet to find an efficient algorithm for, like the group of NP-hard problems. In these cases, instead of giving up on computing solutions 6 Norman Offel 2.1 Evolutionary Computation to these problems, there are algorithms that search through the problem space for the best solution without calculating every solution there is. Although they are not guaranteed to find the best solution, they still find at least near optimal solutions which are often suitable for solving the problem at hand. Evolutionary Computation is a family of algorithms that uses the concept of evolution, as a strategy to find the best solution for the given situation to survive, for solving problems. Evolutionary algorithms as part of the Evolutionary Computation family are simulating and simplifying the natural evolution to suit their purpose of solving problems. Biologists introduced terms for the components of evolution which are also used in computer science to describe Evolutionary Computations. [53] lists these terms and explains how they are used in Evolutionary Computations. Table 2.1 presents a subset and at some points an adapted definition of the terms and their meaning. Of course, some of these definitions vary from their biological counterparts. This is grounded in the simplifications which are needed to transfer the biological concept of evolution into the world of problem solving in computer science. Term Population Individual Genotype Phenotype Mutation Recombination Crossover Selection Fitness Genetic code Meaning in Evolutionary Computation is a collection of individuals is a solution candidate for a problem is the sum of information within an individual which is evolutionary alterable is the representation of the individual from the point of view of the problem domain is a minor change in the genotype is an operation which combines two or more individuals into a new one is a synonym for recombination is an operation which determines the individuals contributing to the evolution into the next generation is the quantification of the quality of an individual is a direct mapping (decoding) from the genotype to the phenotype Table 2.1: Terms and definitions in Evolutionary Computation (adapted from [53] p. 15) In natural evolution, the goal of evolution is to enable the kind to survive within the given environment. In Evolutionary Computation, besides simplifications in the terms, there are also differences in its execution. For example, in biological evolution there is no verifiable property to evaluate the gain of the results of the evolutionary operations from generation to generation. The simple fact that the kind is still alive is the only examinable property of the population to consider its fitness. However, for problems in computer science there are clearly definable goals and objectives for an algorithm to verify the fitness of an individual.([53] p. 24) Furthermore, the general process of Evolutionary Computations is a sequential process since it is a computer program whereas in nature individuals have varying ages and produce offsprings at different times, and, at a time, multiple generations of a kind are present and August 28, 2016 7 2 Background may participate in reproduction. Another difference exists in the start of both evolutions. Arguably, the starting point of natural evolution was the Big Bang whereas the initialization of Evolutionary Computations needs to be build at random. The starting population can already consider knowledge about the problem and what information good individuals might have. After evaluating each individual of the starting population to determine their fitness, a cycle of steps begins starting with the check for the termination criterion. Usually it validates whether • there are individuals that satisfy a minimal fitness level • a maximum amount of cylce runs have been performed or • there is no significant improvement over the last generations. Otherwise, the mating selection determines for each individual how many offsprings it will produce according to its fitness in comparison to the other individuals of the population. New offsprings are generated via crossover and mutation. Then, they are evaluated and integrated into the parent population via the environmental selection. Mostly, the population size is limited so that either some or all parent individuals are supplanted by offspring individuals. With this the cycle begins anew with the check for the termination criterion. Figure 2.2 illustrates the general adaptation of the natural evolution by the Evolutionary Computations.([53] p. 25) Initialization Evaluation Termination Criterion met yes Output no Environmental Selection Mating Selection Evaluation Crossover Mutation Figure 2.2: General Evolutionary Computation algorithm (adapted from [53] p. 25) 8 Norman Offel 2.1 Evolutionary Computation The main challenges for creating an Evolutionary Computation is to 1. abstract the problem into a suitable representation for an Evolutionary Computation and to 2. define a suitable fitness function which allows to guide the search process The absence of further necessary information is one of the reasons Evolutionary Computations are experiencing a broad usage over various research fields in finding solutions for hard problems. 2.1.3 The Evolutionary Computation Family The Evolutionary Computation consists of mainly four schools of thought: the evolutionary strategies, the evolutionary programming, the genetic algorithms and the genetic programming. All of them base on the evolution as their inspiration and all of them share a lot of similarities. However, especially during the start of this field of research they had different purposes and goals. These goals became wider and since they adopted ideas from one and another they also became more and more alike. That is not to say that there are still differences mostly in their choice of genotype and their choice of operations and parameters. [53] (p. 44) created a time line of the history and the most important scientific conferences which can be seen in adapted form in figure 2.3 on the following page. Evolutionary Strategies: Bienert and Rechenberg and later Schwefel founded the Evolutionary Strategies as part of Evolutionary Computation.([53] p. 44) “EvolutionStrategies (ES) imitate, in contrast to the genetic algorithms, the effects of genetic procedures on the phenotype.”[43] Evolution strategies conventionally use real numbers to represent the problem space. Furthermore, evolutionary strategies use random selection of parents and deterministic selection of the n-fittest individuals during the environmental selection.[54] Evolutionary Programming: Founded by Lawrence J. Fogel et al. (1965), the evolutionary programming was introduced by finite automata to predict time series. Later, in the 1980s, David B. Fogel replaced finite automata with neural networks.([53] p. 45) Evolutionary Programming does not use crossover and introduced the contemporary widely used tournament selection.[54] Genetic Algorithms: The main contributors to Genetic Algorithms are Holland, De Jong and Goldberg.[54]([53] p. 45) Traditionally, problems are represented as binary bitstrings. Mutation flips single bits and crossover mixes the bits of the parents with varying strategies.[54] Genetic Programming: Genetic Programming, founded by Koza[23] and derived from the genetic algorithms, uses dynamic representations like trees and it is often used to create computer programs and similar phenotypes. Chapter 4 on page 25 explains Genetic Programming and its techniques in more detail. August 28, 2016 9 2 Background Figure 2.3: History of Evolutionary Computation (adapted from [53] p. 44); The scientific conferences in their respective fields of research are the International Conference on Genetic Algorithms (ICGA), the Parallel Problem Solving from Nature (PPSN) and the Evolutionary Programming (EP); The conferences consolidating the fields are the Genetic and Evolutionary Computation Conference (GECCO) and the Congress on Evolutionary Computation (CEC) 10 Norman Offel 2.2 Complex Event Processing New Concepts: Especially around the millennium a lot of new concepts emerged which used analogies from nature as problem solving algorithms. They are called ant colony optimization, particle swarm optimization, differential evolution and many more.([53] p. 45) 2.2 Complex Event Processing Complex Event Processing (CEP) is based on the processing of events which are “anything that happens, or is contemplated as happening.”[27] These events represent all kinds of measurable or at least observable things in the monitored environment and are often emitted by agents as observers of the environment. In contrast to already present and stored information in databases, CEP seeks to process the information carried by events as soon as they come in to enable real-time processing. CEP uses these usually low-level and primitive events to detect more complex events by correlating them according to rules. These rules are used to infer new information by combining information of primitive events into complex events. The complex event carries a higher level information and can itself be used to infer new information at an even higher level.([4] p. 3 f.) Figure 2.4 illustrates the combination of primitive events forming a certain pattern (represented in CEP as a rule) into a complex event in a higher abstraction layer. The pattern defines a causal relation between the primitive and the complex event. Figure 2.4: Abstraction of events into higher level complex events (adapted from [4] p. 5) Since events represent anything that might happen in an environment, processing and reacting to these events in real-time represents any kind of real world processes. CEP is therefore an event-driven system which works cyclic in three basic steps which are shown in figure 2.5 on the next page. During the detection step, the event-driven system CEP detects events in the environment in real-time. These events are then processed by aggregating events from different sources and by matching patterns. If those patterns are matched then the last step reacts to them by calling distributed services in real-time.[26] August 28, 2016 11 2 Background Detect React Process Figure 2.5: Cycle of event-driven systems 2.2.1 Terminology To provide a common understanding of the terms used throughout this thesis, this section gives definitions taken from [27]. Complex-event processing (CEP): Computing that performs operations on complex events, including reading, creating, transforming, abstracting, or discarding them. Event: Anything that happens, or is contemplated as happening. Events are defined by a type which also defines the information they carry, represented as attributes. Simple event: An event that is not viewed as summarizing, representing, or denoting a set of other events. Derived event: An event that is generated as a result of applying a method or process to one or more other events. Rule (in event processing): A prescribed method for processing events. Bruns and Dunkel add in [4] (p. 12) that rules describe the actions that are to be executed after a pattern was matched. The condition part of the rule is the pattern to be matched and the action part defines the reaction to these event patterns. Window (in event processing): A bounded segment of an event stream. Since events occur unlimited as streams, it is important to be able to define the segment of the stream that is cached for later investigation. Relationships between events: Events are related by time, causality, abstraction and other relationships. Time and causality impose partial orderings upon events. 12 Norman Offel 2.2 Complex Event Processing 2.2.2 Language According to [26] (p. 146f.), the requirements of a language which can represent these causalities in the form of rules are: power of expression, notational simplicity, precise semantics, and scalable pattern matching. This thesis chooses the syntax and event pattern specification from the work of Bruns and Dunkel [4] (p. 21 ff.) as represented in table 2.2. Operators Sequence (→) Boolean (∧, ∨, ¬) Excluding sequence (A → ¬B → C) Description Represents a timely order of its operands. example: A → B means that the pattern matches if an event of type A is followed by an event of type B. Other events of different types are allowed to occur in between. The operands can also be other patterns. In this case, the sequence operator defines that the first operator needs to be ultimately matched before the second one (decisive is the time of final matching). Defines the occurrences (or absence) of events of specific types within the considered stream segment where their order of occurrence does not matter. The ∧-operator demands both operands to be there, whereas the ¬-operator only matches if the event type is absent. The ∨-operator matches as long as at least one of the operands is present. In the simplest form, the operands are boolean values whether the defined event types are present (or absent). When they combine other patterns, then they perform their respective comparison according to the boolean results of their operands. Defines a special kind of sequence where an event type is prohibited to occur in between two other event types. This is semantically different from a combination of two sequences and a ¬-operator as the middle operand: A → ((¬B) → C). Here it is sufficient to have at least one event of a type other than B between the occurrences of an A followed by a C in the stream. If the operands are also patterns instead of event types, then this operator works similar to the sequence-operator. The middle pattern is prohibited to match between the matching of the first and the last pattern. Table 2.2: Event pattern specification taken from [4] (p. 21 ff.) Bruns and Dunkel also propose context conditions which take into account the attributes of the events that matched the event patterns displayed in table 2.3 on the next page. In theory, it is possible to consider every event that has ever been occurred since the CEP system is running. However, this means that the system would need to cache possibly an unlimited number of events to evaluate its rules. That is why CEP systems enable rules to specify the maximum size of events that need to be cached for them via windows. August 28, 2016 13 2 Background Component Alias Access-operator (.) Operators (+, −, /, ∗, <, >, ≤, ≥, =, 6=, . . . ) Description In order to link the context conditions with the event pattern, each event type in the event pattern (also called event condition) is assigned an alias which is a unique reference used in the context conditions (also called attribute conditions) to access attributes of the event referenced by that alias. For assigning an alias the keyword as is used. Example: (A as aliasA) → (B as aliasB) Within the context conditions (attribute conditions), attributes of the referenced events can be accessed via the alias and the accessoperator (.). Example: (A as aliasA → B as aliasB) ∧ ( aliasA.attribute1 = 0) This example already shows how the event and attribute conditions are connected. Both are linked with an ∧-operator and use references to show the relations between the events in the event condition and the attribute comparison in the attribute condition. Context or attribute conditions need to have operators that combine the values they access via aliases and the access-operator. Some values may form new values (for example through addition or multiplication) or they are compared (for example whether one is less than or equal to the other). These operators depend on the kind of values that are processed (numeric or categorical types for example) and their meaning depends on their domain (relation between categories for example). Table 2.3: Context condition specification according to [4] (p. 22 f.) 14 Norman Offel 2.3 Rule Learning Sliding windows adapt the events within this window specification every time a new event is received. CEP systems know two types of windows that differ in how they define their segments. The length window always considers a static amount of past events, no matter their time differences. The time window uses the timely differences from the newest event to the past events to determine which events are part of the segment. Bruns and Dunkel use the syntax win:length:x for length windows and win:time:yz for time windows where y is a number and z is a time unit. The window is attached within [ ]-brackets to the rule: (A → B)[win:time:2min].([4] p. 23 f.) Another important concept in CEP systems are aggregations. In combination with windows, they can conglomerate attribute information from the events within that window.([4] p. 24 f.) Bruns and Dunkel name four general aggregate functions in table 2.4. Function sum avg min, max Description calculates the sum of an attribute over all events within the window builds the average value of an attribute over all events within the window finds the minimum and maximum of an attribute value over all events in the window Table 2.4: Aggregate functions according to [4] p. 25 An example with an aggregate function is ItemSoldEvent.max(price)[win:length:100] as highestPricedEvent. These aggregate functions as well as the aforementioned attribute operators are focused on numeric attribute values which will also be the target attribute value type of this thesis. Nevertheless, CEP systems in general offer more functions and value types than are represented in this section. 2.3 Rule Learning Rule Learning is a form of classification where solutions to a problem are categorized, or classified, according to whether they lead to the desired outcome or not. This section shall give a basic background to this field of research and how the goal of this thesis fits into this field. Urbanowicz and Moore give an introduction to learning classifier systems (LCS) in [51] and so do Sigaud and Wilson in [46]. Learning classifier systems seek to find a rule-set (population of classifiers), instead of a single rule, that best describes the system. “The desired outcome of running the LCS algorithm is for those classifiers to collectively model an intelligent decision maker.”([51] p. 2) Figure 2.6 on the next page shows the origin of these systems, coming from the already presented Evolutionary Computation and from machine learning where the goal is to learn via an improvement in the performance or solutions obtained. August 28, 2016 15 2 Background Figure 2.6: Origin of LCS (adapted from [51]) In contrast to this thesis, not only learning classifier systems but Rule Learning in general is often concerned with finding rule sets instead of one rule. Nevertheless, in its simplest form, rule learning covers this special case as well. Holland already stated in the eighties that Rule Learning and expert systems like CEP go well together: “Expert systems are one of Artificial Intelligence’s real successes in the exploration of intelligence. In the longer view, the most significant part of this success may be the bright illumination it throws upon questions of system versatility. [. . . ] Classifier systems are general-purpose programming systems designed as an attempt to meet the criteria [combination, parallelism, declarative and procedural information, categorization, synchronic and diachronic pointing, gracefulness, and confirmation]. Classifiers have many affinities to the rule-based (production) systems underpinning the usual approach to Expert Systems.”[18] In expert systems, rules are also often used to infer new rules and therefore create new knowledge as links between existing pieces of knowledge. This process is called reasoning and can be defined as follows: “Reasoning, which has a long tradition that springs from philosophy and logic, places emphasis on the process of drawing inferences (conclusions) from some initial information (premises). In standard logic, an inference is deductive if the truth of the premises guarantees the truth of the conclusion by virtue of the argument form. If the truth of the premises renders the truth of the conclusion more credible but does not bestow certainty, the inference is called inductive.”([19] p. 2) Even though reasoning can be performed to find new insights into the domain the expert system is working in, it is not performed by the algorithm used in this thesis. In general, besides reasoning which builds on existing knowledge and rules, there are two ways of learning rules: 16 Norman Offel 2.3 Rule Learning Supervised learning is done by comparing the results of an algorithm with the known correct answers. There is a so-called training set which includes input and desired output. The algorithm shall learn the rule from the input that leads to the correct output to give hints to yet unknown knowledge. The results from the training set are than tested on the test set to evaluate how well they do with other data. This is often done to see if the result was specialized too much according to the training set, what is also called over-fitting.[9] Unsupervised learning is done without knowing beforehand what the correct outcome should be. The algorithm is given minimal information for learning new knowledge from the input. Examples are learning a given number of groups (clusters) of pieces of information that belong together by some relation or, given a few known instances, find similar pieces of information in the input.[9] This thesis uses a supervised learning algorithm which learns rules that correspond to a known outcome from the recorded input. Unlike LCS, it does not strive to find a rule set explaining the domain but rather a single rule explaining the cause of a certain event type. Reasoning cannot always help in these situations and therefore this thesis builds upon Genetic Programming as a tool of efficiently finding a near optimal rule. August 28, 2016 17 3 Related Work This work concentrates on applying Genetic Programming on Complex Event Processing to derive meaningful rules for a given complex event. In this regard, this chapter creates the links between the presented background fields and gives an overview over scientific works closely related to the application as done in this thesis. It starts with an introduction to Evolutionary Computation applied in the field of Rule Learning, examines the application of Genetic Programming to Rule Learning and afterwards looks into proposed Rule Learning algorithms within the context of Complex Event Processing. 3.1 Evolutionary Computation in Rule Learning As presented in section 2.3 on page 15, Rule Learning has a long history with varying algorithms and techniques. Evolutionary Computation started by optimizing parameter sets and models in engineering. But soon after, pioneers like David E. Goldberg started to apply this concept on Rule Learning [15] with remarkable results. This lead to more research in the combination of Evolutionary Computation and Rule Learning in the following decades. This is a selection of a vast amount of research done in the combination of Evolutionary Computation in Rule Learning with a focus on the developments over the years and specific aspects which this thesis uses or considers at some points. Johnson and Feyock provide a general algorithm for the acquisition of expert system rule bases using a genetic algorithm.[22] Their main task is to abstract the binary encoding of genetic algorithms to incorporate rules in a rule grammar. Instead of using rules as inviolable parts of the rule base, Johnson and Feyock proposed to alter rules in evolutionary operations to construct new rules and build the rule base. Grefenstette et al. [17] use a genetic algorithm to learn a tactical plan consisting of a set of decision rules based on flight simulator data. Instead of the binary string representation of a rule which Goldberg uses, they base their optimization on a high level representation of a rule in the form of if-then-constructs where the operators are also high level in the form of and c1 . . . cn . Additionally, they have different kinds of measurements with different representations where one of them also uses a tree structure. Although, the tree structure is not used to represent the rule itself but to decide a subset of possible values which is valid for a rule. Similar to this is the application of genetic algorithms to rule learning on unmanned aerial vehicles in [30] by Marin et al. The rules shall guide the vehicle to follow enemy activities on 18 3.2 Genetic Programming in Rule Learning the terrain. It is not a route planning problem, but the rules are used so the vehicle can react and change its course based on the current information of the enemy movement. It even modifies the rules based on previous information. In that sense, they improved the previous approaches with the adaptation of rules to recent information. Huang uses a genetic algorithm in [20] to learn control actions for combustion control which are rules that are applied to certain situations. In this work, the author attempts to only learn prototype rules with the genetic algorithm and does not attempt to learn a rule for every situation there can be. In order to apply control actions, a nearest neighbor approach is applied to find the rule most appropriate to the current situation. The rule encoding used is a classical binary string as is most common in genetic algorithms which encodes the control action and a percental increment as to how strong the action is to be performed in the context of the example used. The advantages of the approach used in this work are that prototype rules which can be optimized through the genetic algorithm without an expert and it could deal with noise in the data. A well-known application of genetic algorithms to rule learning is GARP (Genetic Algorithm for Rule-set Production)[48] which applies it to the problem of species distribution modelling while taking various information about other species, climate, geographical profiles and so on into consideration. GARP uses a type-safe approach and allows crossover only on values or ranges of values of variables. Mutation is used to introduce new values to variables in the population. The implementation of Janikow in [21] proposes a task-specific genetic algorithm for supervised learning. It uses a grammar representation of rules and specialised operators to alter rules or rule sets which, until then, was still rarely done. Their research also considered shifting from transferring the problem into the domain of the genetic algorithm – usually binary strings – to adapting the evolutionary operators to fit the problem domain. This allows for a faster convergence and better results but also needs a great deal of conceptual work to use task or domain specific knowledge within the evolutionary operators. Janikow also considered invalid or empty rules which are rules that do not produce positive outcomes at all. Spears and De Jong [47] proposed to keep both for the sake of future fitter rules since even the empty or invalid rules inherited information of the best individuals. Janikow, however, argues that there is a conflict between retaining these rules for future possibly better outcomes and the predictive accuracy of the system. Therefore, Janikow decides to remove these rules from the solutions. 3.2 Genetic Programming in Rule Learning So far, Evolutionary Computation in Rule Learning always involved genetic algorithms. That is because a lot of the presented work was done before Genetic Programming was proposed. Since then, some researchers took on the challenge of applying Genetic Programming to Rule Learning. August 28, 2016 19 3 Related Work The presented ways of applying Genetic Programming to Rule Learning have found a number of applications in varying fields. Most of the time, finding rules for given data sets was not the only goal which leads to this section briefly describing works which enhanced the Genetic Programming to do more. Bojarczuk et al. use Genetic Programming to discover rules in the medical domain for diagnosing pathologies.[3] Despite finding accurate rules, their focus is on finding comprehensible knowledge while applying data mining algorithms. Bojarczuk et al. preferred Genetic Programming over genetic algorithms because it allows “a more open-ended, autonomous search for logical combinations of predicting attribute values” and used that advantage to form tree structured individuals to symbolize first-order logic. Despite a small data set, they achieved promising results with their approach. Tay and Ho propose a genetic programming algorithm to solve the flexible job-shop problem with multiple objectives. [49] The flexible job-shop problem is a more complex variant of the well-known job scheduling problem (JSP) as a NP-hard problem. In their work, the authors describe three simultaneous optimization goals for the makespan, the mean tardiness and the mean flow time, each with a different fitness function. They conglomerate the fitness values by calculating the average fitness value of all of them. Genetic Programming is often used in the stock market domain to find trading rules.[40, 7, 36, 38, 52, 37, 5, 28, 56, 25] With the later works, the proposed algorithms achieved an overall better result than traditional algorithms and eventually, the “statistical results confirm that the GP based trading rules generate a positive return for the trader under all market conditions (whether rising or falling).”[28] The rule representations in these works include relational and logical operators where types are used to validate the trees.[56] Automatically Defined Functions (ADFs) are used to distinguish between operator types and to simplify the rules.[56, 52]. In [56], Yu et. al allowed crossover only between modules of the same kind to enable valid only individuals in the populations. They also introduced length windows to determine aggregate functions like average, minimum or maximum. In [11], Freitas developed a Genetic Programming framework as a data mining system and chooses SQL queries as the rule representation for evaluation purposes whereas a tree structure is chosen to represent the query in the Genetic Programm algorithm. He adapted the algorithm so that the goal-attribute for the optimization is user-defined and fixed during the process for the classification problem. De Falco et al. propose a similar Genetic Programming framework which concerns itself with finding automatic class discoveries in comprehensible rules.[6] They use a tree structure to represent the rules and combine logical with relational operators. During the construction, restrictions are imposed on the operators. The logical operators are allowed until a relational operator is build. The relational operator’s first operand is always an attribute whereas the second operand can either be another attribute or a constant with equal probability. During the evolutionary process, a depth limit is enforced where offsprings exceeding this limit are replaced by one of the parents. When mutating a rule, their algorithm uses a point mutation for leaf nodes and a tree mutation for intermediate nodes in the tree. Here, the depth limit 20 Norman Offel 3.3 Optimization and Rule Learning in Complex Event Processing is also enforced and when the new rule exceeds the limit, the original subtree is used again. De Falco et al. determine the fitness by combining the result with the simpleness of the rule. It seems that, after research on applying Evolutionary Computation on Rule Learning lead to promising results, researchers used Genetic Programming to have more influence on the rule structure during the evolutionary process. Therefore, combining the robust search for accurate rules with the objective to result in the most simple versions of them, was the next logical step. It is a necessity because the rules need to be understood by human users mostly and therefore the rules need to be as intuitive as possible. 3.3 Optimization and Rule Learning in Complex Event Processing Optimization in the context of Complex Event Processing is most often used to improve query execution. However, recently more and more research is done in automatically deriving CEP rules. 3.3.1 Improving CEP performance Ding et al. propose in [8] an optimization for Complex Event Processing over large volumes of business transaction streams which evaluates whether the rule is likely to fire and stop the execution if that is not the case. Gao et al. address in [14] the use of Complex Event Services which capsule single CEP-instances and introduce quality of service awareness with a genetic algorithm. In [24], Liu et al. use a B+-tree-based approach to optimize the processing performance of RFID-events with the time window constraint and by using pruning in intermediate query results. Rabinovich et al. optimize in [42] the processing of events with rewriting techniques for complex pattern types. Zhang et al. [57] also proposed strategies for query execution to optimize performance and resource consumption. These are only a brief selection of works that use optimization to improve the overall performance of event stream processing as done in CEP. In their book ([2] p. 38ff.), Atzmüller et al. describe some other works that use optimization techniques like Bayesian networks to predict event streams, Artificial Neural Networks to detect network breaches via two expert systems, and other optimization techniques to analyze the event stream. Although, none of them is related to Rule Learning. 3.3.2 Learning CEP rules There are only few works which are concerned with learning rules in CEP. With their proposed framework Fossa, Frömmgen et al. [13] apply Genetic Programming to learn Event Condition August 28, 2016 21 3 Related Work Action (ECA) rules for a given utility function in an adaptive distributed setting. The utility function quantifies developer-defined performance metrics, such as average throughput, maximum latency and so on. The learned rule expresses how the distributed systems need to adapt depending on the situation to reach the goal defined by the utility function. This work uses event processing at the core to process events from different systems, but does not rely on Complex Event Processing specifically. Timeweaver, proposed by Weiss and Hirsh in [55], predicts rare events from event sequences with categorical (non-numerical) features. The authors use a genetic algorithm to find rules which are represented by a self-defined grammar. As in this thesis, timeweaver aims to find a rule for a given event according to recorded data. But in contrast to this thesis, Weiss and Hirsh use a different approach with a genetic algorithm and a grammar representation. Although timeweaver also optimizes the window for events, it does not use the operators from Complex Event Processing, but their self-defined ones with concepts not found in general Complex Event Processing languages. Additionally, the authors leave out multiple attributes per event and that these attributes may be numerical instead of categorical. Turchin et al. propose in [50] a tuning of rule parameters where the domain expert writes a general rule and their algorithm chooses the suitable parameter values. This varies from the approach of this thesis, since the goal of this thesis is to derive the whole rule for a given event type in a recorded stream. Sen et al. propose in [45] “a recommendation based pattern generation”. In their proposal they use existing rules and domain expert input to further derive rules for recommendation. The difference in their work to this thesis is that they rely on user input for a part of the condition and to derive possibly interesting rules with this condition part for the user whereas this thesis strives to find all the parts of a rule on its own that lead to a specific event. In [35], Mutschler and Philippsen present their apporach on automated CEP Rule Learning with a noise Hidden Markov Model (nHMM). Their approach also focuses on a single complex event for which the algorithm shall find an appropriate rule based on historical recorded streams, like this thesis. Unlike this thesis, they use a different base algorithm with the nHMM. Mehdiyev et al. propose in [31] “a machine learning model to replace the manual identification of rule patterns.” They use a preprocessing stage for feature selection and construction and afterwards apply “various rule-based machine learning algorithms to detect complex events”. They also see a lack of research in automatically detecting CEP rules in the event streams and they analyze the suitability of different rule-based classifiers for this problem: One-R, RIPPER, PART, DTNB, Ridor and NNGE. All classifieres except One-R exceeded an accuracy of 90% in their tests where One-R still managed to have about 80% accuracy. This shows that the application of rule-based classifiers to Complex Event Processing is promising and worth further investigation and research. The proposed algorithms classify all events in the streams into classes. However, this thesis presents an Genetic Programming algorithm to find a possible cause for a given event in the event stream. 22 Norman Offel 3.3 Optimization and Rule Learning in Complex Event Processing Mousheimish et al. present in [33, 34] autoCEP, “a data mining-based approach that automatically learns predictive CEP rules from historical traces”. The ultimate goal of their work is to shift the focus of CEP from detection to prediction of upcoming situations with as few human needed interactions as possible. Instead of complete rules, their goal is to learn so-called shapelets which are pattern of minimum possible length that can classify the data strikingly. After their algorithm attained these shapelets, they transform them into CEP rules in a second stage (see figure 3.1). Mousheimish et al. also introduce time series pattern mining techniques. Until now, their implementation of autoCEP is limited to one attribute per event only, but the authors are currently working on further extending their implementation. For their shapelet learning, which results in CEP rules later, they currently use a brute-force shapelet extraction algorithm ([34]) and no optimizing algorithm is used. Unlike this thesis, Mousheimish et al. strive to learn a rule set for all kinds of classes to predict their future presence. Figure 3.1: Run-Time prediction with autoCEP from [33] Margara et al. defined their proposal iCEP as “a novel framework that learns from historical traces, the hidden causality between the received events and the situations to detect, and uses them to automatically generate CEP rules”.[29] The architecture as depicted in figure 3.2 on the following page consists of several subsystems, each learning different parts of a CEP rule. Their approach aims at learning one rule at a time. They use positive and negative traces during the evaluation to signalize when the specific complex event occurred and when it did not occur in the historical traces. Their goal then is to find a rule which results in these exact same traces. The algorithm works as follows: 1. Starting with the postive traces, the event/attribute learner identifies the relevant event types and attributes. 2. The window learner finds the minimal window that includes all relevant events. The August 28, 2016 23 3 Related Work results of 1. and 2. are sent back and forth until no further improvement could be made. 3. The result of 1. and 2. is handed over to the constraint learner that selects the concrete events according to their attribute values. 4. Concurrently to 3., the aggregate learner finds out whether there is an aggregate constraint and how it needs to be specified according the information from 1. and 2. 5. After 3. and 4., the parameter learner finds parameters which bind the values of attributes from the identified events together. 6. Concurrently, the sequence learner discovers ordering constraints within the events. 7. Finally, the negation learner uses the negative traces to find negation constraints about which specific information needs to be prohibited. As can be seen in figure 3.2, the algorithm is executed in a pipeline where some parts can run concurrently whereas other parts depend on the termination of previously executed parts in the pipeline. Figure 3.2: Architecture of iCEP from [29] The algorithm also includes all general language concepts of Complex Event Processing. To obtain the most general rule, the proposed algorithm intersects every correct rule it could find and keeps only the common properties of the rules. In their work, Margara et al. propose a machine learning algorithm as a future work for further research. This thesis is focused on a Genetic Programming algorithm as a machine learning algorithm to find a most appropriate rule for a given event in a given stream, just like they proposed for future work. Naturally, this algorithm works differently from iCEP but seeks to achieve the same goals. 24 Norman Offel 4 General Approach In Complex Event Processing (CEP), the rules are usually written by domain experts with knowledge about the field in which the Complex Event Processing system is operating. Although there are proposals for rule-based expert systems, like CEP, to learn the rules and thus the domain knowledge on their own or at least with minimal aid of a domain expert, sometimes there also are happenings, meaning events in a broader concept, like failures, errors or unusual behaviours, which are rare and for which the domain expert or a self-learning system does not know the cause. The purpose of this thesis is to provide an algorithm to derive a rule out of all the recorded data during the time of these rare happenings which may give hints to the cause of these interesting but hard to analyze happenings. However, it is important to note that the rule provided by the algorithm may lead to the happening taking place, but that does not mean that the happening truly has its origin in that rule. It may be entirely possible that there are totally different reasons for it to happen. Nevertheless, the rule may still provide hints for the cause and in that sense help the domain expert to find the rule she is looking for. This chapter will first provide an example of the idea pursued in this work. Afterwards, it explains how Genetic Programming is applied to Rule Learning and concludes by presenting the foundations of the evolutionary operations in Genetic Programming which are later adapted to be used for CEP rules. 4.1 The scenario Assuming CEP is installed and operating in the environment of interest where the happening takes place and a domain expert is watching over the results of the CEP system and the environment. Occasionally, something happens that piques the interest of the domain expert. She can point out when it happened but does not know why. However, the recordings of the events in that environment may uncover what lead to this happening. The recordings of the events are gathered in one log where each event consists, typical for CEP, of • a timestamp of its occurrence • an event type 25 4 General Approach • optional attributes as a key-value pair where the attribute name is the key mapping to its respective value Every event of the same event type has the same amount of attributes with the same names and types. For the proposed algorithm, it does not matter whether the happening can be related to a specific event type or not, as long as its occurrences can be injected into the log as accurate as possible. Only a unique event type in the record is necessary and each time the happening takes place, an event of this type is placed into the record. No timestamp, no attributes, no further information is needed for these specific events, whether manually inserted or automatically captured. After the domain expert obtained the enriched log, she passes it to the algorithm. CepGP will use Genetic Programming to search for a rule within the events which most accurately describes a pattern that leads to the happening marked by the special event type. Since the purpose of the algorithm is to provide hints to the cause of the happening based on the events, it should not only produce accurate but also simple and easily understandable rules. It should also adapt all of the concepts from CEP, related to event types and their attributes, windows, and other constraints like ordering or combinations according to the operators presented in 2.2 on page 11. Another goal of the algorithm should be to produce rules with the minimal window to provide the domain expert with the most narrowed down area in the events which enables her to better analyze the results and the event records and possibly grasp the cause of the happening. Figure 4.1 on the next page illustrates the scenario again. The only requirement of the algorithm shall be a record (or log) of the occurred events within the environment including the occurrences of the happening marked by a unique event type. The objectives of the algorithm are foremost a most accurate and also appropriate rule, meaning that not only should it pinpoint the necessary conditions for the happening with all the means of general CEP languages, but it also shall provide a simple output which is easily interpretable by the domain expert and which minimizes the regions within the log that are of interest to understand the cause of the happening. At best, the algorithm shall also show how much each of these goals could be achieved. As presented in the related works in chapter 3 on page 18, there are already some approaches to learning rules in Complex Event Processing. However, most of them try to classify all events or seek to improve existing template events. The approach of this thesis strives to help to identify the origin of a given special and normally rare event in the stream of Complex Event Processing with no other information than the name of the special event and a record of the stream around the time the special events occur. The information that need to be processed to get such a rule, and the possible rules in general, construct a large problem space that cannot be solved via brute-force searches over all possible solutions. Therefore, the proposed algorithm, CepGP, uses a Genetic Programming algorithm to search the optimal rule that leads to the marked special event from the information in the recorded stream. The 26 Norman Offel 4.1 The scenario Figure 4.1: Scenario of this Thesis; Sensors emit Events with the values they detected during their observation of the environment to the CEP-engine; The CEP-engine writes these Events with the needed information into a Log; If the Happening could not be observed by the sensors but by the domain expert alone, then the Log needs to be enriched with the events representing the Happening via a unique event type (here HappeningEvent) at the respective positions in the captured stream, the unique name is sufficient; The Log containing the HappeningEvents is now used to start CepGP to find a rule which might give hints to the cause of the Happening from the recorded event stream (here: the found rule expresses the occurrence of an alarm for very high temperature in a room and within 2min after that a following smoke alarm of a sensor in the same room, therefore, the Happening might be a fire) August 28, 2016 27 4 General Approach next section generally describes how Genetic Programming can be applied to Rule Learning and afterwards this chapter presents the general idea behind Genetic Programming. 4.2 Applying Genetic Programming to Rule Learning As described in 2.3 on page 15, Rule Learning is a classification problem at the core and Pedro, Ventura, and Herrera give an overview over the field of Genetic Programming in Classification problems in [10]. Figure 4.2: Applications of GP in classification tasks (from [10]) They summarize the applications of Genetic Programming in this field of research in figure 4.2 with the three groups of • Preprocessing which concerns itself with transforming the original data to enhance its utility for the Genetic Programming algorithm. Feature selection obtains the relevant attributes and optionally weighs them according to their importance. Feature construction creates new predicting attributes as a combination of present ones. • Model extraction is the actual classifier induction where the most suitable classifier for a given outcome is searched. The model is the representation of the classifiers and ranges from decision trees over classification rules, discriminant functions and more. • Ensemble classification which is used to find a group of classifiers to deal “with different patterns or aspects of a pattern embedded in the whole range of data, and then through ensembling, these different patterns or aspects are incorporated into a final prediction.”[10] Out of these groups, this thesis is most concerned with the model extraction which is depicted in figure 4.3 on the next page. As mentioned before, there is a variety of models to choose from to represent the structure of the classifiers, each of these models shown in the figure are more suitable to the shown classification goals. This thesis aims to derive rules in the context of Complex Event Processing and therefore chooses to further investigate the rule classification.[10] 28 Norman Offel 4.2 Applying Genetic Programming to Rule Learning Figure 4.3: Model extraction with GP according to [10] In rule classification, the algorithms are generally distinguished by whether they are more suitable to differ between two classes or more than two. The binary classification is done by encoding a rule as an individual of a population. One rule is sufficient to decide both classes, the data that fulfills the rule is one and the other class consists of the individuals failing the condition of the rule. The best individual is the final result of the Genetic Programming algorithm.[10] The binary classification is the underlying concept of CepGP, the algorithm proposed by this thesis. There are two properties a Genetic Programming algorithm has to have to work properly: Sufficiency and Closure. Sufficiency is the property of having all means to fully represent all possible solutions to the problem via the given functions and terminals. Closure is the property of having only functions that are able to process all possible inputs they might receive.[10] This property consists of two sub-properties: type consistency and evaluation safety.([39] p. 21) Since Genetic Programming is based on the evolutionary operations, like crossover which combines arbitrary parts of the individuals to form new solutions, type consistency ensures that the new solution can be evaluated. The operators within the solution (or individual) need to be able to process the result of their operands, whatever it is. Closure can be hard to fulfill when there are different types and operators requiring specific subsets of all types as their input and otherwise will not work properly. In these cases, one solution is the use of Strong-typed Genetic Programs (STGP).([12] p. 146ff.)[32]([23] p. 479ff.) Here, the population initialization and the evolutionary operations like crossover and mutation are altered in a way that they only produce valid rules in a sense that these constraints for the operators are taken care of. Another possible solution is the so-called Booleanization [44, 12] where all terminals are only allowed to return Boolean values and functions only process and return Boolean values. The third approach to the closure problem is using grammars to describe valid rules and enforce them. If there is a violation of the constraints, remedying that fact can be done via the fitness function which maps a low fitness value to these invalid individuals. There is also the way of repairing the individual by removing the invalid parts and optionally replacing them with newly randomly generated but valid parts. ([12] p. 146) Complex Event Processing rules have additional features as a result from their main goal to process stream data. These features, like unique operators, attributes, windows and func- August 28, 2016 29 4 General Approach tions, add more complexity and constraints to the rule representation and the evolutionary operations in the Genetic Programming algorithm which need to use one of the proposed strategies to fulfill the closure property. 4.3 Constraints in Genetic Programming In section 2.1 on page 4, this work laid out the background of Evolutionary Computation and briefly presented the main members. These members distinguish themselves mainly by their choice of problem representation, the genotype. The genotype defines how a solution to the problem is encoded in the Evolutionary Computation algorithm. The phenotype, on the other side is the representation of the problem in its own domain. In this thesis, the phenotype is a Complex Event Processing rule with all the concepts of the general language presented in section 2.2 on page 11. The genotype is the transformed version of this rule into a tree structure that is used in the domain of the Genetic Programming algorithm. As already mentioned before, CEP rules require constraints to treat their complexity and interrelation of the rule components. There are mainly three approaches to induce such constraints as domain knowledge into the genotype and thus into Genetic Programming: simple structure enforcement, strongly-typed GP and grammar-based constraints.([39] p. 51ff.) Simple structure enforcement as the name implies, already lays out a basic structure of the solution and allows the algorithm to evolve components within the structure freely. This thesis uses this approach to ensure the basic structure of the rule and that certain functions and terminals are restricted to a specific component (separation of window and condition for example). The initial population can be created with individuals always following these constraints. But crossover and mutation need to be adapted to not mix these components as well. Another way to do this, is to separately evolve these components.([39] p. 52) Strongly-typed GP is another way of inducing constraints when solutions to the problem already impose types in the phenotype. Terminals are typed and functions have types for their parameters they accept and an output type. Every part of the evolutionary process needs to be adapted to this type system to ensure no violation: from the initial population creation, to crossover and mutation.([39] p. 52f.) After each evolution of a solution, every function needs to have parameters of their expected types. This thesis heavily uses this approach within the condition of a rule to best guide the search for the best rule and to focus on the actual problem space of all possible rules instead of allowing invalid rules which cannot be evaluated. Grammar-based constraints are mainly used in the form of rewrite and production rules. The initial population is created in a way that each individual can be produced using the given grammar. Crossover and mutation need to consider the grammar to also follow the imposed constraints. The subtrees under the specific variable of the grammar can only be substituted during crossover or mutation by another subtree of the same 30 Norman Offel 4.4 Evolutionary Operations variable. Another way is to use Grammatical Evolution where numbers decide the option of the variable production for this specific individual.([39] p. 53ff.) Both ways are alternatives for the proposed way in this thesis and could be investigated in future works. 4.4 Evolutionary Operations The representation of a solution in the domain of the Genetic Programming algorithm, also known as genotype, is a tree. Therefore, the evolutionary operations crossover and mutation have to be able to transform trees in a way that they can fulfill their purpose. But first, this section starts by presenting the selection algorithm to choose the individuals that perform crossover, proceeds by explaining the crossover operation and concludes with the mutation operation. 4.4.1 Selection Although there are two selection stages in Evolutionary Computation: mating selection and environmental selection (see 2.1 on page 4), in Genetic Programming there is only the mating selection. The purpose of the environmental selection of conglomerating the new and previous generation into the next one is done by a crossover rate which allows some individuals to survive into the next generation if they are not crossed at that time and by elitism that let the absolute best individuals pass from one to the next generation. In GP, as in every other Evolutionary Computation algorithm, choosing individuals which are allowed to contribute to the next generation is probabilistically based on fitness. The fitter an individual the higher are its chances to be selected and perform crossover. CepGP employs the widely used tournament selection. Tournament selection holds a tournament between a defined number of randomly selected individuals of the population (the tournament size). The fittest of these individuals is the winner of the tournament and is granted the privilege of reproduction. Each tournament selection produces one selected individual, so that crossover always needs two tournaments to be held to decide the parents. Goldberg proposes additional selection methods to be used in [16] (p. 121f.): roulette wheel selection (or stochastic sampling), deterministic sampling, expected value model and more which can also be applied in Genetic Programming. 4.4.2 Crossover The crossover operation in Evolutionary Computation combines multiple individuals of a population into a new individual. The crossover operation in Genetic Programming differs a lot from the biological prototype because in Genetic Programming the parts each individual August 28, 2016 31 4 General Approach exchanges can be at very different places than they were in their original individual. However, preserving the place of the exchanged information can be achieved by homologous crossover, namely the one-point crossover that chooses a common node of both parent individuals and preserves the original places of the exchanged subtrees.([39] p. 44) Uniform crossover works by going through common regions of the parent individuals and randomly chooses at each node whether this node is taken from one or the other parent. This allows a better mix in nodes closer to the root. ([39] p. 44f.) Poli et al. present more crossover strategies in [39] (p. 45f.). None of them is used in CepGP but they may be further investigated in future work. CepGP uses the most popular crossover strategey: subtree crossover as shown in figure 4.4 on the next page. As for every crossover strategy, subtree crossover needs two parent individuals which are chosen by a selection algorithm. Within each parent, a random node is selected to be the crossover point. The offspring of the crossover operation is constructed by replacing the subtree under the selected node of the first parent with the subtree under the selected node of the second parent. To enable both parents to produce more than one offspring in their original form, crossover operations are performed on copies of the parents. It is also possible to produce two offsprings out of one crossover operation by also replacing the selected subtree of the second parent with the selected subtree of the first one.([39] p. 29f.) 4.4.3 Mutation Mutation is a small random change in the genotype of the individual and is used to introduce new information into a population. Poli et al. present the most popular mutation strategies in [39] (p. 42ff.). The most common mutation strategy in Genetic Programming is the subtree mutation which replaces a random subtree of the individual with a new randomly created tree. It is easy to implement because it uses the same mechanism as subtree crossover, except that the subtree from a second individual is replaced by the new randomly created tree. However, this mutation strategy potentially has a big impact on the individual. That is why this thesis uses another approach, called node replacement mutation or point mutation illustrated in figure 4.5 on the facing page. Usually, mutation is probabilistic for every individual in the population at each generation. For each individual there is a chance it undergoes mutation. If it does, a random node of the individual is chosen to be the mutation point. Subtree mutation would replace the subtree under the chosen node with a completely new subtree. Point mutation, on the other hand, creates only a new node that is able to replace the selected node in the original while preserving the subtree under the replaced node. This is a much smaller alteration of the original individual and is closer to the original idea of mutation as a minor change (see 2.1 on page 4). Other interesting mutation strategies are 32 Norman Offel 4.4 Evolutionary Operations Figure 4.4: Subtree Crossover; The selected node in the first parent is the multiplication operator in the right subtree; The selected node in the second parent is the subtraction operator in the left subtree; during crossover the subtree under the subtraction operator (highlighted in blue) replaces the subtree under the multiplication operator (highlighted in yellow) Figure 4.5: Point Mutation; the randomly selected multiplication operator (highlighted in yellow) in the right subtree is replaced by a randomly created division operator (highlighted in blue) while preserving the original operands (3 and 4) August 28, 2016 33 4 General Approach • Hoist mutation which creates a new offspring as a random subtree of the original individual resulting in a shorter solution. This can counteract bloating of solutions (increasing number of larger solutions with ongoing evolution). ([39] p. 43) • Shrink mutation as a special subtree mutation where the replacing subtree consists of just a terminal to shorten the solution and to counteract bloat.([39] p. 43) Both of these are not used in CepGP but may be of interest in future work. It is possible and often beneficial to have different mutation strategies within a single Genetic Programming algorithm. But it is desirable to apply only one at a time.([39] p. 42) 4.5 Summary When installing and running a Complex Event Processing system, domain experts are sometimes confronted with strange or unusual meterings of the sensors that provide the CEPsystem with primitive events. Domain experts also could discover occasionally occurring happenings outside of the meterings but in the same environment that is monitored. CepGP is an algorithm that supports domain experts with their search for the cause of these happenings. All it needs is a record of the captured events of the environment that also contains the occurrences of the happenings. These occurrences are marked by a unique event type that does not need any more information than the event type name itself. In the best case, sensors already captured these happenings, otherwise the domain expert needs to manually insert the happening events at the right places to the best of her knowledge. CepGP is a type-safe and structure enforcing Genetic Programming algorithm, that uses the information from the record (or log) of the event stream including the happening events to derive the most appropriate rule that implies the happening. The condition of the rule may produce the happening but it does not say that the happening always only occurs when these conditions are met. However, the goal of this rule is to provide the domain expert with hints leading to the cause of the happening. This chapter provided the scenario of this thesis and basic information about Genetic Programming in the field of Rule Learning and about the application of evolutionary operations on trees. These valuable information will help to follow the CepGP algorithm in the next chapter. 34 Norman Offel 5 CepGP – The Genetic Programming Algorithm After exploring the principles of Complex Event Processing (CEP) and Evolutionary Computation, it is time to combine both worlds to achieve the goal of automated rule discovery for a given event. Figure 5.1: General process of CepGP The overall process of this work, called CepGP, can be seen in figure 5.1. It is based on historical recorded temporal event stream data. This file needs to contain the complex events of the type for which CepGP shall find a most appropriate rule. If these events have not been recorded automatically, it is possible to include them manually into the data. Only a unique name for the event type is necessary, no time or attributes are considered for this specific event type. The more accurate the positions of these complex events in the stream, the better is the result of the algorithm. CepGP reads this file of events and begins the preparation phase which pre-processes the data and extracts valuable and needed general information about the event types, their attributes, value ranges and so on for the actual search process which begins by building the initial population. This step takes the general information into account to build individual rules as trees with conditions, a window and an action. The result is the first generation of the evolutionary process during which only valid individuals are created and evaluated. Each individual is graded according to the blended fitness of three optimization objectives: Condition fitness quantifies the quality of the condition part of the rule. The more it fires at the wanted places the better. Window fitness rates the size of the window independently of the window type. The smaller the window size the more efficient the rule and the better the window. 35 5 CepGP – The Genetic Programming Algorithm Complexity fitness grades the structure of the rule. The simpler the rule the better. Since the only structural dynamic part of the rule is the condition part, it effectively grades the structure of the condition subtree. The blended total fitness uses weights to largely prioritize the condition fitness over the other two. The complexity fitness only has a very minor impact on the overall fitness while the window fitness is more important. These objectives are not disjunctive. The evaluation of the condition is also dependent on the events it gets through the window to decide whether to fire or not. But if the condition fitness is almost the same, the rule with the fitter window is preferred. If even after that the total fitness is basically identical then the simpler rule is fitter and more preferable. After each evolution, the new generation will be graded like this and each evolution consists of three basic steps: Selection with Elitism: An elitism rate determines the number of the fittest individuals that survive the evolution step into the next generation. The other individuals are generated via crossover during which the mating pairs are chosen via a selection algorithm. Crossover: A crossover rate determines how often individuals mate. The first individual is chosen through the selection algorithm. If it is destined to mate then another individual is selected and an offspring is generated via the crossover method and inserted into the next generation. If it is destined not to cross, no second individual is selected and the first selected individual moves over to the next generation unchanged. Mutation: After the next generation is built via the previous steps, each individual undergoes a minimal invasive change by a probability determined through a mutation rate. The evolution takes place for a given number of times and the best individual of the last generation is the result of the search process. Genetic Programming was once invented to optimize programs which are represented as a tree of operators and operands that can be altered by evolutionary operations like crossover or mutation and evaluated according to their result to determine their fitness. CepGP uses this approach by applying the operator tree optimization concept to a tree representation of a CEP rule as can be seen in figure 5.2 on the next page. This chapter first presents the representation of the components of a rule, their individual needs and requirements and which parts are to be optimized. Afterwards, it explains each step of CepGP in more depth in the order of the process. 5.1 Rule Components The composition of a rule in Complex Event Processing (CEP) was already discussed in section 2.2 on page 11. Each component can be seen as a node within a tree where the rule 36 Norman Offel 5.1 Rule Components is the root. Direct children of the rule root node would be the condition, the window and the action. The general tree representation of a CEP rule is shown in figure 5.2. Figure 5.2: Rule components; components which are part of the optimization are highlighted in red; affected components are colored in yellow; static parts are gray The most important part of a rule is its condition on which it will execute the action. The window defines how many and which recent events will be considered in the evaluation of the condition. In the following sections, each component is discussed in the context of the Genetic Programming approach. It describes the encoding of the components to a part of the individual which can be processed by the algorithm. The order in which the components are discussed represents the order of execution during the evaluation of a rule to enable a better insight into the reasons for the design decisions. 5.1.1 Window The first step in rule evaluation is to determine the events which will be given to the condition which in turn decides whether it fires or not. In section 2.2 on page 11, this work already presented ways for CEP to decide which and how many events are processed during rule evaluation. In this work a window is a mandatory part of the rule. Windows consist of two features: type and value. The type of a window specifies how it will determine which events are within its boundaries and the boundaries are given by the most recent event and the value. It can either be given by the count of events from the most recent one backwards up to the value, e.g., a window of type length with the value specifying the number of the most recent events to take into account for rule evaluation, or it can be given by a time span from the most recent event backwards, e.g. a window of type time with the value specifying the maximum of time unit steps between the most recent event and the older ones. Figure 5.3 summarizes these differences. Figure 5.3: Windows have a type and a value. The interpretation of the value depends on its type. August 28, 2016 37 5 CepGP – The Genetic Programming Algorithm As presented in figure 5.2 on the preceding page, the window is one of the nodes in the rule tree that is part of the optimization process. As such, it is necessary to examine ways for the Genetic Programming Algorithm to combine windows in crossover or mutation. Windows are different from nodes of the condition and cannot be meaningfully combined with those nodes. Thus, the algorithm has to ensure that only windows can be combined with other windows. This is called type-safety in Genetic Programming as already explained in chapter 4 on page 25. Enabling reasonable evolutionary operations that alter the individual in the window in all cases except if the other window is the same, means that windows of different types should be combinable. More on how windows are processed during evolution can be found in section 5.4 on page 50. 5.1.2 Condition Besides being the most important part of a rule, the condition is also the most complex one. In its simplest form, it checks whether the most recent event equals a certain event type and fires in that case. However, in most scenarios the rule will need to incorporate several event types and combinations of them. This can be exploited as an operator subtree under the rule with the root being the condition. The combinations of event types are the operators and the event types are the leaves. The formed event condition tree (ECT) will determine whether the attributes of the event instances which contributed to the successful evaluation will be further examined to decide if the rule as a whole fires or not. This leads to another very similar operator tree. But instead of event types it combines attributes and therefore constitutes the attribute condition tree (ACT). Although, it is often useful to inspect the attribute values of the events within the evaluation, in contrast to the ECT, this subtree of the condition part of a rule is optional and may be omitted. The total picture of the rule after extending the condition component with the subtrees for the event conditions and attribute conditions can be seen in the figure 5.4. Both, the ECT and the ACT are now discussed in more detail in the following sections. Figure 5.4: Rule Components; components which are part of the optimization are highlighted in red; affected components are colored in yellow; static parts are gray 38 Norman Offel 5.1 Rule Components Event Conditions Event conditions represent the actual rule. After the window determines the events under test, the rule will hand over the events to the Event Condition Tree (ECT) to decide whether to fire or not. The ECT is the representation of the event conditions as a subtree within the tree representation of a rule. An example is displayed in figure 5.5 where the root of the ECT is the logical and(∧) operator which has two operands as children in the tree, represented as an event of type A and an event of type B. Figure 5.5: Example of a rule with an Event Condition Tree (ECT) with the logical andoperator (∧) as its root and the operands being the events with types A and B This figure shows the overall composition of the ECT. There are two kinds of nodes: operators and event types. The root of the ECT can be either of those. Event types are always leaf nodes and vice versa whereas operators are intermediate nodes which can be a root but never leaf nodes. While event types as leaf nodes never have children, operators need to allow to have other operators or event types as children to build more complex rules as visualized in figure 5.6 on the next page. The figure shows an example where the root is the logical and-operator and its children are the event type A and a sequence-operator(→). Furthermore, the sequence-operator also has two children: the event types B and C. This flexibility of the nodes in the ECT allows complex and simple event conditions to be easily representable as an ECT. The nodes and its children are very loosely coupled. Operators accept any possible ECT node, e.g. event type or operator, as a child. This is a necessary property of the ECT to enable easy and consistent crossover and mutation, as this work will further discuss in section 5.4 on page 50. Attribute Conditions First, the rule determines via the window which events participate in the evaluation and then hands them over to the event conditions, represented as the ECT. If the ECT was successfully August 28, 2016 39 5 CepGP – The Genetic Programming Algorithm Figure 5.6: Example of a more complex event condition represented as an ECT; the displayed rule in in-order output: (A ∧ (B → C)) evaluated and if the rule also imposes attribute conditions on the events to decide whether to fire or not, then the rule will afterwards pass the actual events that contributed to the successful evaluation of the ECT over to the attribute conditions to analyze their attribute instances and to decide whether they meet the specification of the attribute conditions. Thus, in a string representation of the rule both, the event conditions and attribute conditions, are connected via a logical and-operator (∧). The attribute conditions are very similar in structure to the event conditions and can therefore also be represented as a subtree of the rule, called the Attribute Condition Tree (ACT). Since the attribute conditions examine the state of event instances which lead to the positive outcome of the ECT evaluation, there needs to be some kind of referencing between the event types in the ECT and the attributes in the ACT. This can be done with a naming scheme that uses the event type and an enumeration to build unambiguous references such as A0 for the first appearance of an A and A1 for the second in figure 5.7. These unique identifiers can then be used within the ACT to access the correct attributes by dereferencing the identifiers to the event instance and read the value of the named attribute. However, this building of references only needs to take place when there is an ACT in the rule. Figure 5.7: Example of a rule with an ECT and an ACT (highlighted); the displayed rule in in-order output: (A as A0 → A as A1) ∧ (A0.attribute = A1.attribute) 40 Norman Offel 5.1 Rule Components Although the operators to compare attributes distinguish from operators in the ECT, the ACT still resembles the ECT in many ways. The root of the ACT can either be a logical operator or an attribute comparison operator. Operators cannot be leaves, too, and attributes are always leaf nodes. Another common and important property of both trees is the loose coupling of logical operators nodes and their children. In the ACT, as well as in the ECT, the logical operators are not concerned about their children all being logical operators, comparison operators or a combination of both. Nevertheless, it is not meaningful to mix event conditions with attribute conditions in either tree. Therefore, the algorithm needs to take care of using only event types or event condition operators in the ECT and only attributes or attribute condition operators in the ACT. Another difference between ECT and ACT exists in the leaf nodes. Within the ECT, only event types are allowed to be leaves, whereas in the ACT there are more than one kind of attribute. In figure 5.7 on the facing page the ACT uses event attributes, which are the values of the named attributes of the referenced event instances. However, attributes can also be compared to constant values which do not need to be referenced. From a validity point of view, there is no harm in comparing two constants. But from the perspective of meaningful conditions, this should be avoided since it does not add information to the rule and in worst case even results in the whole rule not firing at all. The latter case might happen when for example two unequal constant values are compared using the equals-operator. Hence, the algorithm might only allow constant values as the second operand of any attribute condition operator. The first operand shall always be an event attribute. As with ECTs, ACTs can also become more complex by adding logical operators like ∧, ∨ and ¬ to the numerical comparison operators. Figure 5.8 illustrates an example with such a more complex ACT and constants as the second operands of the comparison operators. Figure 5.8: Example of a rule with a more complex ACT containing logical and numerical comparison operators; the attribute attribute1 of the event instance referenced by the alias A0 has to be less than 5 or greater than 10 Although this extension adds more possibilities to represent a broader range of rules, it also adds constraints to the ACT that the algorithm needs to consider during evolution. Now, there are three kinds of nodes within an ACT, all with different requirements concerning August 28, 2016 41 5 CepGP – The Genetic Programming Algorithm their combinations. What remains is attributes, no matter if constant or event attributes, are always the only valid leaf nodes. Numerical comparison operators can be root or intermediate nodes, but never leaf nodes, and they only are allowed to have attributes as children to be meaningful. The result of a comparison operator is boolean (true or false), hence, the parent of such a comparison operator cannot be another comparison operator but only one of the logical operators (∧, ∨ and ¬). Logical operators, on the other hand, can only have comparison operators or further logical operators as children within the ACT but never attributes. Table 5.1 summarizes the constraints the ACT nodes have to abide to. Node Attributes (constants or event attributes) Comparison operators (<, >, ≤, ≥, =, 6=) Logical operators (∧, ∨ and ¬) Can be ACT-root × D D Children None Attributes Comparison or logical operators Table 5.1: Summary of constraints to ACT nodes In general, attribute conditions should be able to assert the truthfulness of different types of attributes. This would allow for an even broader range of rules such an algorithm could cover. On the other side, this would impose even more constraints and different operators the algorithm would need to take care of. Thus, to inspect a broad range of attribute comparisons and arguably reflect the majority of attribute comparisons in Complex Event Processing, and at the same time keep the constraints for the algorithm within reasonable boundaries, this work limits the attributes to be numerical. 5.1.3 Action The action of a rule is the effect when the evaluation of the event conditions and the optional attribute conditions result in a positive outcome. Regarding the concept of the optimization algorithm to find a rule that might explain the circumstances of an event to appear, the action can be some placeholder to complete the necessary components of a rule. The action is not part of the optimization process and as such does not need to fulfil any requirements except to be there. However, this does not mean that the action could not be used for some extensions to the algorithm, e.g. post-processing tasks. The action concludes the processing pipeline as shown in figure 5.9 and this work continues to look into ideas for evolutionary operations on the chosen representation of the rule, in particular the window, the ECT and the ACT. Figure 5.9: Processing pipeline of a rule in CepGP 42 Norman Offel 5.2 Preparation Phase 5.2 Preparation Phase This work presents a Genetic Programming algorithm as a general concept for Complex Event Processing rules that does not need a lot of configuration to work. All it needs is recorded temporal event data in the order they appeared including the complex events for which the algorithm is supposed to find a rule. The event data has to have a timestamp, a type and optionally attributes. As already mentioned, this work relies on numerical attribute values, although other types may be considered in future works. Each event of the same type has to have the same number of attributes with the same names. The mentioned complex events only need the same special name, every other information is not necessary for those events. If the complex events were not recorded then they can be inserted into the historical data. Since this may lead to slightly inaccurate data, the result of the algorithm needs to be interpreted more carefully. At the start, the algorithm needs the file of the event data stream and the name of the complex event for which the algorithm is supposed to find a rule. While parsing the file, it should keep track of the set of event types, their attributes with their names and range of observed minimum and maximum values. It also should remember the minimum time interval between two consecutive events as well as the time interval between the first and last event in the stream and the total amount of events. This is all valuable and needed domain knowledge which will be exploited during the optimization process which can be collected during the parsing of the data to minimize computational overhead. Figure 5.10 summarizes the steps of the preparation phase. Figure 5.10: Given a file with historical temporal data which contains the complex events for which the algorithm will try to find a rule, extract domain knowledge during the data parsing which will be exploited via the Genetic Programming algorithm The collected information about the time intervals will prove to be helpful for creating windows for the rules and for grading the window of a rule. The same goes for the number of events which will also be necessary to grade the condition of a rule. The event types with their attributes are much needed information to build rules at all and the value range of the attributes can be used to choose good constant values within the attribute conditions of a rule. August 28, 2016 43 5 CepGP – The Genetic Programming Algorithm 5.3 Initial Population Creation Although Genetic Programming is all about the evolution of trees by combining individual trees to new and hopefully better solutions, it has to start somewhere. The starting position is already very important and can prove to be more or less beneficial for the evolutions to come. The quality of the end result drastically depends on the first individuals that largely influence the coming generations of the evolutionary algorithm. The goal of the initial population creation is to build only valid individuals which can be evaluated and graded and to produce individuals from very different regions of the problem space. It will profit from the preparation phase and use the information gathered during that time. A rule consists at least of the window, the event condition tree and the action. Some rules should also consider the attributes within an attribute condition tree. Since the action is just a placeholder and does not contribute to the evaluation of a rule, this makes it two or three components to be created for each rule. There are two major strategies and a combination of both which is widely used in Genetic Programming. The Ramped half-and-half initialization, as the name suggests, uses the full initialization to create one half and the grow initialization to create the other half of the individuals of the first generation randomly. The full initialization creates fully filled trees which always have the predefined maximum depth. The grow initialization brings forth partially filled trees with a depth less than or equal to the predefined maximum. Both methods together result in a population with very different individuals what will lead to more promising results during the evolution process. All of these strategies cannot be applied to the rule as a whole because of the constraints of the individual components. Thus, different initialization methods are used to generate each component individually and randomly while adhering to the specific requirements. 5.3.1 Window Creation A window consists of two properties: type and value. There are two types of windows in Complex Event Processing: length and time windows. The length window considers a certain amount of previously encountered events, where the value represents the number of events. The time window considers the events within the time interval of the current event back to the amount of a given time unit represented by the value (see figure 5.3 on page 37). To build a random window, the algorithm first randomly selects a type while length and time are equally probabilistic. Depending on the type, it then randomly chooses a value between the minimum and maximum of the type. These boundaries come from the preparation phase. For the length windows the minimum value is 1 and maximum value is the total amount of events within the source file. The minimum time value is the encountered minimal time interval between two consecutive events in the source file and the maximum value is the 44 Norman Offel 5.3 Initial Population Creation time interval between the first and the last event. The time unit is either equal for all time windows or differs randomly while also keeping the time boundaries. This concludes the creation of a window which is random but still follows the constraints to form a valid and reasonable window in the given scenario by using the information available from the historical event data. Figure 5.11 shows the creation process as a program flowchart. Figure 5.11: Program flowchart of the window creation process; the boundaries are given by the minimum and maximum values depending of the type available from the preparation phase; rand() returns a random value within [0, 1) 5.3.2 Event Condition Tree Creation While the window is not a tree, the earlier mentioned initialization methods full, grow and Ramped half-and-half can be applied to the event condition tree (ECT). In this work, the ECT is created via the Ramped half-and-half method. The ECTs of the first half of the initial population are produced by using the full method where the resulting ECT is a full tree with the predefined depth. When it has reached the maximum depth the algorithm only allows a random event type as a terminal. The set of event types is known from the preparation phase. If the depth of the current node has not reached the maximum depth then the algorithm chooses randomly in equal distribution from the function set which consists of the event condition operators of Complex Event Processing (sequence (→), and (∧), or (∨), not (¬) and the excluding sequence (a → ¬b → c)) and continues to produce the operands as subtrees until the maximum depth is reached. Before the maximum depth, only operators are allowed. At maximum depth, only event types are allowed. The full method for ECTs is illustrated in figure 5.12 on the next page The other half of ECTs of the population is created using the grow method which results in partially filled trees. It allows event types as terminals before the maximum depth is reached. Originally, the grow method draws a specific primitive equally from the primitive set (functions and terminals, or in this case operators and event types). However, to be less August 28, 2016 45 5 CepGP – The Genetic Programming Algorithm Figure 5.12: Program flowchart of the full initialization method for the ECT sensitive to the size of the terminal set – since the function set is fixed –, this work uses a slightly different approach. At the root, the chance to choose a random event type is as probabilistic as to choose one of the operators. But the chance to select an event type grows linearly with the depth and is 1 when the maximum depth is reached to ensure that the tree will not exceed it. This grow method is shown in figure 5.13 on the facing page 5.3.3 Attribute Condition Tree Creation The attribute condition tree (ACT) is an optional part of the rule and may be omitted during the creation process. CepGP uses an ACT-rate to determine how many individuals are being equipped with an ACT in this phase. Whether ACTs prevail from generation to generation is decided by the fitness of rules with and without an ACT and is therefore guided by the optimization process itself. There are no guarantees of the portion of individuals with an ACT within the generations from the initial population onwards. Depending on the given data and the problem at hand, this rate can be adjusted to facilitate a better outcome of the optimization process. To find a faint lead to the best rule when there are a lot of event types, a total absence of ACTs in the optimization can be favourable whereas in scenarios with few event types but with a lot of attributes, missing out ACTs may not lead to good results. Since the ACT is also a tree, basically the same initialization methods as with the ECT can be applied. However, CepGP only uses the grow method for ACTs because it generally is uncertain whether ACTs are helpful in smaller or bigger sizes, if at all. If bigger and more 46 Norman Offel 5.3 Initial Population Creation Figure 5.13: Program flowchart of the grow initialization method for the ECT complex ACTs prove to be beneficial then the fitness of the individuals with an ACT is better. Thus, from generation to generation more and more individuals have ACTs and the probability of individuals interchanging parts of their ACTs during crossover rises to create bigger ACTs. The growing of trees in Genetic Programming is a known problem called bloat which in this case can even be exploited to lead to better results or at least steered via the ACT-rate for the initial population. The function set of an ACT consists of the logical operators. The terminal set contains the numerical comparison operators in CepGP, although other types than numbers may be compared in future works. ACTs may also have a different maximum depth from the ECT depending on the problem at hand. The creation of an ACT is shown in figure 5.14 on the next page. Although a comparison operator is a terminal within the ACT, it still is an operator requesting operands to be complete. Comparison operators are always binary and the operands are either constant or event attributes which are tightly coupled to the event types of the ECT of the same rule. To compare the values of events, the ACT needs to refer to attributes of events that correspond to the event types within the ECT. That is why, apart from constant attributes, only event attributes of the event types of the ECT of the same rule are allowed inside the ACT. To enable valid and useful ACTs, CepGP differs between the first and the second operand of the comparison operators. The first operand is always an event attribute. August 28, 2016 47 5 CepGP – The Genetic Programming Algorithm Figure 5.14: Program flowchart of the initialization of an ACT using the grow method 48 Norman Offel 5.3 Initial Population Creation The second operand can either be an event attribute or a constant within the observed interval during the preparation phase of the first operand’s event attribute. This premise for comparison operators improves the initial rules by preventing comparisons of two constant values which are not valuable for the rule and only would allow a lot of never firing rules. Another advantage of this approach is that the constant value can be chosen in a way that fits into the value range of the event attribute of the first operand that it is compared against. The whole initialization of a comparison operator is shown in figure 5.15. Figure 5.15: Initialization of a comparison operator of an ACT; the first operand is always an event attribute; the second is randomly either a constant or another event attribute Generally, it is possible to choose a constant value randomly between the minimum and maximum value of the first operand. Since this approach does not limit the constant values, theoretically, it covers all possible ACTs. But depending on the amount of attributes, their value’s ranges and actual value distribution, it might take a lot of generations and individuals to bring forth good ACTs. To reduce the number of possible constant values but still cover a reasonable amount of possible ACTs, CepGP uses equidistant steps within the given value range. Including the minimum and maximum value of the event attribute, it randomly can choose three more values: min + i/4 · (max − min) where i is a random integer within [0, 4] which results in the minimum value for i = 0, maximum value for i = 4 and three additional possible constant values equidistantly in between. These chosen values can reasonably be used in context of the comparison operators and enhance comprehensibility of the ACT from a users point of view. August 28, 2016 49 5 CepGP – The Genetic Programming Algorithm 5.4 Evolutionary Operators After describing the chosen representation of a rule and its components as a tree and showing the conditions that have to be met to eventually evaluate the rule and determine its quality, this section concerns itself about the evolutionary operations already presented in section 2.1 on page 4 and their application on the rule as a tree. It follows the processing order of the evolutionary operations during evolution in CepGP and therefore starts to discuss the selection method, continues to look into the crossover method and finishes with the mutation method which all operate on a generation and hence need the population initialization presented in section 5.3 on page 44 to take place before the evolution can begin. The speciality of the evolutionary operators in this work lies within the constraints they have to adhere to. Each component has different needs to be taken care of, hence, the operators are explained for each component individually but still work on the rule as a whole. 5.4.1 Selection Selection is used to determine the individuals which participate in producing the next generation. The Tournament Selection is widely used, simple and allows configuration to be adapted to different problems. The amount of individuals to be compared to decide the winner of selection process is determined by the tournament size. For a size of two it results in the following operations: 1. Select a random individual of the current population 2. Select another random individual of the current population 3. The fitter of the individuals is the winner of the tournament and propagates its genes on to the next generation However, since CepGP usually uses a large population for each generation, it is better to bring over the absolute best individuals for sure and not leave it up to pure chance. The mechanism to achieve this is called Elitism. It determines the best n individuals of the current generation and allows them to survive into the next generation. The best n individuals in CepGP is determined by an elitism rate of the population size. This rate should be very small and applied to the population, it rescues the absolute bests while allowing the crossover to produce enough other new individuals and keep up diversity. Other individuals than the elite are produced in the crossover process which uses the selection mechanism to choose the parents of the offsprings as individuals of the next generation. There are a lot of selection algorithms available which can prove to be a better choice for determining the individuals that contribute to the next generation. It remains a task for future works to analyze the benefits of them. 50 Norman Offel 5.4 Evolutionary Operators 5.4.2 Crossover The crossover operation combines individuals to produce the next generation out of the information of the current one. As with selection, there are also a lot of algorithms for crossover of which some are presented in chapter 4 on page 25. However, this evolutionary operation needs to consider the representation of the individuals and therefore has to work well with trees to be a viable option for CepGP. For this reason, CepGP adapts the TreeCrossover and introduces mechanisms to handle the constraints of each component of a rule by ensuring type-safety. During any crossover, only one component is effected and the others are, if no need arises, left untouched. The general crossover process can be seen in figure 5.16 on the following page. As long as there are less new individuals than the population size of a generation, CepGP produces new ones by crossover. The next new individual is produced by first selecting an individual with the selection method aforementioned. A crossover rate determines whether this individual mates with another individual or if it survives as is into the next generation. If it is destined to mate, first, the other mating partner individual is chosen with the selection method as well and after that, both individuals give parts of their information material on to a newly formed offspring. This offspring is then part of the new generation. When the number of individuals in the new generation is equal to or greater than the population size, the next generation is build by: • Accepting the absolute best from the previous generation. The number of individuals is computed with the elitism rate discussed in the selection section 5.4.1 on the preceding page. • The remaining individuals to fill up the population of the next generation are taken in order of their creation from the new individuals just generated Another point to consider here is the crossover rate that determines how often individuals are crossed with other individuals. If the selected first individual is not to be crossed then it survives the generation as is and is part of the next generation. This means, that even without the elitism in CepGP very fit individuals, which are likely to be selected as first individuals more often, are already likely to survive the current crossover round when the crossover rate is comparably low to usual crossover rates in other evolutionary algorithms. However, to enable a stronger convergence a high crossover rate is necessary which in turn reduces the probability of individuals to survive through the crossover process alone. This can be dealt with by elitism as is done in CepGP which also allows the crossover rate to be even higher than normal because the fittest individuals of each generation survive anyways. As mentioned earlier, the speciality of this algorithm arises from the needs of each rule component which are to be considered in the mating process. When two individuals are selected to mate, the crossover algorithm of CepGP selects a random crossover point of the first chosen individual. The available crossover points are indexed in figure 5.17 on page 53. August 28, 2016 51 5 CepGP – The Genetic Programming Algorithm Figure 5.16: General Crossover of CepGP with Elitism 52 Norman Offel 5.4 Evolutionary Operators Figure 5.17: CepGP crossover point indexing of an individual with an ECT and an ACT; the red numbers are the indices of the nodes that can serve as a crossover point; the yellow numbers are the indices of nodes in their component By randomly choosing an index, not only the node at which to be crossed (crossover point) is determined, but so is the component, too. This allows to specifically draw a random node of the same component of the second individual. In this way, the type-safety is ensured and only nodes which are compatible for crossover are selected and in this manner only produce valid offsprings. When in this example the first random index is 3, this means that the crossover point is an event type and part of the ECT. The crossover point of the second individual can only be a valid node of an ECT (either an event type or an event operator), now. The algorithm can deduct the component from the index by building the index like this: 1. Compute the number of nodes in the ECT and ACT 2. Select a random integer between 0 (including) and the total number of nodes (excluding) which is calculated as 1 for the window node + number of ECT nodes + number of ACT nodes with 0 if there is no ACT. 3. Calculate the component and the index within that component according to figure 5.18 on the next page 4. Cross the component of both individuals by choosing a random node of the same component of the second individual After selecting the crossover point of the first individual, the crossover point of the second individual is drawn accordingly from the component of the first crossover point. To enable the parents to produce more than one offspring, the algorithm copies them before crossover. Only the component of the index is changed. Every other component of the rule is taken from the first individual to preserve their validity. August 28, 2016 53 5 CepGP – The Genetic Programming Algorithm Figure 5.18: Deducting the component from the crossover point; the crossover point of the first individual is either the window, the ectIndex or the actIndex Crossing a Window If the component is the window, the algorithm needs to ensure the validity of the crossover result. This implies the window value range of each window type and, if wanted, the window attributes of each type. In CepGP, there are two cases to consider: Equal types of the windows to be crossed. In this case the algorithm keeps the common type for the offspring and takes the value of the second individual. If the window has additional information like the time unit, it is taken from the second individual to ensure that the value is still within boundaries of the type. Unequal types of the windows to be crossed. Now, the type of the window of the offspring is the one from the first individual. The value shall be acquired from the second individual. However, different types have different boundaries and ranges. So, even if the value is within the boundaries, it can have a different meaning when the ranges differ. To enable this crossover and to not loose information, a conversion from one range into the other is needed. CepGP takes the position of the value within the range of the original type and translates it linearly to the rounded equivalent position of the target type. Even though, the position in both ranges is equal, it still might hold different results since the meaning of the type is different. Depending on the temporal distribution of the events in the event stream, a time window with a value equivalent to the amount of time a certain amount of events are encountered, yields a different result than a length window with the certain amount of events as value. Assuming an event stream where events arrive irregularly in time and the average time for five events to arrive is 3min. Applying a length window of value five is bound to yield other results then a time window of value three and time unit minutes. That is not to say that the 3min time interval is equivalent in its position in the range of possible time intervals to the length five in the range of possible length windows. 54 Norman Offel 5.4 Evolutionary Operators Both cases are equally probable and the resulting offspring of this kind of crossover has the ECT and ACT of the first selected individual and the newly created window. Crossing an ECT The ECT consists of event types as leafs and event operators. Since the ECT is a tree, the basic algorithm CepGP follows is the subtree crossover algorithm. In general, it chooses a random node of the ECT of the first individual and replaces it and its underlying nodes with a random subtree of the second individual. Figure 5.19 on the following page illustrates the process. If two ECTs are to be crossed, first, the crossover points within the ECTs of both individuals have to be chosen. The aforementioned calculation of the crossover point within the whole rule allows to trace the crossover point of the ECT of the first individual, meaning the index of the node within the ECT and in this example the event with type B. The algorithm now selects a random node from the ECT of the second individual to be the crossover point of this individual. In this example, the index is 0 which represents the root of the ECT, the →-operator. The algorithm now combines both ECTs by replacing the subtree under the crossover point (B) of the first individual with the subtree under the crossover point of the second individual (the tree with the →-operator as its root), both highlighted in violet boxes. The algorithm uses copies of the first and second individuals selected to not modify the originals and allowing them to be part of multiple crossovers unaltered. This version of subtree crossover discards the rest and proceeds with the general crossover algorithm as explained in figure 5.16 on page 52. However, it is generally possible to produce a second offspring the other way around: Replacing the subtree under the crossover point of the second individual with the subtree under the crossover point of the first individual. This way, only half the iterations for crossovers are necessary since this approach always generates two new individuals. But the chance for a higher diversity within the population is lower than by producing only one individual per crossover operation. If a tree is fit enough to be selected more often than eventually it might happen that the roles between first and second individual are switched in another crossover to produce the other yet discarded offspring. The resulting ECT is always valid and can be evaluated, even if an operator is replaced by an event type. But the rule validity depends on more than just the ECT. ACTs use references to the event types present in the ECT. Therefore, if there are changes in the ECT, it is generally necessary to check if the belonging ACT shows inconsistencies by referencing any event types that are not used in the ECT anymore. How CepGP handles this situation is explained in section 5.4.2 on page 60. Crossing an ACT The ACT consists of comparison operators as leafs and logical operators as intermediate nodes. The attributes as operands of the comparison operators are no additional nodes August 28, 2016 55 5 CepGP – The Genetic Programming Algorithm Figure 5.19: Subtree crossover of ECTs; the violet boxes represent the selected subtrees; the numbers are the indices of the nodes within their ECT; the bright yellow number is the crossover point of the individuals (B in the first individual with index 2 and the sequence operator (→) in the second individual with index 0); the subtree under the sequence-operator replaces the subtree at the crossover point of the first individual; the offspring consists of the ∧-operator and the A of the first individual, the second operand of the ∧-operator is the inserted subtree of the second individual; the nodes taken of the first individual are highlighted in orange and the nodes of the second individual are highlighted in red 56 Norman Offel 5.4 Evolutionary Operators within the crossover but part of the comparison operator as one node. The attributes themselves can thus not be chosen as a crossover point and do not have their own index for that matter. As a tree, the ACT uses a very similar subtree crossover algorithm compared to the ECT. A random node of the ACT is chosen to be the crossover point of the first individual and replaced by a random subtree of the ACT of the second individual. However, in contrast to the ECT, the ACT is optional and may not exist for either one. By using the algorithm presented in figure 5.18 on page 54 to deduct the crossover point, the first individual has to have an ACT to cross it. Hence, only the second individual may not have an ACT. In this case, CepGP uses the subtree under the crossover point of the first individual as the new ACT of the offspring. Figure 5.20 on the following page illustrates the case with both individuals having an ACT and figure 5.21 on page 59 shows an example with a second individual without an ACT. In the first figure 5.20 on the following page both examples have an ACT which are valid because it references existing aliases (B0 or A0, for example, reference the event types A and B respectively). Refer to section 5.1.2 on page 39 for a detailed explanation of the structure and references within the ACT. When the algorithm for choosing a crossover point of the first individual selects a node of the ACT, it also calculates the index of the node within the ACT which are displayed in the boxes in the top right corner of the ACT nodes. In this example, the chosen node is the one with index 2 which is the less-than-operator highlighted via a violet box. The attributes as operands belong to the operator and make up one node for the ACT which is why the attributes are also inside that box. The ACT of the second individual consists of a negation-operator and a greater-than-operator with index 1 which is chosen to be the crossover point. When both individuals are crossed at their respective ACT crossover points, the first individual passes down its complete ECT to the offspring and the parts of the ACT that are not part of the selected subtree. In this example, these are the or-operator with index 0 and the equals-operator with index 1. The subtree under the crossover point of the second individual then replaces the subtree under the crossover point of the first individual. The resulting ACT of the offspring now consists of the passed down parts from the first individual highlighted in orange and the parts from the second individual highlighted in red. This ACT is still valid because it references events present in the ECT. The second figure 5.21 on page 59 presents an example of a second individual without an ACT. The selected crossover point of the first individual is the same as in the previous example. However, there is no crossover point of the second individual because it does not have the optional ACT component in its rule. CepGP handles this case by choosing the subtree under the crossover point of the first individual as the ACT of the offspring. The example shows the crossover point as a bright yellow number of the less-than-operator and the subtree via a violet box surrounding this operator and its attributes. The offspring now gets this subtree as its ACT. As with the previous example, the first individual also passes down its ECT to the offspring. This way ensures a valid ACT where all used aliases reference an existing event type within the ECT Since the whole ECT is used, there can be no inconsistencies in the subtree of the original ACT. August 28, 2016 57 5 CepGP – The Genetic Programming Algorithm Figure 5.20: Subtree crossover of ACTs, both individuals having an ACT; the indices of the nodes of the ACTs are presented in boxes in the top right corner of the nodes; crossover points have a bright yellow background; the subtrees for crossover are highlighted via a violet box in both individuals; the nodes of the first individual are colored orange and the nodes of the second individual are colored red 58 Norman Offel 5.4 Evolutionary Operators Figure 5.21: Subtree crossover of the ACT, without ACT in second individual; the indices of the nodes of the ACTs are presented in boxes in the top right corner of the nodes; the crossover point has a bright yellow background color at its index; the corresponding subtree is framed with a violet box; the ACT of the offspring equals the selected subtree of the first individual August 28, 2016 59 5 CepGP – The Genetic Programming Algorithm Repairing the ACT The examples of the previous section explaining the crossover of ACTs where chosen to only display valid resulting ACTs to show the principle of crossover in CepGP. Although these examples work out, there are offsprings with inconsistent ACTs which comprise references to aliases of event types that were once there but are no more. Figure 5.22 shows an example when crossing ACTs and an example when crossing ECTs can be seen in figure 5.23 on the facing page. Figure 5.22: Example of a broken ACT after crossover of ACTs; The offspring references a non-existing alias (C0) after the crossover The ACTs of the first and second individuals are valid because they only use aliases to events which are present in their respective ECTs. In figure 5.22 the crossover points are both ACT roots, the less-than-operator of the first individual and the equals-operator of the second individual. According to the description of the ACT crossover algorithm earlier presented, the equals operator replaces the less-than-operator which results in the depicted offspring. Although the original ACTs are valid in respect to their belonging ECT, the result ACT of the offspring is inconsistent with the ECT in this example. There is no alias with name C0 within the ECT of the offspring and thus this event attribute can never be resolved. 60 Norman Offel 5.4 Evolutionary Operators Figure 5.23: Example of a broken ACT after crossover of ECTs; if the ECT is to be crossed, the first individual passes down every other component to the offspring including the ACT; after replacing the event type A with alias A0 the ACT shows inconsistencies in the aliases A0 now referring to a non-existent event type in the ECT August 28, 2016 61 5 CepGP – The Genetic Programming Algorithm Figure 5.23 on the preceding page presents an example of an inconsistent ACT after crossover of the ECTs. Again, both individuals, the first and the second, comprise valid ACTs within themselves. However, after the event type A with the alias A0 of the first individual is replaced with the event type C with the alias C0 of the second individual, the event attributes using the alias A0 in the ACT of the offspring are now invalid because this alias cannot be resolved to an actual event. In both cases, the whole rule is invalid. The question now is, how to deal with possible inconsistencies: • Leave the ACT inconsistent and deal with the rule by means of the fitness function. This implies some kind of punishment function or factor and in general a specific handling of faulty individuals. • Discard individuals with inconsistent ACTs, meaning that they will not be added to the new population. • Generate a completely new ACT each time there is an inconsistent one. • Repair the ACT by leaving valid parts as they are and generate new attributes where the existing ones are faulty. CepGP seeks to produce valid individuals only which are easier to handle and allow to generate more meaningful results. The strong coupling between ECT and ACT demands a test of the ACT for validity after each modification of the ECT. If the ACT shows such flaws CepGP repairs the ACT with as few changes to the original as necessary. This allows more individuals to be processed and is more reliable concerning processing time than discarding faulty rules entirely or generating new ACTs every time the original ACT is broken. The ECT, on the other side, is not dependent on the ACT and does not need to be repaired. Coming back to the example in figure 5.22 on page 60, the inconsistency lies within the alias C0 in the ACT which needs to be repaired. CepGP detects these inconsistencies by calculating the disjunctive set of the used aliases within the ACT and the available aliases present in the ECT. If there are aliases in the first but not in the second set then there is an inconsistency. However, it is entirely possible and valid to have aliases in the ECT which are not used in the ACT. It is actually common to not use all of the available aliases. The only alias used in the ACT of the offspring is C0 and the available aliases from the ECT are A0 and B0. Since C0 ∈ / {A0, B0}, there is an inconsistency. CepGP now searches every attribute comparison operator which uses a broken alias. This general repairing algorithm is outlined in figure 5.24 on the facing page. Repairing a comparison operator in CepGP follows the algorithm depicted in figure 5.25 on page 64. If the first operand uses one of the aliases previously identified as invalid according to the algorithm from figure 5.24 on the facing page, then a new random and valid attribute is generated. CepGP generates this operand the same way it generated the first operand of an operator when an ACT is build (see section 5.3.3 on page 46). Now, CepGP checks whether the second attribute is a constant or another event attribute which uses one of the broken aliases. If it is neither then it is a valid attribute and can be left as is. In the first 62 Norman Offel 5.4 Evolutionary Operators Figure 5.24: General algorithm to repair an ACT case, CepGP additionally checks whether the value of the constant is still within the value range of the first attribute which might have changed. If it is not, then a new constant value is calculated in the same way a constant value is selected in the ACT creation process (see section 5.3.3 on page 46). The constant value is updated to ensure a meaningful rule which might be able to fire. Constant values outside of the range of the first attribute may lead to conditions in the ACT that prevents the whole rule from firing. If the second attribute is an event attribute which uses a broken alias, independently of whether the first attribute also does, CepGP creates another random second operand as explained in the ACT creation section ( 5.3.3 on page 46). This repairing algorithm presents a minimal intrusive way that changes as little as necessary to enable all crossover results, whether they are already valid or not, to participate in the optimization process without turning them into other individuals entirely. 5.4.3 Mutation After the next generation is constructed via the crossover process depicted in figure 5.16 on page 52, CepGP randomly selects individuals out of the next generation which undergo a minor alteration process, called mutation. The general mutation process is illustrated in figure 5.26 on the next page. August 28, 2016 63 5 CepGP – The Genetic Programming Algorithm Figure 5.25: Algorithm to repair the broken aliases within a comparison operator of an ACT Figure 5.26: General mutation algorithm 64 Norman Offel 5.4 Evolutionary Operators The algorithm visits each individual of the next generation and decides at random by a given mutation rate whether the currently visited individual gets mutated. There are several mutation algorithms which are suitable to tree representations which are presented in chapter 4 on page 25. CepGP follows the original idea of mutations as very minor changes to an individual and thus uses the point-tree-mutation. In this kind of mutation, the algorithm chooses one random node of a tree and changes this node only, without altering the subtree or parent nodes. Since CepGP always seeks to ensure type-safe and valid trees, this mutation algorithm can lead to inconsistencies and thus may need repairing steps which are not displayed in the figure of the general mutation algorithm. CepGP chooses the node to mutate in the same way it chooses crossover points (see 5.4.2 on page 51). In general, mutation is the only mechanism in CepGP to introduce entirely new or lost information into the next generation. This helps the process of exploration of the problem space, in this case the exploration of all possible rules with the given information from the preparation phase. Each component uses mutation similarly to alter individuals and add more diversity to the population in their respective components. Mutating a Window When the window of a rule is chosen to be mutated, CepGP creates either a random new length or time window for the rule which are equally probable. With this, CepGP incorporates new windows into the population. Otherwise, only the windows within the initial population can be exchanged during the evolutions from generation to generation. Since the window determines the events on which a rule is applied to, it can play an important part in the overall fitness of the rule, apart from the actual window fitness. Thus, the mutation rate should not be too low to enable more windows to be evaluated but in the spirit of general evolutionary algorithms, the mutation rate should also not be too high to not disrupt the inherent optimization within the process. Mutating an ECT Since the ECT is a tree itself, the same point-tree-mutation is applied to it. The mutation point within the ECT is determined in the same way as the crossover point. During the mutation the node that corresponds to the mutation point is replaced by either a random event type , where the original subtree under the replaced node is discarded, or a random event operator , where CepGP uses the children of the original node as children of the new one if present. Otherwise it generates a new random event type for every additionally needed child. Another approach can generate new subtrees as children with the grow or full method and a maximum depth that is equivalent to the initial maximum depth. But since CepGP interprets mutation as a minor change to the original, it uses only event types as new children. August 28, 2016 65 5 CepGP – The Genetic Programming Algorithm Every other component of the rule or node of the ECT is left unaltered. However, as with every change in the ECT, CepGP needs to investigate the ACT for inconsistencies in the used aliases and whether their references still exist. If it encounters inconsistencies, CepGP applies the repairing algorithm presented in 5.4.2 on page 60. Figure 5.27 shows an example of a mutation of the ECT. The chosen index within the ECT is 0 which represents the root node, the ∧-operator. The mutation algorithm now generates a random ECT-node, either another ECT-operator or an event type. In this case, it is a ∨-operator which replaces the chosen ∧-node in the original while preserving the original operands of the tree node, the event types A and B. If an event type was to replace the original ∧-node, then the operands would have been discarded since an event node is always a leaf node. If an event type was to be replaced by an operator for example, missing operands would have been generated as random event types only. Operators are not generated as new operands to keep the mutation as a minor change. Figure 5.27: Example Mutation of the ECT; The selected ECT node is the root with index 0 (highlighted in brigh yellow) represented by a ∧-operator (colored in red); the randomly generated node to replace th chosen node is a ∨-operator highlighted in blue; the mutated individual consists of all the information from the original, but the chosen node was replaced by the newly generated one while preserving the original subtree (here the nodes A and B) Mutating an ACT The mutation of the ACT basically follows the point-tree-mutation as described in the ECT but extends the process due to its higher complexity and constraints. The first difference exists in the presence of an ACT since it is optional. If there is no ACT in the individual at hand, CepGP adds an additional artificial index to the mutation point calculation. If the result of the calculation equals this artificial node, CepGP creates a random comparison operator with random but valid attributes as described in the Attribute Condition Tree creation section 5.3.3 on page 46 and makes it the new Attribute Condition Tree root of the individual where there was none before. 66 Norman Offel 5.5 Fitness Calculation But even if there is an ACT in the individual, CepGP differs between logical and comparison operators as mutation points. If the mutation point represents a logical operator, CepGP works the same as with the ECT and replaces this node by another random logical operator and uses the original children as the respective children of the new operator or CepGP replaces the original logical operator with a new random comparison operator. Because comparison operators are not only leafs of the ACTs but also small trees with a height of 1 themselves, CepGP uses another round of point-tree-mutation for mutating the comparison operator. First, CepGP selects a random node within the comparison operator as the actual mutation point, which can either be the operation itself or one of the attributes. Either of these nodes is then replaced by a new random node of the same type (operator or attribute) by still producing valid ACTs as a result. The comparison operators are all binary. Hence, they have the same amount of operands what obviates the need to produce potentially needed additional attributes. Concerning the attributes, CepGP distinguishes between the first and the second operand when building new ACTs (see 5.3.3 on page 46) and also during mutation for the same reasons. The first operand is never a constant value but only an event attribute to prevent meaningless conditions in the ACT which means mutating the first operand of a comparison operator also produces another event attribute. The second operand can be both, an event attribute or a constant value which is dependent on the first operand and its value boundaries. Through this rather complex mutation algorithm of the ACT, CepGP ensures type-safety and meaningful conditions. It is the only way to introduce new or lost attributes into the ACTs of a population. Therefore, the mutation rate should be adjusted according to the number and variety of attributes in the inspected event stream to allow a decent exploration of the problem space during the optimization process. Figure 5.28 on the following page shows an example of a mutation of a comparison operator. This figure displays the second stage of the comparison operator mutation, where the index of the components of this operator is shown and either the comparison operator itself or one of its operands are chosen to mutate. In this case the highlighted first operand A0.a0 is to be mutated. A new valid attribute is generated. The A0.a1 is another attribute of the same alias A0 and replaces the original attribute in the mutated individual. 5.5 Fitness Calculation Quantifying the quality or fitness of an individual is one of the most important and deciding tasks in Evolutionary Computation. The fitness determines whether an individual is selected and, thus, able to pass over its own information in crossover to the next generation to ultimately participate in the search of the optimal solution. The fittest individual is the one closest to the optimal solution. Because the fitness is of such importance and a core part of the optimization algorithm, CepGP uses a more sophisticated way of determining this value. Fitness in CepGP is a blended value build from three different values, each quantifying the fitness of a part of the rule and each of different importance to the overall fitness. Hence, It is the weighted sum out of the condition fitness, the window fitness and complexity fitness, August 28, 2016 67 5 CepGP – The Genetic Programming Algorithm Figure 5.28: Example Mutation of the ACT; Here the comparison operator was chosen to be mutated and the second round of point mutation is shown; within the comparison operator tree, the first operand A0.a0 (highlighted in red) is to be mutated and replaced by a newly generated valid attribute A0.a1 (colored blue) in the mutated individual 68 Norman Offel 5.5 Fitness Calculation whereas the condition fitness is by far the most important measure, followed by the window fitness. The complexity fitness has a minor impact to distinguish rules which otherwise would have equal fitness. There are more ways to achieve this so-called multi-objective optimization which remain for investigation in future research.([39] p. 75ff.) The following sections describe the fitness functions to quantify the fitness values and how they are blended together to form the total fitness of an individual. 5.5.1 Condition The quality of the rule is based on the so-called binary classification, meaning that the events in the event stream either fulfill the requirements of the rule or they do not. The goal for each rule is to classify the marked complex event, meaning the special event type that marks the happenings within the stream, as the only event that fulfills the requirements of the rule into one class. The other class consists of the other events that are not the marked complex event. This is the most important objective of the optimization by far and it depends on the condition part of the rule and the window which decides which events are considered while computing the condition. CepGP analyzes this classification by removing the marked complex event from the stream for the rule evaluations and remembering the indices of the original positions. The result of the rule on the altered event stream is the indices of the positions where it fired. The rule is expected to insert complex events only at the places where the original marked complex event is in the original event stream. Thus, CepGP compares the resulting indices of the rule with the indices of the marked complex event in the original event stream. In the so-called Receiver Operating Characteristics (ROC) analysis, there are four basic measures that can be derived from the classification result: True positives (TP) is the number of times the rule fired at the right positions. True negatives (TN) is the number of times the rule does not fire when it is not supposed to fire. False positives (FP) is the number of times the rule fires when it is not supposed to fire. False negatives (FN) is the number of times the rule does not fire when it should fire. Just like TP, TN are correct classifications and oppose FP and FN which are false classifications. There are some additional information about these numbers: 1. The four numbers add up to the total number of events in the stream: T P + T N + F P + F N = #events 2. There can only be as many TP or FN as there are instances of the marked complex events and their sum is equal to the number of marked complex events (CE): T P + F N = #CE August 28, 2016 69 5 CepGP – The Genetic Programming Algorithm 3. Conversely, TN and FP add up to the number of events that are not the marked complex event which can also be infered from 1. and 2.: T N + F P = #events − #CE 4. By comparing the resulting indices of the rule with the original indices of the marked complex event, CepGP can infer TP, FP and FN directly (see figure 5.29): • TP are the correct indices • FP are the indices that are in the result of the rule but not in the set of indices of the marked complex event • FN are the indices that are in the set of indices of the marked complex event but not in the result of the rule TN can be calculated from 1. as: T N = #events − T P − F P − F N #Events TN Rule Result FP #CE TP FN Figure 5.29: Relation between TP, FP, FN and TN; The outer circle represents the total number of events, the yellow circle the indices as the result of the rule and the red circle the original indices of the marked complex event; any index not in the yellow or red circle is a member of TN; any index in both, the yellow and the red circle, is a member of TP; any index in the original indices but not in the rule result is a FN; any index in the rule result but not in the original indices is a FP Since there usually are only a few marked complex events in the event stream compared to the overall number of events, the difference #events−#CE is extremely high. Therefore, it is important to value both measures, the number of times of correctly firing at the positions where there is a marked complex event (TP) and the number of times of correctly not firing at places where there is no marked complex event (TN), equally without biasing towards one of them. Otherwise, CepGP would value rules higher that cover all of the right positions but also a lot more over rules which have a lot less FP but do not cover all of the right positions. There are a lot of measures based on the presented TP, TN, FP and FN. But not all of them are unbiased and therefore not ideal for CepGP. The most prominent measures are (see [41]): 70 Norman Offel 5.5 Fitness Calculation • The True Positive Rate (TPR), also called Recall or Sensitivity, is the quotient of the identified right places and the overall number of marked complex events. TPR = TP TP + FN • The Precision, also called Confidence, determines the part of the identified positives that are correct. TP P recision = TP + FP • The True Negative Rate (TNR), also called Inverse Recall or Specificity, is the number of correctly identified negatives in relation to the total number of events which are not marked complex events. T NR = TN TN + FP • The False Positive Rate (FPR), also called Fallout, is the rate of events that got mistakenly classified as positive. FPR = FP FP + TN Although these measures are widely used, they focus either on the positive or the negative classification while ignoring the other classification. This bias is disadvantageous because the quality of the outcome of the algorithms using these measures depends on the sizes of the classes.[41] However, there are other measures that combine both classifications (see [41]): • The Accuracy compares the number of correctly classified events with the overall number of events. In contrast to the aforementioned measures, the accuracy takes both, positives and negatives, into account but it is sensitive to bias and prevalence of one of them. This is problematic since the negatives are usually prevalent in the event stream. TP + TN Accuracy = TP + FP + TN + FN • The F1-score does not take TN into account, although it is in most cases prevalent over the other three values for fit individuals and it is small for unfit ones. Since it shows such an important characteristics, TN should not be ignored in CepGP. F1 = August 28, 2016 TP N T P + F P +F 2 71 5 CepGP – The Genetic Programming Algorithm • Jaccard, also called Tanimoto, is biased because TN is ignored as well. Jaccard = TP F1 = TP + FP + FN 2 − F1 • The Weighted Relative Accuracy (WRacc) is an unbiased measure and therefore suitable for CepGP. Its function value range is [−1, 1] where individuals without a single correct classification gets the condition fitness of -1, whereas average individuals have a fitness of 0 and individuals which classify every event correctly deserve a fitness of 1. The following equation ignores the optional weight. W Racc = T P R − F P R = FP TP − TP + FN FP + TN CepGP uses the unbiased Informedness[41] measure to blend the four values for True Positives, True Negatives, False Positives and False Negatives together into the interval [−1, 1] where -1 means that the rule categorized each event in the event stream falsely either into the positive or negative class. A value of 0 equals the fitness of the average random rule and a value of 1 equals the optimal result where each event in the event stream is put into the correct class by the rule. In the initial population, the average informedness is expected to be around 0. In the generations to come, the average informedness is expected to rise because the fitter individuals prevail and contribute to new individuals while supplanting unfit individuals more and more, leading to even fitter individuals overall. The minimum fitness of a population cannot be predicted because it is often very easy to create individuals which miss the optimal by far, through both, crossover or mutation. a(x) = Inf ormedness = T P R + T N R − 1 = TP TN + −1 TP + FN TN + FP “Informedness quantifies how informed a predictor is for the specified condition, and specifies the probability that a prediction is informed in relation to the condition (versus chance).”[41] Figure 5.30 on the facing page explains how ROC analysis works. It compares the True Positive Rate (TPR) with the False Positive Rate (FPR). The diagonal represents random individuals while they are as often correct as their are mistaken about the classification of the events in the stream. Any individual with a higher TPR than their FPR is a good one, since it is better than the random individuals. Individuals with a higher FPR than their TPR are considered bad because they are worse than the average random individual. The goal for any individual is to maximize the area under curve.[41] Both, WRacc and Informedness, are measures that use this principle in their own way. 72 Norman Offel 5.5 Fitness Calculation Figure 5.30: Illustration of ROC Analysis (adapted from [41]); Individuals try to maximize their True Positive Rate (TPR) while minimizing their Fals Positive Rate (FPR) at the same time and in this way maximize the area under curve colored in yellow for the good individual; good individuals have a higher TPR than their FPR; every individual with a higher FPR than their TPR is worse than the average random rule 5.5.2 Window The condition fitness outweighs the window fitness by far even though they are tightly interlinked. A fit window 1. covers as many events as needed for the rule to potentially fire at the right places 2. is as small in size as possible To put it simple: The window should cover as many events as necessary but not any more. The quality of a window as part of the rule by the first point, is already integrated into the condition fitness. The specific window fitness concerns itself about the second point. A rule has a high window fitness when the window is very small. This quality objective arises from the way Complex Event Processing works. A bigger window means that more events have to be cached to allow the rule engine to investigate them in the future during rule evaluation. • More events need more memory • More events need more processing power and time for rule evaluation August 28, 2016 73 Fitness 0.7 0.6 0.5 0.4 0.3 5 CepGP – The Genetic Programming Algorithm 0.2 0.1 That is the reason why the0 window is also a valuable optimization objective in CepGP. To max. Size Sizewindows over large ones, Time Window Size of a linear function, CepGP stronger emphasize reallymin. small instead uses a logarithmic fitness function as depicted in figure 5.31. 1 0.9 0.8 Fitness 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 min. Size Window Size max. Size Figure 5.31: Idea of the window fitness function: b(x) = 1 − log1+max. min. Size) Size − min. Size (1 + x − The general idea of the fitness function is of the form 1 − log(x). It calculates a penalty value between 0 and 1 which grows logarithmically with the size of the window and subtracts this value from 1. Doing so results in optimal results for minimal sized windows and in the worst fitness values for the near maximum sized windows. To optimally use the domain of window range values, CepGP stretches the function to be exactly 1 only for the minimally sized window and exactly 0 only for the maximally sized window. This way allows CepGP to best differ between the fitness of individuals. This fitness function would work well if the fitnesses of the window types were not interrelated. However, there is a problem in the stretching of the function values over the window value range. Since there are two types of windows, length and time, each of which with different value ranges, it is problematic to calculate the fitness of one of the types without potentially discriminating the other. For example, there can be a small number of events but their timestamps lead to a wide time window value range. Assuming the events are timely equidistant to each other which means that the window type of length with value 2 should have the same fitness as the window type of time with the value of the time distance between two events. Calculating the fitness with the function where the domain is the interval between the minimum and maximum values of the type at hand alone will yield a different result than what is expected. The time windows which should have the same fitness are less fit than the equivalent length window because the function values are stretched according to the value ranges of the types to map the possible window values to the wanted function values of [0, 1] in an optimal way. This problem is pictured in figure 5.32 on the facing page. One solution to this problem is to first linearly convert the value of the window at hand into 74 Norman Offel 5.5 Fitness Calculation 1 0.9 0.8 Fitness 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 min. Size Window Size Length Fitness max. Size Time Fitness Figure 5.32: The window problem using timely equidistant events; The fitness of a time window is lower than the fitness of the corresponding length window even though they should have an equal fitness a common range for all windows. After that, the function can be applied to the common range with the converted window value. This way obtains the wanted result where windows which are equivalent in meaning are assigned the same fitness value. The common range for all windows in CepGP is always the length window range because it usually already provides a reasonable range starting with 1 and in this way only the time windows need to be converted what can reduce the amount of conversions by half on average. The linear conversion function into the range of the window of type length is l(x) = x−β · (ω − α) + α ψ−β where • x is the actual window value • β is the minimal size of the original type • ψ is the maximal size of the original type • ω is the maximal size of the length type • α is the minimal size of the length type. The window fitness function b applied to the length range [α, ω] with the parameter λ = l(x) is August 28, 2016 75 5 CepGP – The Genetic Programming Algorithm b(λ) = 1 − log1+ω−α (1 + λ − α) = 1 − log(1 + λ − α) log(1 + ω − α) resulting in figure 5.33. 1 0.9 0.8 Fitness 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 min. Size Window Size Length Fitness max. Size Correct Time Fitness Figure 5.33: The correct window fitness function which applies conversion and then calculates the fitness as described; the functions overlap completely which shows that the fitness values are computed correctly 5.5.3 Rule Complexity After the condition fitness and the window fitness, the rule complexity fitness is the least important fitness measurement. If there are individuals with almost identical fitness values in condition and window fitness then the rule complexity fitness separates them by quantifying the structure of the rule. It follows the principle “the simpler the better”. The action is not considered in CepGP. From the point of view of CepGP the window is always one node not matter what type it is or additional information it may hold and therefore does not affect the structure or rule complexity. What this fitness rates is the condition tree and, thus, uses meta information about the Event Condition Tree and the Attribute Condition Tree like height or number of nodes to calculate the rule complexity fitness value. To blend the height and number of nodes together, CepGP uses the higher tree of either the ACT or the ECT and puts it into relation to the total number of nodes from both trees: z(x) = 76 1 + ρ(x) τ (x) Norman Offel 5.5 Fitness Calculation where • ρ is the maximum of the height of the ECT and the height of the ACT of the rule x and • τ is the sum of the number of nodes from both, the ECT and the ACT of the rule x. The result is within the interval (0, 1]. It is 1 when there is only one event type as the ECT root and no ACT in the rule. It cannot be 0 since the numerator is at least 1. Furthermore, it is always defined because there is always at least one node in the ECT. If the same event type in the ECT or alias in the ACT is part of unnecessary many conditions then the rule is more complex than a rule with basically the same condition and window fitness. CepGP therefore also considers the distinctiveness of nodes in the ECT and the ACT. This measurement describes how often the same event type in the ECT or the same alias in the ACT is present in relation to the overall number of nodes within their respective trees and is calculated as υ(x) d(x) = σ(x) where • υ is the number of distinct event types or aliases in the ECT or ACT of the rule x and • σ is the overall number of nodes in the ECT or ACT respectively in the rule x. d(x) produces values in the interval (0, 1] and is applied separately to both, the ECT and the ACT. It cannot be 0 for the ECT since there is always at least one event type in the ECT. It is optimal with value 1 if every event type is used just once in the ECT. The same goes for the ACT with the addition that, if there is no ACT in the rule, CepGP assigns an attribute condition tree distinctiveness value of 1. The overall rule complexity fitness now consists of three single fitnesses: z(x) and d(x) from the ECT and the ACT. CepGP calculates the rule complexity fitness as the average between those single fitnesses: c(x) = z(x) + dECT (x) + dACT (x) 3 . Just like the single fitnesses, the result of this function is within the interval (0, 1]. The simplest rule consisting just of a single event type in the ECT with no ACT has a rule complexity fitness of 1. The more nodes and the more repetitions of the same event type or alias, the lower the rule complexity fitness. August 28, 2016 77 5 CepGP – The Genetic Programming Algorithm 5.5.4 Total Fitness The total fitness is the result of the presented condition fitness, window fitness and complexity fitness. All of them grade one of the optimization objectives for CepGP and need to be combined into one value which defines the fitness of the whole rule compared to other rules. CepGP uses a weighted sum to build the total fitness, since the importance of each objective can easily be determined a priori: f (x) = αa(x) + βb(x) + γc(x) with • α being the weight for the condition fitness calculated by the function a(x) (see 5.5.1 on page 72) • β being the weight for the window fitness calculated by the function b(x) (see 5.5.2 on page 76) • γ being the weight for the complexity fitness calculated by the function c(x) (see 5.5.3 on the preceding page). The most important partial fitness is the condition fitness. It should be weighted a lot heavier than the other two. In this way, it contributes much more and CepGP first and foremost will find suitable conditions for the rules. As discussed earlier in section 5.5.2 on page 73, the window is an integral part of the condition fitness. However, the window fitness grades the size of the window which is also an objective of the optimization. But it should be weighted much lighter than the condition fitness. The rule complexity is a minor part of the overall fitness of the rule and, hence, should also take a minor role in the overall fitness. The weight for the complexity fitness should be chosen so that only basically identical rules according to the condition and window fitnesses are affected by that rating. Ideally, CepGP should only consider the other less important objectives when the rule is very fit regarding its condition. CepGP accomplishes that by introducing a threshold which represents the minimum condition fitness value for the window and complexity fitness to contribute to the total fitness of the rule. CepGP also normalizes the total fitness back to the interval [0, 1] for the rules with a fitness value greater than 0 to enable a better interpretation of the results by the user. The normalization is only done for values greater than 0 because these rules are better than an average random try and rules with a fitness value less than 0 would benefit from the normalization even though they should not. 78 Norman Offel 5.6 Summary 5.6 Summary CepGP is a type-safe Genetic Programming algorithm which needs minimal manual information and a recorded event stream to derive a Complex Event Processing rule which is the closest to producing the wanted marked complex event which represent the happenings. In a preparation phase which pre-processes the historical event stream, meta-information are inferred that help CepGP to identify the problem space. This also allows CepGP to only produce valid CEP rules that are evaluable as well. A tree based representation was chosen for the rules which enables Genetic Programming to use well-known practices for its evolutionary operations crossover and mutation. At each general level, the tree structure also divides the components from another providing the algorithm with means to ensure type-safety. CepGP ignores the action part and focuses on the condition, which is divided into the subtree for event conditions and for attribute conditions, and the window that can take the form of a length window or a time window. During the population initialization, a specific number of rules, called individuals, is produced by • choosing a random but valid window, either of type length or time and with a value within the respective boundaries • creating an Event Condition Tree (ECT) with the ramped half-and-half algorithm and the information gathered during the preparation phase to produce only valid trees • optionally creating an Attribute Condition Tree (ACT) with the grow algorithm and the information available from the preparation phase to attain only valid trees The algorithm proceeds with asserting the fitness of every individual by calculating three sub fitness values for the condition, the window and the structure of the rule. Each sub fitness value is conglomerated into one value while emphasizing the importance of the condition fitness over every other one. The window size is also factored in to distinguish fit rules also according to their expected resource consumption. If rules are almost identically fit considering the condition and the window, then CepGP attempts to favor the simpler rule over the more complex one. After the initialization, CepGP continues by applying the evolutionary operations crossover and mutation to each generation of individuals for a given amount of times. Crossover uses a selection algorithm to draw a random individual from the population while considering its fitness so that the fit individuals prevail while unfit rules become extinct. With a certain chance, CepGP chooses another individual in the same way and mates them both to produce an offspring for the following generation. During the mating process, CepGP selects a random node of the first individual from the set of changeable nodes of the rule tree which are the window as one node and all the nodes of the ECT and the ACT. The crossing point of the second individual is now drawn from the same component as the first crossing point to ensure type-safety and valid individuals as results of crossover operations. August 28, 2016 79 5 CepGP – The Genetic Programming Algorithm After producing the new generation with crossover, CepGP uses elitism that transfers the absolute best of the previous generation to the next generation. The remaining number of individuals for the next generation is filled up with the individuals of the newly formed generation from crossover in order of their creation. Then, CepGP applies mutation to the population by inspecting each individual and altering them by a given chance. This alteration is rare compared to the crossover and affects only a minimal part of the rule to abide by the original idea of the mutation process. If the resulting individual of either crossover or mutation is not valid, CepGP attempts to repair the rule with as few changes as possible to allow each rule to contribute to the optimization process. This algorithm’s strong points are the type-safeness and the combination of Complex Event Processing and Genetic Programming while encompassing a lot of the features of CEP rules and by translating it into the problem domain of the Genetic Programming algorithm to derive an optimal rule which is closest to producing the marked complex event from the given recorded event stream. However, there are still language featurs in CEP which are not covered in CepGP yet. Table 5.2 on the facing page compares the language specification of CEP and the supported constructs in CepGP. It remains a future task to introduce the missing features into CepGP. In section 6.5 on page 98 a few ideas are presented to give hints into possible directions to add arithmetical operations and aggregation functions to the CepGP algorithm. 80 Norman Offel 5.6 Summary Construct Supported Event Condition Components Sequence (→) And (∧) Or (∨) Not (¬) Excluding sequence (A → ¬B → C) Attribute Condition Components Logical Operators And Or Not Referencing Alias Access-operator (.) Arithmetical Operations +, −, /, ∗ × Comparison Operators <, >, = ≤, ≥, 6= Aggregation Functions sum × avg × min, max × D D D D D D D D D D D D Table 5.2: Support of Language Constructs in CepGP August 28, 2016 81 6 Implementation To verify the conceptual algorithm presented in chapter 5 on page 35, CepGP was implemented in Java during this thesis as a proof-of-concept work. The goal was to implement the system independently from any existing platform or framework with the least dependencies possible to proof the potential of CepGP. This chapter begins with the requirements emerging from the concept, proceeds with the input and output specification, describes the programmed rule engine and the implementation of CepGP. Afterwards, the parameters are presented and the limitations of the implementation are described. The chapter concludes with a summary. 6.1 Requirements For the sake of completeness and to verify the approach presented before, the implementation should use the process depicted in figure 5.1 on page 35: • Reading and parsing the events recorded in a file (preparation phase) • Building the initial population according to the presented algorithm • Using elitism and tournament selection during evolution • Implement the crossover and mutation operations as described Additionally, the fitness functions need to be implemented to quantify the fitnesses for the three objectives for the condition, the window and the complexity. A very important property of the algorithm is the type-safeness in operations and components of the individual which the implementation needs to take into account as well. The most important goal of the implementation is to verify the algorithm, identify strengths and drawbacks, and enable a fond evaluation of the approach by providing flexibility in the choice of the algorithms used for selection, crossover, mutation, population building and so on. It also should draw strength from Genetic Programming to allow promising results with only a few manual set options, in this case only the input file and the complex event for which the program shall derive a rule. 82 6.2 Input and Output Specification 6.2 Input and Output Specification The algorithm starts by reading and parsing the events from the given input file which includes the complex event for which the algorithm shall derive a rule. During the search process, the user is supplied with outputs to indicate the status of it. At the end of the program run, outputs are generated to provide further insight into the work done by the program. 6.2.1 Input Events in the input file are organized in one line each and have the following structure: <yyyy-MM-dd HH:mm:ss>; <event type>; [attributes in the form <name: value> and each attribute separated by semicolon] • <yyyy-MM-dd HH:mm:ss>: An event was recorded on a day and time. • <event type>: An event always has a type. • [attributes in the form <name: value> and each attribute separated by semicolon]: Optionally, there are also attributes (with a name mapped to a numerical value) separated by a semicolon. The name and the value of an attribute are separated by a colon. However, each event of the same type has to have the exact same number of attributes with the same names. But the order of the attributes does not matter. An example of an event recorded at 10:55:30 AM on the 29th of June 2016, of type “B” with the attributes: “b2” with value -7.0, “b3” with value -74.0, “ID” with value 2.0 and “b1” with value 19.0: 2016-06-29 10:55:30; B; b2: -7.0; b3: -74.0; ID: 2.0; b1: 19.0 As mentioned before, the marked or special complex events which are the targets of the rule only need a unique, but among these special events common, type name without any other information like date and time or attributes. This enables a simpler usage of the program and the domain expert has an easier time when she needs to add these events manually into the captured stream. August 28, 2016 83 6 Implementation 6.2.2 Output The program generates two output files and an output on the prompt: • Prompt output: Before the actual execution of the algorithm, it will show a summary of the parameter settings it uses for this run. During the run, the program will display the generation number that is currently processed. After the run, the program will display the best found rule in the prompt including the overall fitness and the partial fitnesses of condition, window, and complexity. • generations <date and time of file creation>: It contains the population of every generation of the evolutions of the Genetic Programming algorithm. The number of the generation is in one line and the following lines represent the individuals of the generation in descending order of their overall fitness. Each individual also lists the partial fitnesses for condition, window and complexity plus the rule representation as described in section 2.2.2 on page 13. An excerpt of an example file looks like this (generation 0 indicates the initial population): 0 0.78949 (condition: 0.80000 window: 0.68731 complexity: ((A as A0 → A as A1) ∧ ((A0.a2 = A1.ID) ∨ (A1.a1 > A0.a2)))[win:time:986Seconds] =⇒ HIT C C 0.50000) 0.67870 (condition: 0.68889 window: 0.57745 complexity: 0.61111) ((A as A0 → A as A1) ∧ (A0.ID > 1.0))[win:length:7] =⇒ HIT ... C • GPA GENERATIONS <date and time of file creation>: This file contains a summary of all the generations evolved for the specific run of the program. It provides information about the best and worst individual, plus average and mean condition fitness of the generation. Generation: 0 best individual: 0.78949 (condition: 0.80000 window: 0.68731 complexity: 0.50000) ((A as A0 → A as A1) ∧ ((A0.a2 = A1.ID) ∨ (A1.a1 > A0.a2)))[win:time:986Seconds] =⇒ HIT C C C worst individual: -0.53000 (condition: -0.53000 window: 0.10734 complexity: 0.50000) ((B as B0 ∨ A as A0) ∧ ((A0.ID > 1.75) ∧ (A0.a1 < -49.25)))[win:length:61] =⇒ HIT mean conditionFitness: 0.00000, avg conditionFitness: 0.02029 ... C 84 Norman Offel 6.3 The Rule Engine 6.3 The Rule Engine To enable an independent implementation, this work uses its own rule engine to evaluate the condition of the rules by also building on top of the tree representation of the individual for the Genetic Programming algorithm. This provides the use of the same representation in the rule engine and the Genetic Programming implementation which obviates the need for a conversion between genotype (rule representation in Genetic Programming) and phenotype (rule representation in the rule engine). The other approach would have been to build a conversion between the genotype and the phenotype which converts the rule to suit the representation for the external CEP-system or vice-versa into the representation used in the Genetic Programming algorithm. The result from the evaluation of the rule on the training data via the CEP-system would need to be converted as the tuple (TP, TN, FP, FN) as a feedback to the Genetic Programming implementation so the Genetic Programming implementation can calculate the condition fitness. The chosen approach is superior in the sense that it does not need this conversion procedure at all, whereas the other approach would need it for every rule evaluation on the test data. The use of an external CEPsystem, in most cases, would also need reading and parsing of the training data for every rule evaluation as well which is another benefit of the chosen approach where it is done once in the preparation phase. However, the rule engine still needs to be implemented, too. It needs to be robust, so that every possible rule can be evaluated, no matter its complexity. Every edge-case needs to be carefully considered and handled because especially with randomly generated and probabilistically combined rules, there is a high chance that the outcome may contain confusing conditions for the human eye that also might trouble the rule engine. Since the genotype and the phenotype are the same, the same tree can be passed from the Genetic Program to the rule engine for determining the hits and misses and the result can be fed back to the Genetic Program to calculate the fitness. The general process is illustrated in figure 6.1 on the next page. As already mentioned in section 5.2 on page 43, the preparation phase derives metainformation needed for the evolutionary process, but also the original indices of the special event for which the algorithm shall find a rule. Furthermore, the preparation phase prepares a version of the event stream from the input without the special event. This version is used to evaluate the individuals of the Genetic Programming algorithm. The Genetic Programming algorithm creates new rules during the evolutionary process. To evaluate the condition fitness of the rule, CepGP hands this rule over to the rule engine. The rule engine executes the rule on the training data (the event stream without the special event) as follows: 1. Use the events from the input file (read and parsed once during the preparation phase) but without the special event for which the algorithm shall find a rule 2. Start with the first event from the input and execute the rule August 28, 2016 85 6 Implementation Figure 6.1: Process of the Rule Evaluation; The first step is the extraction of the indices of the special event and the event stream without this special event during the preparation phase; The Genetic Programming algorithm creates new individuals during the evolutions and to evaluate the condition fitness, it passes the rule to the rule engine; the rule engine executes the rule on the event stream which does not contain the special event and remembers the indices where the rule would have inserted a complex event; after the execution of the rule, the rule engine compares the indices as the result of the rule execution with the original indices of the special event; The outcome of this comparison is the tuple of the four key numbers for True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) which are handed over to the Genetic Programming algorithm again; The Genetic Programming algorithm uses the condition fitness function to calculate the condition fitness out of these four numbers 86 Norman Offel 6.3 The Rule Engine 3. Iteratively, add the next events in order, one after the other, and execute the rule every time the next event is added after pruning older events according to the window of the rule. 4. Remember the indices of the places after the event that leads to the firing of the rule: index = 1 + index of event + #already encountered positives Or in other words: remember the places where the rule would have inserted a complex event. 5. After the last event from the input was added and the rule was executed, compare the remembered indices of this rule with the original indices of the special event in the input and compute the True Positives, True Negatives, False Positives and False Negatives. 6. Hand over these four key numbers to CepGP, the implementation of the Genetic Programming algorithm Afterwards, CepGP calculates the condition fitness from these four key numbers according to the condition fitness function used in CepGP (Informedness, see section 5.5.1 on page 69). The rule engine has to implement and support the language specification presented in section 2.2.2 on page 13 to determine when a rule fires. Table 6.1 on the next page compares this language specification with the implemented constructs. The most difficult part of the rule engine is the close coupling of the Event Condition Tree (ECT) and the Attribute Condition Tree (ACT). During the execution of a rule, the ECT is applied to the events within the window first. The ACT, if present, is only applied if the ECT fires. To enable the ACT to compute its result, the aliases need to be build to offer references to actual event instances with attribute values that are accessed and compared in the ACT. These aliases are build during the application of the ECT on the events. A map, with the alias being the key and the object-reference to the event instance being the value, is passed down with each recursive invocation of the operands of each operator in the ECT. If the current node is an event with the expected type according to the rule, then it adds itself under the correct alias to the map. After a successful execution of the ECT, this map is handed over to the ACT which also is executed recursively. If the ACT is evaluated successfully, too, then the rule fires and the rule engine saves this occurrence as an index as previously described in the figure 6.1 on the facing page. With this approach the sequence(→)- and the and(∧)-operator can be implemented as expected because if these ECT-operators fire, then the aliases are also definitely updated and refer to an event instance in the map. The or(∨)-operator, however, can fire even if not every operand evaluates to true. This means, that not every alias is set and thus, the ACT may not be evaluable. The rule engine remedies this problem by following these steps in the evaluation of the ACT-comparison-operators (<, >, and =; potentially also ≤, ≥, and 6=): August 28, 2016 87 6 Implementation Construct Supported Event Condition Components Sequence (→) And (∧) Or (∨) Not (¬) × Excluding sequence (A → ¬B → C) × Attribute Condition Components Logical Operators And Or Not Referencing Alias Access-operator (.) Arithmetical Operations +, −, /, ∗ × Comparison Operators <, >, = ≤, ≥, 6= × Aggregation Functions sum × avg × min, max × D D D D D D D D D Table 6.1: Support of the language constructs in the rule engine 88 Norman Offel 6.3 The Rule Engine • Evaluate normally when both operands are dereferencable, meaning they are either an actual event attribute or a constant value. • Ignore when one operand is not dereferencable and the other is also not dereferencable or a constant. This means, that the outcome of the ACT-evaluation does not depend on the result of this ACT-operator. • Evaluate to false when one operand is dereferencable and the other is not because this comparison cannot be evaluated. Since one operand is dereferencable, the user would expect that the evaluation of the ACT also depends on this attribute. But when the other operand of the operation cannot be determined then the outcome has to be a false. 1st Operand\2nd Operand Dereferencable Not dereferencable Dereferencable Evaluate normally False Not Dereferencable False Ignore Constant Evaluate normally Ignore Table 6.2: Decision Matrix for ACT-Comparison-Operator Evaluation; Only the second operand can ever be a constant value according to the presented algorithm Table 6.2 summarizes the decision matrix the rule engine uses. The consequences of ignoring the evaluation of a comparison-operator in the ACT depend on the logical ACT-operator’s intention. • The ∧-operator returns true as long as there is no operand evaluating to false. Comparison operators that evaluated to ignore are therefore treated like true results when compared against a non-ignore (true or false) input value (see table 6.3). Input True True True False True Ignore False True False False False Ignore Ignore True Ignore False Ignore Ignore Output True False True False False False True False Ignore Table 6.3: Decision Matrix for ∧-Operator • The ∨-operator returns an ignore if both operands also return an ignore. Otherwise, it evaluates to true as long as one operand is true and false if both operands are false (see table 6.4 on the next page). August 28, 2016 89 6 Implementation Input True True True False True Ignore False True False False False Ignore Ignore True Ignore False Ignore Ignore Output True True True True False False True False Ignore Table 6.4: Decision Matrix for ∨-Operator • The ¬-operator also returns an ignore if the operand evaluates to ignore. Else, it negates the result of the operand (see table 6.5). Input True False Ignore Output False True Ignore Table 6.5: Decision Matrix for ¬-Operator The missing ACT-comparison-operators are not implemented due to time constraints of the thesis. However, they can already be used by combination of the existing comparison operators. Although, this is often unlikely to happen because the attributes would need to be the same in the same order for two correctly combined comparison-operators. For example, ≤ can be simulated by combining the < and the = operators with a ∨ logical operator, where both < and = have to have the same attributes in the same order. Another possible way would be to negate a > operator to achieve a ≤ operator with only one operator instead of two. Nevertheless, it is advisable to implement these missing operators and by doing so enable the Genetic Programming algorithm to create more complex relations between attributes more easily. Coming back to the Event Condition Components and the currently unsupported operators with negation: ¬ and the excluding sequence. In combination with the ACT, the negation in the ECT can be used to further specify the requirements for an event of a certain type to not be allowed. For example, (¬(A as A0)) ∧ (A0.a = 0) defines an ECT which fires whenever an event of a different type than A is encountered and an ACT that specifies that the overall rule should also be true when the event is of type A but its attribute a is not equal to 0. This evaluation of the ACT currently would not take place because the ECT was already evaluated to false if an A was encountered. The problem lies within the iterative evaluation of the ECT and the ACT. Without the ACT the negation works like intended. As soon as the ACT uses aliases to events under a subtree with a negation, the 90 Norman Offel 6.4 CepGP ECT would need to postpone its ultimate evaluation result and first pass the alias-map to the ACT for further investigation. If the referenced events under negation subtrees fulfill the requirements within the ACT than the rule should not fire. It gets even more complicated because the implementation needs to consider a tree with multiple negations in its subtrees. This contradiction is a remaining problem in the current implementation of the rule engine and could not be solved due to timely constraints. One possible solution could be to evaluate the ACT and the ECT synchronously instead of iteratively. Whenever, during the evaluation of the ECT, an operator of the ACT can be resolved, it should do so and remember these partial solutions. In this case, the evaluation of the ECT already includes the results of the ACT. However, this is a complex approach which needs further research. Arithmetical operators are currently not implemented in the rule engine and so far are also not part of the conceptual work of the CepGP Genetic Programming algorithm. This remains a field for future research and could be implemented as a new leaf node type for the ACT. This adds more complexity to the type-safe property of the proposed algorithm and needs a thorough understanding of the evolutionary processes involved and the evaluation of rules with these components. Crossover and mutation would also need to be adapted to meet the type-safe constraint. The aggregation functions are neither considered in the CepGP Genetic Programming algorithm nor in the rule engine implementation due to timely constraints. One idea to integrate them into the algorithm is to add a new leaf node to the ECT which represents the aggregation function. This is a new type and therefore needs to be integrated into the type-system as well which is a challenging task. Every time the type-system is altered, the operations based on it like the evolutionary operations, need to be adapted as well. 6.4 CepGP This section starts by describing the structure of the implementation of the algorithm presented in chapter 5 on page 35. Afterwards, this section explains the implementation of the preparation phase, the population initialization and the evolutionary operators selection, crossover and mutation. It then proceeds with the evaluation procedure and concludes with a parameter description and states the limitations of the current version and their remediation in the summary. The composition of the CepGP program, including the rule engine (cep-package), is depicted in figure 6.2 on the following page. The util-package includes modules and classes that are used to capsule convenient features mainly for data parsing, initialization, traversing and altering the trees for events and attributes and classes to handle the meta-information from the preparation phase. The gp-package contains the classes and packages needed for the Genetic Programming algorithm. The components and the shown interrelations are explained during the following sections. August 28, 2016 91 6 Implementation Figure 6.2: CepGP Class Diagram; showing the most important packages, their interrelation and the most important classes where for the main class CepGP and the class GeneticProgrammingAlgorithm the methods are also displayed 92 Norman Offel 6.4 CepGP 6.4.1 Preparation Phase The starting point of the program is the class CepGP and its main-method. This method controls the program flow and initializes the parameters which are presented in section 6.4.5 on page 96. It proceeds with the preparation phase and uses the EventDataParser to read and parse the input file while also extracting the meta-information. The result of this step is an instance of the EventHandler which manages the extracted information. Another product of the preparation phase is a WindowBuilder-instance which contains the extracted information about the windows from the input file to generate valid windows during population initialization and mutation. Afterwards, it creates an instance of the class GeneticProgrammingAlgorithm with the needed parameters: • EventHandler contains information about the parsed events from the input file like the used event types, the attributes, number of events, boundaries for windows and attributes and so on. As described several times before, these are much needed information for the evolutionary processes and the population initialization. • ConditionFitnessFunction, WindowFitnessFunction and ComplexityFitnessFunction are instances of the fitness functions which quantify the fitness of the individuals (rules) according to their respective responsibilities. • Crossover is an instance of the Crossover-interface in the crossover-package. As described in section 5.4.2 on page 51, CepGP uses subtree crossover. • PointMutation is an instance of the Mutation-interface in the mutation-package that implements the point mutation as described in section 5.4.3 on page 63. • elitismRate is a value of type double between [0, 1] describing the portion of the population that will survive according to their overall fitness. For example, an elitismRate of 0.1 means that the best 10% of each generation definitely survive into the following generation. • attributeConditionTreeRate indicates the portion of rules which are initially generated with an ACT as a double value between [0, 1]. This only effects the initial population. Whether the following generations also include an ACT is up to the evolutionary process to decide. If the individuals with ACTs come out to be fitter then ACTs will eventually find their way into more rules. • maxAttributeConditionTreeHeight defines the maximum height of the ACTs in the initial population. Again, whether higher or smaller ACTs are better for the ultimate rule is decided according to the fitness. August 28, 2016 93 6 Implementation 6.4.2 Population Initialization After the instantiation of the GeneticProgrammingAlgorithm, the program starts the initialization process via the buildInitialPopulation-method of the GeneticProgrammingAlgorithm to randomly generate the first population of the Genetic Programming algorithm while producing only valid individuals. This method requires an instance of the PopulationInitializer-interface which uses instances of the RuleBuilder abstract class (for full, grow or half-and-half initialization), the size of the population and the maximum ECT height for the initial population. As described in 5.3 on page 44, CepGP uses the ramped half-and-half initialization method. While doing so, it ensures that • the ECT height does not exceed the specified maximum. • ACTs are added to individuals according to the probability specified by the attributeConditionTreeRate. • and that the ACT height also does not exceed the specified maximum while being build via the grow-method. The initialization of the windows is done with the help of the WindowBuilder-instance. After this process of generating the first population, the fitness of each individual is measured according to section 6.4.4 on the next page and the population is sorted by the fitness in descending order. 6.4.3 Evolutionary Process Following the preparation phase and the population initialization, the program executes the loop of evolutionary operations until the number of maximum generations is reached. Each run of the loop is done by the evolve-method of the GeneticProgrammingAlgorithminstance which follows these steps: 1. Calculate the number of elites that survive this generation into the next one. 2. Build a new generation via the Crossover-instance. 3. Copy the elites of the previous into the next generation. 4. Fill the remaining individuals from the new generation created by the crossover-process in order of their creation. 5. Execute the mutation operation of the PointMutation-instance on the newly build generation. 6. Evaluate the fitness of the individuals. 7. Sort the population according to the fitness of each individual in descending order. The crossover-operation is done as described in 5.4.2 on page 51: 94 Norman Offel 6.4 CepGP 1. A TournamentSelection-instance from the selection-package determins the participating individual. 2. With a given probability (crossover rate), the individual undergoes crossover. 3. The ConditionTreeTraverser helps to select the crossover points and to insert the subtree of the copy of the second into the copy of the first selected individual. The crossover point is identified following the proposed process in 5.4.2 on page 51. 4. The AttributeConditionTreeTraverser helps to repair the ACT after the crossoveroperation according to section 5.4.2 on page 60. The mutation-operation follows the steps described in 5.4.3 on page 63: 1. Iterate over the population and decide for each individual whether mutation takes place according to the given mutation rate. 2. If the individual is chosen to do so, mutate by means of the PointMutation-instance which implements the proposed point mutation algorithm. 3. Repair the ACT after the mutation with the help of the AttributeConditionTreeTraverser as described in section 5.4.2 on page 60. 6.4.4 Evaluation The implementation of CepGP is done in Java in version 8 to exploit the introduced streamingAPI1 and time-API2 . The streaming-API enables an efficient and reliable concurrent evaluation of the individuals in a population which is illustrated in the following listing: Consumer<R u l e W i t h F i t n e s s > c a l c F i t n e s s = i n d i v i d u a l −> { i n d i v i d u a l . c o n d i t i o n F i t n e s s = condFf . f i t n e s s O f ( i n d i v i d u a l , eh . g e t W i t h o u t C o m p l e x E v e n t ( ) , eh . g e t I n d i c e s O f C o m p l e x E v e n t ( ) ) ; i n d i v i d u a l . w i n d o w F i t n e s s = w i n F f . f i t n e s s O f ( i n d i v i d u a l . getWindow ( ) ) ; i n d i v i d u a l . c o m p l e x i t y F i t n e s s = complFf . f i t n e s s O f ( i n d i v i d u a l ) ; }; Arrays . stream ( p o pu l a ti o n ) . p a r a l l e l ( ) . forEach ( c a l c F i t n e s s ) ; The population is stored in an array of the length of the number of individuals therein. Exploiting the streaming-API on the population means that the individuals are concurrently processed and for each individual the defined Consumer is invoked. The consumer calls the fitness functions for the three objectives condition (condFf), window (winFf) and complexity (complFf) respectively and updates the fitness values within the individual. Since, in CEP, rules do not have a fitness attribute, the only conversion needed between the rule representation in CEP and in the Genetic Programming algorithm is the RuleWithFitnessclass in the wrappers-package. It inherits all methods and attributes from the CEP rule representation and adds 1 2 https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html https://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html August 28, 2016 95 6 Implementation • Attributes for the fitnesses. • A condition threshold attribute. • Attributes for the weights of the window fitness and the complexity fitness (in relation to the condition fitness) as described in section 5.5 on page 67 and section 6.4.5. • An implementation of the Comparable-interface3 to enable easier sorting of the population. • The condition threshold which is the limit the condition fitness of the individual needs to exceed so that its window fitness and complexity fitness are also factored in. • A method to calculate the total fitness. As already described in the section 6.3 on page 85 about the rule engine, the evaluation of the condition fitness includes the individual and the necessary information about the event stream without the complex event and the original indices of that specific event. The window fitness function only needs the window of the individual, whereas the complexity fitness function determines the tree complexity of the ECT and the ACT of the individual. 6.4.5 Parameters CepGP is designed to work with the input file and the name of the special event type as the minimal parameters needed to start and find the most appropriate rule. However, to tweak the quality and performance of the algorithm and to enable a better evaluation of the algorithm, CepGP as a program provides several additional optional parameters to be set via the user. The minimum parameters are in this order: 1. input file 2. complex event type 3. “true” or “false” if the algorithm shall consider ACTs at all or not If the user wants to set more parameters, she has to give all of the following ones in order: 4. population size (default: 5, 000) 5. number of generations (default: 30) 6. crossover rate (default: 0.8) 7. mutation rate (default: 0.05) Currently only alterable in the source code of the program in the class CepGP are the following further options: 3 https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html 96 Norman Offel 6.4 CepGP • MAX EVENT CONDITION TREE HEIGHT is set to 1 for the initial population. Thus, in the first population, individuals have an ECT consisting of either an event type or one ECT-operator with event types as operands. During the evolutionary process, the ECTs will inevitably grow by combining their subtrees in the crossover operations, what is known as bloat. During the 30 default generations, the individuals usually still have a reasonable length to be interpretable by a domain expert. Furthermore, unnecessarily long and complex rules are graded with a lower fitness value because of the complexity fitness function. • MAX ATTRIBUTE CONDITION TREE HEIGHT is also set to 1 for the initial population for the same reason as the MAX EVENT CONDITION TREE HEIGHT. • WINDOW TIME UNIT is set to be ChronoUnit.SECONDS4 , meaning that time windows of CepGP will always use seconds as the unit for its values since this is the minimum time unit covered in the proposed input format for events presented in section 6.2.1 on page 83. • TOURNAMENT SELECTION SIZE is set to 2 which is the minimum and most common value for the tournament selection size. This value usually is not greater than 5. • ELITISM RATE is set to 0.1 which means that the best 10% of 5, 000 = 500 individuals survive at each generation to the next. • ConditionFitnessFunction is set to the described Informedness fitness function. However, there are already several more fitness functions implemented in the module EvaluationMeasures in the evaluation-package. • WindowFitnessFunction is set to the proposed logarithmic fitness function. • ComplexityFitnessFunction uses the the proposed fitness function to determine the simpleness of the rules. • CONDITION THRESHOLD is set to 0.5 for rules whose condition fitness exceeds this value to include the window and complexity fitness into the total fitness calculation. • WINDOW FACTOR is the weight of the window fitness compared to the condition fitness. A value of 1 means equal weight, but this value is set to 0.1. • COMPLEXITY FACTOR is the weight of the complexity fitness and is set to 0.001 so it does not disrupt the overall fitness of the whole population but is decisive between rules of otherwise almost equal fitness. The amount of changeable values and options shows how much there is to consider when designing the implementation and tweaking the algorithm to fit a specific or the general case. The presented default values and options have been found reasonable during the evaluation in different scenarios. That is not to say, that these are the best options, especially considering a specific case. Chapter 7 on page 100 goes into more detail of the evaluation to give more insight of the here called reasonable choice of parameters. 4 https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoUnit.html August 28, 2016 97 6 Implementation 6.5 Summary The concept of CepGP has been implemented in all of its aspects: • Within the preparation phase, the program reads and parses the events from the input file and extracts all the meta-information. • The population initialization is implemented with the ramped half-and-half initialization while ensuring valid individuals. It provides options to switch ACTs on and off and options to tweak the quality of the first population by varying maximum heights of the ECT and ACT, and the population size. • The tournament selection algorithm was implemented with the option to adapt the tournament size. The program also enables other selection implementations via the Selection-interface. Elitism is implemented as well and the program allows to manipulate the amount of elites in proportion to the population size. • The subtree crossover was implemented and the program is flexible enough to enable other crossover implementations as long as they adhere to the Crossover-interface. It encapsulates the type-safe realizations of the crossover-operations of the different components ECT, ACT and the window. • The point mutation was implemented as described in this thesis. A different implementation for mutation can be used which has to fulfill the requirements of the Mutation-interface and may encapsulate the execution of different mutation algorithms as proposed for future work. • The repairing algorithm for the ACT after evolutionary operations has been implemented as described in this thesis. • The proposed fitness functions are implemented as described while providing an interface for each objective to enable other fitness functions to be implemented in future works. • It is easy to use because the program only needs three minimal inputs: the input file name, the name of the special complex event and a boolean value indicating whether attributes shall be considered during the search. But it also can be used to define a number of more advanced settings to individualize the search according to the problem by the user. All of this is used to evaluate the performance and quality of the concept and the implementation. The main goal of providing the foundation to have a better insight into the details of the concept and enable a systematic evaluation has been successfully achieved. Although the implementation of the concept was successful, the evaluation is limited due to the incompleteness of the self-implemented rule engine. The missing features are: 98 Norman Offel 6.5 Summary • Event comparison operators with a negation (the ¬ and the excluding sequence operator): The Genetic Programming algorithm can handle these operators, but problems in the implementation of the evaluation of rules with these operators prevented their usage in the evaluation of the algorithm. • Arithmetical Operations: Combining attribute values via arithmetical operations like addition, subtraction, multiplication or division are currently also not part of the CepGP algorithm and may be subject to future research. They may be added as new leaf nodes in the ACT. But thorough analysis has to show how type-safe mechanism have to be implemented into the evolutionary operators as well. • Additional Comparison Operators (≤, ≥, 6=): Their underlying operators (<, >, =) are implemented and can lead to the logical equivalent forms already. Combining < with an ∨-operator and an = leads to the equivalent of ≤ for example. The ≥ can be done analogously. The 6= is a combination of the = and an ¬-operator. However, the attributes have to be the same and in the same order within the underlying comparison operators to truly form the equivalent form. Since this is a rather unlikely event during the Genetic Programming search, it is a reasonable task for future improvements of the rule engine. The CepGP algorithm already supports these comparison operators. • Aggregation Functions (sum, avg, min, and max): The integration of the aggregation functions into the rule engine and the CepGP algorithm can be a challenging task. One option might be to add the aggregation function as an additional leaf node to the ECT and derive it from the Event class. Every time the encountered event is of the correct type, recalculate the value of the function and save it as an attribute which can be accessed via the ACT. But this and other ideas need further research and analysis to completely understand its consequences. Although the missing features prevent a fully scaled analysis of the CepGP algorithm, the successfully implemented features still provide a good foundation to build the evaluation on and make meaningful statements about the CepGP algorithm based on the implementation. August 28, 2016 99 7 Evaluation The CepGP algorithm offers a way to search through an event stream for a rule which leads to a specific event type in the stream by using Genetic Programming at its core. The implementation just presented incorporates most of its features and enables an evaluation of the proposed approach with several options that can be manipulated. This chapter presents the findings in practical use of the proposed algorithm in combination with the provided implementation. It starts by elaborating the test data and afterwards guides through the evaluation process and the results while also explaining the consequences. The chapter concludes with a summary of the overall findings. 7.1 Test Data The evaluation of such optimization or search algorithms normally is based on a common data set that allows comparison of different approaches in various aspects. The UCI provides a repository of real data sets for various research fields including time series data sets which is close to CEP data sets.[1] However, this work uses artificially created data sets to evaluate the algorithm. The reasons are mainly timely constraints of the thesis. A benefit of this approach is the flexibility to create data with varying numbers of attributes, event types and overall tailor the data set in a way to evaluate specific characteristics of the algorithm. It also eliminates pitfalls of real data like noise or missing attributes and so on. To protect the artificial data from accusations that they may have been created in a way that would skew the result in a positive way for the algorithm, the DataCreator-class of the util/data-package (see figure 6.2 on page 92) uses the SecureRandom-class1 which provides a cryptographically strong random number generator for its data creation whenever random input is needed. The DataCreator creates data sets that comply with the input specification described in section 6.2.1 on page 83. It can create up to 26 event types (number of alphabet letters) and at least one attribute. That is because CEP data sets normally have events with at least a sensor ID or similar attributes. The creation process creates the same attributes for each event with the same types but uses random integer numbers between [-100, 100] for each 1 https://docs.oracle.com/javase/8/docs/api/java/security/SecureRandom.html 100 7.2 Testing attribute. After the amount of events have been created, the creation algorithm applies a user defined rule on it and inserts the special events into the data before writing the result to a file. For the evaluation, there are three data sets created: small uses 500 events with three event types with the following amount of attributes: {A=1, B=1, C=1}. The applied rule is: C ((A as A0 ∨ (C as C0 ∧ B as B0)) ∧ (B0.ID = C0.ID))[win:time:180Seconds] =⇒ HIT and it fired 42 times. medium uses 1000 events with five event types with the following amount of attributes: {A=2, B=2, C=2, D=2, E=2}. The applied rule is: C ((A as A0 → E as E0) ∧ (A0.ID = E0.ID))[win:time:600Seconds] =⇒ HIT and it fired 10 times. large uses 2500 events with eight event types with the following amount of attributes: {A=2, B=2, C=3, D=4, E=2, F=4, G=5, H=5}. The applied rule is: C ((C as C0 ∧ (B as B0 → (A as A0 ∨ D as D0))) ∧ (C0.ID = B0.ID))[win:length:15] =⇒ HIT and it fired 158 times. During the Testing phase, the algorithm is supposed to find the applied rule from the data creation. 7.2 Testing The tests evaluate some properties of the algorithm to validate assumptions and identify drawbacks and strengths. First, the overall convergence to optimal solutions is tested. Afterwards follows the determination of the default parameters for the algorithm. The section proceeds with a comparison of CepGP and a random walk through the problem space approach and it concludes with a consideration of noise in the data and a discussion of the findings. August 28, 2016 101 7 Evaluation 7.2.1 Convergence To see if the algorithm works, it should converge from the first rather poor results to better and better solutions from generation to generation. Figure 7.1 to 7.3 show one run each of the program implementing CepGP on the data sets small, medium, and large respectively. 1 0.9 0.8 Fitness 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 Generations best average Figure 7.1: CepGP Converging in Small Data Set Figure 7.1 illustrates that the algorithm was able to slowly but steadily converge to better and better solutions. Although, it was not able to find really good solutions. There can be different reasons for this. One is that the algorithm is not suited for this problem, the problem space is too big, or there are only very few really good solutions at all within the problem space while the rest is almost equally mediocre. The following tests suggest that the last of the mentioned reasons might be true in this case. It is hard for every search algorithm to converge to the best results when the fitness of all individuals in the problem space is almost equal while there are few peaks with neighboring solution that do not offer better than average fitness as well. It ends up to be the search for a needle in the haystack. Nevertheless, as described later, the algorithm still is useful in these rather uncommon scenarios. The average fitness of the population slowly but steadily increased from generation to generation and stagnated at fitness value of about 0.45 from generation 23 onwards. The difference in best to average fitness indicates that even after 25 and more generations, there is still moderate diversity in the population which can lead to even better solutions in the long run. Figure 7.2 on the facing page shows an already good starting point for the algorithm on which it steadily improved during the evolutions to come on the medium data set. The average fitness expectedly started with a fitness value of about 0 and rapidly increased. In the long run it stabilized at the fitness value of about 0.7. Here, the algorithm was not only able to find a good but also the optimal solution. Figure 7.3 on the next page provides the convergence of the CepGP implementation on the large data set. As expected of this complex data set and the more complicated rule involved, 102 Norman Offel 7.2 Testing Fitness 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 0 5 10 15 20 25 30 Generations best average Figure 7.2: CepGP converging in medium data set 1 0.9 0.8 Fitness 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 Generations best average Figure 7.3: CepGP converging in large data set August 28, 2016 103 7 Evaluation although the algorithm had a good starting point, it could only manage to reach a good but not optimal solution. The average fitness of the population, on the other hand, could catch up to the best individuals midway through the search process. At the end of the run, the best and average fitness was very close, what indicates that the population’s diversity is rather low and a lot of very similar individuals reside in it. If no measures are taken at that point, further optimization is very unlikely to acquire better results later on. The overall findings on convergence of the algorithm show that on all three data sets the implementation manages to improve from the early to the late generations through. This indicates that the algorithm itself works. The section continues by evaluating different parameter settings to find the most suitable default parameters. 7.2.2 Parameter influence The parameters for population size, generations, crossover rate, and mutation rate are different from every Genetic Programming algorithm and strongly depend on the problem domain. In their field guide ([39] p. 26f. and p. 30f.), Poli et al. found the parameter guidelines in table 7.1 to be widely used and a good starting point: Population size Generations Crossover rate Mutation rate several thousands and at least 500 10 to 50 0.9 0.01 Table 7.1: Parameter Suggestions by Poli et al. [39] The limiter for the population size is the computation time for the fitness evaluation. Whatever amount can be evaluated reasonably fast is good, but it should be at least 500 individuals. Poli et al. also give advice for the amount of generations: “[. . . ] the most productive search is usually performed in [the] early generations, and if a solution hasn’t been found then, it’s unlikely to be found in a reasonable amount of time.”([39] p. 27) This work begins by using these guidelines as a starting point to find good parameters for the problem domain of CEP rule search. The result is shown in figure 7.4 on the next page with 500 individuals, 30 generations (as the middle of the suggested amount of generations) and the crossover rate and mutation rate from the table. Even though the overall result is mediocre, the figure already shows that the algorithm itself is converging towards better results and, thus, works in general. Population Size To improve the performance, the first parameter to adapt is the population size. Figure 7.5 on the facing page displays the effect with different values for population size while setting 104 Norman Offel 7.2 Testing 1 0.9 0.8 0.7 Fitness 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 Generations best average Figure 7.4: CepGP Result on Small Data Set With Initial Parameters the other values as they are suggested (generations: 30, crossover rate: 0.9, mutation rate: 0.01). 1 0.9 0.8 Fitness 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 Generations 500 1000 2000 3000 5000 10000 Figure 7.5: CepGP Result on Small Data Set With Varying Population Sizes The more individuals within a population the higher the chances that better solutions are created. Even though that is not surprising, the figure is not unambiguous in this matter. That may be an indicator that in the small data set, there is only a tiny fraction of solutions that provide really good fitnesses. But even then, the trend that is displayed still remains and proves the assumption of the more individuals within a population the better the results. To be able to process data sets with more information than in the small data set, the parameter for population size is set to 5000 by default which also achieves the overall best results in this example. August 28, 2016 105 7 Evaluation Generations Fitness The next parameter that may lead to better results by increasing is the number of generations of the algorithm. There are different schools of thought in that matter. Some argue that higher population sizes can reduce the generations whereas others say, that smaller population sizes but much more generations yield better results.([39] p. 27) Figure 7.6 depicts the effects of different amounts of generations with a population size of 5000, crossover rate of 0.9, and a mutation rate of 0.01. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 Generations small medium large Figure 7.6: CepGP Result on Small Data Set With Varying Amounts of Generations The figure illustrates that more than 30 generations seem to get stuck at a fitness level without significant improvements. Therefore, this works suggests 30 generations to be the best choice for the general case. The more generations the more crossover and the longer the rules which in turn causes longer evaluation time and less intuitive rules. This is also why only up to 40 generations have been evaluated. The rules in later generations turned out to be too long and took too much time (several hours for the large data set) to be processed. Thus, it is advisable to stop the evolutionary process before the evaluation time gets unreasonably long. This can be remedied on many levels. The stopping criterion can have multiple conditions such as the lack of significant improvements over some generations or a maximum amount of processing time. To stop the bloat and overly long rules, the algorithm could also use a hard limit for the tree heights and sizes. The limits for ECT and ACT should be chosen carefully because they limit the search space to find rules, too. To find more suitable stopping criteria is a remaining task for future work. Crossover Rate The crossover rate embodies the amount of individuals for the next generation that resulted from crossover of two parents. The other individuals are probabilistically chosen from the current generation and survive as they are into the next generation. This already allows good individuals to be present in future generations with a higher chance for the absolute best. In 106 Norman Offel 7.2 Testing that sense, the crossover rate controls two properties of the algorithm at once. CepGP uses Elitism to reduce the risk of the best individuals to get lost in the process and to enable a higher crossover rate without worrying about losing the best. A given percentage is chosen to definitely survive the crossover stage while the rest is filled up with crossover offsprings. The crossover rate can now be set to a very high value when the elitism rate is chosen to cover enough of the best individuals to keep them alive. The elitism rate is set to 0.1, meaning that the best 10% of the current generation always are available in the mutation phase. During the mutation every individual can be afflicted again. Figure 7.7 illustrates the impact of varying crossover rates with population size of 5000, 30 generations and a mutation rate of 0.01. 1 0.9 0.8 Fitness 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 Generations 0.5 0.6 0.7 0.8 0.9 1 Figure 7.7: CepGP Result on Small Data Set With Different Crossover Rates All of the tested crossover rates did not have a big impact on the overall results. However, it seems that 0.8 is the best choice for the crossover rate which also is close to the proposed 0.9 from the beginning of this section. Mutation Rate The mutation rate determines the ratio of the result population of elitism and crossover that undergoes random but minor changes in their structure (ECT, ACT or window). It is the only way new information can be added to the population after the initialization. Crossover combines the information of fit individuals into new structures, but the information within the individuals, the information the crossover operations can work on, is fixed or even decreases if not for the mutation operation. Hence, it is an important contributor to the search process. However, it alters potentially valuable information of individuals by replacing them with information that may be almost useless, too. This may lead to a disruption of the search process and its convergence to the optimal solutions if it happens too often. The August 28, 2016 107 7 Evaluation mutation rate very much depends on the amount of overall information in the data set. If there are a lot of event types and/or attributes which cannot be reasonably covered by the initial population and a moderate mutation rate then it should be slightly increased. If it still does not produce satisfying results, then the problem space might be too big to handle for this algorithm as it is. Figure 7.8 shows the different convergences with varying mutation rates and the parameters already discussed: 5000 individuals per population, 30 generations, crossover rate of 0.8. 1 0.9 0.8 Fitness 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 Generations 0.01 0.03 0.05 0.08 0.1 0.2 Figure 7.8: CepGP Result on Small Data Set With Different Mutation Rates Even though figure 7.8 is not clear about it, the best results are achieved with a mutation rate under 0.1. Mutation rates greater than 0.1 tend to occasionally decrease in their best fitnesses during the search process because the probability of altering the best individuals of the previous generation is higher. The trade-off lies between more new information in each generation while at the same time disrupting the search process more or less depending on the amount of overall changes. The best quality could be acquired with a mutation rate of 0.05. 7.2.3 CepGP vs. Random Walk The comparison of CepGP against a random walk through the problem space assures that the algorithm indeed uses a more sophisticated approach to search through the problem space of possible rules than a random guess-and-try algorithm. The random algorithm works by randomly generating the same amount of rules CepGP would generate throughout the whole process. The parameters during the process are: • 30 Generations × 5000 Individuals = 150, 000 rules 108 Norman Offel 7.2 Testing • maximal event and attribute condition tree height: 4 • attribute condition tree rate: 0.8 • Half-and-Half initialization for generating the rules Figure 7.9 looks at the best individuals for the three sample data sets: small, medium and large. The most noticeable aspect is the superiority of CepGP over the random guess-and-try algorithm. In every sample data set, CepGP manages to achieve better results. 1 0.9 Fitness 0.8 0.7 0.6 0.5 0.4 small medium large GepGP Random Figure 7.9: CepGP vs. Random Walk Comparing the Bests The best found rules are displayed in table 7.2 on page 111. The first column contains the data set and the rule which was used to insert the complex events for which the algorithms should find the most appropriate rule. The second and third column contain the best rules found via CepGP or the random approach respectively. The first thing to notice is that the rules found by CepGP are always not only fitter in general but also in every objective. The rules are shorter and more concise, contain the smaller window, and are more accurate. Better results can in general be achieved by more generations which lead to longer runs because the rules get more specialized and longer. A faster convergence to the optimal solution can also be achieved by altering the parameters for crossover and mutation and by adding some knowledge to begin at a better starting point via adjustments of the attribute condition rate for example. Apart from the difference in quality, CepGP also processed the same amount of individuals much faster. This is grounded in the way CepGP works. From generation to generation, the population gets fitter on average which also leads to rules that are concise and closer to the optimal than the average random rules. August 28, 2016 109 Small ((A as A0 ∨ (C as C0 ∧ B as B0)) ∧ (B0.ID = C0.ID)) [win:time:180Seconds] =⇒ HIT Medium ((A as A0 → E as E0) ∧ (A0.ID = E0.ID)) [win:time:600Seconds] =⇒ HIT Large ((C as C0 ∧ (B as B0 → (A as A0 ∨ D as D0))) ∧ (C0.ID = B0.ID)) [win:length:15] =⇒ HIT CepGP 0.59515 (condition: 0.58533 window: 0.69401 complexity: 0.52222) ((B as B0 ∧ C as C0) ∧ ((C0.ID < 8.0) ∧ ((B0.ID < 6.25) ∧ (B0.ID > 2.75)))) [win:time:93Seconds] =⇒ HIT 0.98569 (condition: 1.00000 window: 0.84194 complexity: 1.05556) ((A as A0 → E as E0) ∧ (A0.ID = E0.ID)) [win:time:590Seconds] =⇒ HIT 0.74394 (condition: 0.75400 window: 0.64563 complexity: 0.51515) (((B → (A ∨ D)) ∧ C) ∧ ((C ∧ (((((C ∧ ((B → ((D ∨ (A ∨ A)) ∧ C)) ∨ C)) ∧ C) ∨ A) → C) ∨ D)) ∧ C)) [win:length:16] =⇒ HIT Random 0.52464 (condition: 0.51781 window: 0.59496 complexity: 0.32492) ((((A as A0 ∨ A as A1) → B as B0) ∧ (((C as C0 ∧ A as A2) ∨ (B as B1 → A as A3)) ∨ ((A as A4 → A as A5) ∧ (B as B2 ∨ B as B3)))) ∧ ((((¬ (A3.ID < 8.0)) ∧ (A1.ID > 2.75)) ∧ (C0.ID = B0.ID)) ∧ (B2.ID < 8.0))) [win:time:184Seconds] =⇒ HIT 0.83857 (condition: 0.84600 window: 0.76770 complexity: 0.49048) (E as E0 ∧ ((E0.ID < 3.0) ∨ ((((E0.e1 > 50.25) ∧ (E0.ID > 1.0)) ∨ ((E0.ID > 1.0) ∧ (E0.e1 > E0.ID))) ∨ ((¬ (E0.ID = E0.ID)) ∨ (E0.e1 > -49.25))))) [win:time:1183Seconds] =⇒ HIT 0.65795 (condition: 0.67200 window: 0.51928 complexity: 0.47312) ((((F → E) ∨ (E → G)) ∨ ((F → F) ∨ (E ∧ H))) → (((A ∧ B) ∨ (D → C)) → ((C ∧ C) ∧ (A ∨ D)))) [win:length:43] =⇒ HIT Norman Offel Table 7.2: Comparison of the Original Hidden Rule and the results of CepGP and Random 7 Evaluation 110 Data Set (Hidden Rule) 7.3 Result Discussion 7.2.4 Noise influence All of these tests have been conducted under ideal circumstances. The events have been artificially created with no missing values or measuring errors and their order is always correct. In real world environments, however, there are multiple sources of error that introduce slight shifts in the order of events, missing or erroneously read values, transmission errors and so on. These errors are also called noise in the data and, for an algorithm to work, they need to be treated before the actual algorithm can use it. Due to timely constraints, this thesis could not elaborate on remediations for CepGP. The current state of the algorithm and the implementation takes no concern of these errors and therefore is very prone to them. Especially inserting the special events manually into the recorded stream can have a huge impact on the results. Future works will have to address this matter. 7.3 Result Discussion CepGP proved to be a working approach on three different artificially generated data sets. Under any circumstances, it converges towards the optimal result and always achieved better outcomes than randomly guessing the same amount of individuals would. Table 7.3 summarizes the default values for the algorithm that have been found to work best for the small data set. This particular data set seems to be a challenging task because good results are hard to find. Even in this scenario, CepGP acquired better rules. Population Size Generations Crossover Rate Mutation Rate Elitism Rate 5000 30 0.8 0.05 0.1 Table 7.3: Final Default Parameters for CepGP The runs depicted in the figures throughout this section have been chosen as representatives of the majority of outcomes tested during this thesis. Even though there have been worse results, they were always still better than the randomly acquired rules. This evaluation provides a basis for more research into this approach and future improvements. It demonstrates the overall suitability of the proposed algorithm and suggests parameter values for its application. Future works also need to consider to run the algorithm against real world problems to see if the algorithm can live up to the results achieved in this laboratory environment. August 28, 2016 111 8 Conclusion This thesis proposes a new algorithm to derive a Complex Event Processing (CEP) rule for a given event type within a recorded historical event stream. Such rule extraction algorithms have been proposed in different contexts but only few are addressing the field of Complex Event Processing. Even among them, the approach of using Evolutionary Computation has yet to be elaborated. Complex Event Processing is based on rules to represent domain knowledge and to enhance the meaning of the information processed to yield higher level abstractions. An Evolutionary Computation algorithm needs to encompass the ability to convert the process of crossover, mutation, and selection to a rule representation that also maps to the real world CEP rule domain. Genetic Programming is an Evolutionary Computation algorithm that is specifically designed to work on trees which are also often used to represent rules. This thesis proposed a sophisticated algorithm that considers the specifications of CEP rules while giving the underlying search algorithm using Genetic Programming the freedom to use its evolutionary operations on the tree representation to achieve the goal of the most appropriate rule for the occurrences of the specified event type. This work first described the backgrounds for Evolutionary Computation, Complex Event Processing, and Rule Learning as the three main research fields that are bound together in this thesis. Because the related works suggest a promising result from the application of Genetic Programming to CEP rule derivation, the general approach elaborates the scenario of this work, narrowed the focus and established the basis for the CepGP algorithm. This algorithm was introduced by setting the general process and analyzing the parts of CEP rules and how they can be represented within a tree. Each part of a general Genetic Programming algorithm was presented in a way that most suits the specification of CEP rules while adhering to strong-typing and structure constraints to allow only valid rules during the evolutionary process. After the concept for the algorithm was explained, the thesis continued by presenting an implementation including a rule engine which showed the overall applicability of CepGP and that the algorithm is not bound to specific frameworks or other software and can work as it is implemented. Even though the proposed algorithm indeed finds good results in various data sets, there is still a lot room for improvement and enhancements. 8.1 Contributions CepGP is the first Evolutionary Computation algorithm used to derive a CEP rule within a given data set for a given event type. Genetic Programming algorithms need flexibility within 112 8.2 Future Work the solutions they generate and within their operations to make use of their advanced way of searching through a problem space. Complex Event Processing rules on the other hand impose rather strong constraints for the rule structure and components. This thesis accomplishes the challenging task of converting CEP rules into the world of Genetic Programming by using strong-typing and structure constraints and incorporating the idea of repairing invalid rules at every stage of the evolutionary process. CepGP manages to include the most important parts of the CEP rule specification into its proposed search algorithm. This work could also show, that the algorithm is able to produce good results within artificially created data and hints to promising parameters for the practical use. 8.2 Future Work The presented algorithm and implementation for CepGP is only a beginning. It builds the foundation for future research in applying Genetic Programming specifically and Evolutionary Computation in general within the field of automatic rule discovery in Complex Event Processing. Although it could show that this may lead to promising results in the future, there are still some aspects left unconsidered or in need of improvement. The algorithm includes most parts of the specification for CEP rules. However, so far some important functions are missing. Arithmetical operations and most importantly aggregation functions are the next steps to enhance the capabilities of the CepGP algorithm. The step after would need to consider other value types than numbers. This imposes new challenging problems within the type-system which may need to be loosened and find other ways to ensure valid rules in the outcome. So far, the action part of the CEP rule is just a mere placeholder and potentially could be useful for future work, too. Multi-stage search processes or Learning Classifier Systems (LCS) may profit from this function. The implementation of CepGP currently does not embody all the aspects which the algorithm is capable of. Although, this is crucial to evaluate potential enhancements and updates on the algorithm itself. Of course, the rule engine could be enhanced and developed to grow in parallel to the algorithm. Both, the algorithm and the implementation, however, can work separately. This leaves room for other possibilities to validate some yet untested properties of the of CepGP algorithm by using available fully fledged CEP engines for the condition fitness evaluation. Exchanging the rule engine also leads to a new functionality of the implementation to convert from the tree representation within the algorithm implementation to the rule representation for the chosen CEP engine and from the evaluation result of the CEP engine to the expected evaluation result of the algorithm. In [26], Luckham also described problems regarding causality within the events. “Causality is dynamic” (p. 241 [26]) and therefore relations of an event being the cause of another depends on the circumstances and may be true in some but not necessarily in all cases. If a rule was found by the algorithm, it may be less fit because it is true only most of the times or August 28, 2016 113 8 Conclusion sometimes, but not always because the occurrences of the event may have different causes at different times. This is a fact that has to be kept in mind at all times while working on the algorithm and interpreting the results it yields. It can only reliably find static causes in that sense. With growing dynamics between the causes and effects, the outcome becomes less and less dependable. Adding to this are real world scenarios in distributed CEP systems. Timestamps do not always signal the actual order of events because of a lack of a global clock and unsynchronized local clocks. ([26] p. 242f.) But Luckham also provides a way to remedy the inability to know the cause and effect relationship between events. To find real causes in cause dynamic or distributed systems, one can use cause models. ([26] p. 242f.) A causal map adds a causal vector to each event which describes the causality attribute of that event and references the actual events leading to this specific event. This not only allows to reliably identify effect and causes between events but also provides an even better understanding because this cause and effect relation is processed for every event, not only for a single one, enabling a clear view over concurrency and synchronization of processes in the system. ([26] p. 243) Causal maps need to be recorded as well to be used within the the proposed algorithm in future work. But if there is this information, CepGP could be improved by using it and finding information of higher abstraction within the event streams and aiding the domain expert in the pursuit of a perfectly adopted CEP system for the specific environment. 114 Norman Offel Bibliography [1] Center for machine learning and intelligent systems – machine learning repository. Website, August 2016. https://archive.ics.uci.edu/ml/datasets.html; Accessed: 2016-08-22. [2] M. Atzmueller. Enterprise Big Data Engineering, Analytics, and Management. Advances in Business Information Systems and Analytics. IGI Global, 2016. [3] C. C. Bojarczuk, H. S. Lopes, and A. A. Freitas. Discovering comprehensible classification rules using genetic programming: a case study in a medical domain. In Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume 2, pages 953–958. Morgan Kaufmann Publishers Inc., 1999. [4] R. Bruns and J. Dunkel. Complex Event Processing: Komplexe Analyse von massiven Datenströmen mit CEP. Springer-Verlag, 2015. [5] S.-H. Chen. Genetic algorithms and genetic programming in computational finance. Springer Science & Business Media, 2012. [6] I. De Falco, A. Della Cioppa, and E. Tarantino. Discovering interesting classification rules with genetic programming. Applied Soft Computing, 1(4):257–269, 2002. [7] M. Dempster and C. Jones. A real-time adaptive trading system using genetic programming. Quantitative Finance, 1(4):397–413, 2001. [8] L. Ding, S. Chen, E. A. Rundensteiner, J. Tatemura, W.-P. Hsiung, and K. S. Candan. Runtime semantic query optimization for event stream processing. In 2008 IEEE 24th International Conference on Data Engineering, pages 676–685. IEEE, 2008. [9] C. Donalek. Supervised and unsupervised learning. Website, April 2011. http://www. astro.caltech.edu/~george/aybi199/Donalek_classif1.pdf; Accessed: 201608-10. [10] P. G. Espejo, S. Ventura, and F. Herrera. A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 40(2):121–144, 2010. [11] A. A. Freitas. A genetic programming framework for two data mining tasks: classification and generalized rule induction. Genetic programming, pages 96–101, 1997. [12] A. A. Freitas. Data mining and knowledge discovery with evolutionary algorithms. Springer Science & Business Media, 2002. 115 Bibliography [13] A. Frömmgen, R. Rehner, M. Lehn, and A. Buchmann. Fossa: Learning eca rules for adaptive distributed systems. In Autonomic Computing (ICAC), 2015 IEEE International Conference on, pages 207–210. IEEE, 2015. [14] F. Gao, E. Curry, M. I. Ali, S. Bhiri, and A. Mileo. Qos-aware complex event service composition and optimization using genetic algorithms. In International Conference on Service-Oriented Computing, pages 386–393. Springer, 2014. [15] D. E. Goldberg. Dynamic system control using rule learning and genetic algorithms. In IJCAI, volume 85, pages 588–592, 1985. [16] D. E. Goldberg. Genetic algorithms in search, optimization, and machine learning. Addison Wesley Longman, 30th edition, 2012. [17] J. J. Grefenstette, C. L. Ramsey, and A. C. Schultz. Learning sequential decision rules using simulation models and competition. Machine Learning, 5(4):355–381, 1990. [18] J. H. Holland. Escaping brittleness. In Proceedings Second International Workshop on Machine Learning, pages 92–95. Citeseer, 1983. [19] K. J. Holyoak and R. G. Morrison. The Cambridge handbook of thinking and reasoning. Cambridge University Press, 2005. [20] R. Huang. Evolving prototype rules and genetic algorithm in a combustion control. In Industrial Automation and Control, 1995 (I A amp; C’95), IEEE/IAS International Conference on (Cat. No.95TH8005), pages 243–248, Jan 1995. [21] C. Z. Janikow. A knowledge-intensive genetic algorithm for supervised learning. Machine Learning, 13(2):189–228, 1993. [22] C. M. Johnson and S. Feyock. A genetics-based technique for the automated acquisition of expert system rule bases. In Developing and Managing Expert System Programs, 1991., Proceedings of the IEEE/ACM International Conference on, pages 78–82, Sep 1991. [23] J. R. Koza. Genetic programming: on the programming of computers by means of natural selection, volume 1. MIT press, 1992. [24] H.-L. Liu, Q. Chen, and Z.-H. Li. Optimization techniques for rfid complex event processing. Journal of computer science and technology, 24(4):723–733, 2009. [25] D. Lohpetch and D. Corne. Discovering effective technical trading rules with genetic programming: Towards robustly outperforming buy-and-hold. In Nature & Biologically Inspired Computing, 2009. NaBIC 2009. World Congress on, pages 439–444. IEEE, 2009. [26] D. Luckham. The power of events an introduction to complex event processing in distributed enterprise systems. Addison-Wesley, Boston, Mass. [u.a.], 3rd print edition, 2005. 116 Norman Offel Bibliography [27] D. Luckham and W. R. Schulte. Event processing technical society - event processing glossary version 2.0. Event Processing Technical Society, July 2011. [28] D. Mallick, V. C. S. Lee, and Y. S. Ong. An empirical study of genetic programming generated trading rules in computerized stock trading service system. In 2008 International Conference on Service Systems and Service Management, pages 1–6, June 2008. [29] A. Margara, G. Cugola, and G. Tamburrelli. Learning from the past: automated rule generation for complex event processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems, pages 47–58. ACM, 2014. [30] J. A. Marin, R. Radtke, D. Innis, D. R. Barr, and A. C. Schultz. Using a genetic algorithm to develop rules to guide unmanned aerial vehicles. In Systems, Man, and Cybernetics, 1999. IEEE SMC ’99 Conference Proceedings. 1999 IEEE International Conference on, volume 1, pages 1055–1060 vol.1, 1999. [31] N. Mehdiyev, J. Krumeich, D. Enke, D. Werth, and P. Loos. Determination of rule patterns in complex event processing using machine learning techniques. Procedia Computer Science, 61:395–401, 2015. [32] D. J. Montana. Strongly typed genetic programming. Evolutionary computation, 3(2):199–230, 1995. [33] R. Mousheimish, Y. Taher, and K. Zeitouni. Automatic learning of predictive rules for complex event processing: Doctoral symposium. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, DEBS ’16, pages 414–417, New York, NY, USA, 2016. ACM. [34] R. Mousheimish, Y. Taher, and K. Zeitouni. Complex event processing for the nonexpert with autocep: Demo. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, DEBS ’16, pages 340–343, New York, NY, USA, 2016. ACM. [35] C. Mutschler and M. Philippsen. Learning event detection rules with noise hidden markov models. In Adaptive Hardware and Systems (AHS), 2012 NASA/ESA Conference on, pages 159–166. IEEE, 2012. [36] C. Neely, P. Weller, and R. Dittmar. Is technical analysis in the foreign exchange market profitable? a genetic programming approach. Journal of financial and Quantitative Analysis, 32(04):405–426, 1997. [37] C. J. Neely and P. A. Weller. Technical trading rules in the european monetary system. Journal of International Money and Finance, 18(3):429 – 458, 1999. [38] M. Oussaidene, B. Chopard, O. V. Pictet, and M. Tomassini. Parallel genetic programming and its application to trading model induction. Parallel Computing, 23(8):1183– 1198, 1997. August 28, 2016 117 Bibliography [39] R. Poli, W. B. Langdon, N. F. McPhee, and J. R. Koza. A field guide to genetic programming, march 2008. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk. [40] J.-Y. Potvin, P. Soriano, and M. Vallée. Generating trading rules on the stock markets with genetic programming. Computers & Operations Research, 31(7):1033–1047, 2004. [41] D. M. Powers. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2011. [42] E. Rabinovich, O. Etzion, and A. Gal. Pattern rewriting framework for event processing optimization. In Proceedings of the 5th ACM international conference on Distributed event-based system, pages 101–112. ACM, 2011. [43] I. Rechenberg. Evolution strategies. Website, 2016. http://www.bionik.tu-berlin. de/institut/xs2evost.html; Accessed: 2016-07-30. [44] S. Sakprasat and M. C. Sinclair. Classification rule mining for automatic credit approval using genetic programming. In 2007 IEEE Congress on Evolutionary Computation, pages 548–555, Sept 2007. [45] S. Sen, N. Stojanovic, and L. Stojanovic. An approach for iterative event pattern recommendation. In Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems, DEBS ’10, pages 196–205, New York, NY, USA, 2010. ACM. [46] O. Sigaud and S. W. Wilson. Learning classifier systems: a survey. Soft Computing, 11(11):1065–1078, 2007. [47] W. M. Spears and K. A. De Jong. Using genetic algorithms for supervised concept learning. In Tools for Artificial Intelligence, 1990., Proceedings of the 2nd International IEEE Conference on, pages 335–341. IEEE, 1990. [48] D. R. B. Stockwell. Genetic Algorithms II, pages 123–144. Springer US, Boston, MA, 1999. [49] J. C. Tay and N. B. Ho. Evolving dispatching rules using genetic programming for solving multi-objective flexible job-shop problems. Computers & Industrial Engineering, 54(3):453–473, 2008. [50] Y. Turchin, A. Gal, and S. Wasserkrug. Tuning complex event processing rules using the prediction-correction paradigm. In Proceedings of the Third ACM International Conference on Distributed Event-Based Systems, DEBS ’09, pages 10:1–10:12, New York, NY, USA, 2009. ACM. [51] R. J. Urbanowicz and J. H. Moore. Learning classifier systems: a complete introduction, review, and roadmap. Journal of Artificial Evolution and Applications, 2009:1, 2009. [52] J. Wang. Trading and hedging in s&p 500 spot and futures markets using genetic programming. Journal of Futures Markets, 20(10):911–942, 2000. [53] K. Weicker. Evolutionäre Algorithmen. Springer Vieweg, 2015. 118 Norman Offel Bibliography [54] K. Weicker. Evolutionäre Algorithmen. Website, 2016. http://www.imn. htwk-leipzig.de/~weicker/publications/sctreff_ea.pdf; Accessed: 2016-0419. [55] G. M. Weiss and H. Hirsh. Learning to predict rare events in event sequences. In KDD, pages 359–363, 1998. [56] T. Yu, S.-H. Chen, and T.-W. Kuo. Discovering financial technical trading rules using genetic programming with lambda abstraction. In Genetic programming theory and practice II, pages 11–30. Springer, 2005. [57] H. Zhang, Y. Diao, and N. Immerman. On complexity and optimization of expensive queries in complex event processing. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 217–228. ACM, 2014. August 28, 2016 119
© Copyright 2026 Paperzz