A Genetic Programming Algorithm To Derive Complex Event

A Genetic Programming Algorithm To
Derive Complex Event Processing Rules
For A Given Event Type
Norman Offel
Master Thesis in the Course of Applied Computer Science
August 28, 2016
Author
Norman Offel
Student number: 1313528
E-mail: [email protected]
Adviser:
Prof. Dr. Ralf Bruns
Department of Computer Science, Faculty IV
University of Applied Sciences and Arts of Hanover, Germany
E-mail: [email protected]
Co-Adviser: Prof. Dr. Jürgen Dunkel
Department of Computer Science, Faculty IV
University of Applied Sciences and Arts of Hanover, Germany
E-mail: [email protected]
Selbstständigkeitserklärung
Hiermit erkläre ich, dass ich die eingereichte Master-Arbeit selbstständig und ohne fremde
Hilfe verfasst, andere als die von mir angegebenen Quellen und Hilfsmittel nicht benutzt und
die den benutzten Werken wörtlich oder inhaltlich entnommenen Stellen als solche kenntlich
gemacht habe.
Hannover, den August 28, 2016
Unterschrift
Contents
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background
2.1 Evolutionary Computation . . . . . . . . . . .
2.1.1 Evolution in Biology . . . . . . . . . .
2.1.2 Evolution for Problem Solving . . . . .
2.1.3 The Evolutionary Computation Family
2.2 Complex Event Processing . . . . . . . . . . .
2.2.1 Terminology . . . . . . . . . . . . . .
2.2.2 Language . . . . . . . . . . . . . . . .
2.3 Rule Learning . . . . . . . . . . . . . . . . .
1
1
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
5
6
9
11
12
13
15
3 Related Work
3.1 Evolutionary Computation in Rule Learning . . . . . . . . . .
3.2 Genetic Programming in Rule Learning . . . . . . . . . . . .
3.3 Optimization and Rule Learning in Complex Event Processing
3.3.1 Improving CEP performance . . . . . . . . . . . . . .
3.3.2 Learning CEP rules . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
18
19
21
21
21
.
.
.
.
.
.
.
.
25
25
28
30
31
31
31
32
34
.
.
.
.
35
36
37
38
42
.
.
.
.
.
.
.
.
4 General Approach
4.1 The scenario . . . . . . . . . . . . . . . . . . .
4.2 Applying Genetic Programming to Rule Learning
4.3 Constraints in Genetic Programming . . . . . .
4.4 Evolutionary Operations . . . . . . . . . . . . .
4.4.1 Selection . . . . . . . . . . . . . . . . .
4.4.2 Crossover . . . . . . . . . . . . . . . . .
4.4.3 Mutation . . . . . . . . . . . . . . . . .
4.5 Summary . . . . . . . . . . . . . . . . . . . . .
5 CepGP – The Genetic Programming Algorithm
5.1 Rule Components . . . . . . . . . . . . . . .
5.1.1 Window . . . . . . . . . . . . . . . .
5.1.2 Condition . . . . . . . . . . . . . . . .
5.1.3 Action . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5.2
5.3
5.4
5.5
5.6
Preparation Phase . . . . . . . . . . . .
Initial Population Creation . . . . . . . .
5.3.1 Window Creation . . . . . . . .
5.3.2 Event Condition Tree Creation .
5.3.3 Attribute Condition Tree Creation
Evolutionary Operators . . . . . . . . . .
5.4.1 Selection . . . . . . . . . . . . .
5.4.2 Crossover . . . . . . . . . . . . .
5.4.3 Mutation . . . . . . . . . . . . .
Fitness Calculation . . . . . . . . . . . .
5.5.1 Condition . . . . . . . . . . . . .
5.5.2 Window . . . . . . . . . . . . .
5.5.3 Rule Complexity . . . . . . . . .
5.5.4 Total Fitness . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . .
6 Implementation
6.1 Requirements . . . . . . . . . .
6.2 Input and Output Specification
6.2.1 Input . . . . . . . . . .
6.2.2 Output . . . . . . . . .
6.3 The Rule Engine . . . . . . . .
6.4 CepGP . . . . . . . . . . . . .
6.4.1 Preparation Phase . . .
6.4.2 Population Initialization
6.4.3 Evolutionary Process . .
6.4.4 Evaluation . . . . . . .
6.4.5 Parameters . . . . . . .
6.5 Summary . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
7 Evaluation
7.1 Test Data . . . . . . . . . . . .
7.2 Testing . . . . . . . . . . . . . .
7.2.1 Convergence . . . . . . .
7.2.2 Parameter influence . . .
7.2.3 CepGP vs. Random Walk
7.2.4 Noise influence . . . . . .
7.3 Result Discussion . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
44
44
45
46
50
50
51
63
67
69
73
76
78
79
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
82
82
83
83
84
85
91
93
94
94
95
96
98
.
.
.
.
.
.
.
100
100
101
102
104
108
112
112
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8 Conclusion
113
8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Bibliography
116
List of Figures
2.1
2.2
2.3
2.4
2.5
2.6
Basic Concept of Biological Evolution . . . . . .
General Evolutionary Computation Algorithm . .
History of Evolutionary Computation . . . . . .
Abstraction of Events into Higher Level Complex
Cycle of Event-Driven Systems . . . . . . . . .
Origin of Learning Classifier Systems . . . . . .
.
.
.
.
.
.
5
8
10
11
12
16
3.1
3.2
autoCEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iCEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
24
4.1
4.2
4.3
4.4
4.5
Scenario of the Thesis . . . . . . . . . . . . . . . . . . . .
Applications of Genetic Programming in classification tasks
Model extraction with Genetic Programming . . . . . . . .
Subtree Crossover . . . . . . . . . . . . . . . . . . . . . .
Point Mutation . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
27
28
29
33
33
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
General process of CepGP . . . . . . . . . . . . . . . . . . . . . . . . . . .
General Rule Components . . . . . . . . . . . . . . . . . . . . . . . . . . .
Window Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Refined Rule Components . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rule example with Event Condition Tree . . . . . . . . . . . . . . . . . . .
More Complex Rule Example with Event Condition Tree . . . . . . . . . . .
Rule example with Event Condition Tree and Attribute Condition Tree . . . .
Rule example with an Event Condition Tree and a more complex Attribute
Condition Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Processing pipeline of a rule in CepGP . . . . . . . . . . . . . . . . . . . .
Preparation Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Window Creation Process . . . . . . . . . . . . . . . . . . . . . . . . . . .
Event Condition Tree Creation with the Full Method . . . . . . . . . . . . .
Event Condition Tree Creation with the Grow Method . . . . . . . . . . . .
Attribute Condition Tree Creation . . . . . . . . . . . . . . . . . . . . . . .
Comparison Operator Initialization . . . . . . . . . . . . . . . . . . . . . .
General Crossover of CepGP with Elitism . . . . . . . . . . . . . . . . . . .
Crossover Point Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Calculation of the Crossover Component and Crossover Point within . . . . .
Subtree Crossover of Event Condition Trees . . . . . . . . . . . . . . . . . .
35
37
37
38
39
40
40
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
5.19
. . . .
. . . .
. . . .
Events
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
42
43
45
46
47
48
49
52
53
54
56
5.20
5.21
5.22
5.23
5.24
5.25
5.26
5.27
5.28
5.29
5.30
5.31
5.32
5.33
Subtree Crossover With Two Attribute Condition Trees . . . . . . . . . . .
Subtree Crossover With One Attribute Condition Tree . . . . . . . . . . . .
Broken Attribute Condition Tree after Crossover of Attribute Condition Trees
Broken Attribute Condition Tree after Crossover of Event Condition Trees . .
General Algorithm to Repair an Attribute Condition Tree . . . . . . . . . . .
Algorithm to Repair Broken Aliases . . . . . . . . . . . . . . . . . . . . . .
General Mutation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .
Example Mutation of the ECT . . . . . . . . . . . . . . . . . . . . . . . . .
Example Mutation of the ACT . . . . . . . . . . . . . . . . . . . . . . . . .
Relation between TP, FP, FN and TN . . . . . . . . . . . . . . . . . . . . .
Illustration of ROC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
Idea of the Window Fitness Function . . . . . . . . . . . . . . . . . . . . .
The Window Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Window Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . .
58
59
60
61
63
64
64
66
68
70
73
74
75
76
6.1
6.2
Process of the Rule Evaluation . . . . . . . . . . . . . . . . . . . . . . . .
CepGP Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
92
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
CepGP
CepGP
CepGP
CepGP
CepGP
CepGP
CepGP
CepGP
CepGP
Converging in Small Data Set . . . . . . . . . . . . . . . . . . .
converging in medium data set . . . . . . . . . . . . . . . . . .
converging in large data set . . . . . . . . . . . . . . . . . . . .
Result on Small Data Set With Initial Parameters . . . . . . . .
Result on Small Data Set With Varying Population Sizes . . . . .
Result on Small Data Set With Varying Amounts of Generations
Result on Small Data Set With Different Crossover Rates . . . .
Result on Small Data Set With Different Mutation Rates . . . .
vs. Random Walk Comparing the Bests . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
102
103
103
105
105
106
107
108
109
List of Tables
2.1
2.2
2.3
2.4
Terms and definitions in Evolutionary Computation
Event Pattern Specification . . . . . . . . . . . .
Context Condition Specification . . . . . . . . . .
Aggregate functions . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
13
14
15
5.1
5.2
Summary of constraints to ACT nodes . . . . . . . . . . . . . . . . . . . .
Support of Language Constructs in CepGP . . . . . . . . . . . . . . . . . .
42
81
6.1
6.2
6.3
6.4
6.5
Support of the Language Constructs in the Rule Engine .
Decision Matrix for ACT-Comparison-Operator Evaluation
Decision Matrix for ∧-Operator . . . . . . . . . . . . . .
Decision Matrix for ∨-Operator . . . . . . . . . . . . . .
Decision Matrix for ¬-Operator . . . . . . . . . . . . . .
88
89
89
90
90
7.1
7.2
7.3
Parameter Suggestions by Poli et al. [39] . . . . . . . . . . . . . . . . . . . 104
Comparison of the Original Hidden Rule and the results of CepGP and Random111
Final Default Parameters for CepGP . . . . . . . . . . . . . . . . . . . . . . 112
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 Introduction
Contemporary business, especially on the Internet, is heavily dependable on up to date
information about the current status. The vast amount of data that needs to be processed
to provide all parts of the business with the needed information just in time is challenging
for modern IT-systems and different strategies and technologies emerged in order to help
to overcome those problems. One of these technologies is the so-called Complex Event
Processing (CEP). Instead of saving each piece of information into a database for further
analysis and querying for higher abstracted information, CEP sees the passing pieces of
information as events. Events are emitted by agents observing the environment and contain
low-level information about the observation in a particular moment. They are sent to a
processing system, the CEP engine, that uses knowledge in the form of rules which represent
cause and effect relations to extract information on a higher level of abstraction to provide
domain experts with more insight into the current status of the business, or more general,
the environment.
“A key to understanding events is knowing what caused them – and having that causal knowledge at the time the events happen. The ability to
track event causality is an essential step toward managing communication
spaghetti.”(Luckham in [26] p. 10)
This cause and effect relation, so far, needs to be modeled by the domain expert. However,
there are happenings, whether recorded events or things that happen within the environment
but are not monitored, for which the causes are yet unknown to the domain expert. Tracking
down the causes for specific historical happenings, would aid domain experts to gain a better
understanding of their environment and ultimately lead to a better knowledge base for the
CEP engine to extract more valuable information for the business.
1.1 Motivation
The complexity of systems, events and interrelations between events can overwhelm a domain
expert. ““Being in control” requires humans to have understandable, personalized views of
enterprise activity at every level of activity. [Manually] monitoring log files of rule engine
activity, as provided by many of today’s process automation tools, is not acceptable.” ([26]
1
1 Introduction
p. 39) Automatic identification of these interrelations enables better understanding of the
environment and opens up potential for better rules and, thus, better systems to react to
even the most complex situations. “However, there are some distinguishing problems in
dealing with exceptions.
• We must be made aware of their presence in real-time - that is, the process is not
behaving as specified.
• We must be able to find out what causes them.
The first issue should be solved by the same real-time, levelwise personalized viewing that
is needed to support process evolution in the face of marketplace changes. But the second
issue, finding the causes, requires new diagnostic capabilities. We need to know which
subprocesses are involved in creating events that have led to an exception. [...] This kind
of capability is called runtime drill-down diagnostics.” ([26] p. 41) Even though Luckham
proposes other means in his work to find the cause for the exceptions, or happenings, this
work provides a different kind of tool to help domain experts identifying cause and effect
relations concerning a specific event type.
By using Genetic Programming and a tree representation of a CEP rule, this thesis strives to
elaborate an automatic approach with minimal input that derives a most appropriate rule for
the given event type within historical recorded event stream data. Several challenges arise
from this goal. CEP has no standardized language specification, but even the common core
language parts between different implementations, use more complex operators in addition
to conventional rule languages or paradigms. Therefore, CEP rules impose constraints on
the structure and components within the rule representation. Genetic Programming, on the
other hand, needs freedom of alteration of the rule representation for its operations to work
and find a most appropriate rule among all possible rules.
In addition to finding a rule which most accurately describes the cause for a given event,
there are further objectives to achieve within the search process of Genetic Programming.
Rules should not only be accurate about the cause, but they also should be comprehensible
for humans, the domain experts, to understand the cause and potentially find more valuable
information in the event stream.
CEP systems only cache events as long as they need them to evaluate their rule base. Socalled windows dictate for every rule how far back in the past the evaluation needs to be
able to see to tell whether the rule fires or not. To provide more insight and the most
accurate form of the rule, the Genetic Programming algorithm also needs to consider to find
the smallest window for the cause and effect relation.
The proposed algorithm of this work addresses all of the aforementioned challenges and
provides a concept for a search of the rule that satisfies all the objectives and uses Genetic
Programming at its core for the process. An implementation aids to evaluate the proposed
concept and gives hints for parameter settings and further research fields.
2
Norman Offel
1.2 Structure of the Thesis
1.2 Structure of the Thesis
The thesis starts by laying the foundation in chapter 2 on the following page and explaining
the background of the three pillars this work is based on: Evolutionary Computation is the
family of algorithms which use biological evolution as a model to find the best solutions to
problems, Complex Event Processing is a technology for real-time processing of vast amounts
of pieces of information, and Rule Learning is a field of research that applies algorithms, like
Genetic Programming, to find most appropriate rules for expert systems, like CEP. Following
in chapter 3 on page 18, related works are presented and a road map is drawn that eventually
leads to the this thesis by combining parts of the described approaches. In chapter 4 on
page 25, this thesis explains in detail the scenario and the overall approach that is followed
throughout the proposed algorithm. The main part of this thesis, the Genetic Programming
algorithm called CepGP, is elaborated in much detail in chapter 5 on page 35. It covers every
operation of Genetic Programming and also describes how CEP rules are encoded in a way
so that Genetic Programming can make the most use of it and still create only valid rules.
Afterwards, this thesis continues by presenting an implementation of the proposed algorithm
in chapter 6 on page 82 to enable a practical analysis of CepGP. The evaluation follows in
chapter 7 on page 100. After discussing the findings, this work concludes in chapter 8 on
page 113 by summarizing the contributions of the work of this thesis and hinting to potential
future research.
August 28, 2016
3
2 Background
Using Genetic Programming to derive a Complex Event Processing rule is mainly based on
three topics which this chapter elaborates to provide the foundation to follow the rest of
this thesis. The first topic is concerned with Evolutionary Computation as the group of
algorithms to which Genetic Programming belongs. The second topic is Complex Event
Processing (CEP) as a way to process massive data in real time from different sources using
rules. The third topic is Rule Learning as the field which unites the worlds of search or
optimization and rule-based systems like CEP.
2.1 Evolutionary Computation
The processing power of computers have long since been used to solve problems which a
human could not feasibly solve because of their excessive amount of calculation steps and
the long numbers involved. But even that power reaches its limits when confronted with
even bigger problems.
Optimization is about finding the best solution for a given problem. In that sense, optimization is a search of the best solution within the space of all solutions to a problem. Most
commonly, this involves trying and evaluating each and every solution to the problem there
is. This works only if the processing power can calculate all that within a reasonable amount
of time. However, there are problems where this approach is not appropriate because the
calculation of all solutions would take too long, even for computers.
For these cases, there are algorithms which seek to find near optimal solutions within a short
amount of time without processing all solutions but an efficiently and sophisticatedly selected
subset of solutions. One of them is the group of the so-called Evolutionary Computation
algorithms which is inspired by the ideas of the evolution theory of Charles Darwin to search
through the solution space. They build generations of solutions that base on the information
of the previous generation and close in to the optimal solutions with each generation.
This section first briefly describes the biological principle behind Evolutionary Computation,
moves on to converting this knowledge into computer scientific problem solving and concludes
with a presentation of the algorithms belonging to the group of Evolutionary Computation.
4
2.1 Evolutionary Computation
2.1.1 Evolution in Biology
Evolution traditionally is about adapting to changes in an environment in order to survive.
These adaptations result in changes in the genetic structure of the individuals. The genetic
structure of an individual includes all the information for structure, organisation, functionality
and appearance. The need to adapt arises from the so-called “rule of nature” which states
that only the most suitable to the environment, or in other words “fittest”, survive and are
allowed to pass on their genes to the next generation to form even fitter individuals.[54]
Mutation, Crossover and Selection are the evolutionary operations which drive the evolution.
They are responsible for information interchange between individuals and forming the next
generation. Each of them have their respective domain to contribute to the evolution.
Figure 2.1 illustrates the basic concept of the biological evolution again.
Figure 2.1: Basic concept of biological evolution (adapted from [54]); From a generation
which needs to adapt to environmental influences (depicted as yellow arrows),
a population of parents is selected based on their conformity or fitness; Mating
pairs find each other and through recombination and mutation form the next
population where the cycle begins anew
Mutation
Mutations can be seen as failures during reproduction. What at first sounds negative is
indeed a necessary process to introduce new information into a population. However, since
failures in reproduction mean that the information of the parents are not combined in a way
that might take the best of both of them, it disrupts the process of optimization to the
current environment. That is the reason why mutations usually have a very low probability
and only make minor changes in the genotype – the genetic structure – of the individuals.
August 28, 2016
5
2 Background
Bigger changes in the population arise from adding the small changes within the individuals.
If a mutation has a high impact on the genotype, it is typically supplanted because mostly
it results in negative properties.([53] p. 9)
Crossover
Crossover is the result of combining two or more individuals. In the real world, this is the
process of mating during which the sexual partners exchange parts of their genotype what
results in one or more offsprings. In traditional evolution theory, crossover is not seen as
an evolutionary factor because it does not introduce new information into the population.
However, contemporary research regards the complex and close interrelation of information in
the genotype. It now acknowledges that crossover can lead to new structures in the genotypes
and in this way introduce new information into the population. This elevates crossover to an
important evolutionary factor, way more influential to evolution than mutation.([53] p. 10)
Selection
The selection in a population describes the change in the frequency of specific information
through differing numbers of offsprings of these information. It can be measured via the
fitness and influences mainly two aspects in evolution:
• Survivability, decided by their conformity to the environment (environmental selection)
• Ability to find mating partners, also called sexual or mating selection
In nature, there are also factors like the general ability to reproduce and mating frequencies.
The fitness is an implicit measure for the quality of an individual because good individuals
are more likely to reproduce more offsprings. The selection depends on the phenotype – the
manifestation of the genes – and its performance regarding the challenges in the environment
which is quantified by the fitness.([53] p. 10 f.)
2.1.2 Evolution for Problem Solving
The history of research into evolution in biology is long reaching and filled with a lot of
famous breakthroughs about life on earth. Computer science, however, is a rather young
science but its list of contributions to our lives is not short, either. In the 1930s, Alan Turing
invented the model of a universal computing machine - the Turing machine - and claimed
that every algorithmically solvable problem can be computed with his model. From this
moment on, computers began to evolve from machines meant to solve a specific problem to
universal problem solvers. At the same time, there are problems which are not algorithmically
solvable, like the Halting Problem, or where there is yet to find an efficient algorithm for, like
the group of NP-hard problems. In these cases, instead of giving up on computing solutions
6
Norman Offel
2.1 Evolutionary Computation
to these problems, there are algorithms that search through the problem space for the best
solution without calculating every solution there is. Although they are not guaranteed to
find the best solution, they still find at least near optimal solutions which are often suitable
for solving the problem at hand. Evolutionary Computation is a family of algorithms that
uses the concept of evolution, as a strategy to find the best solution for the given situation
to survive, for solving problems.
Evolutionary algorithms as part of the Evolutionary Computation family are simulating and
simplifying the natural evolution to suit their purpose of solving problems. Biologists introduced terms for the components of evolution which are also used in computer science to
describe Evolutionary Computations. [53] lists these terms and explains how they are used
in Evolutionary Computations. Table 2.1 presents a subset and at some points an adapted
definition of the terms and their meaning. Of course, some of these definitions vary from
their biological counterparts. This is grounded in the simplifications which are needed to
transfer the biological concept of evolution into the world of problem solving in computer
science.
Term
Population
Individual
Genotype
Phenotype
Mutation
Recombination
Crossover
Selection
Fitness
Genetic code
Meaning in Evolutionary Computation
is a collection of individuals
is a solution candidate for a problem
is the sum of information within an individual which is evolutionary
alterable
is the representation of the individual from the point of view of the
problem domain
is a minor change in the genotype
is an operation which combines two or more individuals into a new one
is a synonym for recombination
is an operation which determines the individuals contributing to the
evolution into the next generation
is the quantification of the quality of an individual
is a direct mapping (decoding) from the genotype to the phenotype
Table 2.1: Terms and definitions in Evolutionary Computation (adapted from [53] p. 15)
In natural evolution, the goal of evolution is to enable the kind to survive within the given
environment. In Evolutionary Computation, besides simplifications in the terms, there are
also differences in its execution. For example, in biological evolution there is no verifiable
property to evaluate the gain of the results of the evolutionary operations from generation to
generation. The simple fact that the kind is still alive is the only examinable property of the
population to consider its fitness. However, for problems in computer science there are clearly
definable goals and objectives for an algorithm to verify the fitness of an individual.([53] p.
24) Furthermore, the general process of Evolutionary Computations is a sequential process
since it is a computer program whereas in nature individuals have varying ages and produce
offsprings at different times, and, at a time, multiple generations of a kind are present and
August 28, 2016
7
2 Background
may participate in reproduction. Another difference exists in the start of both evolutions.
Arguably, the starting point of natural evolution was the Big Bang whereas the initialization
of Evolutionary Computations needs to be build at random. The starting population can
already consider knowledge about the problem and what information good individuals might
have.
After evaluating each individual of the starting population to determine their fitness, a cycle
of steps begins starting with the check for the termination criterion. Usually it validates
whether
• there are individuals that satisfy a minimal fitness level
• a maximum amount of cylce runs have been performed or
• there is no significant improvement over the last generations.
Otherwise, the mating selection determines for each individual how many offsprings it will
produce according to its fitness in comparison to the other individuals of the population.
New offsprings are generated via crossover and mutation. Then, they are evaluated and
integrated into the parent population via the environmental selection. Mostly, the population size is limited so that either some or all parent individuals are supplanted by offspring
individuals. With this the cycle begins anew with the check for the termination criterion.
Figure 2.2 illustrates the general adaptation of the natural evolution by the Evolutionary
Computations.([53] p. 25)
Initialization
Evaluation
Termination
Criterion met
yes
Output
no
Environmental
Selection
Mating
Selection
Evaluation
Crossover
Mutation
Figure 2.2: General Evolutionary Computation algorithm (adapted from [53] p. 25)
8
Norman Offel
2.1 Evolutionary Computation
The main challenges for creating an Evolutionary Computation is to
1. abstract the problem into a suitable representation for an Evolutionary Computation
and to
2. define a suitable fitness function which allows to guide the search process
The absence of further necessary information is one of the reasons Evolutionary Computations
are experiencing a broad usage over various research fields in finding solutions for hard
problems.
2.1.3 The Evolutionary Computation Family
The Evolutionary Computation consists of mainly four schools of thought: the evolutionary
strategies, the evolutionary programming, the genetic algorithms and the genetic programming. All of them base on the evolution as their inspiration and all of them share a lot of
similarities. However, especially during the start of this field of research they had different
purposes and goals. These goals became wider and since they adopted ideas from one and
another they also became more and more alike. That is not to say that there are still differences mostly in their choice of genotype and their choice of operations and parameters.
[53] (p. 44) created a time line of the history and the most important scientific conferences
which can be seen in adapted form in figure 2.3 on the following page.
Evolutionary Strategies: Bienert and Rechenberg and later Schwefel founded the Evolutionary Strategies as part of Evolutionary Computation.([53] p. 44) “EvolutionStrategies (ES) imitate, in contrast to the genetic algorithms, the effects of genetic
procedures on the phenotype.”[43] Evolution strategies conventionally use real numbers to represent the problem space. Furthermore, evolutionary strategies use random
selection of parents and deterministic selection of the n-fittest individuals during the
environmental selection.[54]
Evolutionary Programming: Founded by Lawrence J. Fogel et al. (1965), the evolutionary
programming was introduced by finite automata to predict time series. Later, in the
1980s, David B. Fogel replaced finite automata with neural networks.([53] p. 45)
Evolutionary Programming does not use crossover and introduced the contemporary
widely used tournament selection.[54]
Genetic Algorithms: The main contributors to Genetic Algorithms are Holland, De Jong
and Goldberg.[54]([53] p. 45) Traditionally, problems are represented as binary bitstrings. Mutation flips single bits and crossover mixes the bits of the parents with
varying strategies.[54]
Genetic Programming: Genetic Programming, founded by Koza[23] and derived from the
genetic algorithms, uses dynamic representations like trees and it is often used to create
computer programs and similar phenotypes. Chapter 4 on page 25 explains Genetic
Programming and its techniques in more detail.
August 28, 2016
9
2 Background
Figure 2.3: History of Evolutionary Computation (adapted from [53] p. 44); The scientific
conferences in their respective fields of research are the International Conference
on Genetic Algorithms (ICGA), the Parallel Problem Solving from Nature (PPSN)
and the Evolutionary Programming (EP); The conferences consolidating the fields
are the Genetic and Evolutionary Computation Conference (GECCO) and the
Congress on Evolutionary Computation (CEC)
10
Norman Offel
2.2 Complex Event Processing
New Concepts: Especially around the millennium a lot of new concepts emerged which
used analogies from nature as problem solving algorithms. They are called ant colony
optimization, particle swarm optimization, differential evolution and many more.([53]
p. 45)
2.2 Complex Event Processing
Complex Event Processing (CEP) is based on the processing of events which are “anything
that happens, or is contemplated as happening.”[27] These events represent all kinds of
measurable or at least observable things in the monitored environment and are often emitted by agents as observers of the environment. In contrast to already present and stored
information in databases, CEP seeks to process the information carried by events as soon as
they come in to enable real-time processing. CEP uses these usually low-level and primitive
events to detect more complex events by correlating them according to rules. These rules
are used to infer new information by combining information of primitive events into complex
events. The complex event carries a higher level information and can itself be used to infer
new information at an even higher level.([4] p. 3 f.) Figure 2.4 illustrates the combination
of primitive events forming a certain pattern (represented in CEP as a rule) into a complex event in a higher abstraction layer. The pattern defines a causal relation between the
primitive and the complex event.
Figure 2.4: Abstraction of events into higher level complex events (adapted from [4] p. 5)
Since events represent anything that might happen in an environment, processing and reacting to these events in real-time represents any kind of real world processes. CEP is therefore
an event-driven system which works cyclic in three basic steps which are shown in figure 2.5
on the next page.
During the detection step, the event-driven system CEP detects events in the environment
in real-time. These events are then processed by aggregating events from different sources
and by matching patterns. If those patterns are matched then the last step reacts to them
by calling distributed services in real-time.[26]
August 28, 2016
11
2 Background
Detect
React
Process
Figure 2.5: Cycle of event-driven systems
2.2.1 Terminology
To provide a common understanding of the terms used throughout this thesis, this section
gives definitions taken from [27].
Complex-event processing (CEP): Computing that performs operations on complex events,
including reading, creating, transforming, abstracting, or discarding them.
Event: Anything that happens, or is contemplated as happening. Events are defined by a
type which also defines the information they carry, represented as attributes.
Simple event: An event that is not viewed as summarizing, representing, or denoting a set
of other events.
Derived event: An event that is generated as a result of applying a method or process to
one or more other events.
Rule (in event processing): A prescribed method for processing events. Bruns and Dunkel
add in [4] (p. 12) that rules describe the actions that are to be executed after a pattern
was matched. The condition part of the rule is the pattern to be matched and the
action part defines the reaction to these event patterns.
Window (in event processing): A bounded segment of an event stream. Since events
occur unlimited as streams, it is important to be able to define the segment of the
stream that is cached for later investigation.
Relationships between events: Events are related by time, causality, abstraction and
other relationships. Time and causality impose partial orderings upon events.
12
Norman Offel
2.2 Complex Event Processing
2.2.2 Language
According to [26] (p. 146f.), the requirements of a language which can represent these
causalities in the form of rules are: power of expression, notational simplicity, precise semantics, and scalable pattern matching. This thesis chooses the syntax and event pattern
specification from the work of Bruns and Dunkel [4] (p. 21 ff.) as represented in table 2.2.
Operators
Sequence (→)
Boolean (∧, ∨, ¬)
Excluding sequence
(A → ¬B → C)
Description
Represents a timely order of its operands. example: A → B means
that the pattern matches if an event of type A is followed by an
event of type B. Other events of different types are allowed to
occur in between. The operands can also be other patterns. In this
case, the sequence operator defines that the first operator needs to
be ultimately matched before the second one (decisive is the time
of final matching).
Defines the occurrences (or absence) of events of specific types
within the considered stream segment where their order of occurrence does not matter. The ∧-operator demands both operands
to be there, whereas the ¬-operator only matches if the event
type is absent. The ∨-operator matches as long as at least one
of the operands is present. In the simplest form, the operands
are boolean values whether the defined event types are present
(or absent). When they combine other patterns, then they perform
their respective comparison according to the boolean results of their
operands.
Defines a special kind of sequence where an event type is prohibited
to occur in between two other event types. This is semantically
different from a combination of two sequences and a ¬-operator
as the middle operand: A → ((¬B) → C). Here it is sufficient
to have at least one event of a type other than B between the
occurrences of an A followed by a C in the stream. If the operands
are also patterns instead of event types, then this operator works
similar to the sequence-operator. The middle pattern is prohibited
to match between the matching of the first and the last pattern.
Table 2.2: Event pattern specification taken from [4] (p. 21 ff.)
Bruns and Dunkel also propose context conditions which take into account the attributes of
the events that matched the event patterns displayed in table 2.3 on the next page.
In theory, it is possible to consider every event that has ever been occurred since the CEP
system is running. However, this means that the system would need to cache possibly an
unlimited number of events to evaluate its rules. That is why CEP systems enable rules
to specify the maximum size of events that need to be cached for them via windows.
August 28, 2016
13
2 Background
Component
Alias
Access-operator (.)
Operators (+, −,
/, ∗, <, >, ≤, ≥,
=, 6=, . . . )
Description
In order to link the context conditions with the event pattern, each
event type in the event pattern (also called event condition) is
assigned an alias which is a unique reference used in the context
conditions (also called attribute conditions) to access attributes of
the event referenced by that alias. For assigning an alias the keyword as is used. Example:
(A as aliasA) → (B as aliasB)
Within the context conditions (attribute conditions), attributes of
the referenced events can be accessed via the alias and the accessoperator (.). Example:
(A as aliasA → B as aliasB) ∧ ( aliasA.attribute1 = 0)
This example already shows how the event and attribute conditions are connected. Both are linked with an ∧-operator and use
references to show the relations between the events in the event
condition and the attribute comparison in the attribute condition.
Context or attribute conditions need to have operators that combine the values they access via aliases and the access-operator.
Some values may form new values (for example through addition
or multiplication) or they are compared (for example whether one
is less than or equal to the other). These operators depend on
the kind of values that are processed (numeric or categorical types
for example) and their meaning depends on their domain (relation
between categories for example).
Table 2.3: Context condition specification according to [4] (p. 22 f.)
14
Norman Offel
2.3 Rule Learning
Sliding windows adapt the events within this window specification every time a new event
is received. CEP systems know two types of windows that differ in how they define their
segments. The length window always considers a static amount of past events, no matter
their time differences. The time window uses the timely differences from the newest event
to the past events to determine which events are part of the segment. Bruns and Dunkel
use the syntax win:length:x for length windows and win:time:yz for time windows where y
is a number and z is a time unit. The window is attached within [ ]-brackets to the rule:
(A → B)[win:time:2min].([4] p. 23 f.)
Another important concept in CEP systems are aggregations. In combination with windows,
they can conglomerate attribute information from the events within that window.([4] p. 24
f.) Bruns and Dunkel name four general aggregate functions in table 2.4.
Function
sum
avg
min, max
Description
calculates the sum of an attribute over all events within the window
builds the average value of an attribute over all events within the window
finds the minimum and maximum of an attribute value over all events in the
window
Table 2.4: Aggregate functions according to [4] p. 25
An example with an aggregate function is ItemSoldEvent.max(price)[win:length:100] as highestPricedEvent. These aggregate functions as well as the aforementioned attribute operators
are focused on numeric attribute values which will also be the target attribute value type of
this thesis. Nevertheless, CEP systems in general offer more functions and value types than
are represented in this section.
2.3 Rule Learning
Rule Learning is a form of classification where solutions to a problem are categorized, or
classified, according to whether they lead to the desired outcome or not. This section shall
give a basic background to this field of research and how the goal of this thesis fits into this
field.
Urbanowicz and Moore give an introduction to learning classifier systems (LCS) in [51] and so
do Sigaud and Wilson in [46]. Learning classifier systems seek to find a rule-set (population of
classifiers), instead of a single rule, that best describes the system. “The desired outcome of
running the LCS algorithm is for those classifiers to collectively model an intelligent decision
maker.”([51] p. 2) Figure 2.6 on the next page shows the origin of these systems, coming
from the already presented Evolutionary Computation and from machine learning where the
goal is to learn via an improvement in the performance or solutions obtained.
August 28, 2016
15
2 Background
Figure 2.6: Origin of LCS (adapted from [51])
In contrast to this thesis, not only learning classifier systems but Rule Learning in general is
often concerned with finding rule sets instead of one rule. Nevertheless, in its simplest form,
rule learning covers this special case as well.
Holland already stated in the eighties that Rule Learning and expert systems like CEP go well
together: “Expert systems are one of Artificial Intelligence’s real successes in the exploration
of intelligence. In the longer view, the most significant part of this success may be the
bright illumination it throws upon questions of system versatility. [. . . ] Classifier systems
are general-purpose programming systems designed as an attempt to meet the criteria [combination, parallelism, declarative and procedural information, categorization, synchronic and
diachronic pointing, gracefulness, and confirmation]. Classifiers have many affinities to the
rule-based (production) systems underpinning the usual approach to Expert Systems.”[18]
In expert systems, rules are also often used to infer new rules and therefore create new
knowledge as links between existing pieces of knowledge. This process is called reasoning
and can be defined as follows: “Reasoning, which has a long tradition that springs from
philosophy and logic, places emphasis on the process of drawing inferences (conclusions)
from some initial information (premises). In standard logic, an inference is deductive if the
truth of the premises guarantees the truth of the conclusion by virtue of the argument form.
If the truth of the premises renders the truth of the conclusion more credible but does not
bestow certainty, the inference is called inductive.”([19] p. 2) Even though reasoning can
be performed to find new insights into the domain the expert system is working in, it is not
performed by the algorithm used in this thesis.
In general, besides reasoning which builds on existing knowledge and rules, there are two
ways of learning rules:
16
Norman Offel
2.3 Rule Learning
Supervised learning is done by comparing the results of an algorithm with the known
correct answers. There is a so-called training set which includes input and desired
output. The algorithm shall learn the rule from the input that leads to the correct
output to give hints to yet unknown knowledge. The results from the training set are
than tested on the test set to evaluate how well they do with other data. This is often
done to see if the result was specialized too much according to the training set, what
is also called over-fitting.[9]
Unsupervised learning is done without knowing beforehand what the correct outcome
should be. The algorithm is given minimal information for learning new knowledge
from the input. Examples are learning a given number of groups (clusters) of pieces
of information that belong together by some relation or, given a few known instances,
find similar pieces of information in the input.[9]
This thesis uses a supervised learning algorithm which learns rules that correspond to a
known outcome from the recorded input. Unlike LCS, it does not strive to find a rule set
explaining the domain but rather a single rule explaining the cause of a certain event type.
Reasoning cannot always help in these situations and therefore this thesis builds upon Genetic
Programming as a tool of efficiently finding a near optimal rule.
August 28, 2016
17
3 Related Work
This work concentrates on applying Genetic Programming on Complex Event Processing to
derive meaningful rules for a given complex event. In this regard, this chapter creates the
links between the presented background fields and gives an overview over scientific works
closely related to the application as done in this thesis. It starts with an introduction to
Evolutionary Computation applied in the field of Rule Learning, examines the application of
Genetic Programming to Rule Learning and afterwards looks into proposed Rule Learning
algorithms within the context of Complex Event Processing.
3.1 Evolutionary Computation in Rule Learning
As presented in section 2.3 on page 15, Rule Learning has a long history with varying
algorithms and techniques. Evolutionary Computation started by optimizing parameter sets
and models in engineering. But soon after, pioneers like David E. Goldberg started to apply
this concept on Rule Learning [15] with remarkable results. This lead to more research in
the combination of Evolutionary Computation and Rule Learning in the following decades.
This is a selection of a vast amount of research done in the combination of Evolutionary
Computation in Rule Learning with a focus on the developments over the years and specific
aspects which this thesis uses or considers at some points.
Johnson and Feyock provide a general algorithm for the acquisition of expert system rule
bases using a genetic algorithm.[22] Their main task is to abstract the binary encoding of
genetic algorithms to incorporate rules in a rule grammar. Instead of using rules as inviolable
parts of the rule base, Johnson and Feyock proposed to alter rules in evolutionary operations
to construct new rules and build the rule base.
Grefenstette et al. [17] use a genetic algorithm to learn a tactical plan consisting of a set of
decision rules based on flight simulator data. Instead of the binary string representation of a
rule which Goldberg uses, they base their optimization on a high level representation of a rule
in the form of if-then-constructs where the operators are also high level in the form of and c1
. . . cn . Additionally, they have different kinds of measurements with different representations
where one of them also uses a tree structure. Although, the tree structure is not used to
represent the rule itself but to decide a subset of possible values which is valid for a rule.
Similar to this is the application of genetic algorithms to rule learning on unmanned aerial
vehicles in [30] by Marin et al. The rules shall guide the vehicle to follow enemy activities on
18
3.2 Genetic Programming in Rule Learning
the terrain. It is not a route planning problem, but the rules are used so the vehicle can react
and change its course based on the current information of the enemy movement. It even
modifies the rules based on previous information. In that sense, they improved the previous
approaches with the adaptation of rules to recent information.
Huang uses a genetic algorithm in [20] to learn control actions for combustion control which
are rules that are applied to certain situations. In this work, the author attempts to only
learn prototype rules with the genetic algorithm and does not attempt to learn a rule for
every situation there can be. In order to apply control actions, a nearest neighbor approach
is applied to find the rule most appropriate to the current situation. The rule encoding
used is a classical binary string as is most common in genetic algorithms which encodes the
control action and a percental increment as to how strong the action is to be performed in
the context of the example used. The advantages of the approach used in this work are that
prototype rules which can be optimized through the genetic algorithm without an expert and
it could deal with noise in the data.
A well-known application of genetic algorithms to rule learning is GARP (Genetic Algorithm
for Rule-set Production)[48] which applies it to the problem of species distribution modelling
while taking various information about other species, climate, geographical profiles and so
on into consideration. GARP uses a type-safe approach and allows crossover only on values
or ranges of values of variables. Mutation is used to introduce new values to variables in the
population.
The implementation of Janikow in [21] proposes a task-specific genetic algorithm for supervised learning. It uses a grammar representation of rules and specialised operators to alter
rules or rule sets which, until then, was still rarely done. Their research also considered shifting from transferring the problem into the domain of the genetic algorithm – usually binary
strings – to adapting the evolutionary operators to fit the problem domain. This allows for a
faster convergence and better results but also needs a great deal of conceptual work to use
task or domain specific knowledge within the evolutionary operators. Janikow also considered
invalid or empty rules which are rules that do not produce positive outcomes at all. Spears
and De Jong [47] proposed to keep both for the sake of future fitter rules since even the
empty or invalid rules inherited information of the best individuals. Janikow, however, argues
that there is a conflict between retaining these rules for future possibly better outcomes and
the predictive accuracy of the system. Therefore, Janikow decides to remove these rules
from the solutions.
3.2 Genetic Programming in Rule Learning
So far, Evolutionary Computation in Rule Learning always involved genetic algorithms. That
is because a lot of the presented work was done before Genetic Programming was proposed.
Since then, some researchers took on the challenge of applying Genetic Programming to
Rule Learning.
August 28, 2016
19
3 Related Work
The presented ways of applying Genetic Programming to Rule Learning have found a number
of applications in varying fields. Most of the time, finding rules for given data sets was not
the only goal which leads to this section briefly describing works which enhanced the Genetic
Programming to do more.
Bojarczuk et al. use Genetic Programming to discover rules in the medical domain for diagnosing pathologies.[3] Despite finding accurate rules, their focus is on finding comprehensible
knowledge while applying data mining algorithms. Bojarczuk et al. preferred Genetic Programming over genetic algorithms because it allows “a more open-ended, autonomous search
for logical combinations of predicting attribute values” and used that advantage to form tree
structured individuals to symbolize first-order logic. Despite a small data set, they achieved
promising results with their approach.
Tay and Ho propose a genetic programming algorithm to solve the flexible job-shop problem
with multiple objectives. [49] The flexible job-shop problem is a more complex variant of the
well-known job scheduling problem (JSP) as a NP-hard problem. In their work, the authors
describe three simultaneous optimization goals for the makespan, the mean tardiness and
the mean flow time, each with a different fitness function. They conglomerate the fitness
values by calculating the average fitness value of all of them.
Genetic Programming is often used in the stock market domain to find trading rules.[40, 7,
36, 38, 52, 37, 5, 28, 56, 25] With the later works, the proposed algorithms achieved an
overall better result than traditional algorithms and eventually, the “statistical results confirm
that the GP based trading rules generate a positive return for the trader under all market
conditions (whether rising or falling).”[28] The rule representations in these works include
relational and logical operators where types are used to validate the trees.[56] Automatically
Defined Functions (ADFs) are used to distinguish between operator types and to simplify the
rules.[56, 52]. In [56], Yu et. al allowed crossover only between modules of the same kind
to enable valid only individuals in the populations. They also introduced length windows to
determine aggregate functions like average, minimum or maximum.
In [11], Freitas developed a Genetic Programming framework as a data mining system and
chooses SQL queries as the rule representation for evaluation purposes whereas a tree structure is chosen to represent the query in the Genetic Programm algorithm. He adapted the
algorithm so that the goal-attribute for the optimization is user-defined and fixed during the
process for the classification problem.
De Falco et al. propose a similar Genetic Programming framework which concerns itself with
finding automatic class discoveries in comprehensible rules.[6] They use a tree structure to
represent the rules and combine logical with relational operators. During the construction,
restrictions are imposed on the operators. The logical operators are allowed until a relational
operator is build. The relational operator’s first operand is always an attribute whereas the
second operand can either be another attribute or a constant with equal probability. During
the evolutionary process, a depth limit is enforced where offsprings exceeding this limit are
replaced by one of the parents. When mutating a rule, their algorithm uses a point mutation
for leaf nodes and a tree mutation for intermediate nodes in the tree. Here, the depth limit
20
Norman Offel
3.3 Optimization and Rule Learning in Complex Event Processing
is also enforced and when the new rule exceeds the limit, the original subtree is used again.
De Falco et al. determine the fitness by combining the result with the simpleness of the rule.
It seems that, after research on applying Evolutionary Computation on Rule Learning lead
to promising results, researchers used Genetic Programming to have more influence on the
rule structure during the evolutionary process. Therefore, combining the robust search for
accurate rules with the objective to result in the most simple versions of them, was the next
logical step. It is a necessity because the rules need to be understood by human users mostly
and therefore the rules need to be as intuitive as possible.
3.3 Optimization and Rule Learning in Complex Event
Processing
Optimization in the context of Complex Event Processing is most often used to improve query
execution. However, recently more and more research is done in automatically deriving CEP
rules.
3.3.1 Improving CEP performance
Ding et al. propose in [8] an optimization for Complex Event Processing over large volumes
of business transaction streams which evaluates whether the rule is likely to fire and stop
the execution if that is not the case. Gao et al. address in [14] the use of Complex Event
Services which capsule single CEP-instances and introduce quality of service awareness with
a genetic algorithm. In [24], Liu et al. use a B+-tree-based approach to optimize the
processing performance of RFID-events with the time window constraint and by using pruning
in intermediate query results. Rabinovich et al. optimize in [42] the processing of events with
rewriting techniques for complex pattern types. Zhang et al. [57] also proposed strategies
for query execution to optimize performance and resource consumption. These are only a
brief selection of works that use optimization to improve the overall performance of event
stream processing as done in CEP. In their book ([2] p. 38ff.), Atzmüller et al. describe
some other works that use optimization techniques like Bayesian networks to predict event
streams, Artificial Neural Networks to detect network breaches via two expert systems, and
other optimization techniques to analyze the event stream. Although, none of them is related
to Rule Learning.
3.3.2 Learning CEP rules
There are only few works which are concerned with learning rules in CEP. With their proposed
framework Fossa, Frömmgen et al. [13] apply Genetic Programming to learn Event Condition
August 28, 2016
21
3 Related Work
Action (ECA) rules for a given utility function in an adaptive distributed setting. The utility function quantifies developer-defined performance metrics, such as average throughput,
maximum latency and so on. The learned rule expresses how the distributed systems need
to adapt depending on the situation to reach the goal defined by the utility function. This
work uses event processing at the core to process events from different systems, but does
not rely on Complex Event Processing specifically.
Timeweaver, proposed by Weiss and Hirsh in [55], predicts rare events from event sequences
with categorical (non-numerical) features. The authors use a genetic algorithm to find rules
which are represented by a self-defined grammar. As in this thesis, timeweaver aims to find
a rule for a given event according to recorded data. But in contrast to this thesis, Weiss
and Hirsh use a different approach with a genetic algorithm and a grammar representation.
Although timeweaver also optimizes the window for events, it does not use the operators from
Complex Event Processing, but their self-defined ones with concepts not found in general
Complex Event Processing languages. Additionally, the authors leave out multiple attributes
per event and that these attributes may be numerical instead of categorical.
Turchin et al. propose in [50] a tuning of rule parameters where the domain expert writes a
general rule and their algorithm chooses the suitable parameter values. This varies from the
approach of this thesis, since the goal of this thesis is to derive the whole rule for a given
event type in a recorded stream.
Sen et al. propose in [45] “a recommendation based pattern generation”. In their proposal
they use existing rules and domain expert input to further derive rules for recommendation.
The difference in their work to this thesis is that they rely on user input for a part of the
condition and to derive possibly interesting rules with this condition part for the user whereas
this thesis strives to find all the parts of a rule on its own that lead to a specific event.
In [35], Mutschler and Philippsen present their apporach on automated CEP Rule Learning
with a noise Hidden Markov Model (nHMM). Their approach also focuses on a single complex
event for which the algorithm shall find an appropriate rule based on historical recorded
streams, like this thesis. Unlike this thesis, they use a different base algorithm with the
nHMM.
Mehdiyev et al. propose in [31] “a machine learning model to replace the manual identification of rule patterns.” They use a preprocessing stage for feature selection and construction
and afterwards apply “various rule-based machine learning algorithms to detect complex
events”. They also see a lack of research in automatically detecting CEP rules in the event
streams and they analyze the suitability of different rule-based classifiers for this problem:
One-R, RIPPER, PART, DTNB, Ridor and NNGE. All classifieres except One-R exceeded an
accuracy of 90% in their tests where One-R still managed to have about 80% accuracy. This
shows that the application of rule-based classifiers to Complex Event Processing is promising
and worth further investigation and research. The proposed algorithms classify all events in
the streams into classes. However, this thesis presents an Genetic Programming algorithm
to find a possible cause for a given event in the event stream.
22
Norman Offel
3.3 Optimization and Rule Learning in Complex Event Processing
Mousheimish et al. present in [33, 34] autoCEP, “a data mining-based approach that automatically learns predictive CEP rules from historical traces”. The ultimate goal of their
work is to shift the focus of CEP from detection to prediction of upcoming situations with
as few human needed interactions as possible. Instead of complete rules, their goal is to
learn so-called shapelets which are pattern of minimum possible length that can classify the
data strikingly. After their algorithm attained these shapelets, they transform them into
CEP rules in a second stage (see figure 3.1). Mousheimish et al. also introduce time series
pattern mining techniques. Until now, their implementation of autoCEP is limited to one
attribute per event only, but the authors are currently working on further extending their
implementation. For their shapelet learning, which results in CEP rules later, they currently
use a brute-force shapelet extraction algorithm ([34]) and no optimizing algorithm is used.
Unlike this thesis, Mousheimish et al. strive to learn a rule set for all kinds of classes to
predict their future presence.
Figure 3.1: Run-Time prediction with autoCEP from [33]
Margara et al. defined their proposal iCEP as “a novel framework that learns from historical
traces, the hidden causality between the received events and the situations to detect, and uses
them to automatically generate CEP rules”.[29] The architecture as depicted in figure 3.2
on the following page consists of several subsystems, each learning different parts of a CEP
rule. Their approach aims at learning one rule at a time. They use positive and negative
traces during the evaluation to signalize when the specific complex event occurred and when
it did not occur in the historical traces. Their goal then is to find a rule which results in
these exact same traces. The algorithm works as follows:
1. Starting with the postive traces, the event/attribute learner identifies the relevant
event types and attributes.
2. The window learner finds the minimal window that includes all relevant events. The
August 28, 2016
23
3 Related Work
results of 1. and 2. are sent back and forth until no further improvement could be
made.
3. The result of 1. and 2. is handed over to the constraint learner that selects the
concrete events according to their attribute values.
4. Concurrently to 3., the aggregate learner finds out whether there is an aggregate
constraint and how it needs to be specified according the information from 1. and 2.
5. After 3. and 4., the parameter learner finds parameters which bind the values of
attributes from the identified events together.
6. Concurrently, the sequence learner discovers ordering constraints within the events.
7. Finally, the negation learner uses the negative traces to find negation constraints about
which specific information needs to be prohibited.
As can be seen in figure 3.2, the algorithm is executed in a pipeline where some parts can
run concurrently whereas other parts depend on the termination of previously executed parts
in the pipeline.
Figure 3.2: Architecture of iCEP from [29]
The algorithm also includes all general language concepts of Complex Event Processing. To
obtain the most general rule, the proposed algorithm intersects every correct rule it could find
and keeps only the common properties of the rules. In their work, Margara et al. propose a
machine learning algorithm as a future work for further research. This thesis is focused on a
Genetic Programming algorithm as a machine learning algorithm to find a most appropriate
rule for a given event in a given stream, just like they proposed for future work. Naturally,
this algorithm works differently from iCEP but seeks to achieve the same goals.
24
Norman Offel
4 General Approach
In Complex Event Processing (CEP), the rules are usually written by domain experts with
knowledge about the field in which the Complex Event Processing system is operating.
Although there are proposals for rule-based expert systems, like CEP, to learn the rules
and thus the domain knowledge on their own or at least with minimal aid of a domain
expert, sometimes there also are happenings, meaning events in a broader concept, like
failures, errors or unusual behaviours, which are rare and for which the domain expert or a
self-learning system does not know the cause.
The purpose of this thesis is to provide an algorithm to derive a rule out of all the recorded
data during the time of these rare happenings which may give hints to the cause of these
interesting but hard to analyze happenings. However, it is important to note that the rule
provided by the algorithm may lead to the happening taking place, but that does not mean
that the happening truly has its origin in that rule. It may be entirely possible that there are
totally different reasons for it to happen. Nevertheless, the rule may still provide hints for
the cause and in that sense help the domain expert to find the rule she is looking for.
This chapter will first provide an example of the idea pursued in this work. Afterwards, it
explains how Genetic Programming is applied to Rule Learning and concludes by presenting the foundations of the evolutionary operations in Genetic Programming which are later
adapted to be used for CEP rules.
4.1 The scenario
Assuming CEP is installed and operating in the environment of interest where the happening
takes place and a domain expert is watching over the results of the CEP system and the
environment. Occasionally, something happens that piques the interest of the domain expert.
She can point out when it happened but does not know why. However, the recordings of the
events in that environment may uncover what lead to this happening.
The recordings of the events are gathered in one log where each event consists, typical for
CEP, of
• a timestamp of its occurrence
• an event type
25
4 General Approach
• optional attributes as a key-value pair where the attribute name is the key mapping to
its respective value
Every event of the same event type has the same amount of attributes with the same names
and types.
For the proposed algorithm, it does not matter whether the happening can be related to a
specific event type or not, as long as its occurrences can be injected into the log as accurate
as possible. Only a unique event type in the record is necessary and each time the happening
takes place, an event of this type is placed into the record. No timestamp, no attributes,
no further information is needed for these specific events, whether manually inserted or
automatically captured.
After the domain expert obtained the enriched log, she passes it to the algorithm. CepGP
will use Genetic Programming to search for a rule within the events which most accurately
describes a pattern that leads to the happening marked by the special event type. Since
the purpose of the algorithm is to provide hints to the cause of the happening based on
the events, it should not only produce accurate but also simple and easily understandable
rules. It should also adapt all of the concepts from CEP, related to event types and their
attributes, windows, and other constraints like ordering or combinations according to the
operators presented in 2.2 on page 11. Another goal of the algorithm should be to produce
rules with the minimal window to provide the domain expert with the most narrowed down
area in the events which enables her to better analyze the results and the event records
and possibly grasp the cause of the happening. Figure 4.1 on the next page illustrates the
scenario again.
The only requirement of the algorithm shall be a record (or log) of the occurred events within
the environment including the occurrences of the happening marked by a unique event type.
The objectives of the algorithm are foremost a most accurate and also appropriate rule,
meaning that not only should it pinpoint the necessary conditions for the happening with all
the means of general CEP languages, but it also shall provide a simple output which is easily
interpretable by the domain expert and which minimizes the regions within the log that are
of interest to understand the cause of the happening. At best, the algorithm shall also show
how much each of these goals could be achieved.
As presented in the related works in chapter 3 on page 18, there are already some approaches
to learning rules in Complex Event Processing. However, most of them try to classify all
events or seek to improve existing template events. The approach of this thesis strives to
help to identify the origin of a given special and normally rare event in the stream of Complex
Event Processing with no other information than the name of the special event and a record
of the stream around the time the special events occur. The information that need to be
processed to get such a rule, and the possible rules in general, construct a large problem
space that cannot be solved via brute-force searches over all possible solutions. Therefore,
the proposed algorithm, CepGP, uses a Genetic Programming algorithm to search the optimal
rule that leads to the marked special event from the information in the recorded stream. The
26
Norman Offel
4.1 The scenario
Figure 4.1: Scenario of this Thesis; Sensors emit Events with the values they detected during
their observation of the environment to the CEP-engine; The CEP-engine writes
these Events with the needed information into a Log; If the Happening could not
be observed by the sensors but by the domain expert alone, then the Log needs
to be enriched with the events representing the Happening via a unique event
type (here HappeningEvent) at the respective positions in the captured stream,
the unique name is sufficient; The Log containing the HappeningEvents is now
used to start CepGP to find a rule which might give hints to the cause of the
Happening from the recorded event stream (here: the found rule expresses the
occurrence of an alarm for very high temperature in a room and within 2min
after that a following smoke alarm of a sensor in the same room, therefore, the
Happening might be a fire)
August 28, 2016
27
4 General Approach
next section generally describes how Genetic Programming can be applied to Rule Learning
and afterwards this chapter presents the general idea behind Genetic Programming.
4.2 Applying Genetic Programming to Rule Learning
As described in 2.3 on page 15, Rule Learning is a classification problem at the core and
Pedro, Ventura, and Herrera give an overview over the field of Genetic Programming in
Classification problems in [10].
Figure 4.2: Applications of GP in classification tasks (from [10])
They summarize the applications of Genetic Programming in this field of research in figure 4.2
with the three groups of
• Preprocessing which concerns itself with transforming the original data to enhance
its utility for the Genetic Programming algorithm. Feature selection obtains the relevant attributes and optionally weighs them according to their importance. Feature
construction creates new predicting attributes as a combination of present ones.
• Model extraction is the actual classifier induction where the most suitable classifier for
a given outcome is searched. The model is the representation of the classifiers and
ranges from decision trees over classification rules, discriminant functions and more.
• Ensemble classification which is used to find a group of classifiers to deal “with different
patterns or aspects of a pattern embedded in the whole range of data, and then
through ensembling, these different patterns or aspects are incorporated into a final
prediction.”[10]
Out of these groups, this thesis is most concerned with the model extraction which is depicted
in figure 4.3 on the next page. As mentioned before, there is a variety of models to choose
from to represent the structure of the classifiers, each of these models shown in the figure
are more suitable to the shown classification goals. This thesis aims to derive rules in the
context of Complex Event Processing and therefore chooses to further investigate the rule
classification.[10]
28
Norman Offel
4.2 Applying Genetic Programming to Rule Learning
Figure 4.3: Model extraction with GP according to [10]
In rule classification, the algorithms are generally distinguished by whether they are more
suitable to differ between two classes or more than two. The binary classification is done by
encoding a rule as an individual of a population. One rule is sufficient to decide both classes,
the data that fulfills the rule is one and the other class consists of the individuals failing the
condition of the rule. The best individual is the final result of the Genetic Programming
algorithm.[10] The binary classification is the underlying concept of CepGP, the algorithm
proposed by this thesis.
There are two properties a Genetic Programming algorithm has to have to work properly:
Sufficiency and Closure. Sufficiency is the property of having all means to fully represent all
possible solutions to the problem via the given functions and terminals. Closure is the property
of having only functions that are able to process all possible inputs they might receive.[10]
This property consists of two sub-properties: type consistency and evaluation safety.([39] p.
21) Since Genetic Programming is based on the evolutionary operations, like crossover which
combines arbitrary parts of the individuals to form new solutions, type consistency ensures
that the new solution can be evaluated. The operators within the solution (or individual)
need to be able to process the result of their operands, whatever it is.
Closure can be hard to fulfill when there are different types and operators requiring specific
subsets of all types as their input and otherwise will not work properly. In these cases,
one solution is the use of Strong-typed Genetic Programs (STGP).([12] p. 146ff.)[32]([23]
p. 479ff.) Here, the population initialization and the evolutionary operations like crossover
and mutation are altered in a way that they only produce valid rules in a sense that these
constraints for the operators are taken care of. Another possible solution is the so-called
Booleanization [44, 12] where all terminals are only allowed to return Boolean values and
functions only process and return Boolean values. The third approach to the closure problem
is using grammars to describe valid rules and enforce them. If there is a violation of the
constraints, remedying that fact can be done via the fitness function which maps a low
fitness value to these invalid individuals. There is also the way of repairing the individual by
removing the invalid parts and optionally replacing them with newly randomly generated but
valid parts. ([12] p. 146)
Complex Event Processing rules have additional features as a result from their main goal to
process stream data. These features, like unique operators, attributes, windows and func-
August 28, 2016
29
4 General Approach
tions, add more complexity and constraints to the rule representation and the evolutionary
operations in the Genetic Programming algorithm which need to use one of the proposed
strategies to fulfill the closure property.
4.3 Constraints in Genetic Programming
In section 2.1 on page 4, this work laid out the background of Evolutionary Computation
and briefly presented the main members. These members distinguish themselves mainly by
their choice of problem representation, the genotype. The genotype defines how a solution
to the problem is encoded in the Evolutionary Computation algorithm. The phenotype, on
the other side is the representation of the problem in its own domain. In this thesis, the
phenotype is a Complex Event Processing rule with all the concepts of the general language
presented in section 2.2 on page 11. The genotype is the transformed version of this rule into
a tree structure that is used in the domain of the Genetic Programming algorithm. As already
mentioned before, CEP rules require constraints to treat their complexity and interrelation
of the rule components. There are mainly three approaches to induce such constraints as
domain knowledge into the genotype and thus into Genetic Programming: simple structure
enforcement, strongly-typed GP and grammar-based constraints.([39] p. 51ff.)
Simple structure enforcement as the name implies, already lays out a basic structure of
the solution and allows the algorithm to evolve components within the structure freely.
This thesis uses this approach to ensure the basic structure of the rule and that certain
functions and terminals are restricted to a specific component (separation of window
and condition for example). The initial population can be created with individuals
always following these constraints. But crossover and mutation need to be adapted
to not mix these components as well. Another way to do this, is to separately evolve
these components.([39] p. 52)
Strongly-typed GP is another way of inducing constraints when solutions to the problem
already impose types in the phenotype. Terminals are typed and functions have types
for their parameters they accept and an output type. Every part of the evolutionary
process needs to be adapted to this type system to ensure no violation: from the initial
population creation, to crossover and mutation.([39] p. 52f.) After each evolution of a
solution, every function needs to have parameters of their expected types. This thesis
heavily uses this approach within the condition of a rule to best guide the search for
the best rule and to focus on the actual problem space of all possible rules instead of
allowing invalid rules which cannot be evaluated.
Grammar-based constraints are mainly used in the form of rewrite and production rules.
The initial population is created in a way that each individual can be produced using the
given grammar. Crossover and mutation need to consider the grammar to also follow
the imposed constraints. The subtrees under the specific variable of the grammar
can only be substituted during crossover or mutation by another subtree of the same
30
Norman Offel
4.4 Evolutionary Operations
variable. Another way is to use Grammatical Evolution where numbers decide the
option of the variable production for this specific individual.([39] p. 53ff.) Both ways
are alternatives for the proposed way in this thesis and could be investigated in future
works.
4.4 Evolutionary Operations
The representation of a solution in the domain of the Genetic Programming algorithm, also
known as genotype, is a tree. Therefore, the evolutionary operations crossover and mutation
have to be able to transform trees in a way that they can fulfill their purpose. But first, this
section starts by presenting the selection algorithm to choose the individuals that perform
crossover, proceeds by explaining the crossover operation and concludes with the mutation
operation.
4.4.1 Selection
Although there are two selection stages in Evolutionary Computation: mating selection and
environmental selection (see 2.1 on page 4), in Genetic Programming there is only the
mating selection. The purpose of the environmental selection of conglomerating the new
and previous generation into the next one is done by a crossover rate which allows some
individuals to survive into the next generation if they are not crossed at that time and by
elitism that let the absolute best individuals pass from one to the next generation.
In GP, as in every other Evolutionary Computation algorithm, choosing individuals which are
allowed to contribute to the next generation is probabilistically based on fitness. The fitter an
individual the higher are its chances to be selected and perform crossover. CepGP employs
the widely used tournament selection. Tournament selection holds a tournament between
a defined number of randomly selected individuals of the population (the tournament size).
The fittest of these individuals is the winner of the tournament and is granted the privilege of
reproduction. Each tournament selection produces one selected individual, so that crossover
always needs two tournaments to be held to decide the parents.
Goldberg proposes additional selection methods to be used in [16] (p. 121f.): roulette wheel
selection (or stochastic sampling), deterministic sampling, expected value model and more
which can also be applied in Genetic Programming.
4.4.2 Crossover
The crossover operation in Evolutionary Computation combines multiple individuals of a
population into a new individual. The crossover operation in Genetic Programming differs a
lot from the biological prototype because in Genetic Programming the parts each individual
August 28, 2016
31
4 General Approach
exchanges can be at very different places than they were in their original individual. However,
preserving the place of the exchanged information can be achieved by homologous crossover,
namely the one-point crossover that chooses a common node of both parent individuals and
preserves the original places of the exchanged subtrees.([39] p. 44) Uniform crossover works
by going through common regions of the parent individuals and randomly chooses at each
node whether this node is taken from one or the other parent. This allows a better mix in
nodes closer to the root. ([39] p. 44f.) Poli et al. present more crossover strategies in [39]
(p. 45f.). None of them is used in CepGP but they may be further investigated in future
work.
CepGP uses the most popular crossover strategey: subtree crossover as shown in figure 4.4
on the next page. As for every crossover strategy, subtree crossover needs two parent
individuals which are chosen by a selection algorithm. Within each parent, a random node is
selected to be the crossover point. The offspring of the crossover operation is constructed
by replacing the subtree under the selected node of the first parent with the subtree under
the selected node of the second parent. To enable both parents to produce more than one
offspring in their original form, crossover operations are performed on copies of the parents.
It is also possible to produce two offsprings out of one crossover operation by also replacing
the selected subtree of the second parent with the selected subtree of the first one.([39] p.
29f.)
4.4.3 Mutation
Mutation is a small random change in the genotype of the individual and is used to introduce
new information into a population. Poli et al. present the most popular mutation strategies in
[39] (p. 42ff.). The most common mutation strategy in Genetic Programming is the subtree
mutation which replaces a random subtree of the individual with a new randomly created
tree. It is easy to implement because it uses the same mechanism as subtree crossover,
except that the subtree from a second individual is replaced by the new randomly created
tree. However, this mutation strategy potentially has a big impact on the individual. That is
why this thesis uses another approach, called node replacement mutation or point mutation
illustrated in figure 4.5 on the facing page.
Usually, mutation is probabilistic for every individual in the population at each generation.
For each individual there is a chance it undergoes mutation. If it does, a random node
of the individual is chosen to be the mutation point. Subtree mutation would replace the
subtree under the chosen node with a completely new subtree. Point mutation, on the other
hand, creates only a new node that is able to replace the selected node in the original while
preserving the subtree under the replaced node. This is a much smaller alteration of the
original individual and is closer to the original idea of mutation as a minor change (see 2.1
on page 4). Other interesting mutation strategies are
32
Norman Offel
4.4 Evolutionary Operations
Figure 4.4: Subtree Crossover; The selected node in the first parent is the multiplication
operator in the right subtree; The selected node in the second parent is the
subtraction operator in the left subtree; during crossover the subtree under the
subtraction operator (highlighted in blue) replaces the subtree under the multiplication operator (highlighted in yellow)
Figure 4.5: Point Mutation; the randomly selected multiplication operator (highlighted in
yellow) in the right subtree is replaced by a randomly created division operator
(highlighted in blue) while preserving the original operands (3 and 4)
August 28, 2016
33
4 General Approach
• Hoist mutation which creates a new offspring as a random subtree of the original
individual resulting in a shorter solution. This can counteract bloating of solutions
(increasing number of larger solutions with ongoing evolution). ([39] p. 43)
• Shrink mutation as a special subtree mutation where the replacing subtree consists of
just a terminal to shorten the solution and to counteract bloat.([39] p. 43)
Both of these are not used in CepGP but may be of interest in future work. It is possible and
often beneficial to have different mutation strategies within a single Genetic Programming
algorithm. But it is desirable to apply only one at a time.([39] p. 42)
4.5 Summary
When installing and running a Complex Event Processing system, domain experts are sometimes confronted with strange or unusual meterings of the sensors that provide the CEPsystem with primitive events. Domain experts also could discover occasionally occurring
happenings outside of the meterings but in the same environment that is monitored. CepGP
is an algorithm that supports domain experts with their search for the cause of these happenings. All it needs is a record of the captured events of the environment that also contains
the occurrences of the happenings. These occurrences are marked by a unique event type
that does not need any more information than the event type name itself. In the best case,
sensors already captured these happenings, otherwise the domain expert needs to manually
insert the happening events at the right places to the best of her knowledge.
CepGP is a type-safe and structure enforcing Genetic Programming algorithm, that uses the
information from the record (or log) of the event stream including the happening events to
derive the most appropriate rule that implies the happening. The condition of the rule may
produce the happening but it does not say that the happening always only occurs when these
conditions are met. However, the goal of this rule is to provide the domain expert with hints
leading to the cause of the happening.
This chapter provided the scenario of this thesis and basic information about Genetic Programming in the field of Rule Learning and about the application of evolutionary operations
on trees. These valuable information will help to follow the CepGP algorithm in the next
chapter.
34
Norman Offel
5 CepGP – The Genetic Programming
Algorithm
After exploring the principles of Complex Event Processing (CEP) and Evolutionary Computation, it is time to combine both worlds to achieve the goal of automated rule discovery for
a given event.
Figure 5.1: General process of CepGP
The overall process of this work, called CepGP, can be seen in figure 5.1. It is based on
historical recorded temporal event stream data. This file needs to contain the complex events
of the type for which CepGP shall find a most appropriate rule. If these events have not
been recorded automatically, it is possible to include them manually into the data. Only a
unique name for the event type is necessary, no time or attributes are considered for this
specific event type. The more accurate the positions of these complex events in the stream,
the better is the result of the algorithm. CepGP reads this file of events and begins the
preparation phase which pre-processes the data and extracts valuable and needed general
information about the event types, their attributes, value ranges and so on for the actual
search process which begins by building the initial population. This step takes the general
information into account to build individual rules as trees with conditions, a window and an
action. The result is the first generation of the evolutionary process during which only valid
individuals are created and evaluated.
Each individual is graded according to the blended fitness of three optimization objectives:
Condition fitness quantifies the quality of the condition part of the rule. The more it fires
at the wanted places the better.
Window fitness rates the size of the window independently of the window type. The
smaller the window size the more efficient the rule and the better the window.
35
5 CepGP – The Genetic Programming Algorithm
Complexity fitness grades the structure of the rule. The simpler the rule the better. Since
the only structural dynamic part of the rule is the condition part, it effectively grades
the structure of the condition subtree.
The blended total fitness uses weights to largely prioritize the condition fitness over the other
two. The complexity fitness only has a very minor impact on the overall fitness while the
window fitness is more important. These objectives are not disjunctive. The evaluation of
the condition is also dependent on the events it gets through the window to decide whether
to fire or not. But if the condition fitness is almost the same, the rule with the fitter window
is preferred. If even after that the total fitness is basically identical then the simpler rule is
fitter and more preferable.
After each evolution, the new generation will be graded like this and each evolution consists
of three basic steps:
Selection with Elitism: An elitism rate determines the number of the fittest individuals
that survive the evolution step into the next generation. The other individuals are
generated via crossover during which the mating pairs are chosen via a selection algorithm.
Crossover: A crossover rate determines how often individuals mate. The first individual is
chosen through the selection algorithm. If it is destined to mate then another individual
is selected and an offspring is generated via the crossover method and inserted into
the next generation. If it is destined not to cross, no second individual is selected and
the first selected individual moves over to the next generation unchanged.
Mutation: After the next generation is built via the previous steps, each individual undergoes
a minimal invasive change by a probability determined through a mutation rate.
The evolution takes place for a given number of times and the best individual of the last
generation is the result of the search process.
Genetic Programming was once invented to optimize programs which are represented as a
tree of operators and operands that can be altered by evolutionary operations like crossover
or mutation and evaluated according to their result to determine their fitness. CepGP uses
this approach by applying the operator tree optimization concept to a tree representation of
a CEP rule as can be seen in figure 5.2 on the next page.
This chapter first presents the representation of the components of a rule, their individual
needs and requirements and which parts are to be optimized. Afterwards, it explains each
step of CepGP in more depth in the order of the process.
5.1 Rule Components
The composition of a rule in Complex Event Processing (CEP) was already discussed in
section 2.2 on page 11. Each component can be seen as a node within a tree where the rule
36
Norman Offel
5.1 Rule Components
is the root. Direct children of the rule root node would be the condition, the window and
the action. The general tree representation of a CEP rule is shown in figure 5.2.
Figure 5.2: Rule components; components which are part of the optimization are highlighted
in red; affected components are colored in yellow; static parts are gray
The most important part of a rule is its condition on which it will execute the action. The
window defines how many and which recent events will be considered in the evaluation of
the condition. In the following sections, each component is discussed in the context of the
Genetic Programming approach. It describes the encoding of the components to a part of
the individual which can be processed by the algorithm. The order in which the components
are discussed represents the order of execution during the evaluation of a rule to enable a
better insight into the reasons for the design decisions.
5.1.1 Window
The first step in rule evaluation is to determine the events which will be given to the condition
which in turn decides whether it fires or not. In section 2.2 on page 11, this work already
presented ways for CEP to decide which and how many events are processed during rule
evaluation. In this work a window is a mandatory part of the rule. Windows consist of two
features: type and value. The type of a window specifies how it will determine which events
are within its boundaries and the boundaries are given by the most recent event and the
value. It can either be given by the count of events from the most recent one backwards up
to the value, e.g., a window of type length with the value specifying the number of the most
recent events to take into account for rule evaluation, or it can be given by a time span from
the most recent event backwards, e.g. a window of type time with the value specifying the
maximum of time unit steps between the most recent event and the older ones. Figure 5.3
summarizes these differences.
Figure 5.3: Windows have a type and a value. The interpretation of the value depends on
its type.
August 28, 2016
37
5 CepGP – The Genetic Programming Algorithm
As presented in figure 5.2 on the preceding page, the window is one of the nodes in the rule
tree that is part of the optimization process. As such, it is necessary to examine ways for the
Genetic Programming Algorithm to combine windows in crossover or mutation. Windows are
different from nodes of the condition and cannot be meaningfully combined with those nodes.
Thus, the algorithm has to ensure that only windows can be combined with other windows.
This is called type-safety in Genetic Programming as already explained in chapter 4 on
page 25. Enabling reasonable evolutionary operations that alter the individual in the window
in all cases except if the other window is the same, means that windows of different types
should be combinable. More on how windows are processed during evolution can be found
in section 5.4 on page 50.
5.1.2 Condition
Besides being the most important part of a rule, the condition is also the most complex one.
In its simplest form, it checks whether the most recent event equals a certain event type and
fires in that case. However, in most scenarios the rule will need to incorporate several event
types and combinations of them. This can be exploited as an operator subtree under the
rule with the root being the condition. The combinations of event types are the operators
and the event types are the leaves. The formed event condition tree (ECT) will determine
whether the attributes of the event instances which contributed to the successful evaluation
will be further examined to decide if the rule as a whole fires or not. This leads to another
very similar operator tree. But instead of event types it combines attributes and therefore
constitutes the attribute condition tree (ACT). Although, it is often useful to inspect the
attribute values of the events within the evaluation, in contrast to the ECT, this subtree
of the condition part of a rule is optional and may be omitted. The total picture of the
rule after extending the condition component with the subtrees for the event conditions and
attribute conditions can be seen in the figure 5.4. Both, the ECT and the ACT are now
discussed in more detail in the following sections.
Figure 5.4: Rule Components; components which are part of the optimization are highlighted
in red; affected components are colored in yellow; static parts are gray
38
Norman Offel
5.1 Rule Components
Event Conditions
Event conditions represent the actual rule. After the window determines the events under
test, the rule will hand over the events to the Event Condition Tree (ECT) to decide whether
to fire or not. The ECT is the representation of the event conditions as a subtree within the
tree representation of a rule. An example is displayed in figure 5.5 where the root of the ECT
is the logical and(∧) operator which has two operands as children in the tree, represented
as an event of type A and an event of type B.
Figure 5.5: Example of a rule with an Event Condition Tree (ECT) with the logical andoperator (∧) as its root and the operands being the events with types A and
B
This figure shows the overall composition of the ECT. There are two kinds of nodes: operators
and event types. The root of the ECT can be either of those. Event types are always leaf
nodes and vice versa whereas operators are intermediate nodes which can be a root but never
leaf nodes. While event types as leaf nodes never have children, operators need to allow to
have other operators or event types as children to build more complex rules as visualized in
figure 5.6 on the next page.
The figure shows an example where the root is the logical and-operator and its children are
the event type A and a sequence-operator(→). Furthermore, the sequence-operator also
has two children: the event types B and C. This flexibility of the nodes in the ECT allows
complex and simple event conditions to be easily representable as an ECT. The nodes and
its children are very loosely coupled. Operators accept any possible ECT node, e.g. event
type or operator, as a child. This is a necessary property of the ECT to enable easy and
consistent crossover and mutation, as this work will further discuss in section 5.4 on page 50.
Attribute Conditions
First, the rule determines via the window which events participate in the evaluation and then
hands them over to the event conditions, represented as the ECT. If the ECT was successfully
August 28, 2016
39
5 CepGP – The Genetic Programming Algorithm
Figure 5.6: Example of a more complex event condition represented as an ECT; the displayed
rule in in-order output: (A ∧ (B → C))
evaluated and if the rule also imposes attribute conditions on the events to decide whether
to fire or not, then the rule will afterwards pass the actual events that contributed to the
successful evaluation of the ECT over to the attribute conditions to analyze their attribute
instances and to decide whether they meet the specification of the attribute conditions. Thus,
in a string representation of the rule both, the event conditions and attribute conditions,
are connected via a logical and-operator (∧). The attribute conditions are very similar in
structure to the event conditions and can therefore also be represented as a subtree of the
rule, called the Attribute Condition Tree (ACT). Since the attribute conditions examine the
state of event instances which lead to the positive outcome of the ECT evaluation, there
needs to be some kind of referencing between the event types in the ECT and the attributes
in the ACT. This can be done with a naming scheme that uses the event type and an
enumeration to build unambiguous references such as A0 for the first appearance of an A
and A1 for the second in figure 5.7. These unique identifiers can then be used within the
ACT to access the correct attributes by dereferencing the identifiers to the event instance
and read the value of the named attribute. However, this building of references only needs
to take place when there is an ACT in the rule.
Figure 5.7: Example of a rule with an ECT and an ACT (highlighted); the displayed rule in
in-order output: (A as A0 → A as A1) ∧ (A0.attribute = A1.attribute)
40
Norman Offel
5.1 Rule Components
Although the operators to compare attributes distinguish from operators in the ECT, the ACT
still resembles the ECT in many ways. The root of the ACT can either be a logical operator or
an attribute comparison operator. Operators cannot be leaves, too, and attributes are always
leaf nodes. Another common and important property of both trees is the loose coupling of
logical operators nodes and their children. In the ACT, as well as in the ECT, the logical
operators are not concerned about their children all being logical operators, comparison
operators or a combination of both. Nevertheless, it is not meaningful to mix event conditions
with attribute conditions in either tree. Therefore, the algorithm needs to take care of using
only event types or event condition operators in the ECT and only attributes or attribute
condition operators in the ACT. Another difference between ECT and ACT exists in the
leaf nodes. Within the ECT, only event types are allowed to be leaves, whereas in the
ACT there are more than one kind of attribute. In figure 5.7 on the facing page the ACT
uses event attributes, which are the values of the named attributes of the referenced event
instances. However, attributes can also be compared to constant values which do not need
to be referenced. From a validity point of view, there is no harm in comparing two constants.
But from the perspective of meaningful conditions, this should be avoided since it does not
add information to the rule and in worst case even results in the whole rule not firing at all.
The latter case might happen when for example two unequal constant values are compared
using the equals-operator. Hence, the algorithm might only allow constant values as the
second operand of any attribute condition operator. The first operand shall always be an
event attribute.
As with ECTs, ACTs can also become more complex by adding logical operators like ∧, ∨
and ¬ to the numerical comparison operators. Figure 5.8 illustrates an example with such a
more complex ACT and constants as the second operands of the comparison operators.
Figure 5.8: Example of a rule with a more complex ACT containing logical and numerical
comparison operators; the attribute attribute1 of the event instance referenced
by the alias A0 has to be less than 5 or greater than 10
Although this extension adds more possibilities to represent a broader range of rules, it also
adds constraints to the ACT that the algorithm needs to consider during evolution. Now,
there are three kinds of nodes within an ACT, all with different requirements concerning
August 28, 2016
41
5 CepGP – The Genetic Programming Algorithm
their combinations. What remains is attributes, no matter if constant or event attributes, are
always the only valid leaf nodes. Numerical comparison operators can be root or intermediate
nodes, but never leaf nodes, and they only are allowed to have attributes as children to be
meaningful. The result of a comparison operator is boolean (true or false), hence, the
parent of such a comparison operator cannot be another comparison operator but only one
of the logical operators (∧, ∨ and ¬). Logical operators, on the other hand, can only
have comparison operators or further logical operators as children within the ACT but never
attributes. Table 5.1 summarizes the constraints the ACT nodes have to abide to.
Node
Attributes (constants or event attributes)
Comparison operators (<, >, ≤, ≥, =, 6=)
Logical operators (∧, ∨ and ¬)
Can be ACT-root
×
D
D
Children
None
Attributes
Comparison or logical operators
Table 5.1: Summary of constraints to ACT nodes
In general, attribute conditions should be able to assert the truthfulness of different types
of attributes. This would allow for an even broader range of rules such an algorithm could
cover. On the other side, this would impose even more constraints and different operators
the algorithm would need to take care of. Thus, to inspect a broad range of attribute
comparisons and arguably reflect the majority of attribute comparisons in Complex Event
Processing, and at the same time keep the constraints for the algorithm within reasonable
boundaries, this work limits the attributes to be numerical.
5.1.3 Action
The action of a rule is the effect when the evaluation of the event conditions and the optional
attribute conditions result in a positive outcome. Regarding the concept of the optimization
algorithm to find a rule that might explain the circumstances of an event to appear, the
action can be some placeholder to complete the necessary components of a rule. The action
is not part of the optimization process and as such does not need to fulfil any requirements
except to be there. However, this does not mean that the action could not be used for some
extensions to the algorithm, e.g. post-processing tasks. The action concludes the processing
pipeline as shown in figure 5.9 and this work continues to look into ideas for evolutionary
operations on the chosen representation of the rule, in particular the window, the ECT and
the ACT.
Figure 5.9: Processing pipeline of a rule in CepGP
42
Norman Offel
5.2 Preparation Phase
5.2 Preparation Phase
This work presents a Genetic Programming algorithm as a general concept for Complex
Event Processing rules that does not need a lot of configuration to work. All it needs is
recorded temporal event data in the order they appeared including the complex events for
which the algorithm is supposed to find a rule. The event data has to have a timestamp, a
type and optionally attributes. As already mentioned, this work relies on numerical attribute
values, although other types may be considered in future works. Each event of the same type
has to have the same number of attributes with the same names. The mentioned complex
events only need the same special name, every other information is not necessary for those
events. If the complex events were not recorded then they can be inserted into the historical
data. Since this may lead to slightly inaccurate data, the result of the algorithm needs to be
interpreted more carefully.
At the start, the algorithm needs the file of the event data stream and the name of the
complex event for which the algorithm is supposed to find a rule. While parsing the file,
it should keep track of the set of event types, their attributes with their names and range
of observed minimum and maximum values. It also should remember the minimum time
interval between two consecutive events as well as the time interval between the first and
last event in the stream and the total amount of events. This is all valuable and needed
domain knowledge which will be exploited during the optimization process which can be
collected during the parsing of the data to minimize computational overhead. Figure 5.10
summarizes the steps of the preparation phase.
Figure 5.10: Given a file with historical temporal data which contains the complex events for
which the algorithm will try to find a rule, extract domain knowledge during the
data parsing which will be exploited via the Genetic Programming algorithm
The collected information about the time intervals will prove to be helpful for creating
windows for the rules and for grading the window of a rule. The same goes for the number
of events which will also be necessary to grade the condition of a rule. The event types with
their attributes are much needed information to build rules at all and the value range of the
attributes can be used to choose good constant values within the attribute conditions of a
rule.
August 28, 2016
43
5 CepGP – The Genetic Programming Algorithm
5.3 Initial Population Creation
Although Genetic Programming is all about the evolution of trees by combining individual
trees to new and hopefully better solutions, it has to start somewhere. The starting position
is already very important and can prove to be more or less beneficial for the evolutions to
come. The quality of the end result drastically depends on the first individuals that largely
influence the coming generations of the evolutionary algorithm.
The goal of the initial population creation is to build only valid individuals which can be
evaluated and graded and to produce individuals from very different regions of the problem
space. It will profit from the preparation phase and use the information gathered during that
time.
A rule consists at least of the window, the event condition tree and the action. Some rules
should also consider the attributes within an attribute condition tree. Since the action is just
a placeholder and does not contribute to the evaluation of a rule, this makes it two or three
components to be created for each rule.
There are two major strategies and a combination of both which is widely used in Genetic
Programming. The Ramped half-and-half initialization, as the name suggests, uses the full
initialization to create one half and the grow initialization to create the other half of the
individuals of the first generation randomly. The full initialization creates fully filled trees
which always have the predefined maximum depth. The grow initialization brings forth
partially filled trees with a depth less than or equal to the predefined maximum. Both
methods together result in a population with very different individuals what will lead to more
promising results during the evolution process. All of these strategies cannot be applied to
the rule as a whole because of the constraints of the individual components. Thus, different
initialization methods are used to generate each component individually and randomly while
adhering to the specific requirements.
5.3.1 Window Creation
A window consists of two properties: type and value. There are two types of windows in
Complex Event Processing: length and time windows. The length window considers a certain
amount of previously encountered events, where the value represents the number of events.
The time window considers the events within the time interval of the current event back to
the amount of a given time unit represented by the value (see figure 5.3 on page 37).
To build a random window, the algorithm first randomly selects a type while length and time
are equally probabilistic. Depending on the type, it then randomly chooses a value between
the minimum and maximum of the type. These boundaries come from the preparation phase.
For the length windows the minimum value is 1 and maximum value is the total amount
of events within the source file. The minimum time value is the encountered minimal time
interval between two consecutive events in the source file and the maximum value is the
44
Norman Offel
5.3 Initial Population Creation
time interval between the first and the last event. The time unit is either equal for all time
windows or differs randomly while also keeping the time boundaries. This concludes the
creation of a window which is random but still follows the constraints to form a valid and
reasonable window in the given scenario by using the information available from the historical
event data. Figure 5.11 shows the creation process as a program flowchart.
Figure 5.11: Program flowchart of the window creation process; the boundaries are given by
the minimum and maximum values depending of the type available from the
preparation phase; rand() returns a random value within [0, 1)
5.3.2 Event Condition Tree Creation
While the window is not a tree, the earlier mentioned initialization methods full, grow and
Ramped half-and-half can be applied to the event condition tree (ECT). In this work, the
ECT is created via the Ramped half-and-half method.
The ECTs of the first half of the initial population are produced by using the full method
where the resulting ECT is a full tree with the predefined depth. When it has reached the
maximum depth the algorithm only allows a random event type as a terminal. The set of
event types is known from the preparation phase. If the depth of the current node has not
reached the maximum depth then the algorithm chooses randomly in equal distribution from
the function set which consists of the event condition operators of Complex Event Processing
(sequence (→), and (∧), or (∨), not (¬) and the excluding sequence (a → ¬b → c)) and
continues to produce the operands as subtrees until the maximum depth is reached. Before
the maximum depth, only operators are allowed. At maximum depth, only event types are
allowed. The full method for ECTs is illustrated in figure 5.12 on the next page
The other half of ECTs of the population is created using the grow method which results
in partially filled trees. It allows event types as terminals before the maximum depth is
reached. Originally, the grow method draws a specific primitive equally from the primitive
set (functions and terminals, or in this case operators and event types). However, to be less
August 28, 2016
45
5 CepGP – The Genetic Programming Algorithm
Figure 5.12: Program flowchart of the full initialization method for the ECT
sensitive to the size of the terminal set – since the function set is fixed –, this work uses
a slightly different approach. At the root, the chance to choose a random event type is as
probabilistic as to choose one of the operators. But the chance to select an event type grows
linearly with the depth and is 1 when the maximum depth is reached to ensure that the tree
will not exceed it. This grow method is shown in figure 5.13 on the facing page
5.3.3 Attribute Condition Tree Creation
The attribute condition tree (ACT) is an optional part of the rule and may be omitted during
the creation process. CepGP uses an ACT-rate to determine how many individuals are being
equipped with an ACT in this phase. Whether ACTs prevail from generation to generation
is decided by the fitness of rules with and without an ACT and is therefore guided by the
optimization process itself. There are no guarantees of the portion of individuals with an
ACT within the generations from the initial population onwards. Depending on the given
data and the problem at hand, this rate can be adjusted to facilitate a better outcome of
the optimization process. To find a faint lead to the best rule when there are a lot of event
types, a total absence of ACTs in the optimization can be favourable whereas in scenarios
with few event types but with a lot of attributes, missing out ACTs may not lead to good
results.
Since the ACT is also a tree, basically the same initialization methods as with the ECT can
be applied. However, CepGP only uses the grow method for ACTs because it generally is
uncertain whether ACTs are helpful in smaller or bigger sizes, if at all. If bigger and more
46
Norman Offel
5.3 Initial Population Creation
Figure 5.13: Program flowchart of the grow initialization method for the ECT
complex ACTs prove to be beneficial then the fitness of the individuals with an ACT is
better. Thus, from generation to generation more and more individuals have ACTs and the
probability of individuals interchanging parts of their ACTs during crossover rises to create
bigger ACTs. The growing of trees in Genetic Programming is a known problem called bloat
which in this case can even be exploited to lead to better results or at least steered via the
ACT-rate for the initial population.
The function set of an ACT consists of the logical operators. The terminal set contains
the numerical comparison operators in CepGP, although other types than numbers may be
compared in future works. ACTs may also have a different maximum depth from the ECT
depending on the problem at hand. The creation of an ACT is shown in figure 5.14 on the
next page.
Although a comparison operator is a terminal within the ACT, it still is an operator requesting
operands to be complete. Comparison operators are always binary and the operands are
either constant or event attributes which are tightly coupled to the event types of the ECT
of the same rule. To compare the values of events, the ACT needs to refer to attributes of
events that correspond to the event types within the ECT. That is why, apart from constant
attributes, only event attributes of the event types of the ECT of the same rule are allowed
inside the ACT. To enable valid and useful ACTs, CepGP differs between the first and the
second operand of the comparison operators. The first operand is always an event attribute.
August 28, 2016
47
5 CepGP – The Genetic Programming Algorithm
Figure 5.14: Program flowchart of the initialization of an ACT using the grow method
48
Norman Offel
5.3 Initial Population Creation
The second operand can either be an event attribute or a constant within the observed
interval during the preparation phase of the first operand’s event attribute. This premise for
comparison operators improves the initial rules by preventing comparisons of two constant
values which are not valuable for the rule and only would allow a lot of never firing rules.
Another advantage of this approach is that the constant value can be chosen in a way that
fits into the value range of the event attribute of the first operand that it is compared against.
The whole initialization of a comparison operator is shown in figure 5.15.
Figure 5.15: Initialization of a comparison operator of an ACT; the first operand is always
an event attribute; the second is randomly either a constant or another event
attribute
Generally, it is possible to choose a constant value randomly between the minimum and
maximum value of the first operand. Since this approach does not limit the constant values,
theoretically, it covers all possible ACTs. But depending on the amount of attributes, their
value’s ranges and actual value distribution, it might take a lot of generations and individuals
to bring forth good ACTs. To reduce the number of possible constant values but still cover
a reasonable amount of possible ACTs, CepGP uses equidistant steps within the given value
range. Including the minimum and maximum value of the event attribute, it randomly can
choose three more values: min + i/4 · (max − min) where i is a random integer within [0, 4]
which results in the minimum value for i = 0, maximum value for i = 4 and three additional
possible constant values equidistantly in between. These chosen values can reasonably be
used in context of the comparison operators and enhance comprehensibility of the ACT from
a users point of view.
August 28, 2016
49
5 CepGP – The Genetic Programming Algorithm
5.4 Evolutionary Operators
After describing the chosen representation of a rule and its components as a tree and showing
the conditions that have to be met to eventually evaluate the rule and determine its quality,
this section concerns itself about the evolutionary operations already presented in section 2.1
on page 4 and their application on the rule as a tree. It follows the processing order of the
evolutionary operations during evolution in CepGP and therefore starts to discuss the selection
method, continues to look into the crossover method and finishes with the mutation method
which all operate on a generation and hence need the population initialization presented in
section 5.3 on page 44 to take place before the evolution can begin. The speciality of the
evolutionary operators in this work lies within the constraints they have to adhere to. Each
component has different needs to be taken care of, hence, the operators are explained for
each component individually but still work on the rule as a whole.
5.4.1 Selection
Selection is used to determine the individuals which participate in producing the next generation. The Tournament Selection is widely used, simple and allows configuration to be
adapted to different problems. The amount of individuals to be compared to decide the
winner of selection process is determined by the tournament size. For a size of two it results
in the following operations:
1. Select a random individual of the current population
2. Select another random individual of the current population
3. The fitter of the individuals is the winner of the tournament and propagates its genes
on to the next generation
However, since CepGP usually uses a large population for each generation, it is better to
bring over the absolute best individuals for sure and not leave it up to pure chance. The
mechanism to achieve this is called Elitism. It determines the best n individuals of the current
generation and allows them to survive into the next generation. The best n individuals in
CepGP is determined by an elitism rate of the population size. This rate should be very small
and applied to the population, it rescues the absolute bests while allowing the crossover to
produce enough other new individuals and keep up diversity. Other individuals than the elite
are produced in the crossover process which uses the selection mechanism to choose the
parents of the offsprings as individuals of the next generation.
There are a lot of selection algorithms available which can prove to be a better choice for
determining the individuals that contribute to the next generation. It remains a task for
future works to analyze the benefits of them.
50
Norman Offel
5.4 Evolutionary Operators
5.4.2 Crossover
The crossover operation combines individuals to produce the next generation out of the
information of the current one. As with selection, there are also a lot of algorithms for
crossover of which some are presented in chapter 4 on page 25. However, this evolutionary
operation needs to consider the representation of the individuals and therefore has to work
well with trees to be a viable option for CepGP. For this reason, CepGP adapts the TreeCrossover and introduces mechanisms to handle the constraints of each component of a
rule by ensuring type-safety. During any crossover, only one component is effected and the
others are, if no need arises, left untouched. The general crossover process can be seen in
figure 5.16 on the following page.
As long as there are less new individuals than the population size of a generation, CepGP
produces new ones by crossover. The next new individual is produced by first selecting an
individual with the selection method aforementioned. A crossover rate determines whether
this individual mates with another individual or if it survives as is into the next generation. If
it is destined to mate, first, the other mating partner individual is chosen with the selection
method as well and after that, both individuals give parts of their information material on
to a newly formed offspring. This offspring is then part of the new generation. When the
number of individuals in the new generation is equal to or greater than the population size,
the next generation is build by:
• Accepting the absolute best from the previous generation. The number of individuals is
computed with the elitism rate discussed in the selection section 5.4.1 on the preceding
page.
• The remaining individuals to fill up the population of the next generation are taken in
order of their creation from the new individuals just generated
Another point to consider here is the crossover rate that determines how often individuals
are crossed with other individuals. If the selected first individual is not to be crossed then
it survives the generation as is and is part of the next generation. This means, that even
without the elitism in CepGP very fit individuals, which are likely to be selected as first
individuals more often, are already likely to survive the current crossover round when the
crossover rate is comparably low to usual crossover rates in other evolutionary algorithms.
However, to enable a stronger convergence a high crossover rate is necessary which in turn
reduces the probability of individuals to survive through the crossover process alone. This
can be dealt with by elitism as is done in CepGP which also allows the crossover rate to be
even higher than normal because the fittest individuals of each generation survive anyways.
As mentioned earlier, the speciality of this algorithm arises from the needs of each rule
component which are to be considered in the mating process. When two individuals are
selected to mate, the crossover algorithm of CepGP selects a random crossover point of the
first chosen individual. The available crossover points are indexed in figure 5.17 on page 53.
August 28, 2016
51
5 CepGP – The Genetic Programming Algorithm
Figure 5.16: General Crossover of CepGP with Elitism
52
Norman Offel
5.4 Evolutionary Operators
Figure 5.17: CepGP crossover point indexing of an individual with an ECT and an ACT; the
red numbers are the indices of the nodes that can serve as a crossover point;
the yellow numbers are the indices of nodes in their component
By randomly choosing an index, not only the node at which to be crossed (crossover point)
is determined, but so is the component, too. This allows to specifically draw a random node
of the same component of the second individual. In this way, the type-safety is ensured and
only nodes which are compatible for crossover are selected and in this manner only produce
valid offsprings. When in this example the first random index is 3, this means that the
crossover point is an event type and part of the ECT. The crossover point of the second
individual can only be a valid node of an ECT (either an event type or an event operator),
now. The algorithm can deduct the component from the index by building the index like
this:
1. Compute the number of nodes in the ECT and ACT
2. Select a random integer between 0 (including) and the total number of nodes (excluding) which is calculated as 1 for the window node + number of ECT nodes + number
of ACT nodes with 0 if there is no ACT.
3. Calculate the component and the index within that component according to figure 5.18
on the next page
4. Cross the component of both individuals by choosing a random node of the same
component of the second individual
After selecting the crossover point of the first individual, the crossover point of the second
individual is drawn accordingly from the component of the first crossover point. To enable
the parents to produce more than one offspring, the algorithm copies them before crossover.
Only the component of the index is changed. Every other component of the rule is taken
from the first individual to preserve their validity.
August 28, 2016
53
5 CepGP – The Genetic Programming Algorithm
Figure 5.18: Deducting the component from the crossover point; the crossover point of the
first individual is either the window, the ectIndex or the actIndex
Crossing a Window
If the component is the window, the algorithm needs to ensure the validity of the crossover
result. This implies the window value range of each window type and, if wanted, the window
attributes of each type. In CepGP, there are two cases to consider:
Equal types of the windows to be crossed. In this case the algorithm keeps the common
type for the offspring and takes the value of the second individual. If the window
has additional information like the time unit, it is taken from the second individual to
ensure that the value is still within boundaries of the type.
Unequal types of the windows to be crossed. Now, the type of the window of the offspring
is the one from the first individual. The value shall be acquired from the second
individual. However, different types have different boundaries and ranges. So, even if
the value is within the boundaries, it can have a different meaning when the ranges
differ. To enable this crossover and to not loose information, a conversion from one
range into the other is needed. CepGP takes the position of the value within the range
of the original type and translates it linearly to the rounded equivalent position of the
target type. Even though, the position in both ranges is equal, it still might hold
different results since the meaning of the type is different. Depending on the temporal
distribution of the events in the event stream, a time window with a value equivalent
to the amount of time a certain amount of events are encountered, yields a different
result than a length window with the certain amount of events as value. Assuming
an event stream where events arrive irregularly in time and the average time for five
events to arrive is 3min. Applying a length window of value five is bound to yield other
results then a time window of value three and time unit minutes. That is not to say
that the 3min time interval is equivalent in its position in the range of possible time
intervals to the length five in the range of possible length windows.
54
Norman Offel
5.4 Evolutionary Operators
Both cases are equally probable and the resulting offspring of this kind of crossover has the
ECT and ACT of the first selected individual and the newly created window.
Crossing an ECT
The ECT consists of event types as leafs and event operators. Since the ECT is a tree, the
basic algorithm CepGP follows is the subtree crossover algorithm. In general, it chooses a
random node of the ECT of the first individual and replaces it and its underlying nodes with
a random subtree of the second individual. Figure 5.19 on the following page illustrates the
process.
If two ECTs are to be crossed, first, the crossover points within the ECTs of both individuals
have to be chosen. The aforementioned calculation of the crossover point within the whole
rule allows to trace the crossover point of the ECT of the first individual, meaning the index
of the node within the ECT and in this example the event with type B. The algorithm
now selects a random node from the ECT of the second individual to be the crossover point
of this individual. In this example, the index is 0 which represents the root of the ECT,
the →-operator. The algorithm now combines both ECTs by replacing the subtree under
the crossover point (B) of the first individual with the subtree under the crossover point of
the second individual (the tree with the →-operator as its root), both highlighted in violet
boxes. The algorithm uses copies of the first and second individuals selected to not modify
the originals and allowing them to be part of multiple crossovers unaltered. This version
of subtree crossover discards the rest and proceeds with the general crossover algorithm as
explained in figure 5.16 on page 52. However, it is generally possible to produce a second
offspring the other way around: Replacing the subtree under the crossover point of the
second individual with the subtree under the crossover point of the first individual. This
way, only half the iterations for crossovers are necessary since this approach always generates
two new individuals. But the chance for a higher diversity within the population is lower
than by producing only one individual per crossover operation. If a tree is fit enough to be
selected more often than eventually it might happen that the roles between first and second
individual are switched in another crossover to produce the other yet discarded offspring.
The resulting ECT is always valid and can be evaluated, even if an operator is replaced by an
event type. But the rule validity depends on more than just the ECT. ACTs use references
to the event types present in the ECT. Therefore, if there are changes in the ECT, it is
generally necessary to check if the belonging ACT shows inconsistencies by referencing any
event types that are not used in the ECT anymore. How CepGP handles this situation is
explained in section 5.4.2 on page 60.
Crossing an ACT
The ACT consists of comparison operators as leafs and logical operators as intermediate
nodes. The attributes as operands of the comparison operators are no additional nodes
August 28, 2016
55
5 CepGP – The Genetic Programming Algorithm
Figure 5.19: Subtree crossover of ECTs; the violet boxes represent the selected subtrees;
the numbers are the indices of the nodes within their ECT; the bright yellow
number is the crossover point of the individuals (B in the first individual with
index 2 and the sequence operator (→) in the second individual with index 0);
the subtree under the sequence-operator replaces the subtree at the crossover
point of the first individual; the offspring consists of the ∧-operator and the A of
the first individual, the second operand of the ∧-operator is the inserted subtree
of the second individual; the nodes taken of the first individual are highlighted
in orange and the nodes of the second individual are highlighted in red
56
Norman Offel
5.4 Evolutionary Operators
within the crossover but part of the comparison operator as one node. The attributes
themselves can thus not be chosen as a crossover point and do not have their own index for
that matter. As a tree, the ACT uses a very similar subtree crossover algorithm compared
to the ECT. A random node of the ACT is chosen to be the crossover point of the first
individual and replaced by a random subtree of the ACT of the second individual. However,
in contrast to the ECT, the ACT is optional and may not exist for either one. By using
the algorithm presented in figure 5.18 on page 54 to deduct the crossover point, the first
individual has to have an ACT to cross it. Hence, only the second individual may not have
an ACT. In this case, CepGP uses the subtree under the crossover point of the first individual
as the new ACT of the offspring. Figure 5.20 on the following page illustrates the case with
both individuals having an ACT and figure 5.21 on page 59 shows an example with a second
individual without an ACT.
In the first figure 5.20 on the following page both examples have an ACT which are valid
because it references existing aliases (B0 or A0, for example, reference the event types A and
B respectively). Refer to section 5.1.2 on page 39 for a detailed explanation of the structure
and references within the ACT. When the algorithm for choosing a crossover point of the
first individual selects a node of the ACT, it also calculates the index of the node within
the ACT which are displayed in the boxes in the top right corner of the ACT nodes. In this
example, the chosen node is the one with index 2 which is the less-than-operator highlighted
via a violet box. The attributes as operands belong to the operator and make up one node
for the ACT which is why the attributes are also inside that box. The ACT of the second
individual consists of a negation-operator and a greater-than-operator with index 1 which
is chosen to be the crossover point. When both individuals are crossed at their respective
ACT crossover points, the first individual passes down its complete ECT to the offspring and
the parts of the ACT that are not part of the selected subtree. In this example, these are
the or-operator with index 0 and the equals-operator with index 1. The subtree under the
crossover point of the second individual then replaces the subtree under the crossover point
of the first individual. The resulting ACT of the offspring now consists of the passed down
parts from the first individual highlighted in orange and the parts from the second individual
highlighted in red. This ACT is still valid because it references events present in the ECT.
The second figure 5.21 on page 59 presents an example of a second individual without an
ACT. The selected crossover point of the first individual is the same as in the previous
example. However, there is no crossover point of the second individual because it does
not have the optional ACT component in its rule. CepGP handles this case by choosing
the subtree under the crossover point of the first individual as the ACT of the offspring.
The example shows the crossover point as a bright yellow number of the less-than-operator
and the subtree via a violet box surrounding this operator and its attributes. The offspring
now gets this subtree as its ACT. As with the previous example, the first individual also
passes down its ECT to the offspring. This way ensures a valid ACT where all used aliases
reference an existing event type within the ECT Since the whole ECT is used, there can be
no inconsistencies in the subtree of the original ACT.
August 28, 2016
57
5 CepGP – The Genetic Programming Algorithm
Figure 5.20: Subtree crossover of ACTs, both individuals having an ACT; the indices of the
nodes of the ACTs are presented in boxes in the top right corner of the nodes;
crossover points have a bright yellow background; the subtrees for crossover are
highlighted via a violet box in both individuals; the nodes of the first individual
are colored orange and the nodes of the second individual are colored red
58
Norman Offel
5.4 Evolutionary Operators
Figure 5.21: Subtree crossover of the ACT, without ACT in second individual; the indices
of the nodes of the ACTs are presented in boxes in the top right corner of the
nodes; the crossover point has a bright yellow background color at its index;
the corresponding subtree is framed with a violet box; the ACT of the offspring
equals the selected subtree of the first individual
August 28, 2016
59
5 CepGP – The Genetic Programming Algorithm
Repairing the ACT
The examples of the previous section explaining the crossover of ACTs where chosen to only
display valid resulting ACTs to show the principle of crossover in CepGP. Although these
examples work out, there are offsprings with inconsistent ACTs which comprise references to
aliases of event types that were once there but are no more. Figure 5.22 shows an example
when crossing ACTs and an example when crossing ECTs can be seen in figure 5.23 on the
facing page.
Figure 5.22: Example of a broken ACT after crossover of ACTs; The offspring references a
non-existing alias (C0) after the crossover
The ACTs of the first and second individuals are valid because they only use aliases to events
which are present in their respective ECTs. In figure 5.22 the crossover points are both ACT
roots, the less-than-operator of the first individual and the equals-operator of the second
individual. According to the description of the ACT crossover algorithm earlier presented,
the equals operator replaces the less-than-operator which results in the depicted offspring.
Although the original ACTs are valid in respect to their belonging ECT, the result ACT of
the offspring is inconsistent with the ECT in this example. There is no alias with name C0
within the ECT of the offspring and thus this event attribute can never be resolved.
60
Norman Offel
5.4 Evolutionary Operators
Figure 5.23: Example of a broken ACT after crossover of ECTs; if the ECT is to be crossed,
the first individual passes down every other component to the offspring including the ACT; after replacing the event type A with alias A0 the ACT shows
inconsistencies in the aliases A0 now referring to a non-existent event type in
the ECT
August 28, 2016
61
5 CepGP – The Genetic Programming Algorithm
Figure 5.23 on the preceding page presents an example of an inconsistent ACT after crossover
of the ECTs. Again, both individuals, the first and the second, comprise valid ACTs within
themselves. However, after the event type A with the alias A0 of the first individual is
replaced with the event type C with the alias C0 of the second individual, the event attributes
using the alias A0 in the ACT of the offspring are now invalid because this alias cannot be
resolved to an actual event.
In both cases, the whole rule is invalid. The question now is, how to deal with possible
inconsistencies:
• Leave the ACT inconsistent and deal with the rule by means of the fitness function.
This implies some kind of punishment function or factor and in general a specific
handling of faulty individuals.
• Discard individuals with inconsistent ACTs, meaning that they will not be added to
the new population.
• Generate a completely new ACT each time there is an inconsistent one.
• Repair the ACT by leaving valid parts as they are and generate new attributes where
the existing ones are faulty.
CepGP seeks to produce valid individuals only which are easier to handle and allow to generate
more meaningful results. The strong coupling between ECT and ACT demands a test of the
ACT for validity after each modification of the ECT. If the ACT shows such flaws CepGP
repairs the ACT with as few changes to the original as necessary. This allows more individuals
to be processed and is more reliable concerning processing time than discarding faulty rules
entirely or generating new ACTs every time the original ACT is broken. The ECT, on the
other side, is not dependent on the ACT and does not need to be repaired.
Coming back to the example in figure 5.22 on page 60, the inconsistency lies within the
alias C0 in the ACT which needs to be repaired. CepGP detects these inconsistencies by
calculating the disjunctive set of the used aliases within the ACT and the available aliases
present in the ECT. If there are aliases in the first but not in the second set then there is
an inconsistency. However, it is entirely possible and valid to have aliases in the ECT which
are not used in the ACT. It is actually common to not use all of the available aliases. The
only alias used in the ACT of the offspring is C0 and the available aliases from the ECT are
A0 and B0. Since C0 ∈
/ {A0, B0}, there is an inconsistency. CepGP now searches every
attribute comparison operator which uses a broken alias. This general repairing algorithm is
outlined in figure 5.24 on the facing page.
Repairing a comparison operator in CepGP follows the algorithm depicted in figure 5.25 on
page 64. If the first operand uses one of the aliases previously identified as invalid according
to the algorithm from figure 5.24 on the facing page, then a new random and valid attribute
is generated. CepGP generates this operand the same way it generated the first operand
of an operator when an ACT is build (see section 5.3.3 on page 46). Now, CepGP checks
whether the second attribute is a constant or another event attribute which uses one of the
broken aliases. If it is neither then it is a valid attribute and can be left as is. In the first
62
Norman Offel
5.4 Evolutionary Operators
Figure 5.24: General algorithm to repair an ACT
case, CepGP additionally checks whether the value of the constant is still within the value
range of the first attribute which might have changed. If it is not, then a new constant value
is calculated in the same way a constant value is selected in the ACT creation process (see
section 5.3.3 on page 46). The constant value is updated to ensure a meaningful rule which
might be able to fire. Constant values outside of the range of the first attribute may lead
to conditions in the ACT that prevents the whole rule from firing. If the second attribute
is an event attribute which uses a broken alias, independently of whether the first attribute
also does, CepGP creates another random second operand as explained in the ACT creation
section ( 5.3.3 on page 46).
This repairing algorithm presents a minimal intrusive way that changes as little as necessary
to enable all crossover results, whether they are already valid or not, to participate in the
optimization process without turning them into other individuals entirely.
5.4.3 Mutation
After the next generation is constructed via the crossover process depicted in figure 5.16
on page 52, CepGP randomly selects individuals out of the next generation which undergo
a minor alteration process, called mutation. The general mutation process is illustrated in
figure 5.26 on the next page.
August 28, 2016
63
5 CepGP – The Genetic Programming Algorithm
Figure 5.25: Algorithm to repair the broken aliases within a comparison operator of an ACT
Figure 5.26: General mutation algorithm
64
Norman Offel
5.4 Evolutionary Operators
The algorithm visits each individual of the next generation and decides at random by a given
mutation rate whether the currently visited individual gets mutated. There are several mutation algorithms which are suitable to tree representations which are presented in chapter 4 on
page 25. CepGP follows the original idea of mutations as very minor changes to an individual
and thus uses the point-tree-mutation. In this kind of mutation, the algorithm chooses one
random node of a tree and changes this node only, without altering the subtree or parent
nodes. Since CepGP always seeks to ensure type-safe and valid trees, this mutation algorithm can lead to inconsistencies and thus may need repairing steps which are not displayed
in the figure of the general mutation algorithm.
CepGP chooses the node to mutate in the same way it chooses crossover points (see 5.4.2
on page 51). In general, mutation is the only mechanism in CepGP to introduce entirely
new or lost information into the next generation. This helps the process of exploration of the
problem space, in this case the exploration of all possible rules with the given information
from the preparation phase. Each component uses mutation similarly to alter individuals and
add more diversity to the population in their respective components.
Mutating a Window
When the window of a rule is chosen to be mutated, CepGP creates either a random new
length or time window for the rule which are equally probable. With this, CepGP incorporates
new windows into the population. Otherwise, only the windows within the initial population
can be exchanged during the evolutions from generation to generation. Since the window
determines the events on which a rule is applied to, it can play an important part in the
overall fitness of the rule, apart from the actual window fitness. Thus, the mutation rate
should not be too low to enable more windows to be evaluated but in the spirit of general
evolutionary algorithms, the mutation rate should also not be too high to not disrupt the
inherent optimization within the process.
Mutating an ECT
Since the ECT is a tree itself, the same point-tree-mutation is applied to it. The mutation
point within the ECT is determined in the same way as the crossover point. During the
mutation the node that corresponds to the mutation point is replaced by either a
random event type , where the original subtree under the replaced node is discarded, or a
random event operator , where CepGP uses the children of the original node as children
of the new one if present. Otherwise it generates a new random event type for every
additionally needed child. Another approach can generate new subtrees as children
with the grow or full method and a maximum depth that is equivalent to the initial
maximum depth. But since CepGP interprets mutation as a minor change to the
original, it uses only event types as new children.
August 28, 2016
65
5 CepGP – The Genetic Programming Algorithm
Every other component of the rule or node of the ECT is left unaltered. However, as with
every change in the ECT, CepGP needs to investigate the ACT for inconsistencies in the
used aliases and whether their references still exist. If it encounters inconsistencies, CepGP
applies the repairing algorithm presented in 5.4.2 on page 60.
Figure 5.27 shows an example of a mutation of the ECT. The chosen index within the ECT
is 0 which represents the root node, the ∧-operator. The mutation algorithm now generates
a random ECT-node, either another ECT-operator or an event type. In this case, it is a
∨-operator which replaces the chosen ∧-node in the original while preserving the original
operands of the tree node, the event types A and B. If an event type was to replace the
original ∧-node, then the operands would have been discarded since an event node is always
a leaf node. If an event type was to be replaced by an operator for example, missing operands
would have been generated as random event types only. Operators are not generated as new
operands to keep the mutation as a minor change.
Figure 5.27: Example Mutation of the ECT; The selected ECT node is the root with index
0 (highlighted in brigh yellow) represented by a ∧-operator (colored in red); the
randomly generated node to replace th chosen node is a ∨-operator highlighted
in blue; the mutated individual consists of all the information from the original,
but the chosen node was replaced by the newly generated one while preserving
the original subtree (here the nodes A and B)
Mutating an ACT
The mutation of the ACT basically follows the point-tree-mutation as described in the ECT
but extends the process due to its higher complexity and constraints.
The first difference exists in the presence of an ACT since it is optional. If there is no
ACT in the individual at hand, CepGP adds an additional artificial index to the mutation
point calculation. If the result of the calculation equals this artificial node, CepGP creates a
random comparison operator with random but valid attributes as described in the Attribute
Condition Tree creation section 5.3.3 on page 46 and makes it the new Attribute Condition
Tree root of the individual where there was none before.
66
Norman Offel
5.5 Fitness Calculation
But even if there is an ACT in the individual, CepGP differs between logical and comparison
operators as mutation points. If the mutation point represents a logical operator, CepGP
works the same as with the ECT and replaces this node by another random logical operator
and uses the original children as the respective children of the new operator or CepGP replaces
the original logical operator with a new random comparison operator.
Because comparison operators are not only leafs of the ACTs but also small trees with a
height of 1 themselves, CepGP uses another round of point-tree-mutation for mutating the
comparison operator. First, CepGP selects a random node within the comparison operator as
the actual mutation point, which can either be the operation itself or one of the attributes.
Either of these nodes is then replaced by a new random node of the same type (operator or
attribute) by still producing valid ACTs as a result. The comparison operators are all binary.
Hence, they have the same amount of operands what obviates the need to produce potentially
needed additional attributes. Concerning the attributes, CepGP distinguishes between the
first and the second operand when building new ACTs (see 5.3.3 on page 46) and also
during mutation for the same reasons. The first operand is never a constant value but only
an event attribute to prevent meaningless conditions in the ACT which means mutating the
first operand of a comparison operator also produces another event attribute. The second
operand can be both, an event attribute or a constant value which is dependent on the first
operand and its value boundaries. Through this rather complex mutation algorithm of the
ACT, CepGP ensures type-safety and meaningful conditions. It is the only way to introduce
new or lost attributes into the ACTs of a population. Therefore, the mutation rate should
be adjusted according to the number and variety of attributes in the inspected event stream
to allow a decent exploration of the problem space during the optimization process.
Figure 5.28 on the following page shows an example of a mutation of a comparison operator.
This figure displays the second stage of the comparison operator mutation, where the index
of the components of this operator is shown and either the comparison operator itself or one
of its operands are chosen to mutate. In this case the highlighted first operand A0.a0 is to
be mutated. A new valid attribute is generated. The A0.a1 is another attribute of the same
alias A0 and replaces the original attribute in the mutated individual.
5.5 Fitness Calculation
Quantifying the quality or fitness of an individual is one of the most important and deciding
tasks in Evolutionary Computation. The fitness determines whether an individual is selected
and, thus, able to pass over its own information in crossover to the next generation to
ultimately participate in the search of the optimal solution. The fittest individual is the one
closest to the optimal solution. Because the fitness is of such importance and a core part of
the optimization algorithm, CepGP uses a more sophisticated way of determining this value.
Fitness in CepGP is a blended value build from three different values, each quantifying the
fitness of a part of the rule and each of different importance to the overall fitness. Hence, It
is the weighted sum out of the condition fitness, the window fitness and complexity fitness,
August 28, 2016
67
5 CepGP – The Genetic Programming Algorithm
Figure 5.28: Example Mutation of the ACT; Here the comparison operator was chosen to be
mutated and the second round of point mutation is shown; within the comparison operator tree, the first operand A0.a0 (highlighted in red) is to be mutated
and replaced by a newly generated valid attribute A0.a1 (colored blue) in the
mutated individual
68
Norman Offel
5.5 Fitness Calculation
whereas the condition fitness is by far the most important measure, followed by the window
fitness. The complexity fitness has a minor impact to distinguish rules which otherwise would
have equal fitness.
There are more ways to achieve this so-called multi-objective optimization which remain for
investigation in future research.([39] p. 75ff.) The following sections describe the fitness
functions to quantify the fitness values and how they are blended together to form the total
fitness of an individual.
5.5.1 Condition
The quality of the rule is based on the so-called binary classification, meaning that the events
in the event stream either fulfill the requirements of the rule or they do not. The goal for
each rule is to classify the marked complex event, meaning the special event type that marks
the happenings within the stream, as the only event that fulfills the requirements of the rule
into one class. The other class consists of the other events that are not the marked complex
event. This is the most important objective of the optimization by far and it depends on the
condition part of the rule and the window which decides which events are considered while
computing the condition.
CepGP analyzes this classification by removing the marked complex event from the stream
for the rule evaluations and remembering the indices of the original positions. The result of
the rule on the altered event stream is the indices of the positions where it fired. The rule
is expected to insert complex events only at the places where the original marked complex
event is in the original event stream. Thus, CepGP compares the resulting indices of the rule
with the indices of the marked complex event in the original event stream.
In the so-called Receiver Operating Characteristics (ROC) analysis, there are four basic
measures that can be derived from the classification result:
True positives (TP) is the number of times the rule fired at the right positions.
True negatives (TN) is the number of times the rule does not fire when it is not supposed
to fire.
False positives (FP) is the number of times the rule fires when it is not supposed to fire.
False negatives (FN) is the number of times the rule does not fire when it should fire.
Just like TP, TN are correct classifications and oppose FP and FN which are false classifications. There are some additional information about these numbers:
1. The four numbers add up to the total number of events in the stream:
T P + T N + F P + F N = #events
2. There can only be as many TP or FN as there are instances of the marked complex
events and their sum is equal to the number of marked complex events (CE):
T P + F N = #CE
August 28, 2016
69
5 CepGP – The Genetic Programming Algorithm
3. Conversely, TN and FP add up to the number of events that are not the marked
complex event which can also be infered from 1. and 2.:
T N + F P = #events − #CE
4. By comparing the resulting indices of the rule with the original indices of the marked
complex event, CepGP can infer TP, FP and FN directly (see figure 5.29):
• TP are the correct indices
• FP are the indices that are in the result of the rule but not in the set of indices
of the marked complex event
• FN are the indices that are in the set of indices of the marked complex event but
not in the result of the rule
TN can be calculated from 1. as: T N = #events − T P − F P − F N
#Events
TN
Rule Result
FP
#CE
TP
FN
Figure 5.29: Relation between TP, FP, FN and TN; The outer circle represents the total
number of events, the yellow circle the indices as the result of the rule and the
red circle the original indices of the marked complex event; any index not in the
yellow or red circle is a member of TN; any index in both, the yellow and the
red circle, is a member of TP; any index in the original indices but not in the
rule result is a FN; any index in the rule result but not in the original indices is
a FP
Since there usually are only a few marked complex events in the event stream compared to
the overall number of events, the difference #events−#CE is extremely high. Therefore, it
is important to value both measures, the number of times of correctly firing at the positions
where there is a marked complex event (TP) and the number of times of correctly not firing
at places where there is no marked complex event (TN), equally without biasing towards one
of them. Otherwise, CepGP would value rules higher that cover all of the right positions but
also a lot more over rules which have a lot less FP but do not cover all of the right positions.
There are a lot of measures based on the presented TP, TN, FP and FN. But not all of them
are unbiased and therefore not ideal for CepGP. The most prominent measures are (see [41]):
70
Norman Offel
5.5 Fitness Calculation
• The True Positive Rate (TPR), also called Recall or Sensitivity, is the quotient of
the identified right places and the overall number of marked complex events.
TPR =
TP
TP + FN
• The Precision, also called Confidence, determines the part of the identified positives
that are correct.
TP
P recision =
TP + FP
• The True Negative Rate (TNR), also called Inverse Recall or Specificity, is the
number of correctly identified negatives in relation to the total number of events
which are not marked complex events.
T NR =
TN
TN + FP
• The False Positive Rate (FPR), also called Fallout, is the rate of events that got
mistakenly classified as positive.
FPR =
FP
FP + TN
Although these measures are widely used, they focus either on the positive or the negative
classification while ignoring the other classification. This bias is disadvantageous because
the quality of the outcome of the algorithms using these measures depends on the sizes of
the classes.[41] However, there are other measures that combine both classifications (see
[41]):
• The Accuracy compares the number of correctly classified events with the overall
number of events. In contrast to the aforementioned measures, the accuracy takes
both, positives and negatives, into account but it is sensitive to bias and prevalence of
one of them. This is problematic since the negatives are usually prevalent in the event
stream.
TP + TN
Accuracy =
TP + FP + TN + FN
• The F1-score does not take TN into account, although it is in most cases prevalent
over the other three values for fit individuals and it is small for unfit ones. Since it
shows such an important characteristics, TN should not be ignored in CepGP.
F1 =
August 28, 2016
TP
N
T P + F P +F
2
71
5 CepGP – The Genetic Programming Algorithm
• Jaccard, also called Tanimoto, is biased because TN is ignored as well.
Jaccard =
TP
F1
=
TP + FP + FN
2 − F1
• The Weighted Relative Accuracy (WRacc) is an unbiased measure and therefore
suitable for CepGP. Its function value range is [−1, 1] where individuals without a single
correct classification gets the condition fitness of -1, whereas average individuals have
a fitness of 0 and individuals which classify every event correctly deserve a fitness of
1. The following equation ignores the optional weight.
W Racc = T P R − F P R =
FP
TP
−
TP + FN
FP + TN
CepGP uses the unbiased Informedness[41] measure to blend the four values for True
Positives, True Negatives, False Positives and False Negatives together into the interval
[−1, 1] where -1 means that the rule categorized each event in the event stream falsely
either into the positive or negative class. A value of 0 equals the fitness of the average
random rule and a value of 1 equals the optimal result where each event in the event stream
is put into the correct class by the rule.
In the initial population, the average informedness is expected to be around 0. In the generations to come, the average informedness is expected to rise because the fitter individuals
prevail and contribute to new individuals while supplanting unfit individuals more and more,
leading to even fitter individuals overall. The minimum fitness of a population cannot be
predicted because it is often very easy to create individuals which miss the optimal by far,
through both, crossover or mutation.
a(x) = Inf ormedness = T P R + T N R − 1 =
TP
TN
+
−1
TP + FN
TN + FP
“Informedness quantifies how informed a predictor is for the specified condition,
and specifies the probability that a prediction is informed in relation to the
condition (versus chance).”[41]
Figure 5.30 on the facing page explains how ROC analysis works. It compares the True
Positive Rate (TPR) with the False Positive Rate (FPR). The diagonal represents random
individuals while they are as often correct as their are mistaken about the classification of
the events in the stream. Any individual with a higher TPR than their FPR is a good one,
since it is better than the random individuals. Individuals with a higher FPR than their TPR
are considered bad because they are worse than the average random individual. The goal for
any individual is to maximize the area under curve.[41] Both, WRacc and Informedness, are
measures that use this principle in their own way.
72
Norman Offel
5.5 Fitness Calculation
Figure 5.30: Illustration of ROC Analysis (adapted from [41]); Individuals try to maximize
their True Positive Rate (TPR) while minimizing their Fals Positive Rate (FPR)
at the same time and in this way maximize the area under curve colored in yellow
for the good individual; good individuals have a higher TPR than their FPR;
every individual with a higher FPR than their TPR is worse than the average
random rule
5.5.2 Window
The condition fitness outweighs the window fitness by far even though they are tightly
interlinked. A fit window
1. covers as many events as needed for the rule to potentially fire at the right places
2. is as small in size as possible
To put it simple: The window should cover as many events as necessary but not any more.
The quality of a window as part of the rule by the first point, is already integrated into the
condition fitness. The specific window fitness concerns itself about the second point.
A rule has a high window fitness when the window is very small. This quality objective arises
from the way Complex Event Processing works. A bigger window means that more events
have to be cached to allow the rule engine to investigate them in the future during rule
evaluation.
• More events need more memory
• More events need more processing power and time for rule evaluation
August 28, 2016
73
Fitness
0.7
0.6
0.5
0.4
0.3
5 CepGP – The Genetic Programming Algorithm
0.2
0.1
That is the reason why the0 window is also a valuable optimization objective in CepGP. To
max. Size
Sizewindows over large ones,
Time Window
Size of a linear function, CepGP
stronger emphasize reallymin.
small
instead
uses a logarithmic fitness function as depicted in figure 5.31.
1
0.9
0.8
Fitness
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
min. Size
Window Size
max. Size
Figure 5.31: Idea of the window fitness function: b(x) = 1 − log1+max.
min. Size)
Size − min. Size (1 + x −
The general idea of the fitness function is of the form 1 − log(x). It calculates a penalty
value between 0 and 1 which grows logarithmically with the size of the window and subtracts
this value from 1. Doing so results in optimal results for minimal sized windows and in the
worst fitness values for the near maximum sized windows. To optimally use the domain of
window range values, CepGP stretches the function to be exactly 1 only for the minimally
sized window and exactly 0 only for the maximally sized window. This way allows CepGP to
best differ between the fitness of individuals.
This fitness function would work well if the fitnesses of the window types were not interrelated. However, there is a problem in the stretching of the function values over the window
value range. Since there are two types of windows, length and time, each of which with
different value ranges, it is problematic to calculate the fitness of one of the types without
potentially discriminating the other. For example, there can be a small number of events but
their timestamps lead to a wide time window value range. Assuming the events are timely
equidistant to each other which means that the window type of length with value 2 should
have the same fitness as the window type of time with the value of the time distance between two events. Calculating the fitness with the function where the domain is the interval
between the minimum and maximum values of the type at hand alone will yield a different
result than what is expected. The time windows which should have the same fitness are less
fit than the equivalent length window because the function values are stretched according
to the value ranges of the types to map the possible window values to the wanted function
values of [0, 1] in an optimal way. This problem is pictured in figure 5.32 on the facing page.
One solution to this problem is to first linearly convert the value of the window at hand into
74
Norman Offel
5.5 Fitness Calculation
1
0.9
0.8
Fitness
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
min. Size
Window Size
Length Fitness
max. Size
Time Fitness
Figure 5.32: The window problem using timely equidistant events; The fitness of a time
window is lower than the fitness of the corresponding length window even though
they should have an equal fitness
a common range for all windows. After that, the function can be applied to the common
range with the converted window value. This way obtains the wanted result where windows
which are equivalent in meaning are assigned the same fitness value. The common range for
all windows in CepGP is always the length window range because it usually already provides a
reasonable range starting with 1 and in this way only the time windows need to be converted
what can reduce the amount of conversions by half on average.
The linear conversion function into the range of the window of type length is
l(x) =
x−β
· (ω − α) + α
ψ−β
where
• x is the actual window value
• β is the minimal size of the original type
• ψ is the maximal size of the original type
• ω is the maximal size of the length type
• α is the minimal size of the length type.
The window fitness function b applied to the length range [α, ω] with the parameter λ = l(x)
is
August 28, 2016
75
5 CepGP – The Genetic Programming Algorithm
b(λ) = 1 − log1+ω−α (1 + λ − α) = 1 −
log(1 + λ − α)
log(1 + ω − α)
resulting in figure 5.33.
1
0.9
0.8
Fitness
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
min. Size
Window Size
Length Fitness
max. Size
Correct Time Fitness
Figure 5.33: The correct window fitness function which applies conversion and then calculates
the fitness as described; the functions overlap completely which shows that the
fitness values are computed correctly
5.5.3 Rule Complexity
After the condition fitness and the window fitness, the rule complexity fitness is the least
important fitness measurement. If there are individuals with almost identical fitness values in
condition and window fitness then the rule complexity fitness separates them by quantifying
the structure of the rule. It follows the principle “the simpler the better”.
The action is not considered in CepGP. From the point of view of CepGP the window is always
one node not matter what type it is or additional information it may hold and therefore does
not affect the structure or rule complexity. What this fitness rates is the condition tree and,
thus, uses meta information about the Event Condition Tree and the Attribute Condition
Tree like height or number of nodes to calculate the rule complexity fitness value.
To blend the height and number of nodes together, CepGP uses the higher tree of either the
ACT or the ECT and puts it into relation to the total number of nodes from both trees:
z(x) =
76
1 + ρ(x)
τ (x)
Norman Offel
5.5 Fitness Calculation
where
• ρ is the maximum of the height of the ECT and the height of the ACT of the rule x
and
• τ is the sum of the number of nodes from both, the ECT and the ACT of the rule x.
The result is within the interval (0, 1]. It is 1 when there is only one event type as the ECT
root and no ACT in the rule. It cannot be 0 since the numerator is at least 1. Furthermore,
it is always defined because there is always at least one node in the ECT.
If the same event type in the ECT or alias in the ACT is part of unnecessary many conditions
then the rule is more complex than a rule with basically the same condition and window
fitness. CepGP therefore also considers the distinctiveness of nodes in the ECT and the
ACT. This measurement describes how often the same event type in the ECT or the same
alias in the ACT is present in relation to the overall number of nodes within their respective
trees and is calculated as
υ(x)
d(x) =
σ(x)
where
• υ is the number of distinct event types or aliases in the ECT or ACT of the rule x and
• σ is the overall number of nodes in the ECT or ACT respectively in the rule x.
d(x) produces values in the interval (0, 1] and is applied separately to both, the ECT and
the ACT. It cannot be 0 for the ECT since there is always at least one event type in the
ECT. It is optimal with value 1 if every event type is used just once in the ECT. The same
goes for the ACT with the addition that, if there is no ACT in the rule, CepGP assigns an
attribute condition tree distinctiveness value of 1.
The overall rule complexity fitness now consists of three single fitnesses: z(x) and d(x) from
the ECT and the ACT. CepGP calculates the rule complexity fitness as the average between
those single fitnesses:
c(x) =
z(x) + dECT (x) + dACT (x)
3
.
Just like the single fitnesses, the result of this function is within the interval (0, 1]. The
simplest rule consisting just of a single event type in the ECT with no ACT has a rule
complexity fitness of 1. The more nodes and the more repetitions of the same event type or
alias, the lower the rule complexity fitness.
August 28, 2016
77
5 CepGP – The Genetic Programming Algorithm
5.5.4 Total Fitness
The total fitness is the result of the presented condition fitness, window fitness and complexity
fitness. All of them grade one of the optimization objectives for CepGP and need to be
combined into one value which defines the fitness of the whole rule compared to other rules.
CepGP uses a weighted sum to build the total fitness, since the importance of each objective
can easily be determined a priori:
f (x) = αa(x) + βb(x) + γc(x)
with
• α being the weight for the condition fitness calculated by the function a(x) (see 5.5.1
on page 72)
• β being the weight for the window fitness calculated by the function b(x) (see 5.5.2
on page 76)
• γ being the weight for the complexity fitness calculated by the function c(x) (see 5.5.3
on the preceding page).
The most important partial fitness is the condition fitness. It should be weighted a lot heavier
than the other two. In this way, it contributes much more and CepGP first and foremost will
find suitable conditions for the rules. As discussed earlier in section 5.5.2 on page 73, the
window is an integral part of the condition fitness. However, the window fitness grades the
size of the window which is also an objective of the optimization. But it should be weighted
much lighter than the condition fitness. The rule complexity is a minor part of the overall
fitness of the rule and, hence, should also take a minor role in the overall fitness. The weight
for the complexity fitness should be chosen so that only basically identical rules according to
the condition and window fitnesses are affected by that rating.
Ideally, CepGP should only consider the other less important objectives when the rule is
very fit regarding its condition. CepGP accomplishes that by introducing a threshold which
represents the minimum condition fitness value for the window and complexity fitness to
contribute to the total fitness of the rule.
CepGP also normalizes the total fitness back to the interval [0, 1] for the rules with a fitness
value greater than 0 to enable a better interpretation of the results by the user. The
normalization is only done for values greater than 0 because these rules are better than
an average random try and rules with a fitness value less than 0 would benefit from the
normalization even though they should not.
78
Norman Offel
5.6 Summary
5.6 Summary
CepGP is a type-safe Genetic Programming algorithm which needs minimal manual information and a recorded event stream to derive a Complex Event Processing rule which is the
closest to producing the wanted marked complex event which represent the happenings.
In a preparation phase which pre-processes the historical event stream, meta-information are
inferred that help CepGP to identify the problem space. This also allows CepGP to only
produce valid CEP rules that are evaluable as well.
A tree based representation was chosen for the rules which enables Genetic Programming
to use well-known practices for its evolutionary operations crossover and mutation. At each
general level, the tree structure also divides the components from another providing the
algorithm with means to ensure type-safety. CepGP ignores the action part and focuses
on the condition, which is divided into the subtree for event conditions and for attribute
conditions, and the window that can take the form of a length window or a time window.
During the population initialization, a specific number of rules, called individuals, is produced
by
• choosing a random but valid window, either of type length or time and with a value
within the respective boundaries
• creating an Event Condition Tree (ECT) with the ramped half-and-half algorithm and
the information gathered during the preparation phase to produce only valid trees
• optionally creating an Attribute Condition Tree (ACT) with the grow algorithm and
the information available from the preparation phase to attain only valid trees
The algorithm proceeds with asserting the fitness of every individual by calculating three sub
fitness values for the condition, the window and the structure of the rule. Each sub fitness
value is conglomerated into one value while emphasizing the importance of the condition
fitness over every other one. The window size is also factored in to distinguish fit rules
also according to their expected resource consumption. If rules are almost identically fit
considering the condition and the window, then CepGP attempts to favor the simpler rule
over the more complex one.
After the initialization, CepGP continues by applying the evolutionary operations crossover
and mutation to each generation of individuals for a given amount of times. Crossover uses
a selection algorithm to draw a random individual from the population while considering its
fitness so that the fit individuals prevail while unfit rules become extinct. With a certain
chance, CepGP chooses another individual in the same way and mates them both to produce
an offspring for the following generation. During the mating process, CepGP selects a random
node of the first individual from the set of changeable nodes of the rule tree which are the
window as one node and all the nodes of the ECT and the ACT. The crossing point of
the second individual is now drawn from the same component as the first crossing point to
ensure type-safety and valid individuals as results of crossover operations.
August 28, 2016
79
5 CepGP – The Genetic Programming Algorithm
After producing the new generation with crossover, CepGP uses elitism that transfers the
absolute best of the previous generation to the next generation. The remaining number
of individuals for the next generation is filled up with the individuals of the newly formed
generation from crossover in order of their creation. Then, CepGP applies mutation to the
population by inspecting each individual and altering them by a given chance. This alteration
is rare compared to the crossover and affects only a minimal part of the rule to abide by the
original idea of the mutation process.
If the resulting individual of either crossover or mutation is not valid, CepGP attempts to
repair the rule with as few changes as possible to allow each rule to contribute to the
optimization process.
This algorithm’s strong points are the type-safeness and the combination of Complex Event
Processing and Genetic Programming while encompassing a lot of the features of CEP rules
and by translating it into the problem domain of the Genetic Programming algorithm to
derive an optimal rule which is closest to producing the marked complex event from the
given recorded event stream. However, there are still language featurs in CEP which are not
covered in CepGP yet. Table 5.2 on the facing page compares the language specification
of CEP and the supported constructs in CepGP. It remains a future task to introduce the
missing features into CepGP. In section 6.5 on page 98 a few ideas are presented to give
hints into possible directions to add arithmetical operations and aggregation functions to the
CepGP algorithm.
80
Norman Offel
5.6 Summary
Construct
Supported
Event Condition Components
Sequence (→)
And (∧)
Or (∨)
Not (¬)
Excluding sequence (A → ¬B → C)
Attribute Condition Components
Logical Operators
And
Or
Not
Referencing
Alias
Access-operator (.)
Arithmetical Operations
+, −, /, ∗
×
Comparison Operators
<, >, =
≤, ≥, 6=
Aggregation Functions
sum
×
avg
×
min, max
×
D
D
D
D
D
D
D
D
D
D
D
D
Table 5.2: Support of Language Constructs in CepGP
August 28, 2016
81
6 Implementation
To verify the conceptual algorithm presented in chapter 5 on page 35, CepGP was implemented in Java during this thesis as a proof-of-concept work. The goal was to implement the
system independently from any existing platform or framework with the least dependencies
possible to proof the potential of CepGP.
This chapter begins with the requirements emerging from the concept, proceeds with the
input and output specification, describes the programmed rule engine and the implementation
of CepGP. Afterwards, the parameters are presented and the limitations of the implementation
are described. The chapter concludes with a summary.
6.1 Requirements
For the sake of completeness and to verify the approach presented before, the implementation
should use the process depicted in figure 5.1 on page 35:
• Reading and parsing the events recorded in a file (preparation phase)
• Building the initial population according to the presented algorithm
• Using elitism and tournament selection during evolution
• Implement the crossover and mutation operations as described
Additionally, the fitness functions need to be implemented to quantify the fitnesses for the
three objectives for the condition, the window and the complexity. A very important property
of the algorithm is the type-safeness in operations and components of the individual which
the implementation needs to take into account as well.
The most important goal of the implementation is to verify the algorithm, identify strengths
and drawbacks, and enable a fond evaluation of the approach by providing flexibility in the
choice of the algorithms used for selection, crossover, mutation, population building and so
on. It also should draw strength from Genetic Programming to allow promising results with
only a few manual set options, in this case only the input file and the complex event for
which the program shall derive a rule.
82
6.2 Input and Output Specification
6.2 Input and Output Specification
The algorithm starts by reading and parsing the events from the given input file which
includes the complex event for which the algorithm shall derive a rule. During the search
process, the user is supplied with outputs to indicate the status of it. At the end of the
program run, outputs are generated to provide further insight into the work done by the
program.
6.2.1 Input
Events in the input file are organized in one line each and have the following structure:
<yyyy-MM-dd HH:mm:ss>; <event type>; [attributes in the form <name:
value> and each attribute separated by semicolon]
• <yyyy-MM-dd HH:mm:ss>: An event was recorded on a day and time.
• <event type>: An event always has a type.
• [attributes in the form <name: value> and each attribute separated by
semicolon]: Optionally, there are also attributes (with a name mapped to a numerical
value) separated by a semicolon. The name and the value of an attribute are separated
by a colon.
However, each event of the same type has to have the exact same number of attributes with
the same names. But the order of the attributes does not matter.
An example of an event recorded at 10:55:30 AM on the 29th of June 2016, of type “B”
with the attributes: “b2” with value -7.0, “b3” with value -74.0, “ID” with value 2.0 and
“b1” with value 19.0:
2016-06-29 10:55:30; B; b2:
-7.0; b3:
-74.0; ID: 2.0; b1:
19.0
As mentioned before, the marked or special complex events which are the targets of the rule
only need a unique, but among these special events common, type name without any other
information like date and time or attributes. This enables a simpler usage of the program
and the domain expert has an easier time when she needs to add these events manually into
the captured stream.
August 28, 2016
83
6 Implementation
6.2.2 Output
The program generates two output files and an output on the prompt:
• Prompt output: Before the actual execution of the algorithm, it will show a summary
of the parameter settings it uses for this run. During the run, the program will display
the generation number that is currently processed. After the run, the program will
display the best found rule in the prompt including the overall fitness and the partial
fitnesses of condition, window, and complexity.
• generations <date and time of file creation>: It contains the population of
every generation of the evolutions of the Genetic Programming algorithm. The number
of the generation is in one line and the following lines represent the individuals of the
generation in descending order of their overall fitness. Each individual also lists the
partial fitnesses for condition, window and complexity plus the rule representation as
described in section 2.2.2 on page 13. An excerpt of an example file looks like this
(generation 0 indicates the initial population):
0
0.78949 (condition: 0.80000 window: 0.68731 complexity:
((A as A0 → A as A1) ∧ ((A0.a2 = A1.ID) ∨ (A1.a1 >
A0.a2)))[win:time:986Seconds] =⇒ HIT
C
C
0.50000)
0.67870 (condition: 0.68889 window: 0.57745 complexity: 0.61111)
((A as A0 → A as A1) ∧ (A0.ID > 1.0))[win:length:7] =⇒ HIT
...
C
• GPA GENERATIONS <date and time of file creation>: This file contains a summary of all the generations evolved for the specific run of the program. It provides
information about the best and worst individual, plus average and mean condition
fitness of the generation.
Generation: 0
best individual: 0.78949 (condition: 0.80000 window: 0.68731
complexity: 0.50000) ((A as A0 → A as A1)
∧ ((A0.a2 = A1.ID) ∨ (A1.a1 > A0.a2)))[win:time:986Seconds] =⇒ HIT
C
C
C
worst individual: -0.53000 (condition: -0.53000 window: 0.10734
complexity: 0.50000) ((B as B0 ∨ A as A0)
∧ ((A0.ID > 1.75) ∧ (A0.a1 < -49.25)))[win:length:61] =⇒ HIT
mean conditionFitness: 0.00000, avg conditionFitness: 0.02029
...
C
84
Norman Offel
6.3 The Rule Engine
6.3 The Rule Engine
To enable an independent implementation, this work uses its own rule engine to evaluate
the condition of the rules by also building on top of the tree representation of the individual
for the Genetic Programming algorithm. This provides the use of the same representation in
the rule engine and the Genetic Programming implementation which obviates the need for a
conversion between genotype (rule representation in Genetic Programming) and phenotype
(rule representation in the rule engine). The other approach would have been to build a
conversion between the genotype and the phenotype which converts the rule to suit the
representation for the external CEP-system or vice-versa into the representation used in
the Genetic Programming algorithm. The result from the evaluation of the rule on the
training data via the CEP-system would need to be converted as the tuple (TP, TN, FP,
FN) as a feedback to the Genetic Programming implementation so the Genetic Programming
implementation can calculate the condition fitness. The chosen approach is superior in the
sense that it does not need this conversion procedure at all, whereas the other approach
would need it for every rule evaluation on the test data. The use of an external CEPsystem, in most cases, would also need reading and parsing of the training data for every
rule evaluation as well which is another benefit of the chosen approach where it is done once
in the preparation phase.
However, the rule engine still needs to be implemented, too. It needs to be robust, so that
every possible rule can be evaluated, no matter its complexity. Every edge-case needs to
be carefully considered and handled because especially with randomly generated and probabilistically combined rules, there is a high chance that the outcome may contain confusing
conditions for the human eye that also might trouble the rule engine.
Since the genotype and the phenotype are the same, the same tree can be passed from the
Genetic Program to the rule engine for determining the hits and misses and the result can be
fed back to the Genetic Program to calculate the fitness. The general process is illustrated
in figure 6.1 on the next page.
As already mentioned in section 5.2 on page 43, the preparation phase derives metainformation needed for the evolutionary process, but also the original indices of the special
event for which the algorithm shall find a rule. Furthermore, the preparation phase prepares
a version of the event stream from the input without the special event. This version is used
to evaluate the individuals of the Genetic Programming algorithm.
The Genetic Programming algorithm creates new rules during the evolutionary process. To
evaluate the condition fitness of the rule, CepGP hands this rule over to the rule engine.
The rule engine executes the rule on the training data (the event stream without the special
event) as follows:
1. Use the events from the input file (read and parsed once during the preparation phase)
but without the special event for which the algorithm shall find a rule
2. Start with the first event from the input and execute the rule
August 28, 2016
85
6 Implementation
Figure 6.1: Process of the Rule Evaluation; The first step is the extraction of the indices
of the special event and the event stream without this special event during the
preparation phase; The Genetic Programming algorithm creates new individuals
during the evolutions and to evaluate the condition fitness, it passes the rule to
the rule engine; the rule engine executes the rule on the event stream which does
not contain the special event and remembers the indices where the rule would
have inserted a complex event; after the execution of the rule, the rule engine
compares the indices as the result of the rule execution with the original indices
of the special event; The outcome of this comparison is the tuple of the four
key numbers for True Positive (TP), True Negative (TN), False Positive (FP)
and False Negative (FN) which are handed over to the Genetic Programming
algorithm again; The Genetic Programming algorithm uses the condition fitness
function to calculate the condition fitness out of these four numbers
86
Norman Offel
6.3 The Rule Engine
3. Iteratively, add the next events in order, one after the other, and execute the rule every
time the next event is added after pruning older events according to the window of the
rule.
4. Remember the indices of the places after the event that leads to the firing of the rule:
index = 1 + index of event + #already encountered positives
Or in other words: remember the places where the rule would have inserted a complex
event.
5. After the last event from the input was added and the rule was executed, compare
the remembered indices of this rule with the original indices of the special event in
the input and compute the True Positives, True Negatives, False Positives and False
Negatives.
6. Hand over these four key numbers to CepGP, the implementation of the Genetic
Programming algorithm
Afterwards, CepGP calculates the condition fitness from these four key numbers according to
the condition fitness function used in CepGP (Informedness, see section 5.5.1 on page 69).
The rule engine has to implement and support the language specification presented in section 2.2.2 on page 13 to determine when a rule fires. Table 6.1 on the next page compares
this language specification with the implemented constructs.
The most difficult part of the rule engine is the close coupling of the Event Condition Tree
(ECT) and the Attribute Condition Tree (ACT). During the execution of a rule, the ECT
is applied to the events within the window first. The ACT, if present, is only applied if the
ECT fires. To enable the ACT to compute its result, the aliases need to be build to offer
references to actual event instances with attribute values that are accessed and compared
in the ACT. These aliases are build during the application of the ECT on the events. A
map, with the alias being the key and the object-reference to the event instance being the
value, is passed down with each recursive invocation of the operands of each operator in the
ECT. If the current node is an event with the expected type according to the rule, then it
adds itself under the correct alias to the map. After a successful execution of the ECT, this
map is handed over to the ACT which also is executed recursively. If the ACT is evaluated
successfully, too, then the rule fires and the rule engine saves this occurrence as an index as
previously described in the figure 6.1 on the facing page.
With this approach the sequence(→)- and the and(∧)-operator can be implemented as
expected because if these ECT-operators fire, then the aliases are also definitely updated
and refer to an event instance in the map. The or(∨)-operator, however, can fire even if not
every operand evaluates to true. This means, that not every alias is set and thus, the ACT
may not be evaluable. The rule engine remedies this problem by following these steps in the
evaluation of the ACT-comparison-operators (<, >, and =; potentially also ≤, ≥, and 6=):
August 28, 2016
87
6 Implementation
Construct
Supported
Event Condition Components
Sequence (→)
And (∧)
Or (∨)
Not (¬)
×
Excluding sequence (A → ¬B → C)
×
Attribute Condition Components
Logical Operators
And
Or
Not
Referencing
Alias
Access-operator (.)
Arithmetical Operations
+, −, /, ∗
×
Comparison Operators
<, >, =
≤, ≥, 6=
×
Aggregation Functions
sum
×
avg
×
min, max
×
D
D
D
D
D
D
D
D
D
Table 6.1: Support of the language constructs in the rule engine
88
Norman Offel
6.3 The Rule Engine
• Evaluate normally when both operands are dereferencable, meaning they are either
an actual event attribute or a constant value.
• Ignore when one operand is not dereferencable and the other is also not dereferencable
or a constant. This means, that the outcome of the ACT-evaluation does not depend
on the result of this ACT-operator.
• Evaluate to false when one operand is dereferencable and the other is not because
this comparison cannot be evaluated. Since one operand is dereferencable, the user
would expect that the evaluation of the ACT also depends on this attribute. But when
the other operand of the operation cannot be determined then the outcome has to be
a false.
1st Operand\2nd Operand
Dereferencable
Not dereferencable
Dereferencable
Evaluate normally
False
Not Dereferencable
False
Ignore
Constant
Evaluate normally
Ignore
Table 6.2: Decision Matrix for ACT-Comparison-Operator Evaluation; Only the second
operand can ever be a constant value according to the presented algorithm
Table 6.2 summarizes the decision matrix the rule engine uses. The consequences of ignoring
the evaluation of a comparison-operator in the ACT depend on the logical ACT-operator’s
intention.
• The ∧-operator returns true as long as there is no operand evaluating to false. Comparison operators that evaluated to ignore are therefore treated like true results when
compared against a non-ignore (true or false) input value (see table 6.3).
Input
True
True
True
False
True
Ignore
False True
False False
False Ignore
Ignore True
Ignore False
Ignore Ignore
Output
True
False
True
False
False
False
True
False
Ignore
Table 6.3: Decision Matrix for ∧-Operator
• The ∨-operator returns an ignore if both operands also return an ignore. Otherwise,
it evaluates to true as long as one operand is true and false if both operands are false
(see table 6.4 on the next page).
August 28, 2016
89
6 Implementation
Input
True
True
True
False
True
Ignore
False True
False False
False Ignore
Ignore True
Ignore False
Ignore Ignore
Output
True
True
True
True
False
False
True
False
Ignore
Table 6.4: Decision Matrix for ∨-Operator
• The ¬-operator also returns an ignore if the operand evaluates to ignore. Else, it
negates the result of the operand (see table 6.5).
Input
True
False
Ignore
Output
False
True
Ignore
Table 6.5: Decision Matrix for ¬-Operator
The missing ACT-comparison-operators are not implemented due to time constraints of
the thesis. However, they can already be used by combination of the existing comparison
operators. Although, this is often unlikely to happen because the attributes would need to be
the same in the same order for two correctly combined comparison-operators. For example,
≤ can be simulated by combining the < and the = operators with a ∨ logical operator,
where both < and = have to have the same attributes in the same order. Another possible
way would be to negate a > operator to achieve a ≤ operator with only one operator instead
of two. Nevertheless, it is advisable to implement these missing operators and by doing
so enable the Genetic Programming algorithm to create more complex relations between
attributes more easily.
Coming back to the Event Condition Components and the currently unsupported operators
with negation: ¬ and the excluding sequence. In combination with the ACT, the negation
in the ECT can be used to further specify the requirements for an event of a certain type
to not be allowed. For example, (¬(A as A0)) ∧ (A0.a = 0) defines an ECT which fires
whenever an event of a different type than A is encountered and an ACT that specifies
that the overall rule should also be true when the event is of type A but its attribute a
is not equal to 0. This evaluation of the ACT currently would not take place because the
ECT was already evaluated to false if an A was encountered. The problem lies within the
iterative evaluation of the ECT and the ACT. Without the ACT the negation works like
intended. As soon as the ACT uses aliases to events under a subtree with a negation, the
90
Norman Offel
6.4 CepGP
ECT would need to postpone its ultimate evaluation result and first pass the alias-map to
the ACT for further investigation. If the referenced events under negation subtrees fulfill the
requirements within the ACT than the rule should not fire. It gets even more complicated
because the implementation needs to consider a tree with multiple negations in its subtrees.
This contradiction is a remaining problem in the current implementation of the rule engine
and could not be solved due to timely constraints. One possible solution could be to evaluate
the ACT and the ECT synchronously instead of iteratively. Whenever, during the evaluation
of the ECT, an operator of the ACT can be resolved, it should do so and remember these
partial solutions. In this case, the evaluation of the ECT already includes the results of the
ACT. However, this is a complex approach which needs further research.
Arithmetical operators are currently not implemented in the rule engine and so far are also
not part of the conceptual work of the CepGP Genetic Programming algorithm. This remains
a field for future research and could be implemented as a new leaf node type for the ACT.
This adds more complexity to the type-safe property of the proposed algorithm and needs
a thorough understanding of the evolutionary processes involved and the evaluation of rules
with these components. Crossover and mutation would also need to be adapted to meet the
type-safe constraint.
The aggregation functions are neither considered in the CepGP Genetic Programming algorithm nor in the rule engine implementation due to timely constraints. One idea to integrate
them into the algorithm is to add a new leaf node to the ECT which represents the aggregation function. This is a new type and therefore needs to be integrated into the type-system
as well which is a challenging task. Every time the type-system is altered, the operations
based on it like the evolutionary operations, need to be adapted as well.
6.4 CepGP
This section starts by describing the structure of the implementation of the algorithm presented in chapter 5 on page 35. Afterwards, this section explains the implementation of
the preparation phase, the population initialization and the evolutionary operators selection,
crossover and mutation. It then proceeds with the evaluation procedure and concludes with a
parameter description and states the limitations of the current version and their remediation
in the summary.
The composition of the CepGP program, including the rule engine (cep-package), is depicted
in figure 6.2 on the following page. The util-package includes modules and classes that
are used to capsule convenient features mainly for data parsing, initialization, traversing and
altering the trees for events and attributes and classes to handle the meta-information from
the preparation phase. The gp-package contains the classes and packages needed for the
Genetic Programming algorithm. The components and the shown interrelations are explained
during the following sections.
August 28, 2016
91
6 Implementation
Figure 6.2: CepGP Class Diagram; showing the most important packages, their interrelation
and the most important classes where for the main class CepGP and the class
GeneticProgrammingAlgorithm the methods are also displayed
92
Norman Offel
6.4 CepGP
6.4.1 Preparation Phase
The starting point of the program is the class CepGP and its main-method. This method
controls the program flow and initializes the parameters which are presented in section 6.4.5
on page 96. It proceeds with the preparation phase and uses the EventDataParser to
read and parse the input file while also extracting the meta-information. The result of
this step is an instance of the EventHandler which manages the extracted information.
Another product of the preparation phase is a WindowBuilder-instance which contains the
extracted information about the windows from the input file to generate valid windows during
population initialization and mutation.
Afterwards, it creates an instance of the class GeneticProgrammingAlgorithm with the
needed parameters:
• EventHandler contains information about the parsed events from the input file like
the used event types, the attributes, number of events, boundaries for windows and
attributes and so on. As described several times before, these are much needed information for the evolutionary processes and the population initialization.
• ConditionFitnessFunction, WindowFitnessFunction and
ComplexityFitnessFunction are instances of the fitness functions which quantify
the fitness of the individuals (rules) according to their respective responsibilities.
• Crossover is an instance of the Crossover-interface in the crossover-package. As
described in section 5.4.2 on page 51, CepGP uses subtree crossover.
• PointMutation is an instance of the Mutation-interface in the mutation-package
that implements the point mutation as described in section 5.4.3 on page 63.
• elitismRate is a value of type double between [0, 1] describing the portion of the population that will survive according to their overall fitness. For example, an elitismRate
of 0.1 means that the best 10% of each generation definitely survive into the following
generation.
• attributeConditionTreeRate indicates the portion of rules which are initially generated with an ACT as a double value between [0, 1]. This only effects the initial
population. Whether the following generations also include an ACT is up to the evolutionary process to decide. If the individuals with ACTs come out to be fitter then
ACTs will eventually find their way into more rules.
• maxAttributeConditionTreeHeight defines the maximum height of the ACTs in
the initial population. Again, whether higher or smaller ACTs are better for the ultimate
rule is decided according to the fitness.
August 28, 2016
93
6 Implementation
6.4.2 Population Initialization
After the instantiation of the GeneticProgrammingAlgorithm, the program starts the initialization process via the buildInitialPopulation-method of the
GeneticProgrammingAlgorithm to randomly generate the first population of the Genetic
Programming algorithm while producing only valid individuals. This method requires an instance of the PopulationInitializer-interface which uses instances of the RuleBuilder
abstract class (for full, grow or half-and-half initialization), the size of the population and
the maximum ECT height for the initial population. As described in 5.3 on page 44, CepGP
uses the ramped half-and-half initialization method. While doing so, it ensures that
• the ECT height does not exceed the specified maximum.
• ACTs are added to individuals according to the probability specified by the
attributeConditionTreeRate.
• and that the ACT height also does not exceed the specified maximum while being
build via the grow-method.
The initialization of the windows is done with the help of the WindowBuilder-instance.
After this process of generating the first population, the fitness of each individual is measured
according to section 6.4.4 on the next page and the population is sorted by the fitness in
descending order.
6.4.3 Evolutionary Process
Following the preparation phase and the population initialization, the program executes the
loop of evolutionary operations until the number of maximum generations is reached. Each
run of the loop is done by the evolve-method of the GeneticProgrammingAlgorithminstance which follows these steps:
1. Calculate the number of elites that survive this generation into the next one.
2. Build a new generation via the Crossover-instance.
3. Copy the elites of the previous into the next generation.
4. Fill the remaining individuals from the new generation created by the crossover-process
in order of their creation.
5. Execute the mutation operation of the PointMutation-instance on the newly build
generation.
6. Evaluate the fitness of the individuals.
7. Sort the population according to the fitness of each individual in descending order.
The crossover-operation is done as described in 5.4.2 on page 51:
94
Norman Offel
6.4 CepGP
1. A TournamentSelection-instance from the selection-package determins the participating individual.
2. With a given probability (crossover rate), the individual undergoes crossover.
3. The ConditionTreeTraverser helps to select the crossover points and to insert the
subtree of the copy of the second into the copy of the first selected individual. The
crossover point is identified following the proposed process in 5.4.2 on page 51.
4. The AttributeConditionTreeTraverser helps to repair the ACT after the crossoveroperation according to section 5.4.2 on page 60.
The mutation-operation follows the steps described in 5.4.3 on page 63:
1. Iterate over the population and decide for each individual whether mutation takes place
according to the given mutation rate.
2. If the individual is chosen to do so, mutate by means of the PointMutation-instance
which implements the proposed point mutation algorithm.
3. Repair the ACT after the mutation with the help of the
AttributeConditionTreeTraverser as described in section 5.4.2 on page 60.
6.4.4 Evaluation
The implementation of CepGP is done in Java in version 8 to exploit the introduced streamingAPI1 and time-API2 . The streaming-API enables an efficient and reliable concurrent evaluation of the individuals in a population which is illustrated in the following listing:
Consumer<R u l e W i t h F i t n e s s > c a l c F i t n e s s = i n d i v i d u a l −> {
i n d i v i d u a l . c o n d i t i o n F i t n e s s = condFf . f i t n e s s O f ( i n d i v i d u a l ,
eh . g e t W i t h o u t C o m p l e x E v e n t ( ) , eh . g e t I n d i c e s O f C o m p l e x E v e n t ( ) ) ;
i n d i v i d u a l . w i n d o w F i t n e s s = w i n F f . f i t n e s s O f ( i n d i v i d u a l . getWindow ( ) ) ;
i n d i v i d u a l . c o m p l e x i t y F i t n e s s = complFf . f i t n e s s O f ( i n d i v i d u a l ) ;
};
Arrays . stream ( p o pu l a ti o n ) . p a r a l l e l ( ) . forEach ( c a l c F i t n e s s ) ;
The population is stored in an array of the length of the number of individuals therein.
Exploiting the streaming-API on the population means that the individuals are concurrently
processed and for each individual the defined Consumer is invoked. The consumer calls the
fitness functions for the three objectives condition (condFf), window (winFf) and complexity (complFf) respectively and updates the fitness values within the individual. Since,
in CEP, rules do not have a fitness attribute, the only conversion needed between the rule
representation in CEP and in the Genetic Programming algorithm is the RuleWithFitnessclass in the wrappers-package. It inherits all methods and attributes from the CEP rule
representation and adds
1
2
https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
https://docs.oracle.com/javase/8/docs/api/java/time/package-summary.html
August 28, 2016
95
6 Implementation
• Attributes for the fitnesses.
• A condition threshold attribute.
• Attributes for the weights of the window fitness and the complexity fitness (in relation
to the condition fitness) as described in section 5.5 on page 67 and section 6.4.5.
• An implementation of the Comparable-interface3 to enable easier sorting of the population.
• The condition threshold which is the limit the condition fitness of the individual needs
to exceed so that its window fitness and complexity fitness are also factored in.
• A method to calculate the total fitness.
As already described in the section 6.3 on page 85 about the rule engine, the evaluation of the
condition fitness includes the individual and the necessary information about the event stream
without the complex event and the original indices of that specific event. The window fitness
function only needs the window of the individual, whereas the complexity fitness function
determines the tree complexity of the ECT and the ACT of the individual.
6.4.5 Parameters
CepGP is designed to work with the input file and the name of the special event type as
the minimal parameters needed to start and find the most appropriate rule. However, to
tweak the quality and performance of the algorithm and to enable a better evaluation of the
algorithm, CepGP as a program provides several additional optional parameters to be set via
the user.
The minimum parameters are in this order:
1. input file
2. complex event type
3. “true” or “false” if the algorithm shall consider ACTs at all or not
If the user wants to set more parameters, she has to give all of the following ones in order:
4. population size (default: 5, 000)
5. number of generations (default: 30)
6. crossover rate (default: 0.8)
7. mutation rate (default: 0.05)
Currently only alterable in the source code of the program in the class CepGP are the following
further options:
3
https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html
96
Norman Offel
6.4 CepGP
• MAX EVENT CONDITION TREE HEIGHT is set to 1 for the initial population. Thus, in
the first population, individuals have an ECT consisting of either an event type or
one ECT-operator with event types as operands. During the evolutionary process,
the ECTs will inevitably grow by combining their subtrees in the crossover operations,
what is known as bloat. During the 30 default generations, the individuals usually
still have a reasonable length to be interpretable by a domain expert. Furthermore,
unnecessarily long and complex rules are graded with a lower fitness value because of
the complexity fitness function.
• MAX ATTRIBUTE CONDITION TREE HEIGHT is also set to 1 for the initial population for
the same reason as the MAX EVENT CONDITION TREE HEIGHT.
• WINDOW TIME UNIT is set to be ChronoUnit.SECONDS4 , meaning that time windows
of CepGP will always use seconds as the unit for its values since this is the minimum
time unit covered in the proposed input format for events presented in section 6.2.1
on page 83.
• TOURNAMENT SELECTION SIZE is set to 2 which is the minimum and most common
value for the tournament selection size. This value usually is not greater than 5.
• ELITISM RATE is set to 0.1 which means that the best 10% of 5, 000 = 500 individuals
survive at each generation to the next.
• ConditionFitnessFunction is set to the described Informedness fitness function.
However, there are already several more fitness functions implemented in the module
EvaluationMeasures in the evaluation-package.
• WindowFitnessFunction is set to the proposed logarithmic fitness function.
• ComplexityFitnessFunction uses the the proposed fitness function to determine
the simpleness of the rules.
• CONDITION THRESHOLD is set to 0.5 for rules whose condition fitness exceeds this value
to include the window and complexity fitness into the total fitness calculation.
• WINDOW FACTOR is the weight of the window fitness compared to the condition fitness.
A value of 1 means equal weight, but this value is set to 0.1.
• COMPLEXITY FACTOR is the weight of the complexity fitness and is set to 0.001 so it
does not disrupt the overall fitness of the whole population but is decisive between
rules of otherwise almost equal fitness.
The amount of changeable values and options shows how much there is to consider when
designing the implementation and tweaking the algorithm to fit a specific or the general case.
The presented default values and options have been found reasonable during the evaluation in
different scenarios. That is not to say, that these are the best options, especially considering
a specific case. Chapter 7 on page 100 goes into more detail of the evaluation to give more
insight of the here called reasonable choice of parameters.
4
https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoUnit.html
August 28, 2016
97
6 Implementation
6.5 Summary
The concept of CepGP has been implemented in all of its aspects:
• Within the preparation phase, the program reads and parses the events from the input
file and extracts all the meta-information.
• The population initialization is implemented with the ramped half-and-half initialization
while ensuring valid individuals. It provides options to switch ACTs on and off and
options to tweak the quality of the first population by varying maximum heights of the
ECT and ACT, and the population size.
• The tournament selection algorithm was implemented with the option to adapt the
tournament size. The program also enables other selection implementations via the
Selection-interface. Elitism is implemented as well and the program allows to manipulate the amount of elites in proportion to the population size.
• The subtree crossover was implemented and the program is flexible enough to enable
other crossover implementations as long as they adhere to the Crossover-interface.
It encapsulates the type-safe realizations of the crossover-operations of the different
components ECT, ACT and the window.
• The point mutation was implemented as described in this thesis. A different implementation for mutation can be used which has to fulfill the requirements of the
Mutation-interface and may encapsulate the execution of different mutation algorithms as proposed for future work.
• The repairing algorithm for the ACT after evolutionary operations has been implemented as described in this thesis.
• The proposed fitness functions are implemented as described while providing an interface for each objective to enable other fitness functions to be implemented in future
works.
• It is easy to use because the program only needs three minimal inputs: the input file
name, the name of the special complex event and a boolean value indicating whether
attributes shall be considered during the search. But it also can be used to define a
number of more advanced settings to individualize the search according to the problem
by the user.
All of this is used to evaluate the performance and quality of the concept and the implementation. The main goal of providing the foundation to have a better insight into the details
of the concept and enable a systematic evaluation has been successfully achieved.
Although the implementation of the concept was successful, the evaluation is limited due to
the incompleteness of the self-implemented rule engine. The missing features are:
98
Norman Offel
6.5 Summary
• Event comparison operators with a negation (the ¬ and the excluding sequence
operator): The Genetic Programming algorithm can handle these operators, but problems in the implementation of the evaluation of rules with these operators prevented
their usage in the evaluation of the algorithm.
• Arithmetical Operations: Combining attribute values via arithmetical operations
like addition, subtraction, multiplication or division are currently also not part of the
CepGP algorithm and may be subject to future research. They may be added as new
leaf nodes in the ACT. But thorough analysis has to show how type-safe mechanism
have to be implemented into the evolutionary operators as well.
• Additional Comparison Operators (≤, ≥, 6=): Their underlying operators (<, >,
=) are implemented and can lead to the logical equivalent forms already. Combining <
with an ∨-operator and an = leads to the equivalent of ≤ for example. The ≥ can be
done analogously. The 6= is a combination of the = and an ¬-operator. However, the
attributes have to be the same and in the same order within the underlying comparison
operators to truly form the equivalent form. Since this is a rather unlikely event during
the Genetic Programming search, it is a reasonable task for future improvements of
the rule engine. The CepGP algorithm already supports these comparison operators.
• Aggregation Functions (sum, avg, min, and max): The integration of the aggregation functions into the rule engine and the CepGP algorithm can be a challenging
task. One option might be to add the aggregation function as an additional leaf node
to the ECT and derive it from the Event class. Every time the encountered event is
of the correct type, recalculate the value of the function and save it as an attribute
which can be accessed via the ACT. But this and other ideas need further research
and analysis to completely understand its consequences.
Although the missing features prevent a fully scaled analysis of the CepGP algorithm, the
successfully implemented features still provide a good foundation to build the evaluation on
and make meaningful statements about the CepGP algorithm based on the implementation.
August 28, 2016
99
7 Evaluation
The CepGP algorithm offers a way to search through an event stream for a rule which
leads to a specific event type in the stream by using Genetic Programming at its core. The
implementation just presented incorporates most of its features and enables an evaluation of
the proposed approach with several options that can be manipulated.
This chapter presents the findings in practical use of the proposed algorithm in combination
with the provided implementation. It starts by elaborating the test data and afterwards guides
through the evaluation process and the results while also explaining the consequences. The
chapter concludes with a summary of the overall findings.
7.1 Test Data
The evaluation of such optimization or search algorithms normally is based on a common
data set that allows comparison of different approaches in various aspects. The UCI provides
a repository of real data sets for various research fields including time series data sets which
is close to CEP data sets.[1]
However, this work uses artificially created data sets to evaluate the algorithm. The reasons
are mainly timely constraints of the thesis. A benefit of this approach is the flexibility to
create data with varying numbers of attributes, event types and overall tailor the data set in
a way to evaluate specific characteristics of the algorithm. It also eliminates pitfalls of real
data like noise or missing attributes and so on.
To protect the artificial data from accusations that they may have been created in a way
that would skew the result in a positive way for the algorithm, the DataCreator-class of
the util/data-package (see figure 6.2 on page 92) uses the SecureRandom-class1 which
provides a cryptographically strong random number generator for its data creation whenever
random input is needed.
The DataCreator creates data sets that comply with the input specification described in
section 6.2.1 on page 83. It can create up to 26 event types (number of alphabet letters)
and at least one attribute. That is because CEP data sets normally have events with at least
a sensor ID or similar attributes. The creation process creates the same attributes for each
event with the same types but uses random integer numbers between [-100, 100] for each
1
https://docs.oracle.com/javase/8/docs/api/java/security/SecureRandom.html
100
7.2 Testing
attribute. After the amount of events have been created, the creation algorithm applies a
user defined rule on it and inserts the special events into the data before writing the result
to a file.
For the evaluation, there are three data sets created:
small uses 500 events with three event types with the following amount of attributes: {A=1,
B=1, C=1}. The applied rule is:
C
((A as A0 ∨ (C as C0 ∧ B as B0))
∧ (B0.ID = C0.ID))[win:time:180Seconds] =⇒ HIT
and it fired 42 times.
medium uses 1000 events with five event types with the following amount of attributes:
{A=2, B=2, C=2, D=2, E=2}. The applied rule is:
C
((A as A0 → E as E0) ∧ (A0.ID = E0.ID))[win:time:600Seconds]
=⇒ HIT
and it fired 10 times.
large uses 2500 events with eight event types with the following amount of attributes:
{A=2, B=2, C=3, D=4, E=2, F=4, G=5, H=5}. The applied rule is:
C
((C as C0 ∧ (B as B0 → (A as A0 ∨ D as D0)))
∧ (C0.ID = B0.ID))[win:length:15] =⇒ HIT
and it fired 158 times.
During the Testing phase, the algorithm is supposed to find the applied rule from the data
creation.
7.2 Testing
The tests evaluate some properties of the algorithm to validate assumptions and identify
drawbacks and strengths. First, the overall convergence to optimal solutions is tested.
Afterwards follows the determination of the default parameters for the algorithm. The
section proceeds with a comparison of CepGP and a random walk through the problem
space approach and it concludes with a consideration of noise in the data and a discussion
of the findings.
August 28, 2016
101
7 Evaluation
7.2.1 Convergence
To see if the algorithm works, it should converge from the first rather poor results to better
and better solutions from generation to generation. Figure 7.1 to 7.3 show one run each of
the program implementing CepGP on the data sets small, medium, and large respectively.
1
0.9
0.8
Fitness
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
Generations
best
average
Figure 7.1: CepGP Converging in Small Data Set
Figure 7.1 illustrates that the algorithm was able to slowly but steadily converge to better
and better solutions. Although, it was not able to find really good solutions. There can
be different reasons for this. One is that the algorithm is not suited for this problem, the
problem space is too big, or there are only very few really good solutions at all within the
problem space while the rest is almost equally mediocre. The following tests suggest that the
last of the mentioned reasons might be true in this case. It is hard for every search algorithm
to converge to the best results when the fitness of all individuals in the problem space is
almost equal while there are few peaks with neighboring solution that do not offer better than
average fitness as well. It ends up to be the search for a needle in the haystack. Nevertheless,
as described later, the algorithm still is useful in these rather uncommon scenarios.
The average fitness of the population slowly but steadily increased from generation to generation and stagnated at fitness value of about 0.45 from generation 23 onwards. The difference
in best to average fitness indicates that even after 25 and more generations, there is still
moderate diversity in the population which can lead to even better solutions in the long run.
Figure 7.2 on the facing page shows an already good starting point for the algorithm on
which it steadily improved during the evolutions to come on the medium data set. The
average fitness expectedly started with a fitness value of about 0 and rapidly increased. In
the long run it stabilized at the fitness value of about 0.7. Here, the algorithm was not only
able to find a good but also the optimal solution.
Figure 7.3 on the next page provides the convergence of the CepGP implementation on the
large data set. As expected of this complex data set and the more complicated rule involved,
102
Norman Offel
7.2 Testing
Fitness
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.1
0
5
10
15
20
25
30
Generations
best
average
Figure 7.2: CepGP converging in medium data set
1
0.9
0.8
Fitness
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
Generations
best
average
Figure 7.3: CepGP converging in large data set
August 28, 2016
103
7 Evaluation
although the algorithm had a good starting point, it could only manage to reach a good
but not optimal solution. The average fitness of the population, on the other hand, could
catch up to the best individuals midway through the search process. At the end of the run,
the best and average fitness was very close, what indicates that the population’s diversity is
rather low and a lot of very similar individuals reside in it. If no measures are taken at that
point, further optimization is very unlikely to acquire better results later on.
The overall findings on convergence of the algorithm show that on all three data sets the
implementation manages to improve from the early to the late generations through. This
indicates that the algorithm itself works. The section continues by evaluating different
parameter settings to find the most suitable default parameters.
7.2.2 Parameter influence
The parameters for population size, generations, crossover rate, and mutation rate are different from every Genetic Programming algorithm and strongly depend on the problem domain.
In their field guide ([39] p. 26f. and p. 30f.), Poli et al. found the parameter guidelines in
table 7.1 to be widely used and a good starting point:
Population size
Generations
Crossover rate
Mutation rate
several thousands and at least 500
10 to 50
0.9
0.01
Table 7.1: Parameter Suggestions by Poli et al. [39]
The limiter for the population size is the computation time for the fitness evaluation. Whatever amount can be evaluated reasonably fast is good, but it should be at least 500 individuals. Poli et al. also give advice for the amount of generations: “[. . . ] the most productive
search is usually performed in [the] early generations, and if a solution hasn’t been found
then, it’s unlikely to be found in a reasonable amount of time.”([39] p. 27)
This work begins by using these guidelines as a starting point to find good parameters for
the problem domain of CEP rule search. The result is shown in figure 7.4 on the next page
with 500 individuals, 30 generations (as the middle of the suggested amount of generations)
and the crossover rate and mutation rate from the table. Even though the overall result
is mediocre, the figure already shows that the algorithm itself is converging towards better
results and, thus, works in general.
Population Size
To improve the performance, the first parameter to adapt is the population size. Figure 7.5
on the facing page displays the effect with different values for population size while setting
104
Norman Offel
7.2 Testing
1
0.9
0.8
0.7
Fitness
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
Generations
best
average
Figure 7.4: CepGP Result on Small Data Set With Initial Parameters
the other values as they are suggested (generations: 30, crossover rate: 0.9, mutation rate:
0.01).
1
0.9
0.8
Fitness
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
Generations
500
1000
2000
3000
5000
10000
Figure 7.5: CepGP Result on Small Data Set With Varying Population Sizes
The more individuals within a population the higher the chances that better solutions are
created. Even though that is not surprising, the figure is not unambiguous in this matter.
That may be an indicator that in the small data set, there is only a tiny fraction of solutions
that provide really good fitnesses. But even then, the trend that is displayed still remains and
proves the assumption of the more individuals within a population the better the results. To
be able to process data sets with more information than in the small data set, the parameter
for population size is set to 5000 by default which also achieves the overall best results in
this example.
August 28, 2016
105
7 Evaluation
Generations
Fitness
The next parameter that may lead to better results by increasing is the number of generations
of the algorithm. There are different schools of thought in that matter. Some argue that
higher population sizes can reduce the generations whereas others say, that smaller population
sizes but much more generations yield better results.([39] p. 27) Figure 7.6 depicts the effects
of different amounts of generations with a population size of 5000, crossover rate of 0.9,
and a mutation rate of 0.01.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
35
40
Generations
small
medium
large
Figure 7.6: CepGP Result on Small Data Set With Varying Amounts of Generations
The figure illustrates that more than 30 generations seem to get stuck at a fitness level
without significant improvements. Therefore, this works suggests 30 generations to be the
best choice for the general case. The more generations the more crossover and the longer
the rules which in turn causes longer evaluation time and less intuitive rules. This is also
why only up to 40 generations have been evaluated. The rules in later generations turned
out to be too long and took too much time (several hours for the large data set) to be
processed. Thus, it is advisable to stop the evolutionary process before the evaluation time
gets unreasonably long. This can be remedied on many levels. The stopping criterion can
have multiple conditions such as the lack of significant improvements over some generations
or a maximum amount of processing time. To stop the bloat and overly long rules, the
algorithm could also use a hard limit for the tree heights and sizes. The limits for ECT and
ACT should be chosen carefully because they limit the search space to find rules, too. To
find more suitable stopping criteria is a remaining task for future work.
Crossover Rate
The crossover rate embodies the amount of individuals for the next generation that resulted
from crossover of two parents. The other individuals are probabilistically chosen from the
current generation and survive as they are into the next generation. This already allows good
individuals to be present in future generations with a higher chance for the absolute best. In
106
Norman Offel
7.2 Testing
that sense, the crossover rate controls two properties of the algorithm at once. CepGP uses
Elitism to reduce the risk of the best individuals to get lost in the process and to enable a
higher crossover rate without worrying about losing the best. A given percentage is chosen
to definitely survive the crossover stage while the rest is filled up with crossover offsprings.
The crossover rate can now be set to a very high value when the elitism rate is chosen
to cover enough of the best individuals to keep them alive. The elitism rate is set to 0.1,
meaning that the best 10% of the current generation always are available in the mutation
phase. During the mutation every individual can be afflicted again. Figure 7.7 illustrates
the impact of varying crossover rates with population size of 5000, 30 generations and a
mutation rate of 0.01.
1
0.9
0.8
Fitness
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
Generations
0.5
0.6
0.7
0.8
0.9
1
Figure 7.7: CepGP Result on Small Data Set With Different Crossover Rates
All of the tested crossover rates did not have a big impact on the overall results. However,
it seems that 0.8 is the best choice for the crossover rate which also is close to the proposed
0.9 from the beginning of this section.
Mutation Rate
The mutation rate determines the ratio of the result population of elitism and crossover that
undergoes random but minor changes in their structure (ECT, ACT or window). It is the
only way new information can be added to the population after the initialization. Crossover
combines the information of fit individuals into new structures, but the information within the
individuals, the information the crossover operations can work on, is fixed or even decreases
if not for the mutation operation. Hence, it is an important contributor to the search
process. However, it alters potentially valuable information of individuals by replacing them
with information that may be almost useless, too. This may lead to a disruption of the
search process and its convergence to the optimal solutions if it happens too often. The
August 28, 2016
107
7 Evaluation
mutation rate very much depends on the amount of overall information in the data set. If
there are a lot of event types and/or attributes which cannot be reasonably covered by the
initial population and a moderate mutation rate then it should be slightly increased. If it still
does not produce satisfying results, then the problem space might be too big to handle for
this algorithm as it is. Figure 7.8 shows the different convergences with varying mutation
rates and the parameters already discussed: 5000 individuals per population, 30 generations,
crossover rate of 0.8.
1
0.9
0.8
Fitness
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
Generations
0.01
0.03
0.05
0.08
0.1
0.2
Figure 7.8: CepGP Result on Small Data Set With Different Mutation Rates
Even though figure 7.8 is not clear about it, the best results are achieved with a mutation
rate under 0.1. Mutation rates greater than 0.1 tend to occasionally decrease in their best
fitnesses during the search process because the probability of altering the best individuals of
the previous generation is higher. The trade-off lies between more new information in each
generation while at the same time disrupting the search process more or less depending on
the amount of overall changes. The best quality could be acquired with a mutation rate of
0.05.
7.2.3 CepGP vs. Random Walk
The comparison of CepGP against a random walk through the problem space assures that the
algorithm indeed uses a more sophisticated approach to search through the problem space
of possible rules than a random guess-and-try algorithm.
The random algorithm works by randomly generating the same amount of rules CepGP would
generate throughout the whole process. The parameters during the process are:
• 30 Generations × 5000 Individuals = 150, 000 rules
108
Norman Offel
7.2 Testing
• maximal event and attribute condition tree height: 4
• attribute condition tree rate: 0.8
• Half-and-Half initialization for generating the rules
Figure 7.9 looks at the best individuals for the three sample data sets: small, medium and
large. The most noticeable aspect is the superiority of CepGP over the random guess-and-try
algorithm. In every sample data set, CepGP manages to achieve better results.
1
0.9
Fitness
0.8
0.7
0.6
0.5
0.4
small medium large
GepGP
Random
Figure 7.9: CepGP vs. Random Walk Comparing the Bests
The best found rules are displayed in table 7.2 on page 111. The first column contains the
data set and the rule which was used to insert the complex events for which the algorithms
should find the most appropriate rule. The second and third column contain the best rules
found via CepGP or the random approach respectively. The first thing to notice is that the
rules found by CepGP are always not only fitter in general but also in every objective. The
rules are shorter and more concise, contain the smaller window, and are more accurate.
Better results can in general be achieved by more generations which lead to longer runs
because the rules get more specialized and longer. A faster convergence to the optimal
solution can also be achieved by altering the parameters for crossover and mutation and by
adding some knowledge to begin at a better starting point via adjustments of the attribute
condition rate for example.
Apart from the difference in quality, CepGP also processed the same amount of individuals
much faster. This is grounded in the way CepGP works. From generation to generation, the
population gets fitter on average which also leads to rules that are concise and closer to the
optimal than the average random rules.
August 28, 2016
109
Small
((A as A0 ∨ (C as C0 ∧ B
as B0)) ∧ (B0.ID = C0.ID))
[win:time:180Seconds] =⇒ HIT
Medium
((A as A0 → E as E0) ∧ (A0.ID =
E0.ID)) [win:time:600Seconds]
=⇒
HIT
Large
((C as C0 ∧ (B as B0 → (A as A0
∨ D as D0))) ∧ (C0.ID = B0.ID))
[win:length:15] =⇒ HIT
CepGP
0.59515 (condition: 0.58533 window:
0.69401 complexity: 0.52222) ((B
as B0 ∧ C as C0) ∧ ((C0.ID < 8.0) ∧
((B0.ID < 6.25) ∧ (B0.ID > 2.75))))
[win:time:93Seconds] =⇒ HIT
0.98569 (condition: 1.00000 window:
0.84194 complexity: 1.05556) ((A as
A0 → E as E0) ∧ (A0.ID = E0.ID))
[win:time:590Seconds] =⇒ HIT
0.74394 (condition: 0.75400 window:
0.64563 complexity: 0.51515) (((B →
(A ∨ D)) ∧ C) ∧ ((C ∧ (((((C ∧ ((B
→ ((D ∨ (A ∨ A)) ∧ C)) ∨ C)) ∧ C) ∨
A) → C) ∨ D)) ∧ C)) [win:length:16]
=⇒ HIT
Random
0.52464 (condition: 0.51781 window:
0.59496 complexity: 0.32492) ((((A
as A0 ∨ A as A1) → B as B0) ∧ (((C
as C0 ∧ A as A2) ∨ (B as B1 → A
as A3)) ∨ ((A as A4 → A as A5)
∧ (B as B2 ∨ B as B3)))) ∧ ((((¬
(A3.ID < 8.0)) ∧ (A1.ID > 2.75)) ∧
(C0.ID = B0.ID)) ∧ (B2.ID < 8.0)))
[win:time:184Seconds] =⇒ HIT
0.83857 (condition: 0.84600 window:
0.76770 complexity: 0.49048) (E as
E0 ∧ ((E0.ID < 3.0) ∨ ((((E0.e1 >
50.25) ∧ (E0.ID > 1.0)) ∨ ((E0.ID >
1.0) ∧ (E0.e1 > E0.ID))) ∨ ((¬ (E0.ID
= E0.ID)) ∨ (E0.e1 > -49.25)))))
[win:time:1183Seconds] =⇒ HIT
0.65795 (condition: 0.67200 window:
0.51928 complexity: 0.47312) ((((F
→ E) ∨ (E → G)) ∨ ((F → F) ∨ (E
∧ H))) → (((A ∧ B) ∨ (D → C)) →
((C ∧ C) ∧ (A ∨ D)))) [win:length:43]
=⇒ HIT
Norman Offel
Table 7.2: Comparison of the Original Hidden Rule and the results of CepGP and Random
7 Evaluation
110
Data Set (Hidden Rule)
7.3 Result Discussion
7.2.4 Noise influence
All of these tests have been conducted under ideal circumstances. The events have been
artificially created with no missing values or measuring errors and their order is always correct.
In real world environments, however, there are multiple sources of error that introduce slight
shifts in the order of events, missing or erroneously read values, transmission errors and
so on. These errors are also called noise in the data and, for an algorithm to work, they
need to be treated before the actual algorithm can use it. Due to timely constraints, this
thesis could not elaborate on remediations for CepGP. The current state of the algorithm
and the implementation takes no concern of these errors and therefore is very prone to them.
Especially inserting the special events manually into the recorded stream can have a huge
impact on the results. Future works will have to address this matter.
7.3 Result Discussion
CepGP proved to be a working approach on three different artificially generated data sets.
Under any circumstances, it converges towards the optimal result and always achieved better
outcomes than randomly guessing the same amount of individuals would. Table 7.3 summarizes the default values for the algorithm that have been found to work best for the small
data set. This particular data set seems to be a challenging task because good results are
hard to find. Even in this scenario, CepGP acquired better rules.
Population Size
Generations
Crossover Rate
Mutation Rate
Elitism Rate
5000
30
0.8
0.05
0.1
Table 7.3: Final Default Parameters for CepGP
The runs depicted in the figures throughout this section have been chosen as representatives
of the majority of outcomes tested during this thesis. Even though there have been worse
results, they were always still better than the randomly acquired rules.
This evaluation provides a basis for more research into this approach and future improvements. It demonstrates the overall suitability of the proposed algorithm and suggests parameter values for its application. Future works also need to consider to run the algorithm
against real world problems to see if the algorithm can live up to the results achieved in this
laboratory environment.
August 28, 2016
111
8 Conclusion
This thesis proposes a new algorithm to derive a Complex Event Processing (CEP) rule for a
given event type within a recorded historical event stream. Such rule extraction algorithms
have been proposed in different contexts but only few are addressing the field of Complex
Event Processing. Even among them, the approach of using Evolutionary Computation has
yet to be elaborated. Complex Event Processing is based on rules to represent domain
knowledge and to enhance the meaning of the information processed to yield higher level
abstractions. An Evolutionary Computation algorithm needs to encompass the ability to convert the process of crossover, mutation, and selection to a rule representation that also maps
to the real world CEP rule domain. Genetic Programming is an Evolutionary Computation
algorithm that is specifically designed to work on trees which are also often used to represent
rules. This thesis proposed a sophisticated algorithm that considers the specifications of CEP
rules while giving the underlying search algorithm using Genetic Programming the freedom
to use its evolutionary operations on the tree representation to achieve the goal of the most
appropriate rule for the occurrences of the specified event type.
This work first described the backgrounds for Evolutionary Computation, Complex Event
Processing, and Rule Learning as the three main research fields that are bound together in
this thesis. Because the related works suggest a promising result from the application of
Genetic Programming to CEP rule derivation, the general approach elaborates the scenario
of this work, narrowed the focus and established the basis for the CepGP algorithm. This
algorithm was introduced by setting the general process and analyzing the parts of CEP rules
and how they can be represented within a tree. Each part of a general Genetic Programming
algorithm was presented in a way that most suits the specification of CEP rules while adhering
to strong-typing and structure constraints to allow only valid rules during the evolutionary
process. After the concept for the algorithm was explained, the thesis continued by presenting
an implementation including a rule engine which showed the overall applicability of CepGP
and that the algorithm is not bound to specific frameworks or other software and can work as
it is implemented. Even though the proposed algorithm indeed finds good results in various
data sets, there is still a lot room for improvement and enhancements.
8.1 Contributions
CepGP is the first Evolutionary Computation algorithm used to derive a CEP rule within a
given data set for a given event type. Genetic Programming algorithms need flexibility within
112
8.2 Future Work
the solutions they generate and within their operations to make use of their advanced way
of searching through a problem space. Complex Event Processing rules on the other hand
impose rather strong constraints for the rule structure and components. This thesis accomplishes the challenging task of converting CEP rules into the world of Genetic Programming
by using strong-typing and structure constraints and incorporating the idea of repairing invalid rules at every stage of the evolutionary process. CepGP manages to include the most
important parts of the CEP rule specification into its proposed search algorithm. This work
could also show, that the algorithm is able to produce good results within artificially created
data and hints to promising parameters for the practical use.
8.2 Future Work
The presented algorithm and implementation for CepGP is only a beginning. It builds the
foundation for future research in applying Genetic Programming specifically and Evolutionary Computation in general within the field of automatic rule discovery in Complex Event
Processing. Although it could show that this may lead to promising results in the future,
there are still some aspects left unconsidered or in need of improvement.
The algorithm includes most parts of the specification for CEP rules. However, so far some
important functions are missing. Arithmetical operations and most importantly aggregation
functions are the next steps to enhance the capabilities of the CepGP algorithm. The step
after would need to consider other value types than numbers. This imposes new challenging
problems within the type-system which may need to be loosened and find other ways to
ensure valid rules in the outcome.
So far, the action part of the CEP rule is just a mere placeholder and potentially could
be useful for future work, too. Multi-stage search processes or Learning Classifier Systems
(LCS) may profit from this function.
The implementation of CepGP currently does not embody all the aspects which the algorithm
is capable of. Although, this is crucial to evaluate potential enhancements and updates on
the algorithm itself. Of course, the rule engine could be enhanced and developed to grow in
parallel to the algorithm. Both, the algorithm and the implementation, however, can work
separately. This leaves room for other possibilities to validate some yet untested properties
of the of CepGP algorithm by using available fully fledged CEP engines for the condition
fitness evaluation. Exchanging the rule engine also leads to a new functionality of the
implementation to convert from the tree representation within the algorithm implementation
to the rule representation for the chosen CEP engine and from the evaluation result of the
CEP engine to the expected evaluation result of the algorithm.
In [26], Luckham also described problems regarding causality within the events. “Causality
is dynamic” (p. 241 [26]) and therefore relations of an event being the cause of another
depends on the circumstances and may be true in some but not necessarily in all cases. If a
rule was found by the algorithm, it may be less fit because it is true only most of the times or
August 28, 2016
113
8 Conclusion
sometimes, but not always because the occurrences of the event may have different causes
at different times. This is a fact that has to be kept in mind at all times while working on
the algorithm and interpreting the results it yields. It can only reliably find static causes in
that sense. With growing dynamics between the causes and effects, the outcome becomes
less and less dependable. Adding to this are real world scenarios in distributed CEP systems.
Timestamps do not always signal the actual order of events because of a lack of a global
clock and unsynchronized local clocks. ([26] p. 242f.)
But Luckham also provides a way to remedy the inability to know the cause and effect
relationship between events. To find real causes in cause dynamic or distributed systems,
one can use cause models. ([26] p. 242f.) A causal map adds a causal vector to each event
which describes the causality attribute of that event and references the actual events leading
to this specific event. This not only allows to reliably identify effect and causes between
events but also provides an even better understanding because this cause and effect relation
is processed for every event, not only for a single one, enabling a clear view over concurrency
and synchronization of processes in the system. ([26] p. 243)
Causal maps need to be recorded as well to be used within the the proposed algorithm in
future work. But if there is this information, CepGP could be improved by using it and
finding information of higher abstraction within the event streams and aiding the domain
expert in the pursuit of a perfectly adopted CEP system for the specific environment.
114
Norman Offel
Bibliography
[1] Center for machine learning and intelligent systems – machine learning repository. Website, August 2016. https://archive.ics.uci.edu/ml/datasets.html; Accessed:
2016-08-22.
[2] M. Atzmueller. Enterprise Big Data Engineering, Analytics, and Management. Advances
in Business Information Systems and Analytics. IGI Global, 2016.
[3] C. C. Bojarczuk, H. S. Lopes, and A. A. Freitas. Discovering comprehensible classification rules using genetic programming: a case study in a medical domain. In Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation-Volume
2, pages 953–958. Morgan Kaufmann Publishers Inc., 1999.
[4] R. Bruns and J. Dunkel. Complex Event Processing: Komplexe Analyse von massiven
Datenströmen mit CEP. Springer-Verlag, 2015.
[5] S.-H. Chen. Genetic algorithms and genetic programming in computational finance.
Springer Science & Business Media, 2012.
[6] I. De Falco, A. Della Cioppa, and E. Tarantino. Discovering interesting classification
rules with genetic programming. Applied Soft Computing, 1(4):257–269, 2002.
[7] M. Dempster and C. Jones. A real-time adaptive trading system using genetic programming. Quantitative Finance, 1(4):397–413, 2001.
[8] L. Ding, S. Chen, E. A. Rundensteiner, J. Tatemura, W.-P. Hsiung, and K. S. Candan.
Runtime semantic query optimization for event stream processing. In 2008 IEEE 24th
International Conference on Data Engineering, pages 676–685. IEEE, 2008.
[9] C. Donalek. Supervised and unsupervised learning. Website, April 2011. http://www.
astro.caltech.edu/~george/aybi199/Donalek_classif1.pdf; Accessed: 201608-10.
[10] P. G. Espejo, S. Ventura, and F. Herrera. A survey on the application of genetic
programming to classification. IEEE Transactions on Systems, Man, and Cybernetics,
Part C, 40(2):121–144, 2010.
[11] A. A. Freitas. A genetic programming framework for two data mining tasks: classification and generalized rule induction. Genetic programming, pages 96–101, 1997.
[12] A. A. Freitas. Data mining and knowledge discovery with evolutionary algorithms.
Springer Science & Business Media, 2002.
115
Bibliography
[13] A. Frömmgen, R. Rehner, M. Lehn, and A. Buchmann. Fossa: Learning eca rules for
adaptive distributed systems. In Autonomic Computing (ICAC), 2015 IEEE International
Conference on, pages 207–210. IEEE, 2015.
[14] F. Gao, E. Curry, M. I. Ali, S. Bhiri, and A. Mileo. Qos-aware complex event service
composition and optimization using genetic algorithms. In International Conference on
Service-Oriented Computing, pages 386–393. Springer, 2014.
[15] D. E. Goldberg. Dynamic system control using rule learning and genetic algorithms. In
IJCAI, volume 85, pages 588–592, 1985.
[16] D. E. Goldberg. Genetic algorithms in search, optimization, and machine learning.
Addison Wesley Longman, 30th edition, 2012.
[17] J. J. Grefenstette, C. L. Ramsey, and A. C. Schultz. Learning sequential decision rules
using simulation models and competition. Machine Learning, 5(4):355–381, 1990.
[18] J. H. Holland. Escaping brittleness. In Proceedings Second International Workshop on
Machine Learning, pages 92–95. Citeseer, 1983.
[19] K. J. Holyoak and R. G. Morrison. The Cambridge handbook of thinking and reasoning.
Cambridge University Press, 2005.
[20] R. Huang. Evolving prototype rules and genetic algorithm in a combustion control.
In Industrial Automation and Control, 1995 (I A amp; C’95), IEEE/IAS International
Conference on (Cat. No.95TH8005), pages 243–248, Jan 1995.
[21] C. Z. Janikow. A knowledge-intensive genetic algorithm for supervised learning. Machine
Learning, 13(2):189–228, 1993.
[22] C. M. Johnson and S. Feyock. A genetics-based technique for the automated acquisition
of expert system rule bases. In Developing and Managing Expert System Programs,
1991., Proceedings of the IEEE/ACM International Conference on, pages 78–82, Sep
1991.
[23] J. R. Koza. Genetic programming: on the programming of computers by means of
natural selection, volume 1. MIT press, 1992.
[24] H.-L. Liu, Q. Chen, and Z.-H. Li. Optimization techniques for rfid complex event
processing. Journal of computer science and technology, 24(4):723–733, 2009.
[25] D. Lohpetch and D. Corne. Discovering effective technical trading rules with genetic
programming: Towards robustly outperforming buy-and-hold. In Nature & Biologically
Inspired Computing, 2009. NaBIC 2009. World Congress on, pages 439–444. IEEE,
2009.
[26] D. Luckham. The power of events an introduction to complex event processing in
distributed enterprise systems. Addison-Wesley, Boston, Mass. [u.a.], 3rd print edition,
2005.
116
Norman Offel
Bibliography
[27] D. Luckham and W. R. Schulte. Event processing technical society - event processing
glossary version 2.0. Event Processing Technical Society, July 2011.
[28] D. Mallick, V. C. S. Lee, and Y. S. Ong. An empirical study of genetic programming
generated trading rules in computerized stock trading service system. In 2008 International Conference on Service Systems and Service Management, pages 1–6, June
2008.
[29] A. Margara, G. Cugola, and G. Tamburrelli. Learning from the past: automated rule
generation for complex event processing. In Proceedings of the 8th ACM International
Conference on Distributed Event-Based Systems, pages 47–58. ACM, 2014.
[30] J. A. Marin, R. Radtke, D. Innis, D. R. Barr, and A. C. Schultz. Using a genetic
algorithm to develop rules to guide unmanned aerial vehicles. In Systems, Man, and
Cybernetics, 1999. IEEE SMC ’99 Conference Proceedings. 1999 IEEE International
Conference on, volume 1, pages 1055–1060 vol.1, 1999.
[31] N. Mehdiyev, J. Krumeich, D. Enke, D. Werth, and P. Loos. Determination of rule
patterns in complex event processing using machine learning techniques. Procedia
Computer Science, 61:395–401, 2015.
[32] D. J. Montana. Strongly typed genetic programming. Evolutionary computation,
3(2):199–230, 1995.
[33] R. Mousheimish, Y. Taher, and K. Zeitouni. Automatic learning of predictive rules
for complex event processing: Doctoral symposium. In Proceedings of the 10th ACM
International Conference on Distributed and Event-based Systems, DEBS ’16, pages
414–417, New York, NY, USA, 2016. ACM.
[34] R. Mousheimish, Y. Taher, and K. Zeitouni. Complex event processing for the nonexpert with autocep: Demo. In Proceedings of the 10th ACM International Conference
on Distributed and Event-based Systems, DEBS ’16, pages 340–343, New York, NY,
USA, 2016. ACM.
[35] C. Mutschler and M. Philippsen. Learning event detection rules with noise hidden markov
models. In Adaptive Hardware and Systems (AHS), 2012 NASA/ESA Conference on,
pages 159–166. IEEE, 2012.
[36] C. Neely, P. Weller, and R. Dittmar. Is technical analysis in the foreign exchange market
profitable? a genetic programming approach. Journal of financial and Quantitative
Analysis, 32(04):405–426, 1997.
[37] C. J. Neely and P. A. Weller. Technical trading rules in the european monetary system.
Journal of International Money and Finance, 18(3):429 – 458, 1999.
[38] M. Oussaidene, B. Chopard, O. V. Pictet, and M. Tomassini. Parallel genetic programming and its application to trading model induction. Parallel Computing, 23(8):1183–
1198, 1997.
August 28, 2016
117
Bibliography
[39] R. Poli, W. B. Langdon, N. F. McPhee, and J. R. Koza. A field guide to genetic
programming, march 2008. Published via http://lulu.com and freely available at
http://www.gp-field-guide.org.uk.
[40] J.-Y. Potvin, P. Soriano, and M. Vallée. Generating trading rules on the stock markets
with genetic programming. Computers & Operations Research, 31(7):1033–1047, 2004.
[41] D. M. Powers. Evaluation: from precision, recall and f-measure to roc, informedness,
markedness and correlation. Journal of Machine Learning Technologies, 2011.
[42] E. Rabinovich, O. Etzion, and A. Gal. Pattern rewriting framework for event processing
optimization. In Proceedings of the 5th ACM international conference on Distributed
event-based system, pages 101–112. ACM, 2011.
[43] I. Rechenberg. Evolution strategies. Website, 2016. http://www.bionik.tu-berlin.
de/institut/xs2evost.html; Accessed: 2016-07-30.
[44] S. Sakprasat and M. C. Sinclair. Classification rule mining for automatic credit approval
using genetic programming. In 2007 IEEE Congress on Evolutionary Computation, pages
548–555, Sept 2007.
[45] S. Sen, N. Stojanovic, and L. Stojanovic. An approach for iterative event pattern recommendation. In Proceedings of the Fourth ACM International Conference on Distributed
Event-Based Systems, DEBS ’10, pages 196–205, New York, NY, USA, 2010. ACM.
[46] O. Sigaud and S. W. Wilson. Learning classifier systems: a survey. Soft Computing,
11(11):1065–1078, 2007.
[47] W. M. Spears and K. A. De Jong. Using genetic algorithms for supervised concept
learning. In Tools for Artificial Intelligence, 1990., Proceedings of the 2nd International
IEEE Conference on, pages 335–341. IEEE, 1990.
[48] D. R. B. Stockwell. Genetic Algorithms II, pages 123–144. Springer US, Boston, MA,
1999.
[49] J. C. Tay and N. B. Ho. Evolving dispatching rules using genetic programming for
solving multi-objective flexible job-shop problems. Computers & Industrial Engineering,
54(3):453–473, 2008.
[50] Y. Turchin, A. Gal, and S. Wasserkrug. Tuning complex event processing rules using
the prediction-correction paradigm. In Proceedings of the Third ACM International
Conference on Distributed Event-Based Systems, DEBS ’09, pages 10:1–10:12, New
York, NY, USA, 2009. ACM.
[51] R. J. Urbanowicz and J. H. Moore. Learning classifier systems: a complete introduction,
review, and roadmap. Journal of Artificial Evolution and Applications, 2009:1, 2009.
[52] J. Wang. Trading and hedging in s&p 500 spot and futures markets using genetic
programming. Journal of Futures Markets, 20(10):911–942, 2000.
[53] K. Weicker. Evolutionäre Algorithmen. Springer Vieweg, 2015.
118
Norman Offel
Bibliography
[54] K. Weicker.
Evolutionäre Algorithmen.
Website, 2016.
http://www.imn.
htwk-leipzig.de/~weicker/publications/sctreff_ea.pdf; Accessed: 2016-0419.
[55] G. M. Weiss and H. Hirsh. Learning to predict rare events in event sequences. In KDD,
pages 359–363, 1998.
[56] T. Yu, S.-H. Chen, and T.-W. Kuo. Discovering financial technical trading rules using
genetic programming with lambda abstraction. In Genetic programming theory and
practice II, pages 11–30. Springer, 2005.
[57] H. Zhang, Y. Diao, and N. Immerman. On complexity and optimization of expensive queries in complex event processing. In Proceedings of the 2014 ACM SIGMOD
international conference on Management of data, pages 217–228. ACM, 2014.
August 28, 2016
119