Efficient Event Pattern Matching with Match Windows

Efficient Event Pattern Matching with Match Windows
Bruno Cadonna
Johann Gamper
Michael H. Böhlen
Free University of
Bozen-Bolzano, Italy
Free University of
Bozen-Bolzano, Italy
University of Zurich,
Switzerland
[email protected]
[email protected]
[email protected]
Chemo
ABSTRACT
In event pattern matching a sequence of input events is matched
against a complex query pattern that specifies constraints on extent,
order, values, and quantification of matching events.
In this paper we propose a general pattern matching strategy
that consists of a pre-processing step and a pattern matching step.
Instead of eagerly matching incoming events, the pre-processing
step buffers events in a match window to apply different pruning
techniques (filtering, partitioning, and testing for necessary match
conditions). In the second step, an event pattern matching algorithm, A, is called only for match windows that satisfy the necessary match conditions. This two-phase strategy with a lazy call of
the matching algorithm significantly reduces the number of events
that need to be processed by A as well as the number of calls to
A. This is important since pattern matching algorithms tend to be
expensive in terms of runtime and memory complexity, whereas
the pre-processing can be done very efficiently. We conduct extensive experiments using real-world data with pattern matching algorithms for, respectively, automata and join trees. The experimental
results confirm the effectiveness of our strategy for both types of
pattern matching algorithms.
e1
e2
e3
e4
e5
e6
e7
e8
e9
e10
e11
e12
e13
e14
L
C
B
P
B
P
D
C
P
P
P
B
B
B
B
V
1672.5
7100
111.5
10100
88
84
1320
98
116.5
88
3400
4000
4900
3000
U
mg
1/µl
mg
1/µl
mg
mgl
mg
mg
mg
mg
1/µl
1/µl
1/µl
1/µl
T
3 Jul
4 Jul
5 Jul
6 Jul
8 Jul
9 Jul
10 Jul
11 Jul
12 Jul
15 Jul
17 Jul
18 Jul
19 Jul
22 Jul
Figure 1: Events of Chemotherapy Treatments.
health services [5, 11], and workflow monitoring [10], an increasing amount of research is being conducted in this field.
In this paper we propose a two-phase strategy that leverages existing pattern matching algorithms. To illustrate our framework, we
use sequenced event set (SES) pattern matching [5], where a query
specifies a sequence of event sets and a maximal time interval during which the events must occur. The order of input events that
match a single event set is irrelevant, whereas the order of input
events that match different event sets must correspond to the order
of the sets in the pattern.
Categories and Subject Descriptors
H.2.4 [Database Management Systems]: Systems—Query Processing
Keywords
Example 1. As a running example, we consider the analysis of
chemotherapy data. A chemotherapy is a treatment for cancer patients and consists of a sequence of events, such as the administration of medications and laboratory examinations. Figure 1 shows
a sample event relation, Chemo. The attributes represent patient ID
(PID), event type (L), value (V ) with measurement unit (U ), and
occurrence time (T ) of an event, respectively. For instance, event
e1 represents the administration of 1672.5 mg of Ciclofosfamide to
patient 1 on July 3. To investigate the effect of the medications on
the blood count the following query is issued:
Event pattern matching, Window, Optimization
1.
PID
1
1
1
2
2
2
2
2
1
2
1
2
2
1
INTRODUCTION
In event pattern matching a sequence of input events is matched
against a complex query pattern that specifies constraints on extent,
order, values, and quantification of matching events. Due to its
wide applicability in different application domains, such as financial services [3, 8, 12, 15], click stream analysis [12, 15], RFIDbased tracking and monitoring [3, 9], RSS feed monitoring [8],
Q1: For each patient, retrieve the events that match
(in any order) one administration of Ciclofosfamide
(C) and one or more administrations of Prednisone (P)
with monotonically increasing values, followed by a
single blood count measurement (B); all events must
occur within fifteen days.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
KDD’12, August 12–16, 2012, Beijing, China.
Copyright 2012 ACM 978-1-4503-1462-6/12/08... $15.00.
We propose a novel pattern matching strategy that consists of
a pre-processing phase and a pattern matching phase. In the preprocessing phase, incoming events are buffered in the match win-
471
dow to apply different pruning techniques, such as filtering, partitioning, and testing for necessary match conditions. The aim of
the comparably cheap pre-processing is to reduce the number of
input events that need to be processed in the much more expensive
pattern matching phase.
First, to eliminate irrelevant events that will never participate in
a match we propose a filtering mechanism, which can be done very
efficiently. Filtering out irrelevant events is particularly effective
for the skip-till-next-match and skip-till-any-match event selection
strategies that both skip all irrelevant events [3].
Second, a critical aspect of current solutions is that each input
event is immediately processed by the pattern matching algorithm.
For example, in automaton-based algorithms, each event creates
a new automaton instance since each event can potentially start a
match. This leads to a large number of automaton instances with
a significant amount of state information to be maintained. A first
step to improve this is to start an automaton only when it leaves the
start state [3]. We achieve a substantial reduction of the number
of automata by buffering input events in the match window, W ,
and lazily instantiating an automaton (or equivalently call a pattern
matching algorithm A) only when the buffered events satisfy a set
of necessary match conditions and the window contains all events
that need to be considered for matches starting at the first event in
the window. We adopt a summary statistics based on event counting
to efficiently check the necessary match conditions.
Third, although filtering removes irrelevant events, the match
window might still contain events that cannot be part of any match
starting at the first event in W . For instance, in our example all
events in a match must be of the same patient. Thus, all events in
W with a patient ID different from the one of the first event do not
need to be forwarded to A. We achieve this by partitioning incoming events into different windows, based on equality constraints that
are derived from the query specification.
In the second phase, an event pattern matching algorithm, A,
is called for each match window that satisfies the necessary match
conditions. This two-phase strategy with a lazy call of the matching
algorithm significantly reduces the number of events that need to
be processed by A and the number of calls to A. While pattern
matching algorithms tend to be expensive in terms of runtime and
memory complexity, pre-processing can be done very efficiently.
We conduct extensive experiments on two real-world datasets,
using the automaton-based SES pattern matching algorithm and
ZStream which is based on join trees. The experiments show the
effectiveness of our strategy for both types of pattern matching algorithms.
The technical contributions can be summarized as follows:
related work. In Sec. 3 we introduce SES pattern matching. In
Sec. 4 we present our two-phase pattern matching strategy together
with the pruning techniques. An algorithm that implements this
strategy is described in Sec. 5. In Section 6 we report experimental
results. Section 7 concludes the paper and points to future work.
2. RELATED WORK
In the past, various approaches have been proposed for processing pattern matching queries, most prominently automaton-based
solutions [3, 5, 8, 10] and solutions that use join trees [12]. A common characteristic of these approaches is that incoming events are
immediately processed by the pattern matching algorithm. In this
paper we advocate a strategy that applies a pre-processing phase
followed by a pattern matching phase. Pre-processing significantly
reduces the number of events that need to be processed in the pattern matching phase. Our strategy leverages existing pattern matching algorithms and achieves significant performance improvements
since pre-processing is cheap when compared to the more expensive pattern matching algorithms.
Sequenced event set (SES) pattern matching [5] allows to specify in the query a sequence of event sets rather than a sequence of
single events. For query processing, an approach that uses nondeterministic automata has been proposed. An automaton is created for each input event that matches the beginning of the pattern,
which might lead to orders of magnitude more automaton instances
than matches, and each input event is passed to all active automaton instances. In this paper we propose an effective pre-processing
phase that significantly reduces the number of events that need to
be processed by the automaton instances as well as the number of
automaton instances.
SASE+ [3] translates a query to a nondeterministic finite state
automaton. Specific optimization techniques include partitioning
and merging of automaton instances during query execution. Similar to SES automata, SASE+ suffers from the many automaton
instances that are created for each incoming event that matches the
beginning of the pattern.
The publish/subscribe system Cayuga [8] uses a regular expression like algebra and translates a query into a nondeterministic finite state automaton. Optimization techniques focus on the simultaneous evaluation of multiple queries using indices. By indexing
automaton instances, an input event is only processed by those automaton instances that may change state. This is similar to our
filter mechanism, but many active automaton instances need to be
maintained in memory. To reduce the number of transitions that are
evaluated for each incoming event, transitions are indexed on their
condition. Cayuga instantiates an automaton for each input event
that matches the beginning of the query pattern, and therefore exhibits the same drawbacks as SASE+.
The C-CEP system [10] adopts an automaton-based solution that
checks for pattern query unsatisfiability according to pre-existing
constraints that are evaluated on the input stream. These constraints
originate from the business logic and allow to detect at runtime optimal points to terminate automaton instances that will never produce a match. Different from this work, our pruning techniques
need no extra constraints, but exploit the query pattern to filter out
irrelevant events and to delay the instantiation of automata until
some necessary conditions are satisfied.
The DejaVu system [9] implements a subset of the SQL extension [16] for pattern matching in MySQL. The query processor is
extended to use finite state automata for the evaluation of pattern
matching queries such that an automaton runs as integral part of
the query plan. Specific optimizations for pattern queries (apart
from the optimizations of MySQL) are not presented.
• We propose a general, two-phase pattern matching strategy
that can be combined with different pattern matching algorithms.
• We propose an efficient and effective pre-processing step that
buffers input events in the match window and applies various pruning techniques (filtering, partitioning, and necessary
match conditions). Pre-processing is comparably cheap and
significantly reduces the number of events that need to be
processed by a more expensive pattern matching algorithm
A as well as the number of calls of A.
• We report the results of an empirical evaluation with an
automaton- and join tree-based pattern matching algorithm.
For both algorithms, our strategy shows clear performance
improvements.
The rest of the paper is organized as follows. Section 2 discusses
472
Θp , have the form (v.A φ C), where v.A refers to an attribute of
a matching event, C is a constant, and φ ∈ {=, 6=, <, ≤, >, ≥, }
is a comparison operator. Properties can be further partitioned by
variable, Θp = Θpv1 ∪ · · · ∪ Θpvm . Each set Θpvi (also referred
to as properties of variable vi ) contains all constraints of the form
(vi .A φ C). Relationships, θ ∈ Θr , have the form (vi .Ak φ vj .Al )
or (prev(v.A) φ v.A). Finally, τ is the maximal time span within
which all matching events must occur.
ZStream [12] is a cost-based query processor using join trees for
matching sequential patterns that are enriched with sequence, conjunction, disjunction, negation, and Kleene closure operators. Operators are arranged in a query tree that processes events from the
leaves to the root. Hashing is used to evaluate equality predicates,
and a cost-based reordering algorithm is used to find an efficient
order of the operators. The algorithm finds an optimal order for
patterns that consist of a sequence of single events; this is not guaranteed for queries that consist of sets of more than two events since
the reordering does not consider different orders of binary conjunction operators.
Other research work on event pattern matching includes SQLTS [15], an extension of SQL to process complex sequential patterns in database systems. For efficient query processing, a solution based on the Knuth-Morris-Pratt string matching algorithm
has been adopted and optimized. The Event Analyzer [11] is a data
warehouse component that allows to analyze event sequences.
Systems for stream processing include Aurora [2, 6], Borealis [1], STREAM [4, 13], and TelegraphCQ [7]. Querying in these
systems is limited to a combination of selection, join, and aggregation; pattern queries are not supported. Based on these operations, optimization techniques focus on efficient resource management and load balancing.
3.
Example 2. Query Q1 is formulated as pattern P1 =
(h{c, p+ }, {b}i, Θ, 15 d). The first set, B1 = {c, p+ }, contains two variables with the quantifier + applied to p. The second set, B2 = {b}, contains one variable. The maximal time
span is fifteen days. The constraints, separated in properties and
relationships, are Θ = Θpc ∪ Θpp ∪ Θpb ∪ Θr with properties
Θpc = {c.L = ’C’}, Θpp = {p.L = ’P’}, and Θpb = {b.L = ’B’}
and relationships Θr = {c.PID = p.PID, p.PID = b.PID,
prev(p.V ) < p.V }. Variable c binds a single administration of
Ciclofosfamide (c.L = ’C’), p+ binds one or more administrations
of Prednisone (p.L = ’P’), and b binds a single blood count measurement (b.L = ’B’). The first two relationships force the matched
events to refer to the same patient. The third relationship requires
the amount of Prednisone to be increasing.
EVENT PATTERN MATCHING
This section gives an overview of sequenced event set pattern
matching (or simply pattern matching), which we use to illustrate
our framework. The notation is summarized in Table 1.
Symbol
E
e
A
T
~e
P
Bi
v
Θ
θ
τ
γ
Description
event relation
event
attribute of e
occurrence time of e
chron. ordered event sequence
pattern
set of variables
variable in P
set of constraints
constraint
maximal time span
substitution
To define the matching of a pattern P and an event relation E, we use a substitution γ = {v1 /~e1 , . . . , vm /~em }.
Each pair v/~e represents a binding of variable v to a sequence,
~e = he1 , . . . , en i, n > 0, of events in E. A substitution contains
exactly one binding for each variable in P , and an event can appear
in at most one of the bindings. For a constraint θ ∈ Θ, θγ denotes
the instantiation of θ by γ and is obtained from θ by simultaneously
replacing all variables vi by the corresponding event sequence ~ei .
The instantiation of a set of constraints Θ is Θγ = {θ1 γ, . . . , θl γ}.
The truth value of an instantiation is defined through an interpretation, I(Θγ):
Example
Chemo in Fig. 1
e1 = (1, ’C’, 1672.5, mg, 3 Jul)
PID with e1 .PID = 1
e1 .T = 3 Jul
he1 , e2 , e3 i
(h{c, p+ }, {b}i, Θ, 15 d)
{c, p+ }
c
{c.L = ’C’, c.PID = p.PID}
c.L = ’C’
15 d
{c/he1 i, p/he3 i, b/he11 i}
• I(Θγ) ≡ I({θ1 γ, . . . , θl γ}) ≡ I(θ1 γ) ∧ · · · ∧ I(θl γ),
• I(~e .A φ C) ≡ ∀x ∈ ~e (x.A φ C),
Table 1: Notation.
• I(~ei .Ai φ ~ej .Aj ) ≡ ∀x ∈ ~ei , y ∈ ~ej (x.Ai φ y.Aj ),
An event is represented as a tuple with schema E =
(A1 , . . . , Al , T ), where T is a temporal attribute that stores the occurrence time of an event. For T we assume a totally ordered time
domain. An event relation, E, is a set of events with a total order
given by attribute T (see relation Chemo in Fig. 1). A chronologically ordered sequence of events is represented as ~e = he1 , . . . , en i
with e1 and en referring to the first and last event in ~e , respectively.
• I(prev(~e .A) φ ~e .A) ≡ ∀xi−1 , xi ∈ ~e (xi−1 .A φ xi .A).
The interpretation of Θγ is the conjunction of the interpretation
of the individual constraints θi γ. If θ is a property, i.e., (v.A φ C),
the attribute A of all events bound to v are compared to constant C.
If θ is a relationship of the form (vi .Ai φvj .Aj ), all combinations of
attributes Ai and Aj of all events bound to vi and vj , respectively,
are compared. If θ is a relationship of the form (prev(v.A) φ v.A),
the attribute A of each pair of consecutive events that are bound to
v are compared.
Definition 1. (Pattern) A pattern, P , is a triple
P = (B, Θ, τ ),
where B = hB1 , . . . , Bk i, k ≥ 1, is a sequence of pairwise disjoint sets of variables of the form v or v + , Θ = {θ1 , . . . , θl } is a
set of constraints over variables in B1 , . . . , Bk , and τ is a duration.
Example 3. Let Θ be the set of constraints of pattern P1 =
(h{c, p+ }, {b}i, Θ, 15 d) and γ = {c/he7 i, p/he5 , e8 i, b/he12 i}
be a substitution. The instantiation of Θ (cf. Example 2) with γ is
A variable, v ∈ Bi , binds a sequence of a single event, he1 i.
A quantified variable, v + ∈ Bi , binds a sequence of one or more
events, he1 , . . . , en i, n ≥ 1. Θ is a set of constraints over variables
that must be satisfied by matching events. We distinguish between
properties and relationships, i.e., Θ = Θp ∪ Θr . Properties, θ ∈
Θγ = {he7 i.L = ’C’, he5 , e8 i.L = ’P’, he12 i.L = ’B’,
he7 i.PID = he5 , e8 i.PID, he5 , e8 i.PID = he12 i.PID,
prev(he5 , e8 i.V ) < he5 , e8 i.V }.
473
The interpretation of Θγ is
been proposed in the literature [3]. In the running example of
this paper we use earliest and maximal matches. A match, γ =
{v1 /~e1 , . . . , vm /~em }, is earliest iff each ~ei starts with the earliest
possible event after the first event in γ. A match γ is maximal iff
no ~ei can be extended with an event that occurs after the start of γ,
and γ remains still a match.
I(Θγ) ≡ I(he7 i.L = ’C’) ∧ I(he5 , e8 i.L = ’P’) ∧ · · · ∧
I(he5 , e8 i.PID = he12 i.PID) ∧
I(prev(he5 , e8 i.V ) < he5 , e8 i.V )
≡ e7 .L = ’C’ ∧ e5 .L = ’P’ ∧ e8 .L = ’P’ ∧ · · · ∧
e5 .PID = e12 .PID ∧ e8 .PID = e12 .PID ∧
e5 .V < e8 .V
≡ ’C’ = ’C’ ∧ ’P’ = ’P’ ∧ ’P’ = ’P’ ∧ · · · ∧
2 = 2 ∧ 2 = 2 ∧ 88 < 98
≡ true
Example 5. Match γ in Fig. 2 is earliest and maximal. All bound
event sequences start with the earliest possible events and they cannot be extended and still satisfy the conditions of a match. In contrast, match {c/he7 i, p/he5 , e8 i, b/he13 i} is not earliest, since b
binds e13 instead of the earlier event e12 . Match {c/he7 i, p/he5 i,
b/he12 i} is not maximal, since the event sequence bound to p can
be extended with e8 and still remains a match. The complete list of
earliest and maximal matches is:
Definition 2. (Match) Let P = (B, Θ, τ ) be a pattern with
B = hB1 , . . . , Bk i and E be an event relation. A substitution
γ = {v1 /~e1 , . . . , vm /~em } is a match of P in E iff the following
holds:
I(Θγ) is true,
{c/he1 i, p/he3 , e9 i, b/he11 i},
{c/he7 i, p/he5 , e8 i, b/he12 i}.
(1)
4. LAZY EVALUATION USING MATCH
WINDOWS
∀vi /~ei , vj /~ej ∈ γ(vi ∈ Bi ∧ vj ∈ Bi+1 → ein .T < ej1 .T ), (2)
∀vi /~ei , vj /~ej ∈ γ (|ei1 .T − ejn .T | ≤ τ ).
In this section we propose a lazy evaluation strategy for event
pattern matching. Instead of eagerly matching incoming events, we
advocate a two-phase strategy composed of a pre-processing step
and a pattern matching step (see Fig. 3). First, incoming events are
buffered in the match window, which allows to apply pruning techniques on the window, such as filtering, partitioning, and testing
for necessary match conditions. Second, an event pattern matching algorithm, A, is called that needs to consider only the events in
the match window. The pre-processing phase aims at reducing the
number of events that need to be processed by A.
(3)
Condition 1 requires that a match satisfies all constraints in Θ.
Condition 2 ensures that all events in a match that are bound to a
variable in Bi must occur before all events that are bound to any
variable in Bi+1 . No order is imposed on events that are bound
to variables in the same Bi , hence any permutation is matched.
Condition 3 constrains all events in a match to occur within a time
span of τ .
Example 4. Figure 2 illustrates a match of P1 in relation Chemo.
Variables c and b bind a single event each, p binds a sequence of
two events. The instantiation Θγ is satisfied (Cond. 1): e7 is a ’C’
event, e5 and e8 are ’P’ events, and e12 is a ’B’ event; all events
refer to the same patient; and the value of e5 is greater than the
value of e8 . Events e5 , e7 , and e8 that match the first event set
occur before e12 that matches the second event set (Cond. 2). The
time span between the first (e5 ) and last event (e12 ) is less than
fifteen days (Cond. 3). The complete list of matches is:
{c/he1 i, p/he3 i, b/he11 i},
{c/he7 i, p/he5 i, b/he12 i},
{c/he7 i, p/he8 i, b/he12 i},
{c/he1 i, p/he3 , e9 i, b/he11 i},
{c/he7 i, p/he5 i, b/he13 i},
{c/he7 i, p/he8 i, b/he13 i},
{c/he7 i, p/he5 , e8 i, b/he12 i},
{c/he7 i, p/he5 , e8 i, b/he13 i}.
{c/he7 i, p/he8 i, b/he12 i},
Pre-processing
E
Filtering
Partitioning
Pattern matching
Window 1
A
..
.
..
.
Window k
A
Figure 3: Two-phase Evaluation Strategy.
4.1 Match Window
First, we introduce the match window, which buffers incoming
events and aids a lazy call of A.
Definition 3. (Match Window) Let E be an event relation and P
be a pattern with duration τ . The match window starting at event
ei ∈ E is defined as Wi = he ∈ E | 0 ≤ e.T − ei .T ≤ τ i.
10 days ≤ 15 days
E = { . . . , e5 , . . . , e7 , e8 , . . . , e12 , . . . , }
A match window W is a maximal subsequence of E that starts
at an event ei and includes all events that are within duration τ .
P1 = ( h { c, p+ }, { b } i, Θ, 15 d )
Example 6. Figure 4 illustrates two match windows, W1 and
W2 , for pattern P1 = (h{c, p+ }, {b}i, Θ, 15 d). The match windows start at event e1 and e2 , respectively. Both match windows
contain 12 events, each with a maximal time span of fifteen days
between the first and the last event.
γ = { c / he7 i, p / he5 , e8 i, b / he12 i }
Figure 2: Match for Query Q1.
Definition 2 specifies the set of all possible matches of P in
E, but applications might be interested in a subset only. This
can be controlled by different event selection strategies that have
Conceptually, a match window slides over the input stream
event-by-event and covers at each step all events that are within
duration τ from the earliest event in the window. Depending on the
474
W2
equivalent to ei .L = ’C’ ∨ ei .L = ’P’ ∨ ei .L = ’B’. Referring to relation Chemo, we can easily verify that all events except
e6 satisfy this condition. Thus, e6 can safely be filtered out and
need not to be buffered. W1 and W2 from Fig. 4 become W1′ =
he1 , . . . , e5 , e7 , . . . , e12 i and W2′ = he2 , . . . , e5 , e7 , . . . , e13 i, respectively.
E = e1 , e2 , e3 , e4 , e5 , e6 , e7 , e8 , e9 , e10 , e11 , e12 , e13 , e14 , . . .
W1
Figure 4: Match Window for Pattern P1 .
4.3 Candidate Match Windows
While filtering checks for each individual event whether it satisfies the properties of at least one variable, here we consider all
events in a match window together and formulate conditions that
must be satisfied for a match that starts with the first event in the
match window. Match windows that do not satisfy these conditions
need not to be passed to A.
density of the input events along the time axis, the number of tuples
in the match window might vary.
A match window W = he1 , . . . , en i is a buffer that collects all
incoming events that need to be considered for a match that starts
at e1 . This includes all events until the time span between the first
event e1 and the current input event exceeds τ . At this point an
event pattern matching algorithm is called to compute all matches
that start at e1 . Afterwards, e1 is removed from W .
4.3.1 Necessary Match Conditions
The following theorem specifies necessary conditions for a
match window to contain a match that starts at the first event.
Definition 4. (Event Pattern Matching Algorithm) Let W =
he1 , . . . , en i be a match window for a pattern P . An event pattern matching algorithm, A(P, W ), takes P and W as input and
returns the set of all matches that start at e1 .
T HEOREM 2. (Necessary Match Conditions) Let P
=
(B, Θ, τ ) be a pattern with B = hB1 , . . . , Bk i and variables
v1 , . . . , vm . A match window, W = he1 , . . . , en i, that contains
a match starting at e1 satisfies the following conditions:
A can be any event pattern matching algorithm that returns
matches that start at event e1 . To make the discussion more concrete, we assume an automaton-based algorithm, such as the SES
pattern matching algorithm [5]. In the evaluation we also apply our
framework to ZStream, which uses join trees.
|W | ≥ m,
(4)
∃v ∈ B1 (Θpv (e1 )),
∀v ∈ {v1 , . . . , vm }∃ei ∈ W
Example 7. Consider pattern P1 and the match windows W1
and W2 in Fig. 4. A(P1 , W1 ) returns a single match {c/he1 i,
p/he3 , e9 i, b/he11 i} that starts at e1 . In contrast, A(P1 , W2 ) for
the second match window returns the empty set since no match
starts at e2 .
(5)
(Θpv (ei )),
∀vi ∈ Bi , vj ∈ Bi+1 (
∃ei , ej ∈ W (Θpvi (ei ) ∧ Θpvj (ej ) ∧ ei .T < ej .T )).
(6)
(7)
P ROOF: see Appendix A.
Theorem 2 allows to selectively pass match windows to algorithm A, thereby reducing the number of calls to A. A match window that does not satisfy the necessary match conditions cannot
contain a match that starts with the first event in the window.
An eager instantiation of automata for each incoming event
might lead to a large number of concurrent instances in memory,
whereas a lazy instantiation strategy reduces this number.
Example 9. Consider P1 = (h{c, p+ }, {b}i, Θ, 15d) and match
window W1′ = he1 , . . . , e5 , e7 , . . . , e12 i after filtering out e6 . W1′
is a candidate match window since the necessary match conditions
are satisfied: the cardinality of the match window, |W1′ | = 11, is
greater than the number of variables, which is 3 (Cond. 4); event e1
satisfies all properties of c in B1 (Cond. 5); e1 , e3 , and e11 satisfy
all properties of c, p, and b, respectively (Cond. 6), and their order
corresponds to the order of B1 and B2 (Cond. 7). Next, consider
W2′ = he2 , . . . , e5 , e7 , . . . , e13 }, which is not a candidate match
window, since e2 does not satisfy all properties of c or p (Cond. 5).
Hence, W2′ cannot contain a match that starts at e2 .
4.2 Filtered Match Windows
While the lazy initialization of automata reduces the number of
concurrently active automaton instances, filtering aims at reducing
the number of events that need to be processed by A. For this
we analyse each incoming event for the properties specified in the
pattern.
Not all events in the match window are candidates for matching
a variable in the query pattern. Events that cannot contribute to
any match can be filtered out before they are passed to A. The
following theorem specifies a condition that must be satisfied by
each event in a match.
4.3.2 Summary Statistics for Match Windows
Verifying Cond. 4 and 5 of the necessary match conditions is
efficient and can be done incrementally as the individual events
arrive. This is not the case for Cond. 6 and 7, for which all events
in the match window must be considered. We maintain a summary
statistics over the match window that can be updated incrementally
and allows to efficiently verify the conditions.
T HEOREM 1. Let P = (B, Θ, τ ) be a pattern with variables
v1 , . . . , vm and γ be a match of P in E. Each event e in γ satisfies
Θpv1 (e) ∨ · · · ∨ Θpvm (e).
P ROOF: see Appendix A.
From Theorem 1 we can conclude that an event e ∈ E can only
be part of a match if Θpv1 (e)∨· · ·∨Θpvm (e) is satisfied for some vi .
Therefore, we can filter out events that do not satisfy this condition,
which leads to a reduction of the events that must be processed by
algorithm A. Notice that only a subset of Θ need to be checked.
Definition 5. (Summary Statistics) Let W be a match window
and P = (B, Θ, τ ) be a pattern with B = hB1 , . . . , Bk i
and variables v1 , . . . , vm .
A summary statistics, S =
h(v1 , cnt 1 ), . . . , (vm , cnt m )i, is a set of variable-counter pairs
such that for each vi ∈ Bi
Example 8. Consider pattern P1 = (h{c, p+ }, {b}i, Θ, 15 d)
and match window W1 in Fig. 4. All events ei that contribute
to a match must satisfy Θpc (ei ) ∨ Θpp (ei ) ∨ Θpb (ei ), which is
cnt i = |{ei ∈ W | Θpvi (ei ) ∧ ∀vj ∈ Bi−1 ∃ej ∈ W (
(8)
Θpvj (ej ) ∧ ej .T < ei .T )}|.
475
The summary statistics maintains a counter for each variable.
The counter for a variable vi ∈ Bi records the number of events
in W that (1) satisfy all properties of vi and (2) chronologically
follow events in W that satisfy the properties of the variables in the
previous set Bi−1 .
relationships. If all events in a match must have identical values in
one or more attributes, the corresponding equality relationships in
Θ can be used to partition a match window W into a set of match
windows W1 , . . . , Wn .
To formalize the partitioning of match windows based on equality relationships, we first define the transitive closure of these
relationships. Let P = (B, Θ, τ ) be a pattern with variables
v1 , . . . , vm and Θeq = {θ | θ ∈ Θ ∧ θ ≡ vi .Ai = vj .Ai } be
the set of all relationships with equality constraints over a single
attribute. The transitive closure of Θeq is defined as
[ l
Θ+
Θeq ,
eq =
Example 10. Consider P1 = (h{c, p+ }, {b}i, Θ, 15 d) and
match window W = he1 , e2 , e3 i. The summary statistics for W
is S = h(c, 1), (p, 1), (b, 0)i. Events e1 and e3 satisfy all properties of c and p, respectively, and there is no previous set Bi−1 in
P1 . Although e2 satisfies all properties of variable b, its counter is
zero since there is no event in W that occurs before e2 and satisfies
the properties of p.
l∈N
where Θ0eq = Θeq and Θleq = {vi .Ai = vj .Ai | ∃vk ((vi .Ai =
l−1
l−1
vk .Ai ) ∈ Θeq
∧ (vk .Ai = vj .Ai ) ∈ Θeq
)}.
T HEOREM 3. Let P be a pattern with variables v1 , . . . , vm , W
be a match window, and S = h(v1 , cnt 1 ), . . . , (vm , cnt m )i be a
summary statistics. The necessary match conditions 6 and 7 are
satisfied if cnt 1 > 0 ∧ · · · ∧ cnt m > 0.
Definition 6. (Partitioned Match Windows) Let W be a match
window for a pattern P = (B, Θ, τ ) with variables v1 , . . . , vm .
Furthermore, let X = {A | ∀i, j ∈ [1, m]((vi .A = vj .A) ∈
Θ+
eq )}. The set of attributes X partitions W into W1 , . . . , Wn such
that for each Wi the following holds:
P ROOF: see Appendix A.
The summary statistics is incrementally updated when input
events enter or exit W . When a new event e is added, the properties
are evaluated. If e satisfies all properties of a variable v ∈ Bi and
the counters of all variables from the previous set Bi−1 are greater
than zero (i.e., W contains a matching event for these variables),
cnt i is incremented by one. When the first event, e, is dequeued
from the match window, the counter for each variable whose properties are satisfied by e is decremented. If the counter for a variable
v ∈ Bi becomes zero, the counters of all variables in the sets Bj ,
j > i, are reset to zero.
∀ei , ej ∈ Wi ∀A ∈ X(ei .A = ej .A).
X is a maximal set of partitioning attributes that have to assume
identical values for all events in a match. Accordingly, all events in
a partition have identical values in the attributes X.
Example 12. Consider P1 = (h{c, p+ }, {b}i, Θ, 15 d). The
transitive closure of the equality relationships is Θeq = {c.PID =
p.PID, c.PID = b.PID, p.PID = b.PID, c.PID = c.PID,
p.PID = p.PID, b.PID = b.PID}. The set of partitioning attributes is X = {PID}. That is, all events in a match must have
the same patient ID. Partitioning W1′ = he1 , . . . , e5 , e7 , . . . , e12 i
by PID results in two partitions, he1 , e2 , e3 , e9 , e11 i containing all
events of patient 1 and he5 , e7 , e8 , e10 , e12 i with all events of patient 2.
Example 11. Figure 5 shows the summary statistics for a match
window W that buffers events of patient 1. The last four columns
show the fulfillment of the necessary match conditions. Event e1
increments the counter for c. Event e2 matches b, but has no effect
on the statistics, since the counter for p is zero. After reading e5 , all
counters are greater than zero, and the necessary match conditions
are satisfied. Since the next event exceeds the duration of 15 days,
the pattern match algorithm is called with W . Next, e1 is removed
from W and the summary statistics is updated. Since the counter
for c becomes zero, the counter for b is reset to zero.
W
he1 i
he1 , e2 i
he1 , e2 , e3 i
he1 , e2 , e3 , e9 i
he1 , e2 , e3 , e9 , e11 i
he2 , e3 , e9 , e11 i
S
h(c, 1), (p, 0), (b, 0)i
h(c, 1), (p, 0), (b, 0)i
h(c, 1), (p, 1), (b, 0)i
h(c, 1), (p, 2), (b, 0)i
h(c, 1), (p, 2), (b, 1)i
h(c, 0), (p, 2), (b, 0)i
Partitioning input events allows two types of optimizations.
First, since each partition Wi can be processed independently by
algorithm A, parallel event processing is possible. Second, after
partitioning, the relationships vi .Ai = vj .Ai for all Ai ∈ X need
not to be considered anymore by A. They can be removed from Θ
before calling A to reduce conditions that need to be verified in A.
Match conditions
(4) (5) (6) (7)
F
T
F
F
F
T
F
F
T
T
F
F
T
T
F
F
T
T
T
T
T
F
F
F
5. ALGORITHM
Algorithm 1 implements the two-phase pattern matching strategy
from the previous section. The input parameters are a pattern P , an
event relation E, and an event pattern matching algorithm A. The
algorithm returns the set of all matches.
In the initialization phase, an empty hash table H and result set R
are created, and the set X of partitioning attributes is determined,
which serves as a key for the hash table. H stores triples of the
form (K, W, S), where K = X are the grouping attributes, W is a
match window, and S is the summary statistics over W .
The main loop iterates over all input events in chronological order. Events that do not satisfy the properties of any variable are
immediately filtered out. For events that pass the filter, the corresponding entry in H is retrieved; if no such entry exists, a new entry
is added with key set to the values of the partitioning attributes of
the current event, an empty match window, and a summary statistics with all counters set to zero. Before inserting the current event
e in W , the algorithm ensures that W does not exceed τ when the
Figure 5: Statistics for P1 = (h{c, p+ }, {b}i, Θ, 15 d).
The summary statistics provides an approximate solution and
might produce false positives (e.g. due to relationships between
variables that are not encoded in the statistics). We show experimentally that the number of match candidates can be significantly
reduced.
4.4 Partitioned Match Windows
The last optimization aims to further reduce the number of events
in a match window that are processed by algorithm A. The core
idea is to remove events that, while fulfilling the properties of a
variable, can be excluded from a match since they violate some
476
SES pattern matching algorithm [5] that finds earliest and maximal matches and on the join-based ZStream pattern matching algorithm [12] that finds all matches; (2) to show the scalability of the
match window in the pattern, and (3) to show the scalability of the
match window in the data.
Algorithm 1: Match(P, E, A)
Input: pattern P = (hB1 , . . . , Bk i, Θ, τ ) with m variables, event
relation E, event pattern matching algorithm A
Output: set of matches
1 H ← ∅; R ← ∅;
2 Compute set X = (X1 , . . . , Xk ) of partitioning attributes;
3 foreach e ∈ E ordered by T do
p
4
if Θp
v1 (e) or . . . or Θvm (e) then
5
K ′ ← (e.X1 , . . . , e.Xk );
6
if ∄(K, W, S) ∈ H with K = K ′ then
7
Add (K ′ , ∅, h(v1 , 0), . . . , (vm , 0)i) to H;
8
9
10
11
12
We implemented our two-phase matching strategy with the SES
algorithm and ZStream algorithm in C. The event relation is stored
in an Oracle database, Enterprise Edition 11.1, which is accessed
over the OCI API. The experiments were performed on a PC with
four AMD Opteron 285 processors with 1.8 and 2.6 GHz and
16 GB memory, on which a 64-bit Linux 2.6.32 is installed.
We use two different real-world data sets. The Onco data set contains 341055 chemotherapy events from the Department of Haematology at the Hospital Meran-Merano. The NYSE data set contains
1M share trades in stock markets over 34 hours [14].
In the experiments, we analyze the scalability by varying the
number of variables, the length of τ , and the size of the event relation. For the experiments with a varying number of variables, we
use the following two query patterns:
Let (K, W, S) be the entry in H with K = K ′ ;
while |W | > 0 and e.T − W [1].T > τ do
// match window exceeds τ with event e
if |W | ≥ m and ∃v ∈ B1 (Θp
v (W [1]))
and ∀(v, cnt) ∈ S(cnt > 0) then
R ← R ∪ A(W, P );
dequeue(W ); Update S;
13
14
6.1 Setup and Data
enqueue(W, e); Update S;
15 foreach (K, W, S) ∈ H do
16
while |W | ≥ m and ∃v ∈ B1 (Θp
v (W [1]))
17
and ∀(v, cnt) ∈ S(cnt > 0) do
18
R ← R ∪ A(W, P );
19
dequeue(W );
20
Update S;
• Pseq = (h{v1 }, {v2 }, . . . , {vk }i, Θ, τ )
• Pset = (h{v1 , v2 , . . . , vk }i, Θ, τ )
The number of variables varies from k = 1, . . . , 10. For each
step, we randomly choose ten distinct patterns out of all possible
patterns with k variables, and we take the average of the measured
throughput. The duration τ is 10 days for the Onco data set and
20 ms for the NYSE data set. Pseq is a sequential pattern where the
matching events must occur in the same order as the variables in
the pattern. In Pset the matching events may occur in any order.
For the experiments with a varying length of τ , we use the following patterns.
21 return R;
event is added. If e’s distance from the first event in W exceeds τ ,
the necessary match conditions are verified; if they are satisfied, algorithm A is called. Afterwards, the first event is removed from W
and the summary statistics is updated. This step is repeated until e
fits into W and can be added; the summary statistics is updated.
After reading all input events, the remaining events in the match
windows need to be processed. As long as the necessary match
conditions are satisfied, A is called and the first event is removed.
The summary statistics can efficiently be implemented using bit
vectors. Let P = (B, Θ, τ ) be a pattern with B = hB1 , . . . , Bk i
and variables v1 , . . . , vm . We use a bit vector, ēi , of length m to
mask each incoming event ei . Each position in the bit vector corresponds to a variable in P . If a bit is 1, the event satisfies all
properties of the corresponding variable; otherwise it is set to 0.
Similarly, we use a bit vector, s̄, where each position corresponds
to a counter in the summary statistics. A 0 bit means that the corresponding counter is greater than zero; otherwise the counter is zero.
Finally, we specify a bit mask, b̄i , for each Bi . It has all bits set
to 0 except for the bits that represent the positions of the variables
in the previous event set, Bi−1 . Using these bit vectors in combination allows an efficient update and querying of the summary
statistics.
• Ponco = (h{c, d}, {p+ , r + }, {b, h}i, Θonco , τ )
• Pnyse = (h{a+ , g + }, {i+ }i, Θnyse , τ )
Ponco roughly resembles a cycle of a chemotherapy treatment. b
matches white blood cell counts, h red blood cell counts, and the
rest different medication administrations. Partitioning is applied
on the patient ID. Pnyse specifies that a, g, and i match share
trades of Apple, Google, and IBM, respectively. The price of the
Apple and Google shares must be increasing between trades (i.e.,
prev(a.price) < a.price), followed by a decrease of IBM shares
(i.e., prev(i.price) > i.price). Apple and Google share trades can
occur interleaved. We vary τ from 5 days to 50 days for the Onco
data set and from 10 to 100 ms for the NYSE data set.
For the experiments with a varying size of E, we use the pattern
Pnyse and vary the size of E from 100 to 1000000 events in steps
of a factor 10.
Example 13. Consider P1 = (h{c, p+ }, {b}i, Θ, 15 d) and the
events e1 , e2 , and e3 . Event e1 satisfies all properties of variable c,
but not for p and b. The corresponding bit vector is ē1 = 100. For
the events e2 and e3 the bit vectors are, respectively, ē1 = 001 and
ē3 = 010. The bit vector for the summary statistics is s̄ = 111 at
the beginning and 001 after processing e1 , e2 , and e3 (cf. Fig. 5).
The bit mask for event set B1 is b̄1 = 000; for B2 it is b̄2 = 110.
6.
6.2 Scalability in the Pattern
In this experiment, we study the scalability of our framework by
varying the number of variables and the length of τ in the pattern,
respectively. Our hypothesis is that match windows significantly
increase the throughput.
We first run the SES and ZStream algorithms with a plain match
window without any pruning techniques (baseline), and apply then
filtering (f), partitioning (p), and testing for necessary match conditions (c). The performance of the optimizations is measured in
terms of throughput increase with respect to the baseline algorithm.
Figures 6 and 7 show the factor of the throughput increase of
the various pruning techniques relative to the baseline algorithm.
EXPERIMENTS
In this section, we report the results of an empirical evaluation
using real-world data. The experiments have three purposes: (1)
to show the effect of the match window on the automaton-based
477
fpc
fp
f
600
500
400
300
200
100
6
Factor of Throughput Increase
fpc
fp
f
700
Factor of Throughput Increase
Factor of Throughput Increase
Factor of Throughput Increase
12
11
10
9
8
7
6
5
4
3
2
1
5
4
3
2
fpc
fp
f
1
5
fpc
f
fp
4
3
2
1
0
2
3
4
5
6
7
8
9
10
1
2
3
# of Event Variables
(a) Pseq on Onco
5
6
7
8
9 10
1
2
3
2
1
fc
f
60
fc
f
50
5
6
7
8
9
10
1
40
30
20
10
5
4
3
2
1
4 5 6 7 8
# of Event Variables
9
10
fc
f
2
Factor of Throughput Increase
Factor of Throughput Increase
3 4 5 6 7 8
# of Event Variables
9
10
1
(d) Pset on NYSE
fpc
fp
f
20
15
10
5
5
6
7
8
9
10
3 4 5 6 7 8
# of Event Variables
9
10
12
fc
f
10
8
6
4
2
0
1
(c) Pseq on NYSE
25
4
(b) Pset on Onco
3
4 5 6 7 8
# of Event Variables
9
10
1
2
(c) Pseq on NYSE
6
5
4
3
2
1
2
fc
f
(d) Pset on NYSE
6
Factor of Throughput Increase
3
Factor of Throughput Increase
2
3
# of Event Variables
0
1
2
(a) Pseq on Onco
Factor of Throughput Increase
3
4
# of Event Variables
(b) Pset on Onco
Factor of Throughput Increase
Factor of Throughput Increase
4
# of Event Variables
Factor of Throughput Increase
1
5
4
3
2
fpc
fp
f
1
4
3
2
1
fc
f
0
5
15
25
35
τ [days]
(e) Ponco
45
10
30
50
τ [ms]
70
90
5
(f) Pnyse
15
25
35
τ [days]
45
10
(e) Ponco
Figure 6: Varying the Pattern with SES.
30
50
τ [ms]
70
90
(f) Pnyse
Figure 7: Varying the Pattern with ZStream.
6.3 Scalability in the Data
The first observation is that the match windows achieve a significant improvement over the baseline solution, up to several orders
of magnitudes for Pset .
Second, the optimizations are more effective for SES than for
ZStream. The reasons are that ZStream implicitly filters new events
when they are added to the join tree, and it uses hash tables for the
joins on equality predicates. However, filtering, partitioning and
testing for necessary conditions together still increase the throughput three to four times.
The third observation is that for SES the optimizations are much
more effective for Pset (more than two orders of magnitude) than
for Pseq . With Pset and without any pruning techniques, every
event that matches any variable starts an automaton instance, yielding a large number of automaton instances, many of which do not
lead to a match. With Pseq , only events that match the first variable
in the pattern start an automaton instance. The match windows have
a larger potential to increase the throughput of Pset than of Pseq .
The fourth observation is that the throughput for Ponco increases
with increasing duration τ for SES. Events in the Onco data set
follow roughly a chemotherapy protocol. Individual chemotherapy
treatments are shifted in time, hence the patients are not distributed
uniformly over the data. With increasing τ , events of more patients
are encountered, hence more partitions of match windows can be
created. SES can fully exploit the increasing number of partitions.
In this experiment, we study the scalability of our framework by
varying the number of events in the event relation. Our hypothesis is that the throughput increase remains constant with increasing
size of the event relation. Again, we run first the baseline SES and
ZStream algorithms on NYSE, and apply then filtering (f) and testing for necessary match conditions (c).
Figure 8 shows the factor of the throughput increase relative to
the baseline algorithm. As expected, the throughput increase for
our two-phase matching strategy when the pruning techniques are
applied remains roughly constant. The reason is that neither the
filtering nor the testing for necessary match conditions depends on
the length of the event relation. Notice the logarithmic scale on the
horizontal axis.
7. CONCLUSION
In this paper we presented a novel pattern matching strategy that
consists of a pre-processing phase and a pattern matching phase. In
the pre-processing phase, incoming events are buffered in a match
window, which allows to apply different pruning techniques, such
as filtering, partitioning, and testing for necessary match conditions, and aids a lazy evaluation of a pattern matching algorithm.
We conducted extensive experiments using two real-world data sets
and two existing event pattern matching algorithms. The results
478
Factor of Throughput Increase
Factor of Throughput Increase
8
fc
f
7
6
5
4
3
2
1
0
100
1000
10000
100000
1M
4
3
[11]
2
[12]
1
fc
f
100
1000
|E| [# of Events]
(a) Pnyse with SES
10000
100000
[13]
1M
|E| [# of Events]
(b) Pnyse with ZStream
Figure 8: Varying the Data Size.
[14]
[15]
show that our framework significantly increases the throughput for
both algorithms.
Future work is possible in various directions, including the investigation of additional runtime optimizations to efficiently support other event selection strategies (e.g. contiguous matches) as
well as to explore space optimizations.
8.
[16]
APPENDIX
A.
ACKNOWLEDGEMENTS
PROOFS
P ROOF. (Theorem 1) The constraints Θ consist of relationships and properties, i.e., Θ = Θr ∪ Θp . The properties can
further be partitioned by variable, i.e., Θp = Θpv1 ∪ · · · ∪
Θpvm . By Def. 2, a match γ satisfies I(Θγ). This can be
transformed into I(Θpv1 γ) ∧ · · · ∧ I(Θpvm γ) ∧ I(Θr γ). Since
I(Θpvi γ) ≡ I(Θpvi {vi /he1 , . . . , en i}) ≡ I(Θpvi {vi /he1 i}) ∧ · · · ∧
I(Θpvi {vi /hen i}), each event in a match satisfies all properties of
the variable it is bound to.
This work has been done within the framework of the MEDAN
project, which is funded by the Hospital of Meran-Merano.
9.
optimization for event stream processing. In ICDE, pages
676–685, 2008.
L. Harada and Y. Hotta. Order checking in a CPOE using
event analyzer. In CIKM, pages 549–555, 2005.
Y. Mei and S. Madden. Zstream: A cost-based query
processor for adaptively detecting composite events. In
SIGMOD, pages 193–206, 2009.
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu,
M. Datar, G. S. Manku, C. Olston, J. Rosenstein, and
R. Varma. Query processing, approximation, and resource
management in a data stream management system. In CIDR,
2003.
NYSE. http://www.nyxdata.com/.
R. Sadri, C. Zaniolo, A. Zarkesh, and J. Adibi. Expressing
and optimizing sequence queries in database systems. ACM
Trans. Database Syst., 29(2):282–318, 2004.
F. Zemke, A. Witkowski, M. Cherniak, and L. Colby. Pattern
matching in sequences of rows. Technical report, 2007.
REFERENCES
[1] D. J. Abadi, Y. Ahmad, M. Balazinska, M. Cherniack,
J. hyon Hwang, W. Lindner, A. S. Maskey, E. Rasin,
E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of
the Borealis stream processing engine. In CIDR, pages
277–289, 2005.
[2] D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack,
C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik.
Aurora: a new model and architecture for data stream
management. The VLDB Journal, 12(2):120–139, 2003.
[3] J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman.
Efficient pattern matching over event streams. In SIGMOD,
pages 147–160, 2008.
[4] A. Arasu, S. Babu, and J. Widom. CQL: A language for
continuous queries over streams and relations. In DBPL,
pages 1–19, 2003.
[5] B. Cadonna, J. Gamper, and M. H. Böhlen. Sequenced event
set pattern matching. In EDBT, pages 33–44, 2011.
[6] D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee,
G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik.
Monitoring streams: a new class of data management
applications. In VLDB, pages 215–226, 2002.
[7] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J.
Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy,
S. Madden, V. Raman, F. Reiss, and M. A. Shah.
Telegraphcq: Continuous dataflow processing for an
uncertain world. In CIDR, 2003.
[8] A. J. Demers, J. Gehrke, M. Hong, M. Riedewald, and W. M.
White. Towards expressive publish/subscribe systems. In
EDBT, pages 627–644, 2006.
[9] N. Dindar, B. Güç, P. Lau, A. Ozal, M. Soner, and N. Tatbul.
Dejavu: declarative pattern matching over live and archived
streams of events. In SIGMOD, pages 1023–1026, 2009.
[10] L. Ding, S. Chen, E. A. Rundensteiner, J. Tatemura, W.-P.
Hsiung, and K. S. Candan. Runtime semantic query
P ROOF. (Theorem 2) Condition 4: Since a given event can only
appear once in a match, W needs to contain at least as many input
events as variables in the query pattern. Condition 5: By Def. 2,
the earliest event in a match is bound to a variable in B1 . Thus,
the first event in W must satisfy the properties Θpv of a variable
v ∈ B1 . Condition 6: By Def. 2, each variable v1 , . . . , vm is bound
to at least one input event that satisfies the properties Θpvi . Hence,
W must contain such an event for each variable. Condition 7: By
Def. 2, input events that are bound to variables vi , vj in consecutive
Bi s need to occur in the same chronological order as the Bi s in
P.
P ROOF. (Theorem 3) cnt i is greater than zero iff there exists at
least one event in the set specified by Cond. 8. Therefore, cnt 1 >
0 ∧ · · · ∧ cnt m > 0 is equivalent to
∀v ∈ B1 ∃e ∈ W (Θpv (e)) ∧
∀v ∈ B2 , v ′ ∈ B1 ∃e, e′ ∈ W (Θpv (e) ∧ Θpv′ (e′ ) ∧ e′ .T < e.T )
∧ ···∧
∀v ∈ Bk , v ′ ∈ Bn−1 ∃e, e′ ∈ W (Θpv (e) ∧ Θpv′ (e′ ) ∧ e′ .T < e.T ).
By applying the rule ∀x(A ∧ B) ≡ ∀x(A) ∧ ∀x(A ∧ B) to all
quantified conjuncts we get
∀v ∈ B1 ∃e ∈ W (Θpv (e)) ∧ · · · ∧ ∀v ∈ Bk ∃e ∈ W (Θpv (e)) ∧
∀1 < i ≤ n(∀v ∈ Bi , v ′ ∈ Bi−1 ∃e, e′ ∈ W (
Θpv (e) ∧ Θpv′ (e′ ) ∧ e′ .T < e.T )),
which corresponds to Conds. 6 and 7.
479

Download Report

Efficient Event Pattern Matching with Match Windows

Paperzz.com

Your Paperzz