A Performance Enhancement Advisor for Event Processing Queries Joong-Hyun Choi Eun-Sun Cho Dept. of Computer Sci. & Eng. Chungnam Nat’l Univ. Daejeon, Republic of Korea Dept. of Computer Sci. & Eng. Chungnam Nat’l Univ. Daejeon, Republic of Korea [email protected] [email protected] ABSTRACT most relevant paradigms which deserve close attentions[1]. Recent event stream processing systems such as Esper, Oracle Complex Event Processing (CEP), and MS StreamInsight show practically acceptable performance. However, although it seems good news to the programmers writing reactive programs based on events, those systems do not work well on some types of queries yet, so that programmers should be careful about that. In addition, some of those systems take no measure about such queries, except publishing reference manuals telling programmers for themselves to avoid those queries, which aggravates burdens in reactive programming. In this paper, we propose an improved querying module for event stream processing systems, which helps programmers by giving them the hints to improve performance whenever their queries fall in any possible bad formats in the performance sense. We expect that our proposed module would be a big help to increases productivity of writing reactive programs where debugging, testing, and performance tuning are not straightforward. One of the difficult problems to solve in this area is performance matters in events processing. Most of all, some applications require continuous processing on a stream of events, which entails semi real-time monitoring and evaluating series of events and yields prompt notifications. For instance, monitoring computer networks to detect denial of services or other security attacks require very high-volume processing with low latency [2]. The same is true for fraud detection from the stream of financial data. Categories and Subject Descriptors H.2.4 [Database Management]: Systems–Query processing; D.3.3 [Programming Languages]: Language Constructs and Features – Patterns General Terms Algorithms, Languages, Performance Keywords Events, Event streams, Queries, Reactive programs 1. INTRODUCTION Traditional computing environment are rapidly changing these days, as digital devices like sensors and actuators are getting reliable and cheaper, and big amount of data are continuously collected and processed to yield analytic results. Accordingly, new types of programming paradigms to meet the new environment have got also growing interests from both industries and academia. Among them, event-based reactive programming is one of the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. REBLS’14 at SPLASH’14, October 21, 2014, Portland, OR, USA. Copyright © 2014 ACM 1-58113-000-0/00/0010 …$15.00. Fortunately, recent event stream processing systems like Esper, Oracle Complex Event Processing (CEP), and MS StreamInsight show practically acceptable performance. They achieve decent processing times and throughputs for most of the stream queries on the events. However, on some types of queries such systems show not as good performance as they would do on other queries yet. For instance, as in the academic results they are based [3][4], filtering on the complex event streams are not so efficient as on the naïve event stream. This problem could be solved by query rewriting techniques as in SQL rewriting in relational data base, but it is not straightforward to extract general rules. Currently, however, most of the event processing systems take no measures, except publishing reference manuals telling programmers for themselves to avoid those queries, which aggravates burdens in reactive programming. In this paper, we propose an improved querying module for event stream processing systems, which helps programmers by giving them the hints to improve performance whenever their queries fall in any possible bad formats in the performance sense. We expect that our proposed module would be a big help to increases productivity of writing reactive programs where debugging, testing, and performance tuning are not straightforward. We are currently focusing on Java API for EPL, the query language of Esper, since it is open source and widely used, but we expect main idea of our module can be generally applied to other event processing systems and other programming languages. 2. EPL (Event Processing Language) Esper is one of the most popular open-source, complex event processing systems developed by EsperTech [5]. It provides Java and C# interfaces to enable programmers to develop event-based reactive programs. It also tries to make it easy for relational DBMS users to migrate to event stream processing. When we compare Esper with relational DBMSs, tables, rows, and entities in relational DBMSs are corresponded to event stream views, events, and event properties in Esper, respectively, as shown in Table 1. Table 1. Relational DBMSs vs. Esper Table 2. EPL Syntax Relational DBMSs Tables Rows Esper Event stream views Events Entity Event Properties An event stream means a series of events as the source of data. Event stream views, briefly called ‘views,’ define the data available for querying and filtering. Internally, views are constructed from data windows. As shown in Figure 1, there exist three types of methods to manipulate data windows; Length window method keeps only the last N events of a stream in a window. (N is two in Figure 1.) Time window method slides the window when a new event arrives or expired events leave. Time batch method holds all the events for a given duration, and after that duration, the events are released in bulk and the window moves by the length of the duration [5]. (The window size is assumed to be 4 sec in both Time window method and Time batch method in Figure 1.) Event Processing Language (EPL) of Esper is a SQL-like query language with select, from, where, group by, having and order by clauses [5]. Thus, similar to SQL, all the EPL statements require select and from clauses. A select clause specifies the event properties or events to retrieve. A from clause specifies the event stream definitions and stream names to use. A where clause specifies search conditions. Comparison operators and logical combinations are supported in a where clause [5]. Old New Event Event E1 E1 E2 E2 E3 E3 New Event E1 E1 E1 E1 E2 E2 E2 E3 E3 Without Window t t+1 t+2 t+3 t+4 t+5 Old Event E1 Length Window – 2 t+6 t+7 t+8 New Event t+9 E1 Old Event E2 E1 E2 E2 E1 Time Window – 4sec t t+1 t+2 t+3 t+4 E1 E2 E1 E2 t+5 E3 t+6 t+7 t+8 t+9 Old New Event Event E1 / E2 E3 EPL also has several new unique characteristics for processing event streams, which were not shown in SQLs. One of those new facilities is pattern-based event stream query description, which filters out a stream by defining specific patterns that are interested in for the applications. Patterns will be matched by actual event objects from the incoming stream during the runtime. EPL has four types of pattern operators as follows [5]; Operators that control pattern sub-expression repetition: every, every-distinct, [num] and until Logical operators: and, or, not Temporal operators that operate on event orders: -> (followed-by) Guards are where-conditions that control the lifecycle of sub-expressions. Examples are timer:within, timer:withinmax and while-expression. Custom plug-in guards may also be used [5]. 3. BASIC IDEA Our proposal automatically notifies developers some hints to enhance performance by analyzing given EPL statements. Currently our work is mainly based on Performance Tips in Esper Reference [5], focusing on the following five performance tips. 1. Individual field selection elimination: Select the underlying events rather than individual fields 2. Where-clause elimination: Prefer stream-level filtering over where-clause filtering 3. Unnecessary arithmetic expressions elimination: Reduce the use of arithmetic in expressions E1 E1 [annotations] [expression_declarations] [context context_name] [insert into insert_into_def] select select_list from stream_def [as name] [, stream_def [as name]] [,...] [where search_conditions] [group by grouping_expression_list] [having grouping_search_conditions] [output output_specification] [order by order_by_expression_list] [limit num_rows] E1 / E2 Time Batch – 4sec Figure 1. Three types of methods manipulating data windows (The topmost depicts Length window, while the middle is for Time window, and the bottommost is for Time batch.) 4. Unnecessary group-by elimination: Remove unnecessary grouping constructs 5. Limiting lifetime on sub-expressions: Use “end pattern” subexpressions if possible We choose these five tips based on how they are relevant and useful to generic event query writers. Thus, the tips which are too much specific to Esper implementation or to JVM performance tuning are currently ruled out from our work. Each of them will be introduced and analyzed in the subsequent sub sections. 3.1 Individual Field Selection Elimination The underlying event objects sent to the engine via the method sendEvent() carry associate data. When we use a select Table 3. Overhead of field selection Average Processing Time (µs) Processing unit (for 15sec) Statement 1 31.000 5533104 Statement 1' 33.606 5468017 from clauses instead of where clauses. They call the second type of methods “stream-level filtering [5].” Esper stream processing engine provides well optimized stream-level filtering, while they do not provide optimization on the where clause filtering in general [5]. For example, consider following EPL statement 2 which performs stream-level filtering: EPL statement 2: select * from Market(ticker = 'APPLE') clause, we can select whole underlying events by using the wildcard (*), as follows; The same result would be obtained by following EPL statement 2', but with worse performance than EPL statement 2. EPL statement 1: select * from RFIDEvent EPL statement 2': select * from Market where ticker = 'APPLE' The other format of the same query is earned by enumerating the associate data fields of the events, as follows; Note that Esper engine can optimize only stream-level filtering, because with stream-level filtering it does not have to care about data windows. Thus, in other words, if we cannot make sure any assumptions about a where-clause (for instance, “the preset data window scheme positively affects its condition,”) we safely transform it to a stream-level filtering. EPL statement 1': select assetId, zone, xlocation, ylocation from RFIDEvent Note that EPL statement 1 is better than EPL statement 1’, because, with the former the engine does not need to generate a new output for each input event. On the other hands EPL statement 1' delivers an event to update listeners after additional processing like extracting field data and constructing new event objects as results. Therefore, we recommend stream-level filtering when a given query turns out to be a where-clause query as in Figure 3. As soon as the input query EPL statement 2' has where clause filtering with no data window, it gives a hint for performance improvement. Table 3 shows our small experimental result to figure out the extra costs of such different object processing methods, which is conducted with Esper core library 5.0, Intel Core 2 Quad Q6600 2.4Ghz, 4GB RAM, Windows 7, and Java(TM) SE Runtime Environment (build 1.8.0_05-b13). For the test data, we make use the Esper benchmark kit [6]. As shown in the table, EPL statement 1 is better than statement 1' in both average processing time and throughput, as expected. Thus our module advises the programmers to select underlying events using wildcard (*) rather than selecting fields, if possible. Figure 2. shows the query string and the corresponding warning message based on the result of analyzing the query. In the example, the selection of individual fields would get exactly the same result with selection of underlying event objects, the query registration API yields a message as such. Figure 3. Screen shot when EPL statement 2' is registered to the system To summarize, when an EPL statement has no data window and includes where clause containing constant equals, our tool suggests stream-level filtering as a hint for performance improvement. 3.3 Unnecessary Arithmetic Expressions Elimination Figure 2. Screen shot when EPL statement 1' is registered to the system However, note that when select clause is a sub-query, a stream selection, or a built-in functions (e.g. average, sum), we do not recommend such a tip, because clearly that is what the programmer intended. Esper allows arithmetic expression in EPL statement. However, not all the stream processing engines including Esper does not yet pre-evaluate arithmetic expressions aggressively to generate constants [5], because which is not a trivial or beneficial process, especially with dynamic queries. Therefore, EPL statement 3' below remains the same without optimization in Esper. In performance sense, it is strongly recommended using EPL statement 3 instead, so as to avoid repeated evaluation on the arithmetic expressions at each event. EPL statement 3: select price*(volume1 + volume2) from MarketData(price < 60) 3.2 Where-Clause Elimination EPL statement 3': Filtering methods in Esper are categorized into two types. The first type inserts a search-condition into the where clause just like SQL, while the second type would do the same thing with select price*volume1 + price* volume2 from MarketData(price - 10 < 50) We examine whether there are arithmetic expressions in EPL query; if there is any, we recommend evaluating the arithmetic expressions before query registration. Figure 4 shows the advising messages which are given for the query with a minus operator. 3.5 Limiting Lifetime of Sub-expressions An "every" keyword followed by operator (->) in patterns leads to several sub-expressions simultaneously waiting for matched events. For instance, once an A-typed event arrives, “every A -> B” issues a new sub-expression evaluation looking for any Btyped events coming (as in EPL statement 5'.) Such queries can be made more descriptive by augmenting end conditions like “where timer:within(1 sec).” (as in EPL statement 5'.) EPL statement 5: every A -> B where timer:within(1 sec) EPL statement 5': every A -> B Figure 4. Screen shot when EPL statement 3' with a minus operation is registered to the system 3.4 Unnecessary group by Elimination A group by clause divides the output of an EPL statement into several groups [5]. Following EPL statement 4 is a simple streamlevel filtering query, and EPL statement 4' specifies an augmented version with a group by clause. Since EPL statement 5' is waiting for B-type events endlessly and might cause performance degradation, simple augmentation of end patterns would relieve the burden of Esper engine dramatically, by letting it explicitly know when to stop evaluating the sub-expression. Therefore, programmers are asked to specify end pattern, when our analyzer recognizes that a followed by operator (->) occurs repetitively and there is no end pattern in the statement. Otherwise, as depicted in Figure 6, there would be no warning message if sub-expression already has a termination condition. EPL statement 4: select * from MarketData(symbol = 'GE') EPL statement 4': select * from MarketData(symbol = 'GE') group by symbol Note that in EPL statement 4', “symbol”, the criteria of streamlevel filtering (in symbol = 'GE'), is also used as the criteria of the group by clause filtering (in group by symbol). If the query has a filter criteria allows only one group among the specified in group by clause like EPL statement 4', the group by clause might be superfluous since the stream-level filtering alone would yield the same result well without extra works. Thus, in this case EPL statement 4 would be preferable to EPL statement 4'. We check whether the filtered-stream of a given query is equal to the filter criteria of group by. If they are same, we recommend eliminating group by clause. Figure 5 demonstrates the messages for the redundant group by clause as described above. In this case the programmer can simply remove the group by clause. Figure 6. Screen shot of query registration when EPL statement 5 is attached with end condition 4. IMPLEMENTATIONS As shown in Figure 7, our EPL Statement Analyzer informs programmers of some hints for performance improvement when they register EPL queries via createEPL method. From the given EPL statement, EPL Parser of our EPL Statement Analyzer first constructs a parsing tree using ANTLR parser generator [7]. But note that if above the query contains any aggregation function over groups as well (using having clauses), group by should not be removed, because semantics of those functions needs group by clauses. Figure 5. Screen shot when EPL statement 4' is registered to the system Figure 7. Proposed System Overview Then, the tree is traversed by our Tree Visitor. Finally, a better query would be suggested to programmers by Query Advisor after Tree Visitor analyses the query. Although our analyzer is developed independently of the Esper, we expect that it is also applicable to other CEP’s without significant additional works. Dashed arrows in Figure 7 represent control flows via method invocations or callbacks. The big shaded rectangle shows how our query API, named EPL Statement Analyzer, interacts with the programmer and Esper, In testing and tuning phase, the programmer might register event queries via createEPL and gets feedbacks from EPL Statement Analyzer repeatedly, until no other better queries are suggested. Since our analyzer works in runtime, dynamic queries would be handled properly as expected. Figure 8 depicts the result from EPL Parser. Nonterminals indicate grammatical structures represented by nested hierarchies in the figure. The rectangle nodes in the hierarchy are terminals, collectively showing the original EPL query. regularJoin, facilitating manipulations of more than one stream, is optional (marked with a dashed line), so the corresponding nonterminal does not have children in this example. Esper client is built using Service Provider Framework (SPF) for extensibility [8]. The most essential interface is Esper’s EPServiceProvider, which is used to register EPL queries to Esper engine. Table 4 indicates how to analyze the example EPL query of select * from StockTick where symbolName = 'GOOG’. In this example, interface EPServiceProvider has two implementations; the default provider (at line 4) and our proposed provider (at line 7.) Li n e s 2 , 3 , a n d 4 d e s c r i b e h o w t o create default EPServiceProvider. Line 2 makes configuration object, event type is added into it at line 3, and, then, default EPServiceProvider object is constructed at line 4. Our proposed provider that notifies developers of performance tips is constructed through lines 5, 6, 7, and 8. Unlike default EPServiceProvider, it is wrapped with EPAdministartor. Note that providers are created via getDefaultProvider and getProvider, respectively, and both are static methods of class EPServiceProviderManager. The provider objects Figure 8. Example Parsing Tree for EPL statement 1 Figure 9. Screenshot: execution of TestEPL() in Table 4 are assigned to variables epService and epServiceWithWarn, respectively (at line 4 and line 7), and used to invoke createEPL method at line 10 and line 13, respectively. Figure 9 shows the results from Table 4, which notifies developers of performance tips through standard I/O before a given EPL query is registered into the engine via createEPL method. 5. RELATED WORK Similar to Esper, Oracle Complex Event Processing (CEP) [9] engine provides Oracle Continuous Query Language (Oracle CQL) that is based on SQL extended with new constructs for streaming data [9]. But they do not seem to provide warning or optimization methods guided by query syntax. Instead, Oracle CEP Visualizer [10] provides Query Plan functionality, which enables CQL query optimization, helps programmers rewriting the queries. Microsoft StreamInsight [11] is also a well-known event stream processing engine similar to Esper and Oracle CEP. It uses Language Integrated Query (LINQ) as a query language [11]. Their stand-alone debugging tool named Event Flow Debugger is similar to Oracle CEP Visualizer, which shows query plans for a given query to help programmers rewriting the queries. However, all the tools are not satisfactory in that programmers themselves are still responsible for gathering statics and rewriting queries. On the other hands our proposed adviser yields explicit direction on rewriting according to Esper’s performance tips. The Cayuga [12] CEP system is non-commercial CEP system. It provides Cayuga Event Language which has a SQL-like syntax. Cayuga covert a query into a non-deterministic finite state automata (NFA). Our approach might be applied to Cayuga, in this transformation phase; our analysis would show how a conversion scheme affects the performance of a query. Another non-commercial CEP system named SASE [13][14] is proposed for a complex event processing in the research area. It is famous for a plan-based and stack-based approach in query processing, which gives significant performance improvement, affecting a lot of later stream processing systems like Esper and StreamInsight. However, unlike our techniques, SASE itself does not give any performance hints to their programmers. In addition, compared to actually used CEP systems like Esper or StreamInsight, the query language of SASE has many limitations for practical use. There exist efforts to extend existing programming languages like Java to process streams of events [15][16]. However, this approach is dedicated to language constructs rather than queries, so our query optimization support would be complementary to their approach. Some other existing works focus on frequently used queries on event streams under specific circumstances, trying Table 4 Code fragment to register the query “select * from StockTick where symbolName='GOOG’” to Esper reactive programs, where debugging, testing, and performance tuning are not straightforward. public static void TestEPL() { 1 String query = "select * from StockTick where symbolName = 'GOOG'"; We believe our work is a good starting point to work on effective automatic query rewriting systems for EPLs. We also plan to study additional performance tips uncovered in this paper, and to develop corresponding methods by extending our tool. 2 3 Configuration defaultConfiguration = new Configuration(); defaultConfiguration.addEventType("StockTick", StockTick.class.getName()); EPServiceProvider epService = EPServiceProviderManager .getDefaultProvider(defaultConfiguration); 7. ACKNOWLEDGMENTS Configuration wrapConfiguration = new Configuration(); wrapConfiguration.addEventType("StockTick", StockTick.class.getName()); wrapConfiguration.getEngineDefaults() .getAlternativeContext() .setAdmin("com.plas.esper.epl.tune.WrapEPAdministratorImpl"); 8. REFERENCES 4 5 6 7 8 EPServiceProvider epServiceWithWarn = EPServiceProviderManager .getProvider("WrapEPAdmin", wrapConfiguration); 9 10 System.out.println("----default EPStatement construction----"); EPStatement statement1 = epService.getEPAdministrator().createEPL(query); System.out.println("----complete\n\n"); 11 12 13 System.out.println("----Wrap EPStatement construction----"); EPStatement statement2 = epServiceWithWarn.getEPAdministrator().createEPL(query); System.out.println("----complete----"); 14 } more aggressive optimizations dedicated to characteristics of events and applications [17][18]. These works are also orthogonal to our approach. In relational DBMSs, automatic query rewriting is one of the promising performance optimization mechanisms. Since Esper EPL has similar grammatical structure to SQL in relational DBMs, we envision that some extension of SQL rewriting would also accelerate performance of EPL query processing. Currently, we are elaborating on this direction, by rewriting queries based on the tips instead of giving advice to programmers. Although far more experiments with various queries and more data with various attributes still remain, we earned some insights from the partial result so far; most of all, among the five performance tips that this paper concerns, “unnecessary arithmetic expressions (in Section 3.3)” and “unnecessary group-by expressions (in Section 3.4)” are programmers’ faults in all cases, which means that query rewriting can be safely applied to all the queries in this pattern. However, for the rest three tips, simple blind application of query rewriting is not possible without more sophisticated analyses, since in some cases a programmer might intentionally create an inefficientlooking query in a reactive program. 6. CONCLUSION In this paper, we propose an idea of improvement for event stream processing systems, which analyzes EPL queries and advises programmers to rewrite queries in case a query is in possible bad patterns in the performance sense. We expect that our proposed approach would be a big help to increases productivity of writing This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2010-0013386). . [1] Cugola, G., & Margara, A. 2012. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44, 3, Article 15 (June 2012), 62 pages. [2] Stonebraker, M., Ç etintemel, U., & Zdonik, S. The 8 requirements of real-time stream processing. SIGMOD Rec. 34, 4 (December 2005), 42-47. [3] Wu, E., Diao, Y., Rizvi, S. 2006. High-performance complex event processing over streams. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD '06). ACM, New York, NY, USA, 407-418. [4] Mei, Y., Madden, S, 2009. ZStream: a cost-based query processor for adaptively detecting composite events. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 193-206. [5] Esper Reference (Version 5.0.0). Retrieved Apr 28, 2014 from EsperTech Inc.: http://esper.codehaus.org/esper5.0.0/doc/reference/en-US/html/index.html, by Esper Team and EsperTech Inc., 2014. [6] Performance-Related Information, http://esper.codehaus.org/esper/performance/performance.ht ml, by Esper Team and EsperTech Inc., 2014. [7] Terence Parr. 2013. The Definitive ANTLR 4 Reference (2nd ed.). Pragmatic Bookshelf. [8] Esper 5.0.0 API Documentation (Esper Team and EsperTech Inc.) 2014., http://esper.codehaus.org/esper5.0.0/doc/api/index.html, by Esper Team and EsperTech Inc., 2014. [9] Purich, P. 2011. Oracle Complex Event Processing CQL Language Reference, 11g Release 1 (11.1. 1.4. 0). Retrieved June 15, 2014 from http://docs.oracle.com/cd/E23943_01 /apirefs.1111/e12048/toc.htm [10] Purich, P. 2011. Oracle Complex Event Processing Visualizer User's Guide, 11g Release 1 (11.1. 1.4. 0). Retrieved June 15, 2014 from http://docs.oracle.com /cd/E15523_01/doc.1111/e14302/toc.htm [11] Grabs, T., Schindlauer, R., Krishnan, R., Goldstein, J., & Fernández, R. 2009. Introducing Microsoft StreamInsight. Technical report. Microsoft. [12] A. Demers, J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. White. 2007. Cayuga: a high-performance event processing engine. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD '07). ACM, New York, NY, USA, 1100-1102. [13] E. Wu, Y. Diao, and S. Rizvi. 2006. High-performance complex event processing over streams. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD '06). ACM, New York, NY, USA, 407-418. [14] J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman, Efficient Pattern Matching over Event Streams, In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD '08). ACM, New York, NY, USA, 147-160. [15] P. T. Eugster and K. Jayaram. 2009. EventJava: An Extension of Java for Event Correlation. In Proceedings of the 23rd European Conference on ECOOP 2009, SpringerVerlag, Berlin, Heidelberg, 570-594 [16] K. W. Lee, E. S. Cho and H. Kim, An ECA Rule-based Task Programming Language for Ubiquitous Environments, In Proceedings of the 6th International Conference On Advanced Communication Technology (ICACT’06), IEEE, USA [17] E. S. Cho, J. H. Choi, S. Helal, "Dynamic Parameter Filling for Semantic Exceptions in Context-Aware Systems,” , In Proceedings of the 10th IEEE International Conference on Ubiquitous Intelligence and Computing (UIC-2013), IEEE Computer Society Washington, DC, USA, 293-300 [18] Thanh T. L. Tran, Y. Diao, C. Sutton, A. Liu, Supporting user-defined functions on uncertain data, In Proceedings of VLDB Endowment , 6, 6 (April 2013), VLDB Endowment, 469-480
© Copyright 2025 Paperzz