Processing Event-Monitoring Queries in Sensor Networks Vassilis Stoumpos University of Athens Athens, Greece [email protected] Antonios Deligiannakis Yannis Kotidis University of Athens Athens University of Athens, Greece Economics and Business [email protected] [email protected] Abstract needs of multiple, concurrent data acquisition requests in an efficient manner. It is generally agreed that one cannot simply move the readings necessary for processing an application request out of the network and then perform the required processing in a designated node such as a base station. Wireless sensor nodes have limited energy capacity and such an approach will not only result in overburdening their radio links, but will also quickly drain their energy as radio transmission is by far the most important factor in energy consumption [14]. Thus, most recent proposals rely on building some type of ad-hoc interconnect for answering a query such as the aggregation tree [13, 24]. This is a paradigm of in-network processing that can be applied to non-aggregate queries as well [7]. In this paper we concentrate on building and maintaining efficient data collection trees that will provide the conduit to disseminate all data required for processing many concurrent queries in a sensor network, including long-term and ad-hoc type of queries, while minimizing important resources such as the number of messages exchanged among the nodes or the overall energy consumption. While prior work [4, 20, 22] has also tackled similar problems, previous techniques base their operation on the assumption that the sensor nodes that collect data relevant to the specified query need to include their measurements (and, thus, perform transmissions) in the query result at every query epoch. However, in many monitoring applications such an assumption is not valid. Monitoring nodes are often interested in obtaining either the actual readings, or their aggregate values, from sensor nodes that detect interesting events. The detection of such events can often be identified by the readings of each sensor node. For example, in vehicle tracking and monitoring applications high noise levels may indicate the proximity of a vehicle. In military applications, high levels of detected chemicals can be used to warn nearby troops. In other applications, as in the case of approximate evaluation of queries over the sensor data [6, 15, 18], an event is defined when the current sensor reading deviates by more than a given threshold from the last transmitted value. In all of these scenarios, each sensor node is not forced to include its measurements in the query output at each epoch, but rather such a query participation is evaluated on a per epoch basis, depending on its readings and the definition In this paper we present algorithms for building and maintaining efficient collection trees that provide the conduit to disseminate data required for processing monitoring queries in a wireless sensor network. While prior techniques base their operation on the assumption that the sensor nodes that collect data relevant to a specified query need to include their measurements in the query result at every query epoch, in many event monitoring applications such an assumption is not valid. We introduce and formalize the notion of event monitoring queries and demonstrate that they can capture a large class of monitoring applications. We then show techniques which, using a small set of intuitive statistics, can compute collection trees that minimize important resources such as the number of messages exchanged among the nodes or the overall energy consumption. Our experiments demonstrate that our techniques can organize the data collection process while utilizing significantly lower resources than prior approaches. 1 Alex Delis University of Athens Athens, Greece [email protected] Introduction Many pervasive applications rely on sensory devices that are able to observe their environment and perform simple computational tasks. Driven by constant advances in microelectronics and the economy of scale it is becoming increasingly clear that our future will incorporate a plethora of such sensing devices that will participate and help us in our daily activities. Even though each sensor node will be rather limited in terms of storage, processing and communication capabilities, they will be able to accomplish complex tasks through intelligent collaboration. Nevertheless, building a viable sensory infrastructure cannot be achieved through mass production and deployment of such devices without addressing first the technical challenges of managing such networks. In this paper we focus in developing the necessary data collection infrastructure for supporting data-hungry applications that need to acquire and process readings from a large scale sensor network. While previous work has focused on optimizing specific types of queries such as aggregate [13], join [2], model-based [8, 12] and select-all [7, 19] queries, we propose a data dissemination framework that can address the 1 SELECT FROM WHERE SAMPLE Aggregate Query AggrFun(s.value) Sensors s inclusionConditions(s) = true PERIOD e FOR t Non-Aggregate Query SELECT s.id, s.value FROM Sensors s WHERE inclusionConditions(s) = true SAMPLE PERIOD e FOR t Table 1: An Aggregate and a non-Aggregate Query over the Values Collected by the Sensor Nodes. of interesting events. In this paper we term the monitoring queries where the participation of a node is based on the detection of an event of interest as event monitoring queries (EMQs). Our techniques base their operation on collecting simple statistics during the operation of the sensor nodes. The collected statistics involve the number of events (or, equivalently, their frequency) that each sensor detected in the recent past. Our algorithms utilize these statistics as hints for the behavior of each sensor in the near future and periodically reorganize the collection tree in order to minimize certain metrics of interest, such as the overall number of transmissions or the overall energy consumption in the network. Our contributions are summarized as follows: 1. We introduce the notion of EMQs in sensor networks. EMQs are a superset of existing monitoring queries, but are handled uniformly in our framework, irrespectively of the minimization metric of interest. 2. We present detailed algorithms for minimizing important network resources such as the number of messages exchanged or the energy consumption during the execution of an EMQ. The presented algorithms are based on the collection and transmission of a small, and of constant size, set of statistics. We introduce our algorithms along with a succinct mathematical justification. 3. We extend our framework for the case of multiple concurrent EMQs of different types. 4. We present a detailed experimental evaluation of our algorithms. Our results demonstrate that our techniques can achieve a significant reduction in the number of transmitted messages, or the overall energy consumption, compared to alternative algorithms. 2 Motivational Example In Table 1 we present examples of the two main classes of monitoring queries in sensor networks. We borrow the syntax of TinyOS [13] to denote the epoch duration (e) and the lifetime of the query (t). The predicate inclusionConditions has been added in order to specify which sensor nodes will participate in the query evaluation per epoch. At each query epoch, all the sensor nodes that include their collected data in the query result are termed in our framework as epoch participating nodes. For queries that wish to collect data from all the sensor nodes at each epoch, the above predicate always evaluates to true. When a monitoring query specifies inclusion predicates, these may contain either static or dynamic predicates (or both) regarding the sensor nodes. Examples of static predicates may involve, but are not limited to, the collection of measurements from: (i) Sensors with specific identifiers; (ii) Immobile sensors in a specific area; or (iii) Sensors monitoring a specific quantity, in cases of sensor networks with diverse types of sensor nodes that monitor different quantities. Static predicates are very useful in a variety of applications and have received the focus of the bulk of past research [13, 24]. However, there is a large class of monitoring queries that cannot be expressed using static inclusion conditions. Examples include vehicle tracking and equipment monitoring applications where inclusion predicates need to be conditioned on readings taken by the sensor nodes such as noise levels or temperature readings. In its most simple form a dynamic inclusion predicate may be a condition of the form “current reading > threshold”. More complex forms may require the evaluation of a user defined function over a history of accumulated readings. We call such predicates, whose evaluation depends also on the readings taken by the nodes, as dynamic predicates as they specify which nodes should include their response in the query evaluation at each epoch (i.e., nodes whose values exceed a given threshold, or deviate significantly from previous readings). We term those monitoring queries that contain dynamic predicates as event monitoring queries (EMQs). Given a monitoring query, existing techniques seek to develop collection trees that specify the way that the data is forwarded from the sensor nodes to the Root node. Periodically these collection trees may be reorganized in order to adapt to evolving data characteristics [18]. An important characteristic of EMQs, which is not taken into account by existing algorithms that design collection trees, is that each sensor node may participate in the query evaluation, by including its reading in the query result, only a limited number of times, based on how often the inclusion conditions are satisfied. We can thus associate an epoch participation frequency Pi with each sensor node Si , which specifies the fraction of epochs that this node participated in the query result in the recent past. Given estimates of the epoch participation frequencies, one can design significantly more efficient collection trees than prior approaches. Consider the sample scenario depicted in Figure 1(a). In this figure, 36 sensor nodes are placed in a grid. The sensor identifiers appear next to each sensor node. We also distinguish the Root node at the lower left corner, a monitoring node that performs queries over the data collected by the sensor nodes. In our sample network we assume that each sensor node can communicate with its immediate horizontal, vertical or diagonal neighbors, while only node S30 can communicate with the Root node. In Figure 1(b) we depict sample estimates for the number of times each sensor node will participate in the query result within the next 100 epochs. In the above scenario, given the presented epoch participation frequencies, two interior nodes along with all the boundary nodes on the upper and right- (a) (b) (c) (d) Figure 1: (a) Identifiers of sensors in grid arrangement; (b) Estimated number of participations in query result in 100 epochs; (c) Collection tree for MinHops algorithm. Cost = 3130 transmissions; (d) Collection tree for our algorithm. Cost = 1900 transmissions. most edges of the network always detect events, while the remaining interior nodes detect events with a lower probability, whose average value is about 5%. For the aforementioned sample scenario, in Figure 1(c) we depict a sample collection tree chosen by an algorithm, termed as MinHops that seeks to minimize the number of hops that each node’s data needs to traverse until it reaches the Root node. Next to each node we depict the actual number of transmissions that each node performed within these 100 epochs. Similarly, in Figure 1(d) we present the collection tree that our algorithms created for the evaluation of the SUM aggregate. A significant observation is that our algorithm seeks to forward the query results from nodes with high epoch participation frequencies through a limited number of interior nodes, compared to the MinHops algorithm. One can easily establish the significant reduction in the number of transmissions that our algorithm achieved (1900 vs 3130 or, equivalently, a 65% reduction). ing sensor nodes. A good classification of aggregate functions is presented in [13], depending on the amount and type of state required in non-leaf nodes in order for them to calculate the aggregate result for the partition of descendant, in the collection tree, participating sensors. Table 2 summarizes this classification. A crucial part of the operation of our algorithms is the estimation of the amount of data that will be transmitted in a given (or candidate) collection tree. In order to accurately estimate this information, the aggregate function being used needs to be distributive, algebraic or holistic (see Table 2). Unique and content-sensitive aggregate functions can only be supported by using a worst case estimate for the amount of transmitted data. Please note that holistic aggregate queries share similar characteristics with non-aggregate queries and, thus, are treated in a similar way in our framework. 3 In this paper we seek to develop dissemination protocols for the classes of EMQs described in Section 3.1. The goal is, given the type of query at question, to design the collection tree so as to minimize either: (1) The number of transmitted messages in the network; or (2) The overall energy consumption in the network. The minimization of additional metrics of interest is discussed in Section 4.4. Our algorithms do not make any assumptions about the placement of the sensor nodes, their characteristics or their radio models. Problem Formulation We first introduce the types of EMQs that our framework supports and then present the optimization problems tackled in this paper. We then present the cost model used in our algorithms in order to estimate the energy consumption of a sensor node during the transmission process. 3.1 Supported Queries In Table 1 we presented the two main classes of SQL queries that our framework supports. It is important to emphasize at this point that even non-participating nodes may take part in the query evaluation process by forwarding messages towards the Root node. However, the collected values of non-participating nodes influence neither the reported query result nor its size. The first class of supported queries involve nonaggregate queries over the values of epoch participating sensor nodes. In this type of queries the amount of data transmitted by any node of the collection tree depends on the number of epoch-participating sensors that are descendants of that sensor node. The second class of supported queries involves aggregate functions over the measurements collected by the participat- 3.2 3.3 Problem Definition Energy Consumption Cost Model A sensor node consumes energy at all stages of its operation. However, because our algorithms do not require any significant computational effort by the sensor nodes, we ignore in the cost model used in this paper the power consumption when the sensor node is idle and the consumption due to computations. The notation that will be used in our discussion here, and later in the description of our algorithms, is presented in Table 4. Additional definitions and explanations are presented in appropriate areas of the text. We first describe the cost model used to estimate the energy consumption of a node Si during the data transmission Category Distributive Algebraic Holistic Type of Partial State Needed Aggregate values for descendants Aggregate values for descendants, but for different aggregate function Entire Data of descendants Unique Content-Sensitive Distinct Values of descendants Aggregate-Specific State Size Constant Constant Examples MAX, MIN, COUNT, SUM AVG Proportional to #epoch-participating descendants Data-Dependent Data-Dependent MEDIAN COUNT DISTINCT Histogram of Values Table 2. Characteristics and Examples of Aggregate Function Types. Symbol Root Si Pi Di |aggr| Etri, j DEtri, j ACi, j CFi , DCFi , HCFi Description The node that initiates a query and which collects the relevant data of the sensor nodes The i-th sensor node The epoch participation frequency of Si The minimum distance, in number of hops, of Si from the Root The size of the (non-)aggregate values transmitted by a node Energy spent by Si to transmit a new packet of |aggr| bits to S j Energy spent by Si to transmit additional |aggr| bits to S j (on an existing packet). Attachment cost of Si to a candidate parent S j Cost factors utilized by neighboring nodes of Si when estimating their attachment cost to Si Table 4. Symbols Used in our Algorithm of |aggr| > 0 bits of data to node S j , which lies in distance disti, j from Si . The energy cost can be estimated using a linear model [17] as: Etri, j = SCi + (H + |aggr|) × (ET Xi + ERFi × disti,2 j ), where: (i) SCi denotes the energy startup cost for the data transmission of Si . This cost depends on the radio used by the sensor node; (ii) H denotes the size of the packet’s header; (iii) ET Xi denotes the per bit power dissipation of the transmitter electronics; and (iv) ERFi denotes the per bit and squared distance power delivered by the power amplifier. This power depends on the maximum desired communication range and, thus, from the distance of the nodes with which Si desires to communicate. Thus, the additional energy consumption required to augment an existing packet from Si to S j with additional |aggr| bits can be calculated as: DEtri, j = |aggr| × (ET Xi + ERFi × disti,2 j ). For the case when each sensor node receives data, we need to keep in mind that each sensor must open its radio in order to receive data or queries transmitted by neighboring nodes. This startup cost is incurred when the node wakes up from its sleep mode and, in contrast to the data transmission case, is not directly related to the reception of data (since the sensor may receive no data). Thus, this mandatory cost is not taken into account in our model. When a sensor node Si receives H + b j bits from node S j , then the energy consumed by Si is given by: Ereci = ERXi × (H + b j ), where the value of ERXi depends on the radio model. Some typical values [17] of SC, ET X , ERX and ERF are presented in Table 3. 4 Algorithm Overview We now present our algorithms for creating and maintaining a collection tree that minimizes the desired metric Symbol SC ET X ERF ERX Typical Value 1µJ 50nJ/bit 100pJ/bit/m2 50nJ/bit Table 3: Typical Radio Parameters. (number of messages or energy consumption). We also provide detailed pseudocode in addition to a formal analysis. 4.1 Construction/Update of the Collection Tree The algorithm is initiated with the query propagation phase. The query is propagated from the base station through the network using a flooding algorithm. In densely populated sensor networks, a node Si may receive the announcement of the query from several of its neighbors. As in [13, 24] the node will select one of these nodes as its parent node. The chosen parent will be the one that exhibits the lowest attachment cost, meaning the lowest expected increase in the objective minimization function. For example, if our objective is to minimize the total number of transmitted messages, then the selection will be the node that is expected to result in the lowest increase in the number of transmitted messages in the entire path from that sensor until the Root node (and similarly for the rest of the minimization metrics). At this point we simply note that in order for other nodes to compute their attachment cost, node Si transmits a small set of statistics Statsi and defer their exact definition for Section 4.2. The result of this process is a collection tree towards the base station that initiated the flooding process. A key point in our framework is that the preliminary selection of a parent node may be revised in a second step where each node evaluates the cost of using one of its sibling nodes as an alternative parent. Due to the nature of the query propagation, and given simple synchronization protocols, such as those specified in [13], the nodes lying k hops from the Root node will receive the query announcement before the nodes that lie one hop further from the Root node. Let RecSk denote the set of nodes that receive the query announcement for the first time during the k-th step of the query propagation phase. At step k of the query propagation phase, after the preliminary parent selection has been performed, each node Si in set RecSk , needs to consider whether it is preferable to alter its current selection and choose as its parent a sibling node within set RecSk − Si . Each node calculates a new set of statistics Statsi , based on its preliminary parent selection, and transmits an invitation, which also includes the node’s newly calculated Statsi values, that other nodes in RecSk (and only these nodes) may accept. Of course, we need to be careful at this point and make sure that at least one node within RecSk will not accept any invitation, as this would create a disconnected network and prevent nodes from RecSk to forward their results to nodes belonging in RecSk−1 . We will achieve this by imposing a simple set of rules regarding when an invitation may be accepted by a sensor node. Let CandPari denote the set of nodes in RecSk that transmitted an invitation that Si received. Let Sm be the preliminary parent node of Si , as decided during query propagation. Amongst the nodes in CandPari , node Si considers the node S p such as the attachment cost ACi,p is minimized. If ties occur, then these are broken using the node identifiers (i.e., prefer the node with the highest id). Then S p is selected as the parent of Si instead of the preliminary choice Sm only if all of the following conditions apply: • ACi,p < ACi,m . This conditions ensures that S p seems as a better candidate parent than the current selection Sm . • ACi,p ≤ AC p,i . This conditions ensures that it is better to select S p as the parent of Si , than to select Si as the parent of S p . • If ACi,p = AC p,i , then the identifier of S p is also larger than the identifier of Si . This condition is useful in order to allow nodes to forward messages through neighbor nodes in RecSk and also helps break ties amongst nodes and to prevent the creation of loops. The collection tree may periodically get updated, either because of a significant change in data distribution or because of the addition/termination of queries in a multi-query setup discussed in section 5. Such updates are triggered by the base station using the same protocol used in the initial creation. In this case, the nodes compute and transmit their computed statistics in the same manner, but do not need to propagate the query itself. 4.2 Calculating the Attachment Cost Determining the candidate parent with the lowest attachment cost is not an easy decision, as it depends on several parameters. For example, it is hard to quantify the resulting transmission probability of S j , if a node Si decides to select S j as its parent node. In general, the transmission frequency of S j (please note that this is different than the epoch participation frequency of the node) may end up being as high as min{Pi + Pj , 1} (when nodes transmit on different epochs) and as low as Pj (when transmissions happen on the same epochs and Pi ≤ Pj ). A commonly used technique that we have adopted in our work is to consider that the epoch participation by each node is determined by independent events. Using this independence assumption, node S j will end up transmitting with a probability Pi + Pj − Pi Pj , an increase of Pi (1 − Pj ) over Pj . Similarly, if S j−1 is the parent of S j , this increase will also result in an increase in the transmission frequency of S j−1 by Pi (1 − Pj )(1 − Pj−1 ), etc. In our following discussion, for ease of presentation, when considering the attachment cost of Si to a node S j , we will assume that the nodes in the path from S j to the Root node are the nodes S j−1 , S j−2 , . . . , S1 . 4.2.1 Minimizing the Number of Transmissions The attachment cost of Si when selecting S j as its parent node can be calculated by the increase in the transmission frequency of each link from Si to the Root node as: ACi, j = Pi + Pi (1 − Pj ) + Pi (1 − Pj )(1 − Pj−1 ) + . . . A significant problem concerning the above estimation of ACi, j is that its value depends on the epoch participation frequencies of all the nodes in the path of S j to the Root node. Since the number of these values depends on the actual distance, in number of hops, of S j to the Root node, such a solution does not scale in large sensor networks. Fortunately, there exists an alternative formula to calculate the above attachment cost. Our technique is based on a recursive calculation based on a single cost factor CFi at each node Si . In our example discussed above, the values of CFi and ACi, j can be easily calculated as: CFi ACi, j = (1 − Pi ) × (1 +CFj ) = Pi × (1 +CFj ) One can verify that expanding the above recursive formula and setting as the boundary condition that the CF value of the Root node is zero gives the desired result. Thus, only the cost factor, which is a single statistic, is needed at each node S j in order for all the other nodes to be able to estimate their attachment cost to S j . We also need to note that the formulas presented above also address the case of non-aggregate or holistic aggregate queries. In these cases the size of the transmitted data increases proportionally to the number of each node’s epochparticipating descendants in the collection tree, as we approach the Root node. Thus, sometimes the transmitted data by a node may be split into multiple messages due to the maximum packet size. However, we first note that such cases typically occur in higher levels of the collection tree (and, thus, by a potentially small subset of the sensor nodes) and that, more importantly, our techniques seek to compute and utilize simple statistics. Our study of alternative cost models that incorporated this factor yielded only minor improvements while significantly increasing the communication cost during the collection tree formation. We thus omit such extensions from our presentation. 4.2.2 Minimizing Total Energy Consumption, Distributive and Algebraic Aggregates This case is very similar to the case described above. When considering the attachment cost of Si to a candidate parent S j , we note that additional energy is consumed by nodes in the path of S j to the Root node only if a new transmission takes place. This is because each node aggregates the partial results transmitted by its children nodes and transmits a new single partial aggregate for its sub-tree [13]. Thus, the size of the transmitted data is independent of the number of nodes in the subtree, only the frequency of transmission may get affected. Let Etri, j denote the energy consumption when Si transmits a message to S j consisting of a header and the desired aggregate value(s) - based on whether this is a distributive or an algebraic aggregate function. The energy consumption follows the cost model presented in Section 3.3, where the PRFi value may depend on the distance between Si and S j (thus, the two indices used above). Using the above notation, and similarly to the previous discussion, the attachment cost ACi, j is calculated as: ACi, j = Pi × Etri, j + Pi × (1 − Pj ) × Etr j, j−1 + Pi × (1 − Pj ) × (1 − Pj−1 ) × Etr j−1, j−2 + . . . = Pi × (Etri, j +CFj ), CFi where = (1 − Pi ) × (Etri, j +CFj ) If one wishes to take the receiving cost of messages into account, all that is required is to replace in the above formulas the symbols of the form Etrk,p with (Etrk,p + Erec p ), since each message transmitted by Sk to S p will consume energy during its reception by S p . 4.2.3 Minimizing Total Energy Consumption, Holistic Aggregate and Non-Aggregate Queries When considering the attachment cost of Si to a candidate parent S j , we need not only consider the new messages generated in the path from S j to the Root node, but also the energy consumption due to the increase in the length of messages that would have been transmitted anyway. Please recall that the energy consumption for each transmission of |aggr| bits by Si to S j is given by: DEtri, j = |aggr| × (PT Xi + PRFi × disti,2 j ). Calculating the aforementioned number of messages is simple, as we have already discovered a similar recursive formula that estimates the attachment cost when only considering the transmission of new messages. So, we will utilize two new recursively computed statistics. The DCF value of a node will be similar to the CF value, but will use the DEtr∗,∗ transmission costs, instead of the Etr∗,∗ transmission costs used in the CF formula. The HCF value of a node will be equal to the sum of the DEtr∗,∗ values in the nodes path to the Root node. One can verify that the energy consumption due to the enlargement of messages, because of the attachment of Si to S j , that would have been transmitted anyway is: Pi × (HCFj − DCFj ). The required formulas are presented below: CFi HCFi = (1 − Pi ) × (Etri, j +CFj ) = DEtri, j + HCFj DCFi = (1 − Pi ) × (DEtri, j + DCFj ) ACi, j = Pi × (Etri, j +CFj ) + Pi × (HCFj − DCFj ) 4.2.4 Summary Table 5 summarizes the statistics required to be transmitted by each node during the query propagation. Please note that the invitation phase always requires one more transmitted statistic, as the nodes need to check whether it is more beneficial to be attached to another node or the reverse (see Minimization Metric Transmissions Energy Consumption Energy Consumption Type of Aggregate Aggregate Non-Aggregate Distributive Algebraic Holistic Non-Aggregate Decision Invitation CFi Pi ,CFi CFi Pi ,CFi CFi , HCFi , DCFi Pi ,CFi , HCFi , DCFi Table 5. Statistics Attached to Messages the last two rules in Section 4.1) As it can be clearly seen from this table, our algorithms utilize only a limited number of statistics, which are computed using only information transmitted by neighboring sensor nodes. Due to space constraints, the proof of the following Theorem can be found in the full paper [21]. Theorem 1 For sensor networks that satisfy the connectivity requirements of Section 3 our algorithm always creates a connected routing path that avoids loops. 4.3 Algorithm Implementation In Algorithm 1 we present the complete algorithm for the decisions of a sensor node. This algorithm is invoked both at the query propagation phase and when updating the collection tree. Each node first waits to receive the decisions by nodes that lie one hop closer to the Root node (Line 2). Based on the received decisions it performs an initial parent selection using the ProcessDecisions subroutine described in Algorithm 2 (Lines 3-4). It then calculates some necessary statistics and transmits an invitation to neighboring nodes (Lines 5-6). The node then waits (Line 7) to receive invitations from neighboring nodes and makes a final decision on its parent selection using the ProcessInvitations subroutine presented in Algorithm 3 (Lines 8-9). The node then transmits its final decision (Line 10) to neighboring nodes and ignores any received decisions or invitations until the next update period when the collection tree will be reorganized (a counter denoting the reorganization period can be attached to the queries transmitted by the Root node in order to help the nodes understand the transition to a new update period). An interesting observation that we have not mentioned so far involves the nodes with zero epoch participation frequencies. For these nodes, the computed attachment costs to any neighboring node will also be zero. In such cases we select the candidate parent which produces the lowest Etri, j +CFj + HCFj − DCFj value. This decision is expected to minimize the attachment cost, if the sensor at some point starts observing events. 4.4 Extensions In the full paper [21] we describe extensions on refining the statistics utilized by the sensor nodes. Furthermore, we show that our techniques can be easily adapted to incorporate different minimization metrics, than the ones presented in Section 3.2. For example, the formulas for minimizing the number of transmitted bits can be derived using the formulas for the energy minimization for the corresponding Algorithm 1 BuildCollectionTree() Subroutine 1: {Si is the node being examined} 2: Wait to receive decisions by neighboring nodes −→ 3: Set Dec as the received decisions by the nodes with minimum D values (ignore other decisions). 4: 5: 6: 7: 8: −→ k = ProcessDecisions(Dec) {Returns index of selected parent} Di = 1 + Dk Transmit invitation to neighboring nodes Wait to receive invitations by neighboring nodes −→ Set Inv as the received invitations by the nodes with D values equal to Di (ignore other invitations). −→ 9: m = ProcessInvitations(Inv) {Returns index of selected parent} 10: Transmit decision 11: Ignore received decisions and invitations until next reorganization. −→ Algorithm 2 ProcessDecisions(Dec) Subroutine 1: {Si is the node being examined} 2: Select Deck as the decision with the minimum attachment cost. If Pi = 0 utilize in the calculations a non-zero value at this step to prevent all nodes from having the same (zero) attachment cost. 3: Let Sk be the sender of Deck 4: Set parent(Si ) = Sk 5: Calculate statistics (cost factors) for current node based on current parent selection 6: Return k {Index of selected parent node} type of query (i.e., distributive, non-aggregate). In these formulas one simply has to substitute the term Etri, j with the size of a packet (including the packet’s header) and to substitute the term DEtri, j with the size of each transmitted aggregate value (thus, ignoring the header size). In the case where the goal is to maximize the minimum energy amongst the sensor nodes, the attachment cost can be derived from the minimum energy, amongst the nodes in a sensor’s path to the Root node, raised to −1 (since our algorithms select the candidate parent with the minimum attachment cost). 5 −→ Algorithm 3 ProcessInvitations(Inv, k) Subroutine 1: {Si is the node being examined} 2: {Sk is the current parent node} 3: In the following discussion, all estimations of the attachment cost utilize the same ERFi value, as discussed at the end of Section 4.2.3. 4: Select Invm as the invitation with the minimum attachment cost. If Pi = 0 utilize in the calculations a non-zero value at this step to prevent all nodes from having the same (zero) attachment cost. 5: Let Sm be the sender of Invm 6: if ACi,m ≤ ACi,k then 7: Return k {No benefit in changing parent node} 8: end if 9: Calculate ACm,i using information from Invm 10: if ACi,m > ACm,i then 11: Return k {Reverse decision is better} 12: else if ACi,m == ACm,i AND i > m then 13: Return k {Base decision on identifier} 14: end if 15: Set parent(Si ) = Sm 16: Calculate statistics (cost factors) for current node based on current parent selection 17: Return m {Index of selected parent node} queries in order based on their identifier (i.e., from 1 to M), we demonstrate in the full paper [21] that the attachment cost ACi,k j of Si to S j regarding the k-th query is calculated as follows: • Minimizing Total Number of Transmissions. partialPRODki, j M and ∏ x=1 f (x) = f (k) x (1 − Pi ), and by processing the Using the notation =∏ x<k f (x) = S j PRODki = = Pik × (JCFjk + partialProdi,k j ) JCFik = PRODki + (1 − Pik ) × JCFjk • Minimizing Total Energy Consumption: Distributive and Algebraic Aggregates. ACi,k j =Pik × partialProdi,k j × Etri, j + Multi-Query Optimization In the multi-query scenario, each node Si may choose different parent nodes for each posed query. Thus, the resulting network topology may not be a tree after-all but a directed acyclic graph. In the case of multiple concurrent queries we need to introduce some additional (or augmented) notation for the presentation of our algorithms. Let Pik denote the epoch participation frequency of Si regarding the k-th query. Let fi (k) denote the index of the selected parent node of Si for the k-th posed query. In order to be able to derive recursive formulas for the estimation of the attachment cost, in our approach we break the posed queries into two groups. The first group of queries contains the distributive and algebraic aggregate queries, while the second group contains the holistic and non-aggregate queries. In our discussion below we assume that the group of queries handled in each case contains a total of M queries of similar type. ACi,k j Pik × (1 − partialProdi,k j ) × DEtri, j Pik × JCFjk + Pik × (JECFjk − JDCFjk ) JCFik = PRODki × Etri, f (k) + (1 − Pik ) × JCFfk(k) JECFik = (1 − Pik ) × (DEtri, f (k) + JECFfk(k) ) JDCFik = PRODki × DEtri, f (k) + (1 − Pik ) × JDCFfk(k) • Minimizing Total Energy Consumption: Holistic Aggregate and Non-Aggregate Queries. ACi,k j =Pik × partialProdi,k j × Etri, j + Pik × (1 − partialProdi,k j ) × DEtri, j + Pik × JCFjk + Pik × (JHCFjk − JDCFjk ) JCFik =PRODki × Etri, f (k) + (1 − Pik ) × JCFfk(k) JHCFik =DEtri, f (k) + JHCFfk(k) (1 − Pix ) JDCFik =PRODki × DEtri, f (k) + (1 − Pik ) × JDCFfk(k) 6 Experiments We developed a simulator for testing the algorithms proposed in this paper under various conditions. In our discussion we term our algorithm for minimizing the number Figure 2: Messages and Message Overhead for Synthetic Data Set. Results for MinEnergy Presented for both Aggregate SUM and NonAggregate Query. of transmissions as MinMesg, and our algorithm for minimizing the overall energy consumption as MinEnergy. Our techniques are compared against two intuitive algorithms. In the MinHops algorithm, each sensor node that receives the query announcement randomly selects as its parent node a sensor amongst those with the minimum distance, in number of hops, from the Root node [13]. In the MinCost algorithm, each sensor seeks to minimize the sum of the squared distances amongst the sensors in its path to the Root node, when selecting its parent node. Since the energy consumed by the power amplifier in many radio models depends on the square of the communication range, the MinCost algorithm aims at selecting paths with low communication cost. In all sets of experiments we place the sensor nodes on random locations over a rectangular area. The radio parameters were set accordingly to the values in Table 3. The message header was set to 32 bits, similarly to the size of each statistic and aggregate value. In all figures we account for the overhead of transmitting statistics and invitation messages during the creation of the collection tree in our algorithms. 6.1 Experiments with Synthetic Data Sets We initially placed 36 sensor nodes in a 300x300 area, and then scaled up to the point of having 900 sensors. We set the maximum broadcast range of each sensor to 90m. In all cases the Root node was placed on the lower left part of the sensor field. In each case we set the epoch participation frequency of the sensor nodes with the maximum distance, in hop count, from the Root to 1. Unless specified otherwise, with probability 8% some interior node assumed an epoch participation frequency of 1, while the epoch participation frequency of the remaining interior nodes was set to 5%. We first evaluated a non-aggregate “SELECT *” query over the measurements obtained by the epoch participating sensor nodes. Since in this query the measurements of each epoch participating node need to be propagated all the way to the Root node, and the only sharing that can be achieved in combined messages involves the message’s header, we expect little energy savings in this case. We also evaluated a SUM aggregate query over the values of epoch participating sensor nodes using all algorithms. We present the total number of transmissions for each type of query, algorithm and number of sensors in Figure 2. Please note that the MinEnergy algorithm built very different collection trees for the two types of queries. For the remaining algorithms, the number of transmitted messages was the same for both types of queries. The corresponding average energy consumption by the sensor nodes for each case is presented in Table 6. As we can see, our MinMesg algorithm achieves a significant reduction in the number of transmitted messages compared to the MinHops and MinCost algorithms. The reduction in messages is up to 64% and 105%, respectively, with an average gain of 48% and 93.7%, respectively, compared to the the MinHops and MinCost algorithms. However, since these gains depend on the number of transmissions that epoch-participating nodes perform, it is perhaps more interesting to measure the routing overhead of each technique. We define the routing overhead of each algorithm as the relative increase in the number of transmissions when compared to the number of epoch participations by the sensor nodes. Note that the latter number is a mandatory cost that represents the transmissions in the network if each sensor could communicate directly with the Root node. For example, if the total number of epoch participations by the sensor nodes was 1000, but the overall number of transmissions was 1700, then the routing overhead would have been equal to (1700 − 1000)/1000 = 70%. As we observe from Figure 2, our MinMesg algorithm often results in 3 times smaller routing overhead compared to the alternative algorithms considered. We also observe that the MinEnergy algorithm in the aggregate case produced results very close to the ones of MinMesg. A main difference between these two algorithms is that amongst candidate parents with similar cost factors, the MinEnergy algorithm is less likely to select a distant neighbor than the MinMesg algorithm, which only considers epoch participation frequencies. This is a trend that we observed in all our experiments. However, in the case of non-aggregate queries, the MinEnergy algorithm formed very different collection trees, as it avoided routing measurements through very long paths. The MinEnergy algorithm performs very well in both types of queries. Compared to the MinHops algorithm, it achieves up to a 2-fold reduction in the power drain for aggregate queries and up to 19% for non-aggregate queries. Compared to the MinCost algorithm the energy savings are smaller but still significant (i.e., up to 79% in the aggregate query). The MinMesg algorithm is obviously a very poor choice, with respect to the energy consumption, for nonaggregate queries. We expect that the more the epoch participation frequencies of sensor nodes increase, the less likely that out tech- Sensors MinMesg 36 144 324 576 900 109.339 70.129 71.662 65.921 67.107 Aggregate SUM Query MinEnergy MinHops 109.341 68.971 68.703 64.717 64.077 161.278 139.821 146.425 127.315 128.299 MinCost MinMesg 136.354 121.640 106.416 104.156 102.708 381.483 515.215 687.083 624.817 756.902 Non-Aggregate “SELECT *” Query MinEnergy MinHops MinCost 292.231 344.489 444.157 457.788 549.262 335.920 390.806 523.670 547.147 640.830 303.270 344.213 448.816 471.845 559.183 Table 6: Average Power Consumption (in mJ) for Synthetic Dataset # Sensors MinMesg MinEnergy MinHops MinCost 150 600 1350 73.607 58.131 50.418 67.821 58.273 49.350 111.751 97.958 89.231 91.990 85.158 76.099 Table 7: Average Power Consumption (in mJ) for SchoolBuses Dataset 1300000 1200000 Messages 1100000 1000000 900000 MinMesg MinHops MinCost 800000 700000 600000 500000 400000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 Participation Frequency of Nodes with P <> 1 Figure 3: Transmissions Varying the Epoch Participation Frequency Figure 5. Transmissions - SchoolBuses data a node’s sibling as its parent node in the collection tree. As we can see, the benefits from utilizing the 2-step process are important in all aspects (transmitted messages and power consumption). 6.2 Figure 4. Transmissions - Trucks data # Sensors 36 144 324 576 900 Transmissions MinEnergy-1Step MinEnergy 106474 405279 749013 1258888 1812024 79887 232298 487889 827959 1279129 Average Energy Consumption MinEnergy-1Step MinEnergy 147.882 126.374 112.964 101.935 93.273 109.341 68.971 68.703 64.717 64.077 Table 8: Comparison of 1-Step and 2-Step Parent Selection for MinEnergy Algorithm. Number of Transmissions and Average Power Consumption (in mJ) for Synthetic Data Set niques will be able to provide substantial savings compared to the MinHops and MinCost algorithms. In Figure 3 we repeat the aggregate query of Figure 2 at the sensor network with 324 nodes, but vary the epoch participation frequency Pi of those nodes that do not make a transmission at each epoch (i.e., of those nodes with Pi < 1). While Figure 3 validates our intuition, it also demonstrates that significant savings can be achieved even when sensor nodes have large Pi values (i.e., Pi ≥ 0.5). A novel feature of our technique is the 2-step parent selection phase. In Table 8 we compare the performance of our MinEnergy algorithm in the aggregate SUM query described above versus a variant that was not allowed to select Experiments with Real Data Sets We also experimented with the following two real data sets. The Trucks data set contains trajectories of 276 moving trucks [1]. Similarly, the SchoolBuses data set contains trajectories of 145 moving schoolbuses [1]. For each data set we initially overlaid a sensor network of 150 nodes over the monitored area. We set the broadcast range such that interior sensor nodes could communicate with at least 5 more sensor nodes. Moreover, each sensor could detect objects within a circle centered at the node and with radius equal to 60% of the broadcast range. We then scaled the data set up to a network of 1350 sensors, while keeping the sensing range steady. In Figures 4 and 5 we depict the total number of transmissions by all algorithms for the Trucks and SchoolBuses data sets, correspondingly, when computing the SUM of the number of detected objects. In our scenario, nodes that do not observe an event make a transmission only if they need to propagate measurements/aggregates by descendant nodes. Due to space constraints we present the average energy consumption of the sensor nodes in the same experiment for only the SchoolBuses data set in Table 7. As it is evident, our algorithms achieve significant savings in both metrics. For example, the MinCost algorithm, which exhibits lower power consumption than the MinHops algorithm, still drains about 50% more energy than our MinEnergy algorithm. Moreover, both our MinMesg and MinEnergy algorithms significantly reduce the amount of transmitted messages by up to 42% and 73% when compared to the MinHops and MinCost algorithms, respectively. We then decided to mix the data sets. We separated # Sensors 150 600 1350 MinMesg 70,583 286,894 430,094 MinEnergy 125,686 401,431 774,530 MinHops 107,887 336,143 606,105 MinCost 122,532 479,544 1,038,093 Table 9. Messages for Multi-Query Scenario the SchoolBuses into two categories (by randomly coloring each schoolbus as either color A and B) and overlaid this data set with the trucks data set. We then performed three simultaneous queries requesting the total number of trucks, schoolbuses of color A and schoolbuses of color B observed in the network. We used the same topology, network scale and placement of the Root node as above and compared our MinMesg and MinEnergy algorithm with the MinHops and MinCost algorithms, which were modified to select a single parent node for all queries (this produced the best results for them). Due to space constraints, in Table 9 we depict only the total number of transmitted messages for all algorithms. 7 Related Work The database community has long been the advocate of using an embedded database management system for data acquisition in sensor networks [13, 24]. The use of a declarative SQL-like query interface allows rapid development of applications in such systems without the need to manage hand-coded programs at each sensor node [14]. In the database community different types of popular queries have been discussed, such as aggregate [5, 6, 13, 18, 16], join [2], model-based [8, 12] and select-all queries [7, 19]. Tracking queries that seek to determine the spatial extent of a particular phenomenon have also been considered [9, 23]. Many of the low-level networking details have already been discussed in the networking community and, thus, can be utilized in our framework. As an example, nodes in unattended wireless networks must be able to self-configure [3] and discover their surrounding nodes [10]. Prior work on computing energy-efficient data routing paths (such as the aggregation tree) [11, 20, 22] have tackled similar problems, but these techniques base their operation on the assumption that the sensor nodes that collect data relevant to the specified query need to include their measurements in the query result at every query epoch. However, this assumption does not hold in event monitoring queries that are the scope of our framework. Due to space constraints, a more elaborate discussion of the related work can be found in [21]. 8 Conclusions In this paper we presented algorithms for building and maintaining efficient collection trees in support of event monitoring queries in wireless sensor networks. We demonstrated that is it possible to create efficient collection trees that minimize important network resources using a small set of statistics that are communicated in a localized manner during the construction of the tree topology. Furthermore, our techniques utilize a novel 2-step refinement process that significantly increases the quality of the created trees. We have also demonstrated that our algorithms can handle a mix of event monitoring queries (EMQs) including aggregate and non-aggregate queries. References [1] Rtree Pportal. http://www.rtreeportal.org. [2] D.J. Abadi, S. Madden, and W. Lindenr. REED: Robust, Efficient Filtering and Event Detection in Sensor Networks. In VLDB, 2005. [3] A. Cerpa and D. Estrin. ASCENT: Adaptive Self-Configuring sEnsor Network Topologies. In INFOCOM, 2002. [4] Jae-Hwan Chang and Leandros Tassiulas. Energy Conserving Routing in Wireless Ad-hoc Networks. In INFOCOM, 2000. [5] J. Considine, F. Li, G. Kollios, and J. Byers. Approximate Aggregation Techniques for Sensor Databases. In ICDE, 2004. [6] A. Deligiannakis, Y. Kotidis, and N. Roussopoulos. Hierarchical InNetwork Data Aggregation with Quality Guarantees. In EDBT, 2004. [7] A. Deligiannakis, Y. Kotidis, and N. Roussopoulos. Dissemination of Compressed Historical Information in Sensor Networks. VLDB Journal, 2007. [8] A. Deshpande, C. Guestrin, S. Madden, J.M. Hellerstein, and W. Hong. Model-Driven Data Acquisition in Sensor Networks. In VLDB, 2004. [9] M. Duckham, S. Nittel, and M. Worboys. Monitoring Dynamic Spatial Fields Using Responsive Geosensor Networks. In GIS, 2005. [10] D. Estrin, R. Govindan, J. Heidermann, and S. Kumar. Next Century Challenges: Scalable Coordination in Sensor Networks. In MobiCOM, 1999. [11] C. Intanagonwiwat, D. Estrin, R. Govindan, and J. Heidermann. Impact of Network Density on Data Aggregation in Wireless Sensor Networks. In ICDCS, 2002. [12] Y. Kotidis. Snapshot Queries: Towards Data-Centric Sensor Networks. In ICDE, 2005. [13] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. Tag: A Tiny Aggregation Service for ad hoc Sensor Networks. In OSDI Conf., 2002. [14] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. The Design of an Acquisitional Query processor for Sensor Networks. In ACM SIGMOD, 2003. [15] C. Olston and J. Widom. Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data. In VLDB, 2000. [16] S. Pattem, B. Krishnamachari, and R. Govindan. The Impact of Spatial Correlation on Routing with Compression in Wireless Sensor Networks. In IPSN, 2004. [17] V. Raghunathan, C. Schurgers, S. Park, and M. Srivastava. Energy aware wireless microsensor networks. IEEE Signal Processing Magazine, 19(2), 2002. [18] A. Sharaf, J. Beaver, A. Labrinidis, and P. Chrysanthis. Balancing Energy Efficiency and Quality of Aggregate Data in Sensor Networks. VLDB Journal, 2004. [19] A. Silberstein, R. Braynard, and J. Yang. Constraint Chaining: On EnergyEfficient Continuous Monitoring in Sensor Networks. In SIGMOD, 2006. [20] S. Singh, M. Woo, and C. S. Raghavendra. Power-aware routing in mobile ad hoc networks. In ACM/IEEE International Conference on Mobile Computing and Networking, 1998. [21] V. Stoumpos, A. Deligiannakis, Y. Kotidis, and A. Delis. Processing Event-Monitoring Queries in Sensor Netwrks. Technical Report, University of Athens, June 2007. Available at http://www.cs.umd.edu/ adeli/TR07.pdf. [22] N. Trigoni, Y. Yao, A.J. Demers, J. Gehrke, and R. Rajaraman. Multiquery Optimization for Sensor Networks. In DCOSS, 2005. [23] W. Xue, Q. Luo, L. Chen, and Y. Liu. Contour Map Matching for Event Detection in Sensor Networks. In SIGMOD, 2006. [24] Y. Yao and J. Gehrke. The Cougar Approach to In-Network Query Processing in Sensor Networks. SIGMOD Record, 31(3):9–18, 2002.
© Copyright 2026 Paperzz