Preserving Query Privacy in Urban Sensing Systems Emiliano De Cristofaro1 and Roberto Di Pietro2 1 Palo Alto Research Center [email protected] 2 Università di Roma Tre, Italy [email protected], [email protected] Abstract. Urban Sensing is an emerging paradigm that combines the ubiquity of smartphones with measurement capabilities of sensor networks. While this concept is still in development, related security and privacy concerns become increasingly more relevant. In this paper, we focus on a number of scenarios where nodes of an Urban Sensing system are subject to individual queries. We address the problem of protecting query privacy (i.e., hiding which node matches the query) and data privacy (i.e., hiding sensed data). We introduce a realistic network model and two novel adversarial models: resident and non-resident adversaries. For each of them, we propose a distributed privacy-preserving technique and evaluate its effectiveness via analysis and simulation. To the best of our knowledge, this is the first attempt to define and address both query and data privacy in the context of Urban Sensing. Our techniques are tunable, trading off the level of privacy assurance with a small overhead increase. We additionally provide a relevant improvement of data reliability and availability, while only relying on standard symmetric cryptography. The practicality of our proposals is demonstrated both analytically and experimentally. 1 Introduction and Motivation Urban Sensing enables the seamless collection of data from a large number of usercarried devices. By embedding a sensor into a wireless-enabled device (e.g., a mobile phone), Urban Sensing targets dynamic information about environmental trends, e.g., ambient air quality [20], urban traffic patterns [17], parking availabilities [15], sound events [13], etc. As it often happens, however, the increasing amount of entailed information prompts a number of security and privacy concerns. In the attempt of reaching the full potential of Urban Sensing, researchers are proposing platforms for application developers [7] and devising business models based on incentive mechanisms for the capitalization on sensed data [12,22]. As information collected by “sensors” is made accessible to third-party entities, not only anonymity of smartphone users, but also privacy of queriers must be protected. Observe that, while traditional Wireless Sensor Networks (WSNs) assume that network operator and sensor owners are the same entity, in Urban Sensing this might not be the case. Multiple users and organizations often collaborate, but they may not trust each other. As individual sensors can be subject to queries, we face two privacy issues: (1) queriers might not be willing to disclose their interests, and (2) sensed data should be protected against unauthorized access. Urban Sensing systems present several other L. Bononi et al. (Eds.): ICDCN 2012, LNCS 7129, pp. 218–233, 2012. c Springer-Verlag Berlin Heidelberg 2012 Preserving Query Privacy in Urban Sensing Systems 219 differences compared to WSNs. In the former, sensing devices are mobile and equipped with more powerful resources, their batteries can be charged frequently, and data transmission is not as expensive. Nonetheless, we borrow some WSN terminology, e.g., we use the words sensors and nodes, interchangeably, to denote sensing devices. We are motivated by a number of privacy-sensitive applications. Consider, for instance, smartphone users measuring environmental data, e.g., pushed by some incentives, such as discounts on their phone bills [12]. Members of the program and external entities may access sensed data and query specific sensors, on demand. However, queriers do not want to reveal which sensor is being interrogated, as this reveals their interests and/or exposes sensitive information to other users, network operator, or eavesdroppers. Even though the identity of queried sensors is hidden, queriers should be prevented from unconditionally querying all sensors, whereas, they should only interrogate those sensors they are entitled to ( e.g., based on program’s policies). In order to guarantee query privacy, one could collect all readings at some (external) server and privately query the database server using, for instance, Private Information Retrieval (PIR) techniques [4]. However, it is well-known that the state-of-the-art in PIR has not reached the point of being practical (see [25]). Also, it is not clear how to adapt PIR to our scenario: in PIR, server’s database is public and all information can be retrieved, whereas, in Urban Sensing the querier should only get data from the sensors she interrogates and she is entitled to. Intended Contributions. This paper introduces and analyzes different adversarial strategies to violate privacy in Urban Sensing systems. We distinguish between resident and non-resident adversaries: the former controls a fraction of the sensors at all times, while the latter corrupts sensors after they have sensed data. For each considered models, we propose a corresponding distributed privacy-preserving technique. Our techniques employ replication of sensed data combined with probabilistic algorithms. As a positive effect, replication enhances data reliability and tackles possible disconnections and sensor failures, which are likely to happen in such a mobile environment. Also, our techniques rely on very efficient cryptographic tools and minimize the query preparation overhead. We do not require any specific assumption on the network topology and we minimize bandwidth requirements on the link between network gateway and querier. Finally, a thorough analysis supports our claims. Paper Organization. Next section reviews related work. After preliminaries in Section 3, we present building blocks in Section 4. In Section 5, we describe the non-resident adversary, while the resident adversary – in Section 6. Section 7 discusses proposed techniques. The paper concludes in Section 8. 2 Related Work In the last few years, research interest in Urban Sensing ramped-up, however, little work has focused on privacy and security aspects. Recent proposals in [6] and [11] are, to the best of our knowledge, the most closely related work to address privacy-related problems. They aim at protecting anonymity of users, using Mix Network techniques [3], and provide either k-anonymity [26] or 220 E. De Cristofaro and R. Di Pietro l-diversity [14]. However, they only guarantee limited confidentiality, as reports are encrypted under the public key of a Report Service (RS), a trusted party collecting reports and distributing them to queriers, which learns both sensors’ reports and queriers’ interests. Other work focuses on other (somewhat) related problems. In [24], authors study privacy-preserving data aggregation, e.g., to compute sum, average, or variance. Similarly, [10] presents a mechanism to compute community statistics of time-series data, while protecting anonymity (using data perturbation in a closed community with a known empirical data distribution). One common feature of these techniques is that the underlying infrastructure used to collect and deliver reports consists of almost ubiquitous 802.11x access points. Specifically, the work in [6] uses standard MAC-IP address recycling techniques to guarantee user unlinkability between reports (with respect to the access point). However, the assumption of WiFi infrastructures imposes severe limitations on applications’ scope in Urban Sensing. Indeed, an ubiquitous presence of open WiFi networks is neither currently available, nor anticipated in the next future. Another limitation of prior work, such as [6], concerns the use of Mix Networks [3]. Mix Networks serve as anonymizing channels between sensors and report servers: they de-link reports submitted by sensors before they reach the applications. In other words, Mix Networks act as proxies to forward user reports only when some system-defined criteria are met. However, it is not clear how to deploy these proxies in Urban Sensing. The recent PEPSI project [9] introduces a cryptographic model and a provablysecure infrastructure for privacy protection in participatory sensing [1]. Finally, related results are in the context of WSNs. In particular, [8] targets the problem of privately querying the sensor network in a sensing-as-service model, i.e., where network operators offer on-demand access to sensor readings: it combines Onion Routing [23] and k-anonymity techniques [26]. Although addressing a somewhat similar problem, this technique cannot be adapted to the Urban Sensing paradigm, since the network topology is assumed to be static and known in advance to the querier. Also, the querier can interrogate any sensor, at will, i.e., sensor data is publicly available. Finally, an overhead of O(k) messages is always generated between the sensor network gateway and the external querier. whereas, the latter requires the presence of trusted storage nodes. Also, recent work in [19,18] focuses on event privacy in WSN. 3 Preliminaries We now overview network model and assumptions, and define privacy properties. Network Model. While there is no universally accepted model for Urban Sensing systems, in this paper we use reasonable assumptions from the literature. Thus, we assume a large-scale system resembling a mesh network, similar to [24]. We consider an Urban Sensing system consisting of n sensors, S = {S1 , · · · , Sn }, and a gateway, GW. The network operator is denoted with OPR. Each sensor Si is owned by OWN i . Note that we consider settings where OWN i ’s and OPR are distinct entities. We assume that each sensor Si is securely initialized by OWN i . Each Si is preloaded with a pairwise key ki shared with OWN i only. Moreover, we optionally assume that sensors can securely establish pairwise key with other sensors in the network. We denote with K(Si , Sl ) a symmetric key shared between Si and Sl . Preserving Query Privacy in Urban Sensing Systems 221 Query Privacy. Let St be the target of a query to the Urban Sensing system. We say that query privacy is guaranteed if any malicious adversary has just a negligible advantage over a random guess of the identity of queried sensor St . In other words, only OWN t (or any entity authorized by OWN t ) can successfully identify St as the target of a query issued to the Urban Sensing system. We anticipate that our solution facing a non-resident adversary (see Sec. 4.3) provides a degree of query privacy very close to the optimal one (i.e., when an adversary has only negligible advantage over a random guess). We do not consider querier’s anonymity, i.e., hiding the fact that she is querying the network, as standard techniques, such as Tor, could be used to build an anonymous channel between GW and the querier. Data Privacy. Let St be the target of a query to the Urban Sensing system, and OWN t the owner of the corresponding sensor. We say that data privacy is guaranteed if any malicious adversary has a negligible advantage over a random guess of the reading reported by St . In other words, only OWN t can access St ’s reading. Therefore, data privacy ensures that: (1) any eavesdropper in the network has a negligible probability of learning any reading provided by any sensor, and (2) although the identity of queried sensor is hidden by query privacy guarantees, external parties cannot query arbitrary sensors, but only those they “own”, or they are entitled to. Adversarial Capabilities. In the rest of the paper, we denote the adversary with ADV. We introduce our adversarial models in Sec. 4.3; however, we assume the following features to be common to any adversary. ADV controls a coalition of up to z sensors, where 0 < z < n. Also, ADV is assumed to be honest-but-curious. Specifically, such an adversary can: (1) eavesdrop on packets in physical proximity of corrupted sensors, and (2) compromise all information stored on the sensor, i.e., readings and secret keys. ADV does not interfere with sensors’ behavior, but she learns their secrets and use them to compromise query and data privacy. Indeed, if ADV limits its activity to reading sensor memory contents, there might be no way to tell if a given sensor has ever been corrupted—should sensor code compromising occur, this action could be detected, for instance, using code attestation techniques, e.g., in [2,21]. Observe that, in the context of Urban Sensing, sensors are essentially users’ smartphones, thus, users themselves can be trying to violate privacy properties discussed above and even collude with each other. The above adversarial model (controlling up to z sensors) characterizes the class of adversaries considered in our applications. Finally, note that we aim at hiding the query target even to the queried sensor, as it might be under the control of the adversary. We only focus on query and data privacy. Additional well-known issues, such as pairwise key establishment, key management, data integrity, and routing authentication can be addressed by standard solutions. Also note that Denial-of-Service (DoS) attacks are out of the scope of this paper. 4 Building Blocks 4.1 Components Sensing. At each round, any sensor Si produces a reading. The reading sensed at round j is denoted with dji . Subsequently, the sensor encrypts the data reading and 222 E. De Cristofaro and R. Di Pietro Algorithm 1. Sensing [Executed by Sensor Si at round j] 1. Senses dji 2. Eij = Encki (dji ) /* Data Encryption */ 3. Tij = Encki (i, j) /* Data Tag */ Algorithm 2. Query (Assume the query target is St at round j) 1. 2. 3. 4. 5. 6. 7. [OWN t ] Compute T ∗ = Enckt (t, j) [OWN t ] Send T ∗ to GW [GW, on receiving T ∗ ] Randomly select α sensors, s.t. Q = {q1 , · · · , qα } [GW] Forward T ∗ to sensors in Q [Each ql ∈ Q, on receiving T ∗ ] If ( Tij = T ∗ ) Then Send (Eij , Tij ) to GW [GW, on receiving (Eij , Tij )] Forward (Eij ) to OWN t [OWN t ] Obtain djt = Deckt (Eij ) produces a data tag, as presented in Algorithm 1. We assume a secure encryption scheme, (Enc(·) (·), Dec(·) (·)) such as AES. Dissemination. At each round j, each sensor Si disseminates encrypted reading and data tag produced in Algorithm 1 to β other sensors. We call them replica sensors and we denote them with the corresponding set Rji = {rij1 , · · · , rijβ }. The data dissemination strategy depends on the adversarial model and it is detailed in Sec. 5 and Sec. 6. At the end of dissemination, each sensor in Rji will store (Eij , Tij ). Optionally, each sensor after dissemination deletes both the reading and the data tag. We call such option the disseminate-and-delete mode. Query. We consider the case where OWN t queries the sensor St as for the reading at round j. Algorithm 2 presents the sequence of steps that are performed. Note that in text in square brackets we indicate the entity performing the corresponding operation. First, OWN t sends to GW an encrypted tag corresponding to the sensor (and round) of interest. Then, GW forwards it to a random set of α sensors in the network. Finally, each of the α sensors verifies if any of the tags that it stores matches the incoming encrypted tag. In this case, the corresponding encrypted reading is forwarded back to GW that further relays it to OWN t . We remark that, given the probabilistic query phase (where the α sensors to interrogate are randomly selected), the query could also fail, i.e., none of the α sensors queried by GW actually stores a tag matching T ∗ . The success probability of the query relies on the values α and β, that introduce a tension in the protocol. On the one hand, the values α and β should be set as high as possible, in order to decrease the probability for a query to fail. On the other hand, high values for α and√ β would increase √ the message overhead. We choose, as an acceptable trade-off, α = c1 n and β = c2 n, where c1 and c2 are small constants. Assuming c1 = c2 = c, the probability to have a successfully query is: P r(Query Success) ≥ 1 − 1 ec (1) Note that this analytical lower bound has also been validated experimentally through the methodology we introduce in Sec. 4.2. Preserving Query Privacy in Urban Sensing Systems 223 Corruption. As mentioned in Sec. 3, we assume ADV corrupts z sensors, i.e., eavesdrops all the traffic in their physical proximity, and compromises all their secrets. 4.2 Settings Throughout the paper, all analytical findings are confirmed by experimental simulations. To this end, we developed a network simulator that represents different combinations of network parameters and adversarial strategies. To validate each event, we performed 1,000 simulations and reported the average. Furthermore, note that in the sequel of the paper we selected c1 = c2 = 1. This choice of parameters leads to at least a 63% success probability for the query. Should the query fail, GW can issue another query, selecting another random set of sensors. Query failing could be checked associating a timeout to the issued query. In the following, however, we will assume that queries do not fail. This assumption does not affect, in any way, query or data privacy properties, while having just a small effect on performance (on the average, issuing the query two times, each time with a fresh randomly selected set of α sensors, it is enough for the query to succeed). Notation. Table 1 summarizes terminology and notation used in the paper. Note that when it is clear from the context, we will not use the index specifying the round (j). Table 1. Notation Symbol Meaning K(Si , Sl ) Pairwise symmetric key between Si and Sl OWN Owner of the network / Querier OPR Network operator S Set of sensors n # of sensors in S Si ∈ S Generic sensor St Target sensor Z Set of sensors controlled by ADV at round j z # of sensors in Z Symbol Meaning ki Pairwise symmetric key between OWN and Si Tij Data tag produced by sensor Si dji Data reading of sensor Si at round j Eij Encryption of di with key ki T ∗ Data tag produced by OWN to query Q Set of sensors receiving T ∗ ql Generic sensor in Q ril Generic replica sensor in Ri α # of sensors in Q β # of replica sensors 4.3 Adversarial Strategies With regard to the presence of ADV in the network, we distinguish the two following strategies. Non-Resident Adversary. In this adversarial model, we assume that ADV is not always present in the network but it corrupts sensors after both the sensing phase and the dissemination phase have been completed. Moreover, before the next round of sensing starts, it releases the corrupted sensors in order to maximize its chances to go undetected. Further, note that sensor release implies that ADV restores the original code run by the sensor. Indeed, if the code executed by a sensor is left altered by ADV this 224 E. De Cristofaro and R. Di Pietro Algorithm 3. Dissemination Non-Resident Adversary [Executed by Sensor Si ] 1. 2. 3. 4. Randomly select Ri = {ri1 , · · · , riβ } M STi = M inimumSpanningT ree(Ri ) Send (Ei , Ti ) to each sensor in Ri according to M STi (Optional) Delete (dji , Ei , Ti ) would be eventually detected [2]. To summarize, the operating cycle can be decomposed into the following sequence: (1) Sensing, (2) Dissemination, (3) Corruption, (4) Query, and (5) Sensor Release. We anticipate that the degree of privacy achieved against this type of ADV is higher than the one achieved against the resident one. Indeed, the fact that ADV is not present during the dissemination phase prevents ADV from acquiring valuable information. Resident Adversary. We consider the case where ADV is always on the network, i.e., it controls z sensors at all times. The operating cycle of the system is: (1) Sensing, (2) Dissemination, and (3) Query. Compared to the non-resident ADV, two steps are “missing”: corruption, since sensors are corrupted just once, and sensor release, as compromised sensors are not released. While privacy is harder to achieve, we show how to trade-off a reduction on the privacy level with a corresponding overhead decrease during dissemination. An efficient dissemination strategy is presented in Sec. 6. 5 Non-resident Adversary We start by considering the adversary to be non-resident. 5.1 Dissemination Strategy (Non-resident Adversary) When facing a non-resident adversary, the dissemination protocol can take advantage from the fact that ADV is not present in the network. If replica sensors are randomly selected and the adversary does not observe the sequence of dissemination messages, than the smartest choice for sensor Si is to build a Minimum Spanning Tree [5] to reach all the replica sensors in Ri . This minimizes the total number of messages needed for the dissemination. Assuming such a mechanism for the dissemination phase, Algorithm 3 shows the sequence of steps run by sensors: 5.2 Privacy Analysis (Non-resident Adversary) For each of the adversarial models presented in the paper, we analyze the degree of query privacy provided by our technique, i.e., the probability that ADV successfully guesses the sensor that originated the data (i.e., the target sensor). We do not analyze the confidentiality of the sensed data, since it is assured by the encryption method algorithm adopted, that we assume to be secure. In the non-resident model, after sensor compromise ADV can partition the whole set of sensors in two subsets. The first is composed of the sensors she controls and which Prob. ADV breaks query privacy Preserving Query Privacy in Urban Sensing Systems 0.0020 0.0018 0.0016 0.0014 0.0012 0.0010 0.0008 0.0006 0.0004 512 1024 n 1536 225 5 4.5 4 3.5 3 ) 2.5 2 (% 1.5 z/n 1 0.5 2048 Fig. 1. Probability non-resident ADV wins in disseminate-and-delete mode store T ∗ – we denote it with T . We remark that T = Q ∩ Z. The second set is the complement of T , i.e., S\T . Finally, we denote with τ the size of T (i.e., τ = |T |). Disseminate-and-Delete Mode. We consider the privacy degree assuming the tags and the encrypted reading are immediately deleted by the sensors right after the dissemination phase has taken place, but before compromising. In this model it is interesting to note that if a sensor controlled by ADV holds the value T ∗ , then this sensor cannot be the target sensor, due to the deletion strategy. However, apart from this information, ADV cannot do anything but randomly chose the target sensor among all sensors but those in T . Therefore: P r(ADV wins | |T | = τ ) = 1/(n − τ ) (2) Hence, leveraging Eq. 2 and recalling that |Rt | = β: P r(ADV wins) = β P r(ADV wins |τ = i) · P r(τ = i) = i=0 β β P r(τ = i) 1 = · = n − i n − i i=0 i=0 βn−β i (3) nz−i z We point out that the analysis of the above probability has been validated via simulation, according to the methodology described in Sec. 4.2. Simulation results are plotted in Fig. 1, with respect to different values for n (ranging from 512 to 2048) and z (ranging from 0.5% to 5% of the sensors). We remark that the probability in Eq. 3 only differs from our experimental evaluation by an error of at most 10−4 (we do not overlap the plot of Eq. 3 with the plotting of the experimental results to ease figure reading). As the size of the network increases, information available to ADV to make an educated guess of the query target sensor decreases (ratio of controlled sensors being equal). Thus, we conclude that the degree of privacy provided by our technique scales with the number of sensors in the network. Note that the way ADV selects the sensors to compromise does not influence its capability of guessing the query target. E. De Cristofaro and R. Di Pietro Prob. ADV breaks query privacy 226 0.0300 0.0250 0.0200 0.0150 0.0100 0.0050 0.0000 512 1024 n 1536 5 4.5 4 3.5 3 2.5 (%) 2 1.5 z/n 1 0.5 2048 Fig. 2. Probability non-resident ADV wins when disseminate-and-delete mode is not enforced Disseminate-and-Delete Mode Not Enforced. We now relax the assumption introduced by the disseminate-and-delete mode, i.e., sensors will not delete the disseminated data. We now discuss the choices ADV has with respect to T in order to guess the target sensor: If (τ = 0), then the best choice ADV can perform (to guess the target sensor) is to select randomly the target from the sensors in the set S\Z, hence: P r(ADV wins) = 1/(n − z). If (τ > 0), ADV has two options: (1) she could select the target sensor from T , thus: P r(ADV wins) = 1/β, or (2) she could select the target sensor selecting at random from the set S\T , thus: P r(ADV wins) = 1/(n − z). Combining the above results, under the rationale hypothesis that ADV wishes to maximize her winning probability (we assume 1/(n − z) < 1/β), we have: P r(ADV wins) = P r(ADV wins|τ > 0) · P r(τ > 0) + P r(ADV wins|τ = 0) · P r(τ = 0) = n−β n−β 1 1 1 1 z z = · (1 − P r(τ = 0)) + · P r(τ = 0) = 1− n + n β n−z β n−z z z (4) Again, the above model has been simulated. The results of our experimental evaluation are plotted in Fig. 2, with respect to different values of n (ranging from 512 to 2048) and z (ranging from 0.5% to 5% of the sensors). The probability in Eq. 4 only differs from our experimental evaluation by an error of at most 10−3 . Remark. Analyzing the different privacy degrees offered by the two proposed techniques for non-resident adversary, (see Fig. 1 and 2), we conclude that the enforcement of the disseminate-and-delete mode remarkably reduces the ability of the adversary in correctly guessing the query target. Note that its success probability degrades rapidly for increasing networks (x-axis), while it is only slightly affected by an increasing number of compromised nodes (z-axis). Thus, we recommend its use at all times. Nonetheless, we conclude that the probability of ADV violating query privacy in both modes is low enough for most realistic applications. 5.3 Overhead Analysis (Non-Resident Adversary) We now analyze the overhead (in terms of exchanged messages) resulting from our protocol for the non-resident adversarial model. Dissemination Overhead (messages) Preserving Query Privacy in Urban Sensing Systems 350 300 227 sqrt(n*sqrt(n)) Experimental 250 200 150 100 50 512 1024 1536 2048 n Fig. 3. Message overhead of dissemination in Algorithm 3 Sensing. In this phase (Alg. 1), each sensor acquires a reading dependent on the environment and the physical phenomena of interest. Our protocol only requires sensor to perform two symmetric key encryptions, but the resulting overhead can be considered negligible. Indeed, in our experiments performed on a Nokia N900 smartphone (equipped with a 600MHz CPU), it only took less than 0.01ms to encrypt integers using AES with 128-bit keys. Note that, by contrast, the closest result to our work [8] requires nodes to perform public-key operations, specifically, Elliptic-Curve ElGamal decryption [16] (which takes 20ms on the Nokia N900). Dissemination. During this phase, a number of dissemination messages are exchanged among sensors. Note that not privacy-preserving querying mechanism do not incur this overhead. Each sensor replicates its sensed data to β other sensors reached through a Minimum Spanning Tree. In order to estimate the resulting message overhead, we consider the following lower and upper bounds. The lower bound is reached if all replica sensors are neighboring sensors, hence the number of messages exchanged is (β). Whereas, as an upper-bound, we consider that all replica sensors are chosen at random and reached in a random order (i.e., without constructing the MST).√ Since on n), and average the distance between two random sensors in a n-sized network is O( √ √ β = n, the upper bound is O(β · n) = O(n). As a result, the dissemination message overhead for the dissemination presented in Algorithm 3 is bounded as: β < Message overhead from Algorithm 3 < O(n) In order to provide a more accurate measure, we decided to simulate the behavior of the dissemination using a Minimum Spanning Tree. Fig. 3 shows √ that the resulting message overhead can be experimentally upper-bounded by O( n n). Note that we simulated networks whose sizes range from 512 to 2048. We randomly selected β sensors and count the number of hops needed to reach them all following the Minimum √ √ Spanning Tree. To ease simulation, we assumed the network to be represented as a n× n grid, thus making routing algorithm straightforward. Observe that dissemination imposes a limited computational overhead to each sensor, specifically, related to the computation of the Minimum Spanning Tree. This incurs O(β log β), using, for instance, Kruskal’s algorithm [5]. 228 E. De Cristofaro and R. Di Pietro Query. During this phase, GW forwards the encrypted query to α random sensors. These sensors can be reached with a minimum √ number of messages—again, through a Minimum Spanning Tree. Thus, as α = n, the number of messages exchanged in √ this phase can be upper bounded by O( n n). Since the query algorithm (Alg. 2) is independent of the adversarial model, for all the following protocols presented in the rest√of the paper, we will estimate the message overhead due to such algorithm as O( n n) messages. Furthermore, the computational overhead imposed by the query algorithm (Alg. 2) to involved parties can be considered negligible, as it only involves: (i) forwarding and comparisons (GW, sensors in Q), (ii) symmetric encryption/decryption operations (OWN t ). (Recall that—as discussed in the analysis of the sensing algorithm in Sec. 4—it takes less than 0.01ms to perform AES operations on a smartphone). Take Away. In conclusion, we observe that our √ protocol for non-resident adversaries introduces a total average overhead of O( n n) messages per sensor. Whereas, the computational overhead incurred by our techniques is negligible, when compared to any non privacy-preserving mechanism. 6 Resident Adversary We now consider the case when ADV is resident. In the following, we assume that the adversary is local to a specific area of the network. 6.1 Dissemination (Resident Adversary) To ease exposition, we now consider the √ to be deployed as a grid. Sensors are √ network then placed at the intersections of an n × n matrix. We recall that ADV in this model occupies a squared-shape region of the network. We consider such an adversary √ √ to be deployed on a z × z square. Next, we assume that n and z are perfect square. We stress that this topology is assumed only to ease presentation, while our techniques are not restricted to it. We consider two different dissemination strategies in order to show the feasibility of the proposed approach. Local Chain. In this approach, each sensor at dissemination time initializes a counter with β and selects the neighbor at its right. When the latter receives the replica to retain, it decreases the counter and forwards the message to another neighbor at its right. The dissemination stops when the counter reaches zero. Two special cases may occur: (1) when one of the sensors at the right border of the network is reached; we cope with this situation assuming that this sensor continues the dissemination on its bottom neighbor. Thus, it could be possible that a sensor at the bottom-right border is reached; then the dissemination continues to its left neighbor; and, (2) if a sensor at the bottom border of the network is reached, it performs the dissemination on the upper neighbors. Local Diffusion. In this approach, each sensor, Si , populates the set of replica nodes, √ √ Ri , with neighbors, by selecting an inner z × z square in the grid network. In Fig. 4, we depict the two dissemination strategies for the resident adversary. We depict a small network composed by 8 ∗ 8 sensors, and each sensor relies on 8 Preserving Query Privacy in Urban Sensing Systems 229 Si Fig. 4. Dissemination in the presence of resident local adversary in a 8x8 network: local chain (in gray) and local diffusion (in blue) √ (β = 8 ∗ 8 = 8) replica sensors. The sensors in lighter color are the replica sensors selected with the local chain approach, whereas the darker ones are those selected with the local diffusion approach. (Note that the sensor immediately right of Si is selected with both approaches). We choose a limited number of sensors only to improve figure’s clarity; as shown later, our dissemination techniques also apply, efficiently, to significantly larger networks. 6.2 Privacy Analysis (Resident Adversary) We now analyze the privacy provided by the above approaches. We consider the adversary to break the query privacy whenever she controls a sensor storing a tag matching T ∗ (i.e., OWN t ’s query). Indeed, due to the deterministic routing adopted, ADV could easily compute the sensor originating the message (i.e., St ). Informally, ADV wins if Rt ∩ Z is not empty. The probability of such an event is different in the two different approaches presented above. Local Chain. If the dissemination is performed always choosing the right neighbor (we will not consider the special case of a message reaching the border of the deployment area) the compromising probability can be upper bounded as follows. Since ADV occupies a square area, it is enough to consider where to place the center of such a square for ADV to win—such a center can be abstracted as a sensor (cA ). Indeed, once cA is placed, the distribution of other sensors will follow. In√particular, let us denote with d the discrete distance of cA from its border (that is, d = z/2). In the same way, we can consider just the placement of the target sensor (St ), to derive the distribution of all the other sensors in Rt . Now, since the distribution of sensors in Rt is a chain, for ADV to compromise at least a sensor in that set, it is sufficient that the sensor cA is placed at a distance less than d from any of the sensors in R. This area, is equal to a rectangle of length β + 2d and height 2d. Hence, the probability for ADV to compromise a sensor in Rt is given by: P r(ADV wins) < 2dβ + z (β + 2d)2d = n n (5) E. De Cristofaro and R. Di Pietro Prob. ADV breaks query privacy 230 0.3 0.25 0.2 0.15 0.1 0.05 025 30 35 sqrt(n) 40 45 2 3 8 7 6 5 sqrt(z) 4 Prob. ADV breaks query privacy Fig. 5. Probability resident ADV compromises query privacy for local chain 0.3 0.25 0.2 0.15 0.1 0.05 025 30 35 sqrt(n) 40 45 2 3 4 5 8 7 6 sqrt(z) Fig. 6. Probability resident ADV compromises query privacy for local diffusion The upper bound of the above equation has been validated by experiments as described in Sec. 4.2. Fig. 5 plots the probability of ADV guessing the target in our experimental results, in terms of the number of sensors in the network (side of the network grid) and sensors controlled by the adversary (side of the adversarial inner square). To ease presentation, we do not plot the analytical values as they tightly upper-bound the empirical probabilities, with a maximum error of 10−2 . Local Diffusion. Similarly to the previous technique, we say that ADV breaks query privacy if she controls a sensor in the square formed by sensors in Rt . With a reasoning similar to the one adopted for chain deployment, the above event happens if the following sufficient condition is verified: ADV places cA within a square of side-length √ ( n + 2d) that has St at the center. The probability of such an event can be upper bounded as follows: √ √ 2 P r(ADV wins) < β + z + 2 βz ( β + 2d) = n n (6) The upper bound of the above equation has been validated empirically as described in Sec. 4.2. Fig. 6 plots the probability for ADV to compromise query privacy in our experimental results, in terms of sensors in the network (side of the network grid) and sensors controlled by the adversary (side of the adversarial inner square). To ease presentation, we do not plot the analytical values as the difference between analytical and experimental results reports, on any plotted point, an absolute error of at most 10−2 . Preserving Query Privacy in Urban Sensing Systems 231 6.3 Overhead Analysis (Resident Adversary) Sensing. Communication and computational overhead due to sensing is same as for the non-resident adversary, thus, we do not repeat here its analysis and comparison to related work. Dissemination. During this phase each sensor replicates its reading to β replica sensors. Since these sensors are always selected among neighboring sensors (in both the local chain and local diffusion modes), √ the total number of messages needed for the dissemination accounts to O(β) = O( n) for each sensor. Computational overhead is negligible as it only involves forwarding. Query. The for this action (Alg. 2) is same as for the non-resident adversary, overhead √ hence, O( n n) total messages. Again, computational overhead is negligible, as it only involves forwarding and symmetric key operations. Take Away. We conclude that our protocol for resident adversaries (following both the local √ chain and the local diffusion approaches) introduces a total average overhead of O( n) messages per sensor, which is smaller than that incurred by our technique for non-resident adversaries. Again, the computational overhead incurred by our techniques is negligible, when compared to any non privacy-preserving mechanism. 7 Discussion In previous sections, we have presented a set of probabilistic techniques to enforce query and data privacy in Urban Sensing. Our protocols face different adversarial strategies, and provide different privacy degree and bandwidth overhead. Observe that the degree of privacy provided by techniques against a non-resident ADV is higher than those against a resident ADV, but they also incur higher message overhead. Also note that, in the non-resident adversary model, the way the ADV is distributed does not influence attained privacy degree. Interestingly, privacy guarantees increase for larger networks, even with the same percentage of corrupted sensors. In the case of resident adversaries, we have assumed that ADV is locally distributed in a specific area of the network. We plan to include, in the extended version of the paper, the analysis of privacy (and related dissemination techniques) for resident adversaries that occupy different areas of the network. Note that communication overhead is evenly distributed among sensors, while the communication between the gateway and the querier is kept to the bare minimum – just one message is routed to the GW. Also, to the best of our knowledge, our work is the first to: (1) formalize the problem of protecting query and data privacy in Urban Sensing systems; and, (2) to analyze the aforementioned adversarial models. Finally, remark that all proposed techniques rely on generating replica of the sensed data. These replica, beyond influencing the achieved degree of privacy, have the positive side effect of enhancing data reliability and fault tolerance. 232 E. De Cristofaro and R. Di Pietro 8 Conclusion To the best of our knowledge, this work is the first to define and address query and data privacy in the context of Urban Sensing. We have proposed two novel adversary models and, for each of them, a distributed technique that trades-off the achieved query privacy level with a (slight) communication overhead. Data privacy is achieved using standard (inexpensive) symmetric cryptography. Our techniques have been thoroughly analyzed and, when compared to the threat, they resulted efficient and effective. Finally, we have highlighted some problems that call for further research in this developing area. Acknowledgments. This paper has been partially supported by: the grant HPC-2011 from CASPUR (Italy); Prevention, Preparedness and Consequence Management of Terrorism and other Security-related Risks Programme European Commission - DirectorateGeneral Home Affairs, under the ExTrABIRE project, HOME/2009/CIPS/AG/C2-065; ACC1O, the Catalan Business Competitiveness Support Agency. References 1. Burke, J., Estrin, D., Hansen, M., Parker, A., Ramanathan, N., Reddy, S., Srivastava, M.: Participatory Sensing. In: World Sensor Web Workshop (2006) 2. Chang, K., Shin, K.G.: Distributed authentication of program integrity verification in wireless sensor networks. ACM Trans. Inf. Syst. Secur. 11 (2008) 3. Chaum, D.L.: Untraceable electronic mail, return addresses, and digital pseudonyms. Communications of ACM 24(2) (1981) 4. Chor, B., Kushilevitz, E., Goldreich, O., Sudan, M.: Private information retrieval. Journal of ACM 45(6) (1998) 5. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001) 6. Cornelius, C., Kapadia, A., Kotz, D., Peebles, D., Shin, M., Triandopoulos, N.: AnonySense: Privacy-aware people-centric sensing. In: MobiSys (2008) 7. Das, T., Mohan, P., Padmanabhan, V., Ramjee, R., Sharma, A.: PRISM: Platform for Remote Sensing using Smartphones. In: MobiSys (2010) 8. De Cristofaro, E., Ding, X., Tsudik, G.: Privacy-preserving querying in wireless sensor networks. In: ICCCN (2009) 9. De Cristofaro, E., Soriente, C.: PEPSI: Privacy-Enhanced Participatory Sensing Infrastructure. In: WiSec (2011) 10. Ganti, R., Pham, N., Tsai, Y., Abdelzaher, T.: PoolView: Stream Privacy for Grassroots Participatory Sensing. In: SenSys (2008) 11. Huang, K., Kanhere, S., Hu, W.: Preserving Privacy in Participatory Sensing Systems. Computer Communications 33(11) (2010) 12. Lee, J., Hoh, B.: Sell Your Experiences: A Market Mechanism based Incentive for Participatory Sensing. In: PerCom (2010) 13. Lu, H., Pan, W., Lane, N., Choudhury, T., Campbell, A.: SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones. In: MobiSys (2009) 14. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. ACM Trans. on Knowledge Discovery from Data (TKDD) 1(1) (2007) 15. Mathur, S., Jin, T., Kasturirangan, N., Chandrasekaran, J., Xue, W., Gruteser, M., Trappe, W.: ParkNet: Drive-by Sensing of Road-side Parking Statistics. In: MobiSys (2010) Preserving Query Privacy in Urban Sensing Systems 233 16. Menezes, A.: Elliptic curve public key cryptosystems. Kluwer (2002) 17. Mohan, P., Padmanabhan, V., Ramjee, R.: Rich Monitoring of Road and Traffic Conditions using Mobile Smartphones. In: SenSys (2008) 18. Ortolani, S., Conti, M., Crispo, B., Di Pietro, R.: Event Handoff Unobservability in WSN. In: Camenisch, J., Kisimov, V., Dubovitskaya, M. (eds.) iNetSec 2010. LNCS, vol. 6555, pp. 20–28. Springer, Heidelberg (2011) 19. Ortolani, S., Conti, M., Crispo, B., Di Pietro, R.: Events privacy in WSNs: A new model and its application. In: WoWMoM (2011) 20. Paulos, E., Honicky, R., Goodman, E.: Sensing Atmosphere. In: SenSys Workshops (2007) 21. Perito, D., Tsudik, G.: Secure Code Update for Embedded Devices Via Proofs of Secure Erasure. In: Gritzalis, D., Preneel, B., Theoharidou, M. (eds.) ESORICS 2010. LNCS, vol. 6345, pp. 643–662. Springer, Heidelberg (2010) 22. Reddy, S., Estrin, D., Srivastava, M.: Recruitment Framework for Participatory Sensing Data collections. In: Floréen, P., Krüger, A., Spasojevic, M. (eds.) Pervasive Computing. LNCS, vol. 6030, pp. 138–155. Springer, Heidelberg (2010) 23. Reed, M.G., Syverson, P.F., Goldschlag, D.M.: Anonymous connections and onion routing. IEEE Journal on Selected Areas in Communications 16(4) (1998) 24. Shi, J., Zhang, R., Liu, Y., Zhang, Y.: PriSense: Privacy-Preserving Data Aggregation in People-Centric Urban Sensing Systems. In: INFOCOM (2010) 25. Sion, R., Carbunar, B.: On the Computational Practicality of Private Information Retrieval. In: NDSS (2007) 26. Sweeney, L.: k-Anonymity: A model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5) (2002)
© Copyright 2026 Paperzz