Preserving Query Privacy in Urban Sensing Systems

Preserving Query Privacy in Urban Sensing Systems
Emiliano De Cristofaro1 and Roberto Di Pietro2
1
Palo Alto Research Center
[email protected]
2
Università di Roma Tre, Italy
[email protected], [email protected]
Abstract. Urban Sensing is an emerging paradigm that combines the ubiquity
of smartphones with measurement capabilities of sensor networks. While this
concept is still in development, related security and privacy concerns become
increasingly more relevant. In this paper, we focus on a number of scenarios
where nodes of an Urban Sensing system are subject to individual queries. We
address the problem of protecting query privacy (i.e., hiding which node matches
the query) and data privacy (i.e., hiding sensed data). We introduce a realistic
network model and two novel adversarial models: resident and non-resident adversaries. For each of them, we propose a distributed privacy-preserving technique and evaluate its effectiveness via analysis and simulation. To the best of
our knowledge, this is the first attempt to define and address both query and data
privacy in the context of Urban Sensing. Our techniques are tunable, trading off
the level of privacy assurance with a small overhead increase. We additionally
provide a relevant improvement of data reliability and availability, while only relying on standard symmetric cryptography. The practicality of our proposals is
demonstrated both analytically and experimentally.
1 Introduction and Motivation
Urban Sensing enables the seamless collection of data from a large number of usercarried devices. By embedding a sensor into a wireless-enabled device (e.g., a mobile
phone), Urban Sensing targets dynamic information about environmental trends, e.g.,
ambient air quality [20], urban traffic patterns [17], parking availabilities [15], sound
events [13], etc. As it often happens, however, the increasing amount of entailed information prompts a number of security and privacy concerns. In the attempt of reaching
the full potential of Urban Sensing, researchers are proposing platforms for application
developers [7] and devising business models based on incentive mechanisms for the
capitalization on sensed data [12,22]. As information collected by “sensors” is made
accessible to third-party entities, not only anonymity of smartphone users, but also privacy of queriers must be protected.
Observe that, while traditional Wireless Sensor Networks (WSNs) assume that network operator and sensor owners are the same entity, in Urban Sensing this might not
be the case. Multiple users and organizations often collaborate, but they may not trust
each other. As individual sensors can be subject to queries, we face two privacy issues:
(1) queriers might not be willing to disclose their interests, and (2) sensed data should
be protected against unauthorized access. Urban Sensing systems present several other
L. Bononi et al. (Eds.): ICDCN 2012, LNCS 7129, pp. 218–233, 2012.
c Springer-Verlag Berlin Heidelberg 2012
Preserving Query Privacy in Urban Sensing Systems
219
differences compared to WSNs. In the former, sensing devices are mobile and equipped
with more powerful resources, their batteries can be charged frequently, and data transmission is not as expensive. Nonetheless, we borrow some WSN terminology, e.g., we
use the words sensors and nodes, interchangeably, to denote sensing devices.
We are motivated by a number of privacy-sensitive applications. Consider, for
instance, smartphone users measuring environmental data, e.g., pushed by some incentives, such as discounts on their phone bills [12]. Members of the program and
external entities may access sensed data and query specific sensors, on demand. However, queriers do not want to reveal which sensor is being interrogated, as this reveals
their interests and/or exposes sensitive information to other users, network operator, or
eavesdroppers. Even though the identity of queried sensors is hidden, queriers should
be prevented from unconditionally querying all sensors, whereas, they should only
interrogate those sensors they are entitled to ( e.g., based on program’s policies).
In order to guarantee query privacy, one could collect all readings at some (external)
server and privately query the database server using, for instance, Private Information
Retrieval (PIR) techniques [4]. However, it is well-known that the state-of-the-art in
PIR has not reached the point of being practical (see [25]). Also, it is not clear how to
adapt PIR to our scenario: in PIR, server’s database is public and all information can be
retrieved, whereas, in Urban Sensing the querier should only get data from the sensors
she interrogates and she is entitled to.
Intended Contributions. This paper introduces and analyzes different adversarial
strategies to violate privacy in Urban Sensing systems. We distinguish between resident and non-resident adversaries: the former controls a fraction of the sensors at all
times, while the latter corrupts sensors after they have sensed data. For each considered models, we propose a corresponding distributed privacy-preserving technique. Our
techniques employ replication of sensed data combined with probabilistic algorithms.
As a positive effect, replication enhances data reliability and tackles possible disconnections and sensor failures, which are likely to happen in such a mobile environment.
Also, our techniques rely on very efficient cryptographic tools and minimize the query
preparation overhead. We do not require any specific assumption on the network topology and we minimize bandwidth requirements on the link between network gateway
and querier. Finally, a thorough analysis supports our claims.
Paper Organization. Next section reviews related work. After preliminaries in Section
3, we present building blocks in Section 4. In Section 5, we describe the non-resident
adversary, while the resident adversary – in Section 6. Section 7 discusses proposed
techniques. The paper concludes in Section 8.
2 Related Work
In the last few years, research interest in Urban Sensing ramped-up, however, little work
has focused on privacy and security aspects.
Recent proposals in [6] and [11] are, to the best of our knowledge, the most closely
related work to address privacy-related problems. They aim at protecting anonymity
of users, using Mix Network techniques [3], and provide either k-anonymity [26] or
220
E. De Cristofaro and R. Di Pietro
l-diversity [14]. However, they only guarantee limited confidentiality, as reports are
encrypted under the public key of a Report Service (RS), a trusted party collecting reports and distributing them to queriers, which learns both sensors’ reports and queriers’
interests. Other work focuses on other (somewhat) related problems. In [24], authors
study privacy-preserving data aggregation, e.g., to compute sum, average, or variance.
Similarly, [10] presents a mechanism to compute community statistics of time-series
data, while protecting anonymity (using data perturbation in a closed community with
a known empirical data distribution).
One common feature of these techniques is that the underlying infrastructure used to
collect and deliver reports consists of almost ubiquitous 802.11x access points. Specifically, the work in [6] uses standard MAC-IP address recycling techniques to guarantee
user unlinkability between reports (with respect to the access point). However, the assumption of WiFi infrastructures imposes severe limitations on applications’ scope in
Urban Sensing. Indeed, an ubiquitous presence of open WiFi networks is neither currently available, nor anticipated in the next future. Another limitation of prior work,
such as [6], concerns the use of Mix Networks [3]. Mix Networks serve as anonymizing
channels between sensors and report servers: they de-link reports submitted by sensors
before they reach the applications. In other words, Mix Networks act as proxies to forward user reports only when some system-defined criteria are met. However, it is not
clear how to deploy these proxies in Urban Sensing.
The recent PEPSI project [9] introduces a cryptographic model and a provablysecure infrastructure for privacy protection in participatory sensing [1].
Finally, related results are in the context of WSNs. In particular, [8] targets the problem of privately querying the sensor network in a sensing-as-service model, i.e., where
network operators offer on-demand access to sensor readings: it combines Onion Routing [23] and k-anonymity techniques [26]. Although addressing a somewhat similar
problem, this technique cannot be adapted to the Urban Sensing paradigm, since the
network topology is assumed to be static and known in advance to the querier. Also, the
querier can interrogate any sensor, at will, i.e., sensor data is publicly available. Finally,
an overhead of O(k) messages is always generated between the sensor network gateway and the external querier. whereas, the latter requires the presence of trusted storage
nodes. Also, recent work in [19,18] focuses on event privacy in WSN.
3 Preliminaries
We now overview network model and assumptions, and define privacy properties.
Network Model. While there is no universally accepted model for Urban Sensing systems, in this paper we use reasonable assumptions from the literature. Thus, we assume
a large-scale system resembling a mesh network, similar to [24]. We consider an Urban
Sensing system consisting of n sensors, S = {S1 , · · · , Sn }, and a gateway, GW. The
network operator is denoted with OPR. Each sensor Si is owned by OWN i . Note
that we consider settings where OWN i ’s and OPR are distinct entities. We assume
that each sensor Si is securely initialized by OWN i . Each Si is preloaded with a pairwise key ki shared with OWN i only. Moreover, we optionally assume that sensors
can securely establish pairwise key with other sensors in the network. We denote with
K(Si , Sl ) a symmetric key shared between Si and Sl .
Preserving Query Privacy in Urban Sensing Systems
221
Query Privacy. Let St be the target of a query to the Urban Sensing system. We
say that query privacy is guaranteed if any malicious adversary has just a negligible
advantage over a random guess of the identity of queried sensor St . In other words,
only OWN t (or any entity authorized by OWN t ) can successfully identify St as the
target of a query issued to the Urban Sensing system.
We anticipate that our solution facing a non-resident adversary (see Sec. 4.3) provides a degree of query privacy very close to the optimal one (i.e., when an adversary has only negligible advantage over a random guess). We do not consider querier’s
anonymity, i.e., hiding the fact that she is querying the network, as standard techniques,
such as Tor, could be used to build an anonymous channel between GW and the querier.
Data Privacy. Let St be the target of a query to the Urban Sensing system, and OWN t
the owner of the corresponding sensor. We say that data privacy is guaranteed if any
malicious adversary has a negligible advantage over a random guess of the reading
reported by St . In other words, only OWN t can access St ’s reading. Therefore, data
privacy ensures that: (1) any eavesdropper in the network has a negligible probability
of learning any reading provided by any sensor, and (2) although the identity of queried
sensor is hidden by query privacy guarantees, external parties cannot query arbitrary
sensors, but only those they “own”, or they are entitled to.
Adversarial Capabilities. In the rest of the paper, we denote the adversary with ADV.
We introduce our adversarial models in Sec. 4.3; however, we assume the following
features to be common to any adversary. ADV controls a coalition of up to z sensors,
where 0 < z < n. Also, ADV is assumed to be honest-but-curious. Specifically, such
an adversary can: (1) eavesdrop on packets in physical proximity of corrupted sensors,
and (2) compromise all information stored on the sensor, i.e., readings and secret keys.
ADV does not interfere with sensors’ behavior, but she learns their secrets and use them
to compromise query and data privacy. Indeed, if ADV limits its activity to reading
sensor memory contents, there might be no way to tell if a given sensor has ever been
corrupted—should sensor code compromising occur, this action could be detected, for
instance, using code attestation techniques, e.g., in [2,21]. Observe that, in the context
of Urban Sensing, sensors are essentially users’ smartphones, thus, users themselves
can be trying to violate privacy properties discussed above and even collude with each
other. The above adversarial model (controlling up to z sensors) characterizes the class
of adversaries considered in our applications. Finally, note that we aim at hiding the
query target even to the queried sensor, as it might be under the control of the adversary.
We only focus on query and data privacy. Additional well-known issues, such as
pairwise key establishment, key management, data integrity, and routing authentication
can be addressed by standard solutions. Also note that Denial-of-Service (DoS) attacks
are out of the scope of this paper.
4 Building Blocks
4.1 Components
Sensing. At each round, any sensor Si produces a reading. The reading sensed at
round j is denoted with dji . Subsequently, the sensor encrypts the data reading and
222
E. De Cristofaro and R. Di Pietro
Algorithm 1. Sensing [Executed by Sensor Si at round j]
1. Senses dji
2. Eij = Encki (dji ) /* Data Encryption */
3. Tij = Encki (i, j) /* Data Tag */
Algorithm 2. Query (Assume the query target is St at round j)
1.
2.
3.
4.
5.
6.
7.
[OWN t ] Compute T ∗ = Enckt (t, j)
[OWN t ] Send T ∗ to GW
[GW, on receiving T ∗ ] Randomly select α sensors, s.t. Q = {q1 , · · · , qα }
[GW] Forward T ∗ to sensors in Q
[Each ql ∈ Q, on receiving T ∗ ] If ( Tij = T ∗ ) Then Send (Eij , Tij ) to GW
[GW, on receiving (Eij , Tij )] Forward (Eij ) to OWN t
[OWN t ] Obtain djt = Deckt (Eij )
produces a data tag, as presented in Algorithm 1. We assume a secure encryption
scheme, (Enc(·) (·), Dec(·) (·)) such as AES.
Dissemination. At each round j, each sensor Si disseminates encrypted reading and
data tag produced in Algorithm 1 to β other sensors. We call them replica sensors and
we denote them with the corresponding set Rji = {rij1 , · · · , rijβ }. The data dissemination strategy depends on the adversarial model and it is detailed in Sec. 5 and Sec. 6. At
the end of dissemination, each sensor in Rji will store (Eij , Tij ). Optionally, each sensor
after dissemination deletes both the reading and the data tag. We call such option the
disseminate-and-delete mode.
Query. We consider the case where OWN t queries the sensor St as for the reading
at round j. Algorithm 2 presents the sequence of steps that are performed. Note that in
text in square brackets we indicate the entity performing the corresponding operation.
First, OWN t sends to GW an encrypted tag corresponding to the sensor (and round)
of interest. Then, GW forwards it to a random set of α sensors in the network. Finally,
each of the α sensors verifies if any of the tags that it stores matches the incoming
encrypted tag. In this case, the corresponding encrypted reading is forwarded back to
GW that further relays it to OWN t .
We remark that, given the probabilistic query phase (where the α sensors to interrogate
are randomly selected), the query could also fail, i.e., none of the α sensors queried by
GW actually stores a tag matching T ∗ . The success probability of the query relies on
the values α and β, that introduce a tension in the protocol. On the one hand, the values
α and β should be set as high as possible, in order to decrease the probability for a query
to fail. On the other hand, high values for α and√
β would increase
√ the message overhead.
We choose, as an acceptable trade-off, α = c1 n and β = c2 n, where c1 and c2 are
small constants. Assuming c1 = c2 = c, the probability to have a successfully query is:
P r(Query Success) ≥ 1 −
1
ec
(1)
Note that this analytical lower bound has also been validated experimentally through
the methodology we introduce in Sec. 4.2.
Preserving Query Privacy in Urban Sensing Systems
223
Corruption. As mentioned in Sec. 3, we assume ADV corrupts z sensors, i.e.,
eavesdrops all the traffic in their physical proximity, and compromises all their secrets.
4.2 Settings
Throughout the paper, all analytical findings are confirmed by experimental simulations. To this end, we developed a network simulator that represents different combinations of network parameters and adversarial strategies. To validate each event, we
performed 1,000 simulations and reported the average. Furthermore, note that in the
sequel of the paper we selected c1 = c2 = 1. This choice of parameters leads to at least
a 63% success probability for the query. Should the query fail, GW can issue another
query, selecting another random set of sensors. Query failing could be checked associating a timeout to the issued query. In the following, however, we will assume that
queries do not fail. This assumption does not affect, in any way, query or data privacy
properties, while having just a small effect on performance (on the average, issuing the
query two times, each time with a fresh randomly selected set of α sensors, it is enough
for the query to succeed).
Notation. Table 1 summarizes terminology and notation used in the paper. Note that
when it is clear from the context, we will not use the index specifying the round (j).
Table 1. Notation
Symbol Meaning
K(Si , Sl ) Pairwise symmetric key
between Si and Sl
OWN Owner of the network / Querier
OPR Network operator
S Set of sensors
n # of sensors in S
Si ∈ S Generic sensor
St Target sensor
Z Set of sensors controlled by ADV
at round j
z # of sensors in Z
Symbol Meaning
ki Pairwise symmetric key
between OWN and Si
Tij Data tag produced by sensor Si
dji Data reading of sensor Si at round j
Eij Encryption of di with key ki
T ∗ Data tag produced by OWN to query
Q Set of sensors receiving T ∗
ql Generic sensor in Q
ril Generic replica sensor in Ri
α # of sensors in Q
β # of replica sensors
4.3 Adversarial Strategies
With regard to the presence of ADV in the network, we distinguish the two following
strategies.
Non-Resident Adversary. In this adversarial model, we assume that ADV is not always present in the network but it corrupts sensors after both the sensing phase and the
dissemination phase have been completed. Moreover, before the next round of sensing
starts, it releases the corrupted sensors in order to maximize its chances to go undetected. Further, note that sensor release implies that ADV restores the original code
run by the sensor. Indeed, if the code executed by a sensor is left altered by ADV this
224
E. De Cristofaro and R. Di Pietro
Algorithm 3. Dissemination Non-Resident Adversary
[Executed by Sensor Si ]
1.
2.
3.
4.
Randomly select Ri = {ri1 , · · · , riβ }
M STi = M inimumSpanningT ree(Ri )
Send (Ei , Ti ) to each sensor in Ri according to M STi
(Optional) Delete (dji , Ei , Ti )
would be eventually detected [2]. To summarize, the operating cycle can be decomposed
into the following sequence: (1) Sensing, (2) Dissemination, (3) Corruption, (4) Query,
and (5) Sensor Release. We anticipate that the degree of privacy achieved against this
type of ADV is higher than the one achieved against the resident one. Indeed, the fact
that ADV is not present during the dissemination phase prevents ADV from acquiring
valuable information.
Resident Adversary. We consider the case where ADV is always on the network,
i.e., it controls z sensors at all times. The operating cycle of the system is: (1) Sensing, (2) Dissemination, and (3) Query. Compared to the non-resident ADV, two steps
are “missing”: corruption, since sensors are corrupted just once, and sensor release, as
compromised sensors are not released. While privacy is harder to achieve, we show how
to trade-off a reduction on the privacy level with a corresponding overhead decrease
during dissemination. An efficient dissemination strategy is presented in Sec. 6.
5 Non-resident Adversary
We start by considering the adversary to be non-resident.
5.1 Dissemination Strategy (Non-resident Adversary)
When facing a non-resident adversary, the dissemination protocol can take advantage
from the fact that ADV is not present in the network. If replica sensors are randomly
selected and the adversary does not observe the sequence of dissemination messages,
than the smartest choice for sensor Si is to build a Minimum Spanning Tree [5] to reach
all the replica sensors in Ri . This minimizes the total number of messages needed for
the dissemination. Assuming such a mechanism for the dissemination phase, Algorithm
3 shows the sequence of steps run by sensors:
5.2 Privacy Analysis (Non-resident Adversary)
For each of the adversarial models presented in the paper, we analyze the degree of
query privacy provided by our technique, i.e., the probability that ADV successfully
guesses the sensor that originated the data (i.e., the target sensor). We do not analyze the
confidentiality of the sensed data, since it is assured by the encryption method algorithm
adopted, that we assume to be secure.
In the non-resident model, after sensor compromise ADV can partition the whole set
of sensors in two subsets. The first is composed of the sensors she controls and which
Prob. ADV breaks query privacy
Preserving Query Privacy in Urban Sensing Systems
0.0020
0.0018
0.0016
0.0014
0.0012
0.0010
0.0008
0.0006
0.0004
512
1024
n
1536
225
5
4.5
4
3.5
3
)
2.5
2
(%
1.5
z/n
1
0.5
2048
Fig. 1. Probability non-resident ADV wins in disseminate-and-delete mode
store T ∗ – we denote it with T . We remark that T = Q ∩ Z. The second set is the
complement of T , i.e., S\T . Finally, we denote with τ the size of T (i.e., τ = |T |).
Disseminate-and-Delete Mode. We consider the privacy degree assuming the tags and
the encrypted reading are immediately deleted by the sensors right after the dissemination phase has taken place, but before compromising.
In this model it is interesting to note that if a sensor controlled by ADV holds the
value T ∗ , then this sensor cannot be the target sensor, due to the deletion strategy.
However, apart from this information, ADV cannot do anything but randomly chose
the target sensor among all sensors but those in T . Therefore:
P r(ADV wins | |T | = τ ) = 1/(n − τ )
(2)
Hence, leveraging Eq. 2 and recalling that |Rt | = β:
P r(ADV wins) =
β
P r(ADV wins |τ = i) · P r(τ = i) =
i=0
β
β
P r(τ = i)
1
=
·
=
n
−
i
n
−
i
i=0
i=0
βn−β i
(3)
nz−i
z
We point out that the analysis of the above probability has been validated via simulation,
according to the methodology described in Sec. 4.2. Simulation results are plotted in
Fig. 1, with respect to different values for n (ranging from 512 to 2048) and z (ranging
from 0.5% to 5% of the sensors). We remark that the probability in Eq. 3 only differs
from our experimental evaluation by an error of at most 10−4 (we do not overlap the
plot of Eq. 3 with the plotting of the experimental results to ease figure reading).
As the size of the network increases, information available to ADV to make an educated guess of the query target sensor decreases (ratio of controlled sensors being
equal). Thus, we conclude that the degree of privacy provided by our technique scales
with the number of sensors in the network.
Note that the way ADV selects the sensors to compromise does not influence its
capability of guessing the query target.
E. De Cristofaro and R. Di Pietro
Prob. ADV breaks query privacy
226
0.0300
0.0250
0.0200
0.0150
0.0100
0.0050
0.0000
512
1024
n
1536
5
4.5
4
3.5
3
2.5 (%)
2
1.5
z/n
1
0.5
2048
Fig. 2. Probability non-resident ADV wins when disseminate-and-delete mode is not enforced
Disseminate-and-Delete Mode Not Enforced. We now relax the assumption introduced by the disseminate-and-delete mode, i.e., sensors will not delete the disseminated data. We now discuss the choices ADV has with respect to T in order to guess
the target sensor: If (τ = 0), then the best choice ADV can perform (to guess the
target sensor) is to select randomly the target from the sensors in the set S\Z, hence:
P r(ADV wins) = 1/(n − z). If (τ > 0), ADV has two options: (1) she could select
the target sensor from T , thus: P r(ADV wins) = 1/β, or (2) she could select the target
sensor selecting at random from the set S\T , thus: P r(ADV wins) = 1/(n − z).
Combining the above results, under the rationale hypothesis that ADV wishes to
maximize her winning probability (we assume 1/(n − z) < 1/β), we have:
P r(ADV wins) = P r(ADV wins|τ > 0) · P r(τ > 0) + P r(ADV wins|τ = 0) · P r(τ = 0) =
n−β n−β
1
1
1
1
z
z
= · (1 − P r(τ = 0)) +
· P r(τ = 0) =
1− n
+
n
β
n−z
β
n−z
z
z
(4)
Again, the above model has been simulated. The results of our experimental evaluation
are plotted in Fig. 2, with respect to different values of n (ranging from 512 to 2048)
and z (ranging from 0.5% to 5% of the sensors). The probability in Eq. 4 only differs
from our experimental evaluation by an error of at most 10−3 .
Remark. Analyzing the different privacy degrees offered by the two proposed techniques for non-resident adversary, (see Fig. 1 and 2), we conclude that the enforcement
of the disseminate-and-delete mode remarkably reduces the ability of the adversary in
correctly guessing the query target. Note that its success probability degrades rapidly for
increasing networks (x-axis), while it is only slightly affected by an increasing number
of compromised nodes (z-axis). Thus, we recommend its use at all times. Nonetheless,
we conclude that the probability of ADV violating query privacy in both modes is low
enough for most realistic applications.
5.3 Overhead Analysis (Non-Resident Adversary)
We now analyze the overhead (in terms of exchanged messages) resulting from our
protocol for the non-resident adversarial model.
Dissemination Overhead (messages)
Preserving Query Privacy in Urban Sensing Systems
350
300
227
sqrt(n*sqrt(n))
Experimental
250
200
150
100
50
512
1024
1536
2048
n
Fig. 3. Message overhead of dissemination in Algorithm 3
Sensing. In this phase (Alg. 1), each sensor acquires a reading dependent on the environment and the physical phenomena of interest. Our protocol only requires sensor
to perform two symmetric key encryptions, but the resulting overhead can be considered negligible. Indeed, in our experiments performed on a Nokia N900 smartphone
(equipped with a 600MHz CPU), it only took less than 0.01ms to encrypt integers using AES with 128-bit keys. Note that, by contrast, the closest result to our work [8]
requires nodes to perform public-key operations, specifically, Elliptic-Curve ElGamal
decryption [16] (which takes 20ms on the Nokia N900).
Dissemination. During this phase, a number of dissemination messages are exchanged
among sensors. Note that not privacy-preserving querying mechanism do not incur this
overhead. Each sensor replicates its sensed data to β other sensors reached through
a Minimum Spanning Tree. In order to estimate the resulting message overhead, we
consider the following lower and upper bounds. The lower bound is reached if all
replica sensors are neighboring sensors, hence the number of messages exchanged is
(β). Whereas, as an upper-bound, we consider that all replica sensors are chosen at
random and reached in a random order (i.e., without constructing the MST).√
Since on
n), and
average
the
distance
between
two
random
sensors
in
a
n-sized
network
is
O(
√
√
β = n, the upper bound is O(β · n) = O(n). As a result, the dissemination message
overhead for the dissemination presented in Algorithm 3 is bounded as:
β < Message overhead from Algorithm 3 < O(n)
In order to provide a more accurate measure, we decided to simulate the behavior of the
dissemination using a Minimum Spanning Tree. Fig. 3
shows
√ that the resulting message
overhead can be experimentally upper-bounded by O( n n). Note that we simulated
networks whose sizes range from 512 to 2048. We randomly selected β sensors and
count the number of hops needed to reach them all following the Minimum
√
√ Spanning
Tree. To ease simulation, we assumed the network to be represented as a n× n grid,
thus making routing algorithm straightforward.
Observe that dissemination imposes a limited computational overhead to each sensor,
specifically, related to the computation of the Minimum Spanning Tree. This incurs
O(β log β), using, for instance, Kruskal’s algorithm [5].
228
E. De Cristofaro and R. Di Pietro
Query. During this phase, GW forwards the encrypted query to α random sensors.
These sensors can be reached with a minimum
√ number of messages—again, through
a Minimum Spanning Tree. Thus, as α =
n, the number of messages exchanged in
√
this phase can be upper bounded by O( n n). Since the query algorithm (Alg. 2)
is independent of the adversarial model, for all the following protocols presented in
the
rest√of the paper, we will estimate the message overhead due to such algorithm as
O( n n) messages.
Furthermore, the computational overhead imposed by the query algorithm (Alg.
2) to involved parties can be considered negligible, as it only involves: (i) forwarding and comparisons (GW, sensors in Q), (ii) symmetric encryption/decryption operations (OWN t ). (Recall that—as discussed in the analysis of the sensing algorithm in
Sec. 4—it takes less than 0.01ms to perform AES operations on a smartphone).
Take Away. In conclusion, we observe that
our
√ protocol for non-resident adversaries
introduces a total average overhead of O( n n) messages per sensor. Whereas, the
computational overhead incurred by our techniques is negligible, when compared to
any non privacy-preserving mechanism.
6 Resident Adversary
We now consider the case when ADV is resident. In the following, we assume that the
adversary is local to a specific area of the network.
6.1 Dissemination (Resident Adversary)
To ease exposition, we now consider the
√ to be deployed as a grid. Sensors are
√ network
then placed at the intersections of an n × n matrix. We recall that ADV in this
model occupies a squared-shape
region of the network. We consider such an adversary
√
√
to be deployed on a z × z square. Next, we assume that n and z are perfect square.
We stress that this topology is assumed only to ease presentation, while our techniques
are not restricted to it. We consider two different dissemination strategies in order to
show the feasibility of the proposed approach.
Local Chain. In this approach, each sensor at dissemination time initializes a counter
with β and selects the neighbor at its right. When the latter receives the replica to retain,
it decreases the counter and forwards the message to another neighbor at its right. The
dissemination stops when the counter reaches zero. Two special cases may occur: (1)
when one of the sensors at the right border of the network is reached; we cope with this
situation assuming that this sensor continues the dissemination on its bottom neighbor.
Thus, it could be possible that a sensor at the bottom-right border is reached; then the
dissemination continues to its left neighbor; and, (2) if a sensor at the bottom border of
the network is reached, it performs the dissemination on the upper neighbors.
Local Diffusion. In this approach, each sensor,
Si , populates the set of replica nodes,
√
√
Ri , with neighbors, by selecting an inner z × z square in the grid network.
In Fig. 4, we depict the two dissemination strategies for the resident adversary.
We depict a small network composed by 8 ∗ 8 sensors, and each sensor relies on 8
Preserving Query Privacy in Urban Sensing Systems
229
Si
Fig. 4. Dissemination in the presence of resident local adversary in a 8x8 network: local chain (in
gray) and local diffusion (in blue)
√
(β = 8 ∗ 8 = 8) replica sensors. The sensors in lighter color are the replica sensors selected with the local chain approach, whereas the darker ones are those selected
with the local diffusion approach. (Note that the sensor immediately right of Si is selected with both approaches). We choose a limited number of sensors only to improve
figure’s clarity; as shown later, our dissemination techniques also apply, efficiently, to
significantly larger networks.
6.2 Privacy Analysis (Resident Adversary)
We now analyze the privacy provided by the above approaches. We consider the adversary to break the query privacy whenever she controls a sensor storing a tag matching
T ∗ (i.e., OWN t ’s query). Indeed, due to the deterministic routing adopted, ADV could
easily compute the sensor originating the message (i.e., St ). Informally, ADV wins if
Rt ∩ Z is not empty. The probability of such an event is different in the two different
approaches presented above.
Local Chain. If the dissemination is performed always choosing the right neighbor (we
will not consider the special case of a message reaching the border of the deployment
area) the compromising probability can be upper bounded as follows. Since ADV occupies a square area, it is enough to consider where to place the center of such a square
for ADV to win—such a center can be abstracted as a sensor (cA ). Indeed, once cA is
placed, the distribution of other sensors will follow. In√particular, let us denote with d
the discrete distance of cA from its border (that is, d = z/2). In the same way, we can
consider just the placement of the target sensor (St ), to derive the distribution of all the
other sensors in Rt . Now, since the distribution of sensors in Rt is a chain, for ADV
to compromise at least a sensor in that set, it is sufficient that the sensor cA is placed at
a distance less than d from any of the sensors in R. This area, is equal to a rectangle of
length β + 2d and height 2d. Hence, the probability for ADV to compromise a sensor
in Rt is given by:
P r(ADV wins) <
2dβ + z
(β + 2d)2d
=
n
n
(5)
E. De Cristofaro and R. Di Pietro
Prob. ADV breaks query privacy
230
0.3
0.25
0.2
0.15
0.1
0.05
025
30
35
sqrt(n)
40
45 2
3
8
7
6
5
sqrt(z)
4
Prob. ADV breaks query privacy
Fig. 5. Probability resident ADV compromises query privacy for local chain
0.3
0.25
0.2
0.15
0.1
0.05
025
30
35
sqrt(n)
40
45 2
3
4
5
8
7
6
sqrt(z)
Fig. 6. Probability resident ADV compromises query privacy for local diffusion
The upper bound of the above equation has been validated by experiments as described
in Sec. 4.2. Fig. 5 plots the probability of ADV guessing the target in our experimental
results, in terms of the number of sensors in the network (side of the network grid) and
sensors controlled by the adversary (side of the adversarial inner square). To ease presentation, we do not plot the analytical values as they tightly upper-bound the empirical
probabilities, with a maximum error of 10−2 .
Local Diffusion. Similarly to the previous technique, we say that ADV breaks query
privacy if she controls a sensor in the square formed by sensors in Rt . With a reasoning
similar to the one adopted for chain deployment, the above event happens if the following
sufficient condition is verified: ADV places cA within a square of side-length
√
( n + 2d) that has St at the center. The probability of such an event can be upper
bounded as follows:
√
√
2
P r(ADV wins) <
β + z + 2 βz
( β + 2d)
=
n
n
(6)
The upper bound of the above equation has been validated empirically as described
in Sec. 4.2. Fig. 6 plots the probability for ADV to compromise query privacy in our
experimental results, in terms of sensors in the network (side of the network grid) and
sensors controlled by the adversary (side of the adversarial inner square). To ease presentation, we do not plot the analytical values as the difference between analytical and
experimental results reports, on any plotted point, an absolute error of at most 10−2 .
Preserving Query Privacy in Urban Sensing Systems
231
6.3 Overhead Analysis (Resident Adversary)
Sensing. Communication and computational overhead due to sensing is same as for
the non-resident adversary, thus, we do not repeat here its analysis and comparison to
related work.
Dissemination. During this phase each sensor replicates its reading to β replica sensors. Since these sensors are always selected among neighboring sensors (in both the
local chain and local diffusion modes),
√ the total number of messages needed for the
dissemination accounts to O(β) = O( n) for each sensor. Computational overhead is
negligible as it only involves forwarding.
Query. The
for this action (Alg. 2) is same as for the non-resident adversary,
overhead
√
hence, O( n n) total messages. Again, computational overhead is negligible, as it
only involves forwarding and symmetric key operations.
Take Away. We conclude that our protocol for resident adversaries (following both the
local
√ chain and the local diffusion approaches) introduces a total average overhead of
O( n) messages per sensor, which is smaller than that incurred by our technique for
non-resident adversaries. Again, the computational overhead incurred by our techniques
is negligible, when compared to any non privacy-preserving mechanism.
7 Discussion
In previous sections, we have presented a set of probabilistic techniques to enforce
query and data privacy in Urban Sensing. Our protocols face different adversarial
strategies, and provide different privacy degree and bandwidth overhead.
Observe that the degree of privacy provided by techniques against a non-resident
ADV is higher than those against a resident ADV, but they also incur higher message
overhead. Also note that, in the non-resident adversary model, the way the ADV is
distributed does not influence attained privacy degree. Interestingly, privacy guarantees
increase for larger networks, even with the same percentage of corrupted sensors. In
the case of resident adversaries, we have assumed that ADV is locally distributed in a
specific area of the network. We plan to include, in the extended version of the paper,
the analysis of privacy (and related dissemination techniques) for resident adversaries
that occupy different areas of the network.
Note that communication overhead is evenly distributed among sensors, while the
communication between the gateway and the querier is kept to the bare minimum –
just one message is routed to the GW. Also, to the best of our knowledge, our work
is the first to: (1) formalize the problem of protecting query and data privacy in Urban
Sensing systems; and, (2) to analyze the aforementioned adversarial models. Finally,
remark that all proposed techniques rely on generating replica of the sensed data. These
replica, beyond influencing the achieved degree of privacy, have the positive side effect
of enhancing data reliability and fault tolerance.
232
E. De Cristofaro and R. Di Pietro
8 Conclusion
To the best of our knowledge, this work is the first to define and address query and data
privacy in the context of Urban Sensing. We have proposed two novel adversary models
and, for each of them, a distributed technique that trades-off the achieved query privacy
level with a (slight) communication overhead. Data privacy is achieved using standard
(inexpensive) symmetric cryptography. Our techniques have been thoroughly analyzed
and, when compared to the threat, they resulted efficient and effective. Finally, we have
highlighted some problems that call for further research in this developing area.
Acknowledgments. This paper has been partially supported by: the grant HPC-2011
from CASPUR (Italy); Prevention, Preparedness and Consequence Management of Terrorism and other Security-related Risks Programme European Commission - DirectorateGeneral Home Affairs, under the ExTrABIRE project, HOME/2009/CIPS/AG/C2-065;
ACC1O, the Catalan Business Competitiveness Support Agency.
References
1. Burke, J., Estrin, D., Hansen, M., Parker, A., Ramanathan, N., Reddy, S., Srivastava, M.:
Participatory Sensing. In: World Sensor Web Workshop (2006)
2. Chang, K., Shin, K.G.: Distributed authentication of program integrity verification in wireless
sensor networks. ACM Trans. Inf. Syst. Secur. 11 (2008)
3. Chaum, D.L.: Untraceable electronic mail, return addresses, and digital pseudonyms. Communications of ACM 24(2) (1981)
4. Chor, B., Kushilevitz, E., Goldreich, O., Sudan, M.: Private information retrieval. Journal of
ACM 45(6) (1998)
5. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)
6. Cornelius, C., Kapadia, A., Kotz, D., Peebles, D., Shin, M., Triandopoulos, N.: AnonySense:
Privacy-aware people-centric sensing. In: MobiSys (2008)
7. Das, T., Mohan, P., Padmanabhan, V., Ramjee, R., Sharma, A.: PRISM: Platform for Remote
Sensing using Smartphones. In: MobiSys (2010)
8. De Cristofaro, E., Ding, X., Tsudik, G.: Privacy-preserving querying in wireless sensor networks. In: ICCCN (2009)
9. De Cristofaro, E., Soriente, C.: PEPSI: Privacy-Enhanced Participatory Sensing Infrastructure. In: WiSec (2011)
10. Ganti, R., Pham, N., Tsai, Y., Abdelzaher, T.: PoolView: Stream Privacy for Grassroots Participatory Sensing. In: SenSys (2008)
11. Huang, K., Kanhere, S., Hu, W.: Preserving Privacy in Participatory Sensing Systems. Computer Communications 33(11) (2010)
12. Lee, J., Hoh, B.: Sell Your Experiences: A Market Mechanism based Incentive for Participatory Sensing. In: PerCom (2010)
13. Lu, H., Pan, W., Lane, N., Choudhury, T., Campbell, A.: SoundSense: Scalable Sound Sensing for People-Centric Applications on Mobile Phones. In: MobiSys (2009)
14. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy
beyond k-anonymity. ACM Trans. on Knowledge Discovery from Data (TKDD) 1(1) (2007)
15. Mathur, S., Jin, T., Kasturirangan, N., Chandrasekaran, J., Xue, W., Gruteser, M.,
Trappe, W.: ParkNet: Drive-by Sensing of Road-side Parking Statistics. In: MobiSys (2010)
Preserving Query Privacy in Urban Sensing Systems
233
16. Menezes, A.: Elliptic curve public key cryptosystems. Kluwer (2002)
17. Mohan, P., Padmanabhan, V., Ramjee, R.: Rich Monitoring of Road and Traffic Conditions
using Mobile Smartphones. In: SenSys (2008)
18. Ortolani, S., Conti, M., Crispo, B., Di Pietro, R.: Event Handoff Unobservability in WSN.
In: Camenisch, J., Kisimov, V., Dubovitskaya, M. (eds.) iNetSec 2010. LNCS, vol. 6555,
pp. 20–28. Springer, Heidelberg (2011)
19. Ortolani, S., Conti, M., Crispo, B., Di Pietro, R.: Events privacy in WSNs: A new model and
its application. In: WoWMoM (2011)
20. Paulos, E., Honicky, R., Goodman, E.: Sensing Atmosphere. In: SenSys Workshops (2007)
21. Perito, D., Tsudik, G.: Secure Code Update for Embedded Devices Via Proofs of Secure Erasure. In: Gritzalis, D., Preneel, B., Theoharidou, M. (eds.) ESORICS 2010. LNCS, vol. 6345,
pp. 643–662. Springer, Heidelberg (2010)
22. Reddy, S., Estrin, D., Srivastava, M.: Recruitment Framework for Participatory Sensing Data
collections. In: Floréen, P., Krüger, A., Spasojevic, M. (eds.) Pervasive Computing. LNCS,
vol. 6030, pp. 138–155. Springer, Heidelberg (2010)
23. Reed, M.G., Syverson, P.F., Goldschlag, D.M.: Anonymous connections and onion routing.
IEEE Journal on Selected Areas in Communications 16(4) (1998)
24. Shi, J., Zhang, R., Liu, Y., Zhang, Y.: PriSense: Privacy-Preserving Data Aggregation in
People-Centric Urban Sensing Systems. In: INFOCOM (2010)
25. Sion, R., Carbunar, B.: On the Computational Practicality of Private Information Retrieval.
In: NDSS (2007)
26. Sweeney, L.: k-Anonymity: A model for Protecting Privacy. Int. J. Uncertain. Fuzziness
Knowl.-Based Syst. 10(5) (2002)