Approximate Data Collection in Sensor Networks using Probabilistic

Approximate Data Collection in Sensor
Networks using Probabilistic Models
ICDE 2006
David Chu-Amol Deshpande-Joseph M. Hellerstein-Wei Hong-1
UC Berkeley
University of Maryland
UC Berkeley
Intel Research Berkeley
Arched Rock Corp.
klhsueh 09.11.03
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
2
Introduction
Sensing data
Kept in sync
3
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
4
Ken Operation
Is the expected values accurate enough?
No
Find the attributes that are useful to the prediction.
source
5
sink
Ken Operation
(at time t)
1.
Compute the probability distribution function (pdf)
2.
Compute the expected value according to the pdf
If
4. Otherwise:
3.
6
source
then stop.
a.
Find the smallest
such that the
expected value according to the pdf is accurate enough.
a.
Send the values of attributes in X to the sink.
Ken Operation
1.
sink
(at time t)
Compute the probability distribution function
If the sink received from the source values of attributes
in
, then condition p using these
values as described in source’s Step 4(a) above.
3. Compute the expected values of the attributes
,
and use them as the approximation to the true values.
2.
7
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
8
Replicated Dynamic Probabilistic Model
 Ex1: very simple prediction model
Assume that the data value
remains constant over time.
 Ex2: linear prediction model
It utilizes the temporal correlations,
but ignores spatial correlations.
Considering both
correlations
9
Ken uses dynamic
probabilistic model.
Replicated Dynamic Probabilistic Model
 Dynamic Probabilistic Model
 A probability distribution function (pdf) for the initial state
 A transition model
 The pdf at time t+1
10
observations communicated to the sink.
Replicated Dynamic Probabilistic Model
 Ex3: 2-dimensional linear Gaussian model
Not accurate!
Compute expected values
11
Wonly have to communicate
one value to the sink because
of spatial correlations.
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
12
Choosing the Prediction Model
 Total communication cost :
 intra-source
 Checking whether the prediction is accurate.
 source-sink
 Sending a set of values to the sink.
13
Choosing the Prediction Model
 Ex3: Disjoint-Cliques Model
Reduce intra-source cost &
Utilizing spatial correlations
between attributes
 Exhaustive algorithm for finding optimal solution
 Greedy heuristic algorithm
14
Choosing the Prediction Model
 Ex4: Average Model
15
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
16
Evaluation
 Real-world sensor network data
 Lab: Intel Research Lab in Berkeley consisting of 49 mica2 motes
 Garden: UC Berkeley Botanical Gardens consisting of 11 mica2 motes.
 Three attributes: {temperature, humidity, voltage}
 time-varying multivariate Gaussians
 We estimated the model parameters using the first 100 hours of
data (training data), and used traces from the next 5000 hours (test
data) for evaluating Ken.
 error bounds of 0.5oC for temperature, 2% for humidity and 0.1V for
battery voltage.
17
Evaluation
18
Evaluation
 Comparison Schemes
 TinyDB:
 always reports all sensor values to the base station
 Approximate Caching:
 caches the last reported reading at the sink and source, and sources do not
report if the cached reading is within the threshold of the current reading.
 Ken with Disjoint-Cliques (DjC) and Average (Avg) models:
 Greedy-k heuristic algorithm to find the Disjoint-Clique model (DjCk)
19
Evaluation
Ken and ApC both achieve
significant savings over TinyDB
Average reports at a higher
rate than Disjoint-Cliques with
max clique size restricted to 2
(DjC2).
Capturing and modeling
temporal correlations alone
may not be sufficient to
outperform caching.
Utilizing spatial correlations
20
21%
Garden dataset have more data reduction
36%
Evaluation
 Disjoint-Cliques Models
21
Evaluation
 Quantify the merit of various clique size
Physical deployment may not have
sufficiently strong spatial correlations.
22
Evaluation
 Base station resides at the east end of the network.
The areas closer to the base station do not benefit from larger cliques
23
Evaluation
24
Conclusion
 We propose a robust approximate technique called Ken that
uses replicated dynamic probabilistic models to minimize
communication from sensor nodes to the network’s PC base
station.
25