Approximate Data Collection in Sensor Networks using Probabilistic

Approximate Data Collection in Sensor
Networks using Probabilistic Models
ICDE 2006
David Chu-Amol Deshpande-Joseph M. Hellerstein-Wei Hong-1
UC Berkeley
University of Maryland
UC Berkeley
Intel Research Berkeley
Arched Rock Corp.
klhsueh 09.11.03
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
2
Introduction
Sensing data
Kept in sync
3
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
4
Ken Operation
Is the expected values accurate enough?
No
Find the attributes that are useful to the prediction.
source
5
sink
Ken Operation
(at time t)
1.
Compute the probability distribution function (pdf)
2.
Compute the expected value according to the pdf
If
4. Otherwise:
3.
6
source
then stop.
a.
Find the smallest
such that the
expected value according to the pdf is accurate enough.
a.
Send the values of attributes in X to the sink.
Ken Operation
1.
sink
(at time t)
Compute the probability distribution function
If the sink received from the source values of attributes
in
, then condition p using these
values as described in source’s Step 4(a) above.
3. Compute the expected values of the attributes
,
and use them as the approximation to the true values.
2.
7
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
8
Replicated Dynamic Probabilistic Model
 Ex1: very simple prediction model
Assume that the data value
remains constant over time.
 Ex2: linear prediction model
It utilizes the temporal correlations,
but ignores spatial correlations.
Considering both
correlations
9
Ken uses dynamic
probabilistic model.
Replicated Dynamic Probabilistic Model
 Dynamic Probabilistic Model
 A probability distribution function (pdf) for the initial state
 A transition model
 The pdf at time t+1
10
observations communicated to the sink.
Replicated Dynamic Probabilistic Model
 Ex3: 2-dimensional linear Gaussian model
Not accurate!
Compute expected values
11
Wonly have to communicate
one value to the sink because
of spatial correlations.
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
12
Choosing the Prediction Model
 Total communication cost :
 intra-source
 Checking whether the prediction is accurate.
 source-sink
 Sending a set of values to the sink.
13
Choosing the Prediction Model
 Ex3: Disjoint-Cliques Model
Reduce intra-source cost &
Utilizing spatial correlations
between attributes
 Exhaustive algorithm for finding optimal solution
 Greedy heuristic algorithm
14
Choosing the Prediction Model
 Ex4: Average Model
15
Outline
 Introduction
 Ken architecture
 Replicated Dynamic Probabilistic Model
 Choosing the Prediction Model
 Evaluation
 Conclusion
16
Evaluation
 Real-world sensor network data
 Lab: Intel Research Lab in Berkeley consisting of 49 mica2 motes
 Garden: UC Berkeley Botanical Gardens consisting of 11 mica2 motes.
 Three attributes: ｛temperature, humidity, voltage｝
 time-varying multivariate Gaussians
 We estimated the model parameters using the first 100 hours of
data (training data), and used traces from the next 5000 hours (test
data) for evaluating Ken.
 error bounds of 0.5oC for temperature, 2% for humidity and 0.1V for
battery voltage.
17
Evaluation
18
Evaluation
 Comparison Schemes
 TinyDB:
 always reports all sensor values to the base station
 Approximate Caching:
 caches the last reported reading at the sink and source, and sources do not
report if the cached reading is within the threshold of the current reading.
 Ken with Disjoint-Cliques (DjC) and Average (Avg) models:
 Greedy-k heuristic algorithm to find the Disjoint-Clique model (DjCk)
19
Evaluation
Ken and ApC both achieve
significant savings over TinyDB
Average reports at a higher
rate than Disjoint-Cliques with
max clique size restricted to 2
(DjC2).
Capturing and modeling
temporal correlations alone
may not be sufficient to
outperform caching.
Utilizing spatial correlations
20
21%
Garden dataset have more data reduction
36%
Evaluation
 Disjoint-Cliques Models
21
Evaluation
 Quantify the merit of various clique size
Physical deployment may not have
sufficiently strong spatial correlations.
22
Evaluation
 Base station resides at the east end of the network.
The areas closer to the base station do not benefit from larger cliques
23
Evaluation
24
Conclusion
 We propose a robust approximate technique called Ken that
uses replicated dynamic probabilistic models to minimize
communication from sensor nodes to the network’s PC base
station.
25

Download Report

Approximate Data Collection in Sensor Networks using Probabilistic

Paperzz.com

Your Paperzz