Sensor Data Management - DKE

한국기술교육대학교
민준기

Wireless Sensor Network
◦ Limited Energy Power
◦ Limited Computing Power

Sensor Data Management
◦ Navie Approach
 Each Sensor sends data to the base station
 Do data processing at the base station
◦ Problem
 Each sensor waste its energy quickly in order to send its reading
continuously
◦  Minimize Energy Consumption
◦  In-Network Processing

Data Aggregation

Data Gathering

Query Processing

TAG (Tiny Aggregation)
◦ In-Network Aggregation
◦ Tree Routing Based
◦ Simple Approach
◦  Cost for Median is
very high
Sum(2,12, 7)
2
Sum(4,5,3)
Sum(3,2,2)
4
5
3
3
2
2

Q-Digest[2]
◦ Capture the distribution of sensor data approximately
◦ Digest property
 count(v) <= floor(n/k)
(except leaf node)
 count(v)+count(vp)+count(vs) >= floor(n/k)
(except root node)
, where v is a node, vp is the parent of v, vs is the sibling of v.
n is the number of data, k is compression parameter
σ is the range of data
◦ Size of q-Digest <= 3k


Each Sensor build q-Digest
Parent node
◦ Merges q-Digests of Children
◦ Compression
compression

Quantile Query
◦ Find value whose rank
in n values is qn,
where q (0,1)
If q = 0.5, find median
<[1,8],1> <[5,6], 2> <[7,8], 2> <[3,3],4> <[4,4], 6>
Sorting in increasing right end point
<[3,3],4> <[4,4], 6> <[5,6], 2> <[7,8], 2> <[1,8],1>
<[4,4],6> exceed 0.5*15= 7.5
Thus, 4 is an estimated median

Multiple Aggregation
◦ Equivalence Class Reduction[3]
 Q = {q1 = {1+2+3}, q2 ={1+2}, q3 = {3}}
 Equivalent class = set of sensors supports same query
set
 EC1 = {1,2} , EC2 = {3}
 Bit Vector EC1 = [1,1,0]T, EC2 = [1,0,1]T
EC1 EC2
Q1
1
1
basis
Q2
1
0
x v1 = {1+2}  1 0 x v1
Q3
0
1
v2 = {3}
0 1
v2

Multiple Aggregation
◦ Segmentation Based Method[4]




Dynamic routing, Not tree routing
Segment == equivalent class
A sensor sends data to a node including same segment as possible
STG
vs
STS


Node 6 can send data to node 5 and 7, in case, node 6 sends data to node 7
STG : node 4 sends data for q2 (=4, 7, 8) and q1+q2 (=4,5)
node 1 receives 3 messages
( from node 2 - 1 message, node 4- 2 messages)
STS: multiple routing
node 4 sends data for q2 (=4,5,6,7) to node 1 and q1(=4,5) to node 2
node 1 receives 2 messages




In-network aggregation provides a great
opportunity for reducing the communication
overhead
Since a single aggregated value represents
the overall sensing field, it may be
insufficient to analysis the correlation among
subregions of the sensor field
Sensor Data Gathering
◦ Exact Data Gathering  waste Energy
◦ Solution  reduce the number of transmission

Basic Approach
◦ Temporal Suppression
 A node does not transmit a value if it has not change
since last reported
◦ Spatial Suppression
 A node suppresses it value if it is identical to those of
its neighboring

Approximate Gathering
◦ Sensor readings have errors intrinsically
◦ Sensor readings have strong correlations

Approximate Data Gathering
◦ Each Sensor has a tool to estimate future value
◦ The base Station also keep tools
 If a sensor does not send data  estimation correct
 If a sensor sends data  estimation incorrect
 Update tools of the sensor and the basestation
◦ Model Based
 BBQ[5]
 KEN[6]
 PAQ[7]
◦ Filter Based
 Dual Kalman[9]
◦ Compression Based
 Wavelet, DFT, SBR[8]etc.
 A collection of readings of a sensor is transmitted
periodically

Model Based Approach
◦ Linear Regression
 Xt+1 = aXt+b
◦ BBQ, KEN
 Multivariate Gaussian model
 Probability density function: P(X1, X2, X3, …, Xn)
 Xi: random variable for sensor readings

Approximate Gathering
◦ PAQ
 Linear Regression and Gaussian model require much
time to construct correct model, and much data
 AutoRegression(3) model
 A data Vt = mt+X(t)  Vt - mt= X(t)
 X(t) = aX(t-1)+bX(t-2)+cX(t-3)+b(w)N(0,1)
 mt is a mean of V to time t, a,b,c is real constants, b(w)
is white noise
 Predictor P(t) = mt+ a(vt-1 – mt-1)+ b(vt-2 – mt-2) + c(vt-3 –
mt-3)

PAQ
◦ Lemma)Let e = v b(w), where v > 1. Then the actual
value at time t is contained in [P(t)-e , P(t)+e)] with
probability at most 1/v2.
Proof) Chebychev inequality
P(|vt- P(t)| > e) <= b(w)2/e2 = b(w)2/v2b(w)2 = 1/v2
◦ Generally v is 6 or 7
◦ Using above Lemma, PAQ decide when it updates
its model.
Well fit
-e
-d
Parital fit
d
-e
Outlier

Filter Based
◦ Mode Based Approach requires much data to
construct models
◦ Each node has the filter according to the last
reported sensor reading
 |Vnew – Vold| > e, the reading is sent to the base station

Dual Kalman Filter
◦ Base station has as many filters as the number of
sensors
Initial state
◦ Discrete Kalman Filter
◦ Ex) moving object
Prediction step
Correction step
 State model : xt = vt-1*dt+xt-1
vt = vt-1
 Measure model: z (real position)
project
current state
Compute
Kalman gain
Estimate
next state
Update
system state
 z = [1 0]T x +vt
, where vt is measurement white Guassion noise
Update
error covariance

Join Operation
◦ An important operator
◦ It allows to relate measurements taken at different
nodes.
L
R

General Join Plans[12,13]
L
R
Sequential
L
R
Naive
L
R
Centroid

Optimal Join Location[14]
◦ Weighted Fermat Problem
 One wants to find the point with the property that the
weighted sum of the distances from the point to the
vertexes of a triangle is minimized.

Synopsis Join[13]
◦ Prunes non-candidate tuples and only joins
candidate tuples
◦ Preliminary Join
 Eliminate
non-candidate tuples
◦ Final Join

TPSJ [10]
◦ Preprocessing: Query Decomposition
 Query Q
 Decomposed Queries
Q1
Q2
Pa
ge
21
TPSJ

Fist phase
◦

Query Q1 execute
Second phase
◦

Query Q2 is executed with the injecting of R1 into the
network
Pa
ge
22

Sensor
◦ Light weight
◦ Wireless

Sensor Data Management
◦ Reduce Energy consumption
 In-network Processing
 Aggregation
 Gathering
 Query Processing












[1] S. Madden et.al., “TAG: Aggregation Service for Ad-Hoc Sensor Networks”, OSDI, 2002
[2] N. Shrivastava et.al., “Medians and Beyond: New Aggregation Techniques for Sensor Networks,” ACM
Sensys 2004
[3] N. Trigoni et.al., “Multi-Query Optimization for Sensor Networks” DCOSS 2005
[4]N. Trigoni, et.al., "Routing and Processing Multiple Aggregate Queries in Sensor Networks,“ ACM Sen
Sys, 2006.
[5] A. Deshpande et.al., "Model-Driven Data Acquisition in Sensor Networks,“ VLDB, 2004.
[6] D. Chu et.al., "Approximate Data Collection in Sensor Networks using Probabilistic Models,“ ICDE, 2
006
[7] D. Tulone et. al., “PAQ: Time Series Forecasting For Approximate Query Answering In Sensor
Networks,” European Conf. Wireless Sensor Networks, 2006
[8] A. Deligiannakis et.al., “Compressing Historical Information in Sensor Networks,” ACM SIGMOD 200
4
[9] A. Jain et.al., “Adaptive Stream Resource Management Using Kalman Filters,” ACM SIGMOD 2004
[10] X. Yang et.al., “In-Network Execution of Monitoring Queries in Sensor Networks,” ACM SIGMOD
2007.
[11]M. Stern et.al., “Towards Efficient Processing of Gneral-Purpose Joins in Sensor Networks,” ICDE
2009.
[12]A. Pandit et.al, “ Communication-Efficient Implementation of Range-Joins in Sensor Networks,”
International Conference on Database Systems for Advanced Applications (DASFAA), 2006

[13] H. Yu et.al, “In-Network Join Processing for Sensor Networks,” APWeb 2006.

[14] A. Coman et.al, “On Join Location in Sensor Networks,” MDM 2007.