The Minimal Communication
Cost of Gathering Correlated
Data over Sensor Networks
EL 736 Final Project
Bo Zhang
Motivation: Correlated Data Gathering
Correlated data gathering core
component of many applications, real
life information processes
Large scale sensor applications
Scientific data collection: Habitat Monitoring
High redundancy data: temperature, humidity,
vibration, rain, etc.
Surveillance videos
Resource Constraint
Data collection at one or more sinks
Network: Limited Resources
Wireless Sensor Networks
Energy constraint (limited battery)
Communication cost >> computation cost
Internet
Cost metrics: bandwidth, delay etc.
Problem:
What is the Minimum total cost (e.g.
communication) to collect correlated data at
single sink?
Model Formalization
Source Graph: GX
Undirected graph G(V, E)
Source nodes {1, 2, …, N }, sink t
e=(i, j) E — comm. link, weight we
Discrete Sources: X={ X1, X2, …, XN }
Arbitrary distribution p( X1=x1, X2=x2, …, XN=xN )
Generate i.i.d. samples, arbitrary sample rate
Task: collect source data with negligible loss at t
7
5
9
2
1
Sink-t
3
10
8
12
4
6
11
Model Formalization: continued
Linear costs
g( Re, we ) = Re · we , e E
Re - data rate on edge e, in bits/sample
we - weight depends on application
For communication cost of wireless links
we l , 2 4 , l – Euclidean distance
Goal: Minimize total Cost
Minimal Communication Cost Uncapacitated and data correlation ignored
Link-Path Formulation
ECMP Shortest-Path Routing: Uncapacitated Minimum Cost
indices
d = 1, 2, ...,D
demands
p = 1, 2, ..., Pd paths for demand d
e = 1, 2, ...,E links
constants
hd
δedp = 1
7
5
9
variables
We
Xdp(w)
2
1
12
4
3
6
8
metric of link e, w = (w1, w2, ...,wE)
(non-negative) flow induced by link metric system
w for demand d on path p
minimize
F = Σe WeΣd Σpδedp Xdp(w)
Sink-t
10
volume of demand d
if link e belongs to path p realizing demand d
11
constraints
Σp Xdp(w) = hd, d= 1, 2, ...,D
Data correlation –
Tradeoffs: path length vs. data rate
X2
Routing vs. Coding (Compression)
X1
Shorter path or fewer bits?
Example:
2
Two sources X1 X2
Three relaying nodes 1, 2, 3
R - data rate in bits/sample
Joint compression reduces redundancy
X2
X1
t
X2
X1
R1
R2
R1
t
3
1
R2
R3<R1+R2
t
Data correlation - Previous Work
Explicit Entropy Encoding (EEC)
Joint encoding possible only with side info
H(X1,X2,X3)= H(X1)+ H(X2|X1)+ H(X3|X1,X2)
Coding depends on routing structure
X1
Routing - Spanning Tree (ST)
Finding optimal ST NP-hard
7
5
9
2
1
Sink-t
12
4
3
6
10
8
11
X2
H(X2)
H(X1)
X3
H(X1,X2, X3)
Data correlation - Previous Work (Cont’d)
Slepian-Wolf Coding (SWC):
Optimal SWC scheme
routes? Shortest path routing
rates? LP formulation
(Cristecu et al, INFOCOM04)
5
9
2
1
Sink-t
12
4
3
6
8
11
Correlation Factor
For each node in the Graph G (V,E), find
correlation factors with its neighbors.
Correlation factor ρuv , representing the correlation
between node u and v.
ρuv = 1 – r / R
R - data rate before jointly compression
r - data rate after jointly compression
Correlation Factor (Cont’d)
Shortest Path Tree (SPT):
Total Cost: 4R+r
Jointly Compression:
Total Cost: 3R+3r
All edge weights are 1
As long as ρ= 1- r/R > 1/2, the SPT is no longer
optimal
Minimal Communication Cost – local data
correlation :Add Heuristic Algorithm
Step 0: Initially collecting data at sink t via shortest path. Compute Cost Fi(0) = Σe Ri We, where We is
the weight of link e realizing demand Ri. Set Si(0) = {j’}, where j is the next-hop of node i. i, j = 1,
2… N, i ≠ j . Set iteration count to k = 0. Let Mi denote the neighbors of node i.
Step 1: For j ∈ Mi\Si(k), do
Fij(k+1) = Fi(k) – RiWij’+RiWij + Σe (Ri – ρij) We
Step 2: Determine a new j such that
Fij(k+1) = min {Fij(k+1)} < Fi(k).
If there is no such j, go to step 4.
Step 3: Update
Si(k+1) = {j}
Set Fi(k+1) = Fij(k+1) and k := k + 1 and go to Step 1.
Step 4: No more improvement possible; stop.
Add Heuristic: example
First Step: Shortest path routing
7
5
9
2
1
After Heuristic:
Sink-t
12
4
3
6
10
8
11
When ρij >1/2, j will be
the next hop of i.
Local data correlation: analysis
Information from neighbors needed
Optimal?
Approximation algorithm
Other factors took into account: energy,
capacity…
Thanks!
© Copyright 2026 Paperzz