Selecting Observations against Adversarial Objectives

Simultaneous Placement and
Scheduling of Sensors
Andreas Krause, Ram Rajagopal,
Anupam Gupta, Carlos Guestrin
rsrg@caltech
..where theory and practice collide
Traffic monitoring
CalTrans wants to deploy wireless sensors
under highways and arterial roads
Deploying sensors is expensive
(need to close and open up roads etc.)
 Where should we place the sensors?
Battery lifetime ¼ 3 years
Need 10+ years lifetime for feasible deployment 
Solution: Sensor scheduling (e.g., activate every 4 days)
 When should we activate each sensor?
2
Monitoring water networks
Contamination of drinking water
could affect millions of people
Contamination
Sensors
~$7K
75 days
YSI 6600 Sonde
Simulator from EPA
Place sensors to detect contaminations
“Battle of the Water Sensor Networks” competition
 Where and when should we sense
to detect contamination?
3
Traditional approach
1.) Sensor Placement:
Find most informative locations
2.) Sensor Scheduling:
Find most informative activation times
(e.g., assign to groups + round robin)
If we know that we need to schedule, why not take
that into account during placement?
4
Our approach
1.) Sensor Placement:
Find most informative locations
Simultaneously optimize over
placement and schedule
2.) Sensor Scheduling:
Find most informative activation times
(e.g., assign to groups + round robin)
If we know that we need to schedule, why not take
that into account during placement?
5
Model-based sensing
Utility of sensing based on model of the world
For traffic monitoring: Learn probabilistic models from data (later)
For water networks: Water flow simulator from EPA
Model
predicts
impact “sensing quality” F(A)
For
each
subset A µ VLow
compute
High
impact
Contamination
location
Medium impact
location
S3
S1S2
S1
S4
Set V of all
network junctions
High sensing quality F(A) = 0.9
S3
Sensor reduces
impact through
early detection!
S2
S4
S1
Low sensing quality F(A)=0.01
6
Problem formulation
Sensor Placement:
Given: finite set V of locations, sensing quality F
Want:
A*µ V such that
Sensor Scheduling:
Given: sensor placement A* µ V
Want:
Ak* = sensors activated
at time k
Partition A* = A1* [ A2* [ … [ Ak* s.t.
Want to maximize average performance over time! 7
The SPASS Problem
Simultaneous placement and scheduling (SPASS):
Given: finite set V of locations, sensing quality F
Want:
Disjoint sets A1*, …, Ak* such that
| A1* [ … [ Ak *| · m and
At = sensors activated
at time t
Typically NP-hard!
8
Greedy average-case placement and scheduling (GAPS)
Greedily choose:
s: sensor location
t: time step to add s to
Start with A1,…,Ak = ;
For i = 1 to m
(s*,t*) := argmax(s,t) F(At [ {s}) – F(At)
At* := At* [ {s*}
1
1
4
2
1
3
1
2
3
4
Score F(Ai)
2
Contribution of s2 to F(A4)
F(A4 [ {s2}) – F(A4)
s8
s1
s5
s10
s12
s6
s11
s9
s7
s2
A1
A2
A3
A4
s13
How well can this simple heuristic do?
9
Key property: Diminishing returns
Placement A = {S1, S2}
Placement B = {S1, S2, S3, S4}
S2
S2
S3
S1
S1
S4
Theorem [Krause et al., J Wat Res Mgt ’08]:
Adding S’F(A) in water
Adding S’
S’
Sensing quality
networks is submodular!
will help a lot!
Submodularity:
B A
New
sensor S’
doesn’t help much
+ S’ Large improvement
+ S’
Small improvement
For A µ B, F(A [ {S’}) – F(A) ¸ F(B [ {S’}) – F(B)
10
Performance guarantee
Theorem
GAPS provides constant factor approximation
t F(AGAPS,t) ¸ 1/2 t F(A*t)
Proof Sketch:
SPASS requires maximization of a monotonic submodular function
over a truncated partition matroid
Theorem then follows from result by Fisher et al ’78
 Generalizes analysis of k-cover problem (Abrams et al., IPSN ’04)
 Can also get slightly better guarantee (¼ 0.63) using
more involved algorithm by Vondrak et al. ‘08
11
Average-case scheduling can be unfair
Consider V = {s1,…,sn}, k = 4, m = 10
t F(At) high!
mint F(At) low 
s6
s5
s1
s10
s12
s13
s8
s11
s2
s7
A1
A2
A3
Score F(Ai)
Score F(Ai)
t F(At) high!
mint F(At) high!
s10
s5
s12
s2
s11
s2
A4
A1
A2
s8
s13
s6
s1
s2
A4
s7
A3
Poor coverage at t=4!
 Want to ensure balanced coverage
12
Balanced SPASS
Want:
A1*, …, Ak* disjoint sets s.t. |A1* [ … [ Ak *| · m and
Greedy algorithm performs arbitrarily badly 
We now develop an approximation algorithm
for this balanced SPASS problem!
13
Key idea: Reduce worst-case to average-case
Suppose we learn the value attained by optimal solution:
c* = mint F(A*t) = OPT
Then we need to find a feasible solution A1,…,Ak such that
F(At) ¸ c* for all t
If we can check feasibility for any c,
we can find optimal c* using binary search!
How can we find such a feasible solution?
14
Trick: Truncation
Need to find a feasible solution such that
F(At) ¸ c for all t
c
For Fc(A) = min{F(A), c}:
F(At) ¸ c for all t  t Fc(At) = k c
F(A)
Fc(A)
|A|
Truncation preserves submodularity! 
Hence, to check whether OPT = mint F(A*t) ¸ c,
we need to solve average-case problem
15
Challenge: Use of approximation
Only have an ½-approximation algorithm (GAPS) for
average case problem
s10
s18
s19
s20
s15
s16
s27
s49
s45
s12
s14
s32
s11
s2
s6
s31
A1
A2
A3
A4
s9
s8
s7
s5
s13
s1
Optimal solution
has value 4c
c
Score Fc(Ai)
Score Fc(Ai)
c
s9
s8
s7
s13
s5
s10
s1
s12
s3
A1
no
coverage!
s2
A2
A3
A4
Approximate solution
guarantees only 2c
 Can lead to unbalanced solution! mint F(At) = 0
16
Remedy: Can rebalance solution
Can attempt to rebalance the solution, to obtain
uniformly high score for all buckets
Score Fc(Ai)
c
s9
s8
s7
s6
s13
s5
s12
s11
s2
A1
A2
s1
s10
A3
A4
17
Is rebalancing always possible?
If there are elements s where F({s}) is large,
rebalancing may not be possible:
Rebalanced solution
still has
Score Fc(Ai)
c
s3
mint F(At) = 0
s7
s2
A1
A2
A3
A4
18
Distinguishing big and small elements
Score Fc({s})
Element s2 V is big if F({s})¸  c for some fixed 0<<1
Will find out how
to choose  later!
c
“big” elements
c
s1
s2
s3
s4
…
sn
If we can ensure that F(At) ¸  c for all t
then we get  approximation guarantee!
Can remove big elements from problem instance!
19
How large should  be?
GAPS solution
on small elements
c
Score Fc(Ai)
s9
s8
s7
s6
c
s5
rebalanced
solution
s10
s11
s2
s12
A1
A2
A3
s4
…
Ak’
“satisfied”
time steps
Lemma: If  = 1/6, can always successfully
rebalance (i.e., ensure all time steps are satisfied)
20
eSPASS Algorithm
eSPASS:
Efficient Simultaneous Placement and Scheduling of Sensors
Initialize cmin=0, cmax = F(V)
Do binary search: c = (cmin+cmax)/2
Allocate big elements to separate time steps (and remove)
Run GAPS with Fc to find A1,…,Ak’, where k’ = k - #big elements
Reallocate small elements to obtain balanced solution
If mint F(At) ¸ c/6: increase cmin
If mint F(At) < c/6: decrease cmax
until convergence
21
Performance guarantee
Theorem
eSPASS provides constant factor 6 approximation
mint F(AeSPASS,t) ¸ 1/6 mint F(A*t)
Can also obtain data-dependent bounds which are
often much tighter
22
Experimental studies
Questions we ask:
How much does simultaneous optimization help?
Is optimizing the balanced performance a good idea?
How does eSPASS compare to existing algorithms
(for the special case of sensor scheduling)?
Case studies:
Contamination detection in water networks
Traffic monitoring
Community sensing
Selecting informative blogs on the web
23
Traffic monitoring
Goal: Predict normalized road speeds on unobserved
road segments from sensor data
Approach:
Learn probabilistic model (Gaussian process) from data
Use eSPASS to optimize sensing quality
F(A) = Expected reduction in MSE
when sensing at locations A
Data:
from 357 sensors deployed on highway I-880 South (PeMS)
Sampled between 6am and 11am during work days
24
Minimum variance reduction
Higher is better
Benefit of simultaneous optimization
eSPASS
80
OP: Optimized Placement
OS: Optimized Schedule
RP: Random Placement
RS: Random Schedule
OP/OS
60
RP/OS
40
OP/RS
20
RP/RS
0
5
10
15
Lifetime improvement (#time slots k)
20
Lifetime improvement (k groups)
Traffic data
¼ 30% lifetime improvement for same accuracy!
For large k, random scheduling hurts more than
random placement
25
50
Variance reduction
Higher is better
Average-case vs. Balanced Score
Avg. score
eSPASS
Avg. score
GAPS
48
46
Balanced score
eSPASS
44
Balanced score
GAPS
42
2
4
6
8
10
Lifetime improvement (5 sensors / time slot)
Traffic data
Optimizing for balanced score leads to good
average-case performance, but not vice versa
26
Minimum variance reduction
Higher is better
Data-dependent bounds
400
Bound from
Theorem 4.1
300
200
100
0
eSPASS
Data-dependent
bound
5
10
15
Lifetime improvement (#time slots k)
Traffic data
20
Our data-dependent bounds show that eSPASS solutions
are typically much closer to optimal than 1/6
27
Water network monitoring
Real metropolitan area network (12,527 nodes)
Water flow simulator provided by EPA
3.6 million contamination events
Multiple objectives: Detection time, affected population, …
Place sensors that detect well “on average”
28
Benefit of simultaneous optimization
Min. population protected
Higher balanced score
1
eSPASS
OP/OS
0.8
0.6
OP: Optimized Placement
OS: Optimized Schedule
RP: Random Placement
RS: Random Schedule
OP/RS
RP/OS
0.4
0.2
RP/RS
0
5
10 15 20 25 30 35
Number m of sensors (k=3)
More sensors
40
Water networks
Simultaneous optimization significantly
outperforms traditional approaches
E.g., ~3x reduction in affected population when m = 24, k = 3
29
Comparison with existing techniques
Comparison of eSPASS with existing algorithms for
scheduling (m = |V|):
MIP: Mixed integer program for domatic partitioning with
accuracy requirements (Koushanfary et al. 06)
SDP: Approximation algorithm for domatic partitioning
(Deshpande et al. 08)
Results on temperature monitoring (Intel Berkeley)
data set with 46 sensors
Goal: Minimize expected MSE
30
Lower error (MSE)
Comparison with existing techniques
Worst-case error
Average-case error
eSPASS outperforms existing
approaches for sensor scheduling
Temperature data
31
Trading off power and accuracy
Suppose that we sometimes activate all sensors
(e.g., determine boundary of traffic jam,
localize source of contamination)
Want to simultaneously optimize
mint F(At)
and
F(A1 [ … [ Ak)
“Balanced performance”
“High-density performance”
Scalarization: for some 0 < l < 1, we want to optimize:
l mint F(At) + (1-l) F(A1[ … [ Ak)
Theorem: Our algorithm, mcSPASS (multicriterion SPASS)
guarantees factor 8 approximation!
32
Tradeoff results
max l mint F(At) + (1-l) F(A1[ … [ Ak)
Stage-wise (l = 0)
eSPASS (l = 1)
mcSPASS (l = .25)
33
High-density performance
Tradeoff results
1
l mint F(At) +
(1-l) F(A1[ … [ Ak)
0.99
0.98
0.97
l=0
l = 0.25
0.96
l =1
0.95
0.94
0.82
0.84
0.86
0.88
0.9
Scheduled performance
0.92
Water networks
Can simultaneously obtain high performance
in scheduled and high-density mode
34
Conclusions
Introduced simultaneous placement and scheduling
(SPASS) problem
Developed efficient algorithms with strong guarantees:
GAPS:
1/2 approximation for average performance
eSPASS: 1/6 approximation for balanced performance
mcSPASS: 1/8 approximation for trading off high-density and
balanced performance
Data-dependent bounds show solutions close to optimal
Presented results on several real-world sensing tasks
35