Estimation for Monotone Sampling:
Competitiveness and Customization
Edith Cohen
Microsoft Research
A Monotone Sampling Scheme
Data domain
π
π½(β π
)
random seed
π’ ~ π 0,1
π
Outcome π(π£, π’) : function of the data and seed
Monotone: Fixing π the information in π(π’, π£)
(set of all data vectors consistent with π and π’) is
non-increasing with π’.
Monotone Estimation Problem (MEP)
A monotone sampling scheme (π½, π) :
ο Data domain π½(β π
π )
ο Sampling scheme π: π½ × [0,1],
A nonnegative function π: π½ β₯ 0
Goal: estimate π π
Specify an estimator π(π) that is:
Unbiased, nonnegative, (Pareto) βoptimalβ
What we know on π(π) from π
Fix the data π. The lower the seed π’ is, the more we
know on π and hence on π(π).
π(π£)
Information
on π(π£)
π’
1
MEP applications in data analysis:
Scalable but perhaps approximate
query processing:
Data is sampled/sketched/summarized.
We process queries posed over the data by
applying an estimator to the sample.
We give an example
Example: Social/Communication data
Activity value π£(π, π) is associated with each node pair
(π, π) (e.g. number of messages, communication)
Pairs are PPS sampled (Probability Proportional to Size)
For π > 0, iid π’ π, π βΌ π[0,1]:
Monday
activity
(a,b) 40
(f,g)
5
(h,c) 20
(a,z) 10
β¦β¦
(h,f) 10
(f,s)
10
(π, π) β π β π£ π, π β₯ π · π’(π, π)
Monday
Sample:
(a,b) 40
(a,z) 10
β¦β¦..
(f,s) 10
Samples of multiple days
Coordinated samples: Each pair is sampled with same seed
π’(π, π) in different days
(a,b) 40
(f,g)
5
Monday
Sample:
(h,c) 20
(a,b) 40
(a,z) 10
(a,z) 10
β¦β¦
β¦β¦..
(h,f) 10
(f,s) 10
(f,s)
10
Wednesday
Tuesday
activity
Monday
activity
(a,b) 3
activity
(f,g) 5
Tuesday
Sample:
(g,c) 10
(a,b) 30
Wednesday
(g,c) 5
Sample:
(g,c)
(h,c) 10
(a,b) 30
(a,z) 50
(a,z) 50
(a,z) 10
(b,f) 20
β¦β¦
β¦β¦..
β¦β¦
β¦β¦..
(s,f) 20
(g,h)
(b,f) 20
(d,h) 10
(g,h) 10
(d,h) 10
Matrix view keys × instances
In our example: keys (a,b) are user-user pairs.
Instances are days.
Su
Mo
Tu
We
Th
Fr
Sa
(a,b)
40
30
10
43
55
30
20
(g,c)
0
5
0
0
4
0
10
(h,c)
5
0
0
60
3
0
2
(a,z)
20
10
5
24
15
7
4
(h,f)
0
7
6
3
8
5
20
(f,s)
0
0
0
20
100
70
50
(d,h)
13
10
8
0
0
5
6
Matrix view keys × instances
Coordinated PPS sample π = 100
π
Su
Mo
Tu
We
Th
Fr
Sa
0.33
(a,b)
40
30
10
43
55
30
20
0.22
(g,c)
0
5
0
0
4
0
10
0.82
(h,c)
5
0
0
60
3
0
2
0.16
(a,z)
20
10
5
24
15
7
4
0.92
(h,f)
0
7
6
3
8
5
20
0.16
(f,s)
0
0
0
20
100
70
50
0.77
(d,h)
13
10
8
0
0
5
6
Example Queries
ο§ Total communication from users in California
to users in New York on Wednesday.
ο§ πΏπ distance (change) in activity of male-male
users over 30 between Friday and Monday
ο§ Breakdown: total increase, total decrease
ο§ Average of median/max/min activity over days
We would like to estimate the query result from
the sample
Estimate one key at a time
Queries are often (functions of) sums over
selected keys β of a function π applied to the
values tuple of β
π(β) = (π£
π(π(β) )
β
1
,π£
β
2 ,π£
β
3
,β¦)
For πΏπ distance: π(π) = |π£1 β π£2 |π
β
Estimate one key at a time:
π(πβ )
β
The estimator for π(π(β) ) is
applied to the sample of π(β)
βWarmupβ queries:
Estimate a single entry at a time
ο§ Total communication from users in California
to users in New York on Wednesday.
Inverse probability estimate
(Horviz Thompson) [HT52]:
Over sampled entries π that match predicate (CA
to NY, Wednesday), add up value divided by
inclusion probability in sample π(π)/π(π)
HT estimator (single-instance)
Coordinated PPS sample π = 100
π
Su
Mo
Tu
We
Th
Fr
Sa
0.33
(a,b)
40
30
10
43
55
30
20
0.22
(g,c)
0
5
0
0
4
0
10
0.82
(h,c)
5
0
0
60
3
0
2
0.14
(a,z)
20
10
5
24
13
7
4
0.92
(h,f)
0
7
6
3
8
5
20
0.16
(f,s)
0
0
0
20
100
70
50
0.77
(d,h)
13
10
8
0
0
5
6
HT estimator (single-instance)
π = 100. Select Wednesday, CA-NY
π
Su
Mo
Tu
We
Th
Fr
Sa
0.33
(a,b)
40
30
10
43
55
30
20
0.22
(g,c)
0
5
0
0
4
0
10
0.82
(h,c)
5
0
0
60
3
0
2
0.16
(a,z)
20
10
5
24
15
7
4
0.92
(h,f)
0
7
6
3
8
5
20
0.16
(f,s)
0
0
0
20
100
70
50
0.77
(d,h)
13
10
8
0
0
5
6
HT estimator for single-instance
π = 100. Select Wednesday, CA-NY
We
π
0.33
(a,b)
43
0.22
(g,c)
0
0.82
(h,c)
60
0.16
(a,z)
24
0.92
(h,f)
3
0.16
(f,s)
20
0.77
(d,h)
0
Exact: 43 + 60 + 20 = 123
π = 0.43
HT estimate is 0 for keys that are not
sampled, π£/π when key is sampled
HT estimate: 100 + 100 = 200
π = 0.20
Inverse-Probability (HT) estimator
ο§ Unbiased: important because bias adds up and we
are estimating sums
ο§ Nonnegative: important because π is
ο§ Bounded variance (for all π)
ο§ Monotone: more information β higher estimate
ο§ Optimality: UMVU The unique minimum
variance (unbiased, nonnegative, sum)
estimator
Works when π depends on a single entry.
What about general π ?
Queries involving multiple columns
ο§ πΏπ distance (change) in activity of βmale users
over 30β between Friday and Monday
π(π) = |π£1 β π£2 |π
ο§ Breakdown: total increase, total decrease
π(π) = max{0, π£1 β π£2 }π
HT may not work at all now and may not be
optimal when it does.
We want estimators with the same nice properties
Sampled data
Coordinated PPS sample π = 100
π
Su
Mo
Tu
We
Th
Fr
Sa
0.33
(a,b)
40
30
10
43
55
30
20
0.22
(g,c)
0
5
0
0
4
0
10
0.82
(h,c)
5
0
0
60
3
0
2
0.16
(a,z)
20
10
5
24
15
7
4
0.92
(h,f)
0
7
6
3
8
5
20
0.16
(f,s)
0
0
0
20
100
70
50
0.77
(d,h)
13
10
8
0
0
5
6
Want to estimate 55 β 43 2 + 8 β 3 2 + 24 β 15
Lets look at key (a,z), and estimating ππ β ππ π
2
Information on π
Fix the data π. The lower π’ is, the more we
know on π and on π π = 24 β 15 2 = 81.
We plot the lower bound we have on π(π) as a
function of the seed π’.
81
0.15
0.24
1
π’
This is a MEP !
Monotone Estimation Problem
A monotone sampling scheme (π½, π) :
ο Data domain π½(β π
π )
ο Sampling scheme π: π½ × [0,1],
A nonnegative function π: π½ β₯ 0
Goal: estimate π(π): specify a good estimator π(π, π’)
Our results:
General Estimator Derivations for any
MEP for which such estimator exists
ο§ Unbiased, Nonnegative, Bounded variance
ο§ Admissible: βPareto Optimalβ in terms of
variance
Solution is not unique.
The optimal range
Our results:
General Estimator Derivations
ο§ Order optimal estimators: For an order βΊ on the
data domain π½: Any estimator with lower variance
on π, must have higher variance on π βΊ π
The L* estimator:
ο§ The unique admissible monotone estimator
ο§ Order optimal for: π βΊ π βΊ π π < π(π)
ο§ 4-variance competitive
The U* estimator:
ο§ Order optimal for: π βΊ π βΊ π π > π(π)
The L* estimator
Summary
οDefined Monotone Estimation Problems
(motivated by coordinated sampling)
οStudy Range of Pareto optimal (admissible)
unbiased and nonnegative estimators:
ο§ L* (lower end of range: unique monotone estimator,
dominates HT) ,
ο§ U* (upper end of range),
ο§ Order optimal estimators (optimized for certain data
patterns)
Follow-up and open problems
ο Tighter bounds on universal ratio: L* is 4 competitive, can
do 3.375 competitive, lower bound is 1.44 competitive.
ο Instance-optimal competitiveness β Give efficient
construction for any MEP
ο MEP with multiple seeds (independent samples)
ο Applications:
ο§ Estimating Euclidean and Manhattan distances from
samples [C KDD β14]
ο§ sketch-based similarity in social networks [CDFGGW
COSN β13],
ο§ Timed-influence oracle [CDPW β14]
L1 difference [C KDD14]
Independent / Coordinated PPS sampling
#IP flows to a destination in two time periods
2
L2
difference [C KDD14]
Independent/Coordinated PPS sampling
Surname occurrences in 2007, 2008 books (Google ngrams)
Why Coordinate Samples?
β’ Minimize overhead in repeated surveys (also storage)
Brewer, Early, Joice 1972; Ohlsson β98 (Statistics) β¦
β’ Can get better estimators
Broder β97; Byers et al Tran. Networking β04; Beyer et al
SIGMOD β07; Gibbons VLDB β01 ;Gibbons Tirthapurta SPAA
β01; Gionis et al VLDB β99; Hadjieleftheriou et al VLDB 2009;
Cohen et al β93-β13 β¦.
β’ Sometimes cheaper to compute
Samples of neighborhoods of all nodes in a graph in linear
time Cohen β93 β¦
Variance Competitiveness [CK13]
An estimator π π, π’ is c-competitive if for any
data π, the expectation of the square is within a
factor c of the minimum possible for π (by an
unbiased and nonnegative estimator).
For all unbiased nonnegative π,
πΈ [π 2 S, u | π] β€ π πΈ [π2 S, u | π]
Optimal estimates π
(π)
for data π
The optimal estimates π (π) are the negated derivative of
the lower hull of the Lower bound function.
Lower Bound function for π
Lower Hull
π(π, π’)
1
π’
Intuition: The lower bound tell us on outcome S, how βhighβ we
can go with the estimate, in order to optimize variance for π
while still being nonnegative on all other consistent data vectors.
Manhattan Distance
Euclidean Distance
© Copyright 2026 Paperzz