Information Technology
Selecting Representative Objects
Considering Coverage and Diversity
Shenlu Wang1, Muhammad Aamir Cheema2, Ying Zhang3, Xuemin Lin1
1
The University of New South Wales, Australia
2 Monash University, Australia
3 The University of Technology, Australia
Outline
Influence Sets
Reverse k Nearest Neighbors Queries
Reverse Top-k Queries
Reverse Skyline Queries
Representative Objects using Influence Sets
Techniques
Experiment Results
Summary
Faculty of Information Technology
Influence Set
Influence
In a data set consisting of facilities
and users, a facility 𝒇 influences a
user 𝒖 if 𝒖 considers 𝒇 as one of its
most “important” facilities
Influence Set of Coles
U1
U2
f2
Influence Set
A set of users influenced by 𝒇 is
called influence set of 𝒇
f1
Faculty of Information Technology
Influence Set
Who are my potential
customers ?
Important facility?
A facility f is important for u if it is one
of the top-k facilities for a user u
considering her preferences, e.g.,
Distance
Rating
Price
Faculty of Information Technology
Influence Set
Significance
Important to identify potential users/customers
Used in various applications such as marketing, cluster and
outlier analysis, and decision support systems
Types
Reverse 𝒌 Nearest Neighbors
Reverse Top-𝒌
Reverse Skyline
Faculty of Information Technology
Outline
Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
Representative Objects using Influence Sets
Techniques
Experiment Results
Summary
Faculty of Information Technology
Reverse k Nearest Neighbors (RkNN)
•
•
Definition of importance
– A facility f is important to a user if f is
one of its k closest facilities
u2
Reverse k Nearest Neighbors
– Find every user u for which the query
facility q is important, i.e., q is one of
its k-closest facilities.
Influence set of f1 is {u1,u2}
Influence set of f2 is {u3}
f1
u1
u3
f2
K=1
Faculty of Information Technology
RkNN Algorithms
Pruning
Six-regions
TPL
FINCH
Boost
InfZone
SLICE
Verification
(Stanoi et al., SIGMOD 2000)
(Tao et al., VLDB 2004)
Six-regions (SIGMOD 2000)
Region-based
(Wu et al., VLDB 2008)
SLICE (ICDE 2014)
(Emrich et al., SIGMOD 2010)
TPL (VLDB 2004),
Half-space
(Cheema et al., ICDE2011)
FINCH (VLDB 2008),
(Yang et al., ICDE 2014)
InfZone (ICDE 2011)
Faculty of Information Technology
RkNN Algorithms
•
k=2
Regions-based Pruning:
u2
-Six-regions
[Stanoi et al., SIGMOD 2000]
1.
b
c
u1
Divide the whole space centred at the
query q into six equal regions
2.
Find the k-th nearest neighbor in each
Partition.
3.
The k-th nearest facility of q in each region
defines the area that can be pruned
a
q
d
The user points that cannot be pruned should be
verified by range query
Faculty of Information Technology
RkNN Algorithms
•
k=2
Half-space Pruning:
the space that is contained by k halfspaces can be pruned
-TPL [Tao et al., VLDB 2004]
1.
b
c
a
u
Find the nearest facility f in the unpruned
area.
q
2.
3.
Draw a bisector between q and f, prune by
using the half-space
d
Go to step 1 unless all facilities in the
unpruned area have been accessed
Checking which k-half spaces prune a point/node is expensive
TPL ++ [Yang et al., PVLDB 2015]
Faculty of Information Technology
RkNN Algorithms
•
k=2
FINCH [Wu et al., VLDB 2008]
– Approximate the unpruned area
by a convex polygon
b
c
a
q
d
Faculty of Information Technology
RkNN Algorithms
•
k=2
InfZone [Cheema et al., ICDE 2011]
b
1.
The influence zone corresponds to the
unpruned area when the bisectors of all the
facilities have been considered for pruning.
c
a
q
2.
3.
A user u is a RkNN of q if and only if u lies
inside the influence zone
d
No verification phase.
Faculty of Information Technology
RkNN Algorithms
•
SLICE [Yang et al., ICDE 2014]
1.
Divide the whole space centred at the
query q into t equal regions
2.
f1
f2
Draw arcs for each facility
q
3.
k-th arc in each partition defines the
pruning region
Pruning requires checking only one distance
k=2
Faculty of Information Technology
Outline
Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
Representative Objects using Influence Sets
Techniques
Experiment Results
Summary
Faculty of Information Technology
Influence Set based on Reverse Top-k
•
•
Definition of importance
– Each user u has a preference function
– A facility f is important to a user u if f is
one of the top-k facilities for u
Reverse Top-k Query (RTk)
– Find every user u for which the query
facility q is one of her top-k facilities.
Influence set of f1 is {u2}
Influence set of f2 is {u1,u3}
1*distance
2
u2
f1
Price=2
u1
0.9*price + 0.1*distance
3
0.5*price + 0.5*distance
u3
Price=1
f2
K=1
Faculty of Information Technology
Existing work on Reverse Top-k
Vlachou et al., “Reverse top-k queries”, ICDE 2010
Chester et al., “Indexing reverse top-k queries in two dimensions,” DASFAA
2013
Cheema et al., “A Unified Framework for Efficiently Processing Ranking
Related Queries”, EDBT 2014
Vlachou et al., “Branch-and-bound algorithm for reverse top-k queries”,
SIGMOD 2013
Ge et al., “Efficient all top-k computation: A unified solution for all top-k,
reverse top-k and top-m influential queries”, TKDE 2013.
Vlachou et al., “Monitoring reverse top-k queries over mobile devices”, MobiDE
2011
Yu et al., “Processing a large number of continuous preference top-k queries”,
SIGMOD 2012
Faculty of Information Technology
Outline
Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
Representative Objects using Influence Sets
Techniques
Experiment Results
Summary
Faculty of Information Technology
Influence Set based on Reverse Skyline
•
•
•
Dominance
A facility x dominates another facility y
w.r.t. a user u, if for every attribute, u
prefers x over y
Definition of importance
A facility f is important to a user u if f is not
dominated by any other facility
Reverse Skyline
Find every user u for which the query
facility q is not dominated by any other
facility.
Influence set of f1 is {u1,u2}
Influence set of f2 is {u1,u2,u3}
f1
u2
Price=2
u1
Price=1
u3
f2
Faculty of Information Technology
Existing work on Reverse Skylines
Dellis et al., “Efficient computation of reverse skyline queries”, VLDB 2007
Lian et al., “Reverse skyline search in uncertain databases”, TODS 2010
Prasad et al., “Efficient reverse skyline retrieval with arbitrary non-metric
similarity measures”, EDBT 2011
Wang et al., “Energy-efficient reverse skyline queries processing over wireless
sensor networks”, TKDE 2012
Wu et al., “Finding the influence set through skylines”, EDBT 2009
Faculty of Information Technology
Outline
Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
Representative Objects using Influence Sets
Techniques
Experiment Results
Summary
Faculty of Information Technology
Representative Objects
Given a set of facilities and a set of users, choose t representative
facilities considering coverage and diversity
Coverage
•LetKoh
I(f) denote
influence
set of afavorite
facility. products based on
et al., the
“Finding
k most
Given
a settop-t
of facilities
F, its
coverage
is the measure of total
reverse
queries”,
VLDB
J. 2014
number of distinct users that are influenced by the facilities in F
• Gkorgkas et al., “ Finding the most diverse products using
preference queries”, EDBT 2015
Faculty of Information Technology
Representative Objects
Diversity
Let I(f) denote the influence set of a facility.
Dissimilarity between two facilities is defined based on the Jaccard
similarity of their influence sets
Diversity of a set of facility F is the minimum of the pair-wise
dissimilarities between the facilities in the set
Faculty of Information Technology
Representative Objects
Problem Definition
Score of a set of facilities F is
Given a set of facilities and a set of users, return a set of t facilities
with maximum score.
Faculty of Information Technology
Outline
Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
Representative Objects using Influence Sets
Techniques
Experiment Results
Summary
Faculty of Information Technology
Techniques
Challenges
Problem is NP-Hard
Requires computing influence sets for many facilities
Requires set intersection and union operations to compute diversity
Faculty of Information Technology
Techniques
Phase 1: Compute influence sets
Prune the facilities that cannot be among the representative facilities
Compute influence sets of remaining facilities
Phase 2: Greedy Algorithm
Iteratively select a facility f that maximizes the score of current set
Stop when t facilities have been selected
Faculty of Information Technology
Techniques
Phase 1: Compute influence sets
Prune the facilities that cannot be among the representative facilities
Compute influence sets of remaining facilities
RTK
1. Apply existing reverse top-k algorithm for each remaining facility
2. Compute top-k facilities for each user and populate the influence
sets of each facility
TK
a) Use branch-and-bound top-k algorithm for each user
b) Use brute-force algorithm to compute top-k for each user
NBF
Faculty of Information Technology
Techniques
Phase 2: Greedy Algorithm
Iteratively select a facility f that maximizes the score of current set
Stop when t facilities have been selected
ESO f requires computing set intersection and union operations
Selecting
1. Compute exact set operations
2. Compute approximate set intersection and union
MK
Faculty of Information Technology
Outline
Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
Representative Objects using Influence Sets
Techniques
Experiment Results
Summary
Faculty of Information Technology
Experimental Results
Faculty of Information Technology
Experimental Results
Faculty of Information Technology
Outline
Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
Representative Objects using Influence Sets
Techniques
Experiment Results
Summary
Faculty of Information Technology
Summary
We studied the problem of computing representative objects using
influence sets based on reverse top-k queries
Proposed a two phase greedy algorithm with approximation
guarantee
Experimental results demonstrate that the greedy algorithms produce
high quality results
Faculty of Information Technology
Thanks
Faculty of Information Technology
© Copyright 2026 Paperzz