CSE Template - Muhammad Aamir Cheema

Information Technology
Selecting Representative Objects
Considering Coverage and Diversity
Shenlu Wang1, Muhammad Aamir Cheema2, Ying Zhang3, Xuemin Lin1
1
The University of New South Wales, Australia
2 Monash University, Australia
3 The University of Technology, Australia
Outline
 Influence Sets
Reverse k Nearest Neighbors Queries
Reverse Top-k Queries
Reverse Skyline Queries
 Representative Objects using Influence Sets
 Techniques
 Experiment Results
 Summary
Faculty of Information Technology
Influence Set
Influence
In a data set consisting of facilities
and users, a facility 𝒇 influences a
user 𝒖 if 𝒖 considers 𝒇 as one of its
most “important” facilities
Influence Set of Coles
U1
U2
f2
Influence Set
A set of users influenced by 𝒇 is
called influence set of 𝒇
f1
Faculty of Information Technology
Influence Set
Who are my potential
customers ?
Important facility?
A facility f is important for u if it is one
of the top-k facilities for a user u
considering her preferences, e.g.,
 Distance
 Rating
 Price
Faculty of Information Technology
Influence Set
Significance
 Important to identify potential users/customers
 Used in various applications such as marketing, cluster and
outlier analysis, and decision support systems
Types
 Reverse 𝒌 Nearest Neighbors
 Reverse Top-𝒌
 Reverse Skyline
Faculty of Information Technology
Outline
 Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
 Representative Objects using Influence Sets
 Techniques
 Experiment Results
 Summary
Faculty of Information Technology
Reverse k Nearest Neighbors (RkNN)
•
•
Definition of importance
– A facility f is important to a user if f is
one of its k closest facilities
u2
Reverse k Nearest Neighbors
– Find every user u for which the query
facility q is important, i.e., q is one of
its k-closest facilities.
Influence set of f1 is {u1,u2}
Influence set of f2 is {u3}
f1
u1
u3
f2
K=1
Faculty of Information Technology
RkNN Algorithms
Pruning
Six-regions
TPL
FINCH
Boost
InfZone
SLICE
Verification
(Stanoi et al., SIGMOD 2000)
(Tao et al., VLDB 2004)
Six-regions (SIGMOD 2000)
Region-based
(Wu et al., VLDB 2008)
SLICE (ICDE 2014)
(Emrich et al., SIGMOD 2010)
TPL (VLDB 2004),
Half-space
(Cheema et al., ICDE2011)
FINCH (VLDB 2008),
(Yang et al., ICDE 2014)
InfZone (ICDE 2011)
Faculty of Information Technology
RkNN Algorithms
•
k=2
Regions-based Pruning:
u2
-Six-regions
[Stanoi et al., SIGMOD 2000]
1.
b
c
u1
Divide the whole space centred at the
query q into six equal regions
2.
Find the k-th nearest neighbor in each
Partition.
3.
The k-th nearest facility of q in each region
defines the area that can be pruned
a
q
d
The user points that cannot be pruned should be
verified by range query
Faculty of Information Technology
RkNN Algorithms
•
k=2
Half-space Pruning:
the space that is contained by k halfspaces can be pruned
-TPL [Tao et al., VLDB 2004]
1.
b
c
a
u
Find the nearest facility f in the unpruned
area.
q
2.
3.
Draw a bisector between q and f, prune by
using the half-space
d
Go to step 1 unless all facilities in the
unpruned area have been accessed
Checking which k-half spaces prune a point/node is expensive
TPL ++ [Yang et al., PVLDB 2015]
Faculty of Information Technology
RkNN Algorithms
•
k=2
FINCH [Wu et al., VLDB 2008]
– Approximate the unpruned area
by a convex polygon
b
c
a
q
d
Faculty of Information Technology
RkNN Algorithms
•
k=2
InfZone [Cheema et al., ICDE 2011]
b
1.
The influence zone corresponds to the
unpruned area when the bisectors of all the
facilities have been considered for pruning.
c
a
q
2.
3.
A user u is a RkNN of q if and only if u lies
inside the influence zone
d
No verification phase.
Faculty of Information Technology
RkNN Algorithms
•
SLICE [Yang et al., ICDE 2014]
1.
Divide the whole space centred at the
query q into t equal regions
2.
f1
f2
Draw arcs for each facility
q
3.
k-th arc in each partition defines the
pruning region
Pruning requires checking only one distance
k=2
Faculty of Information Technology
Outline
 Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
 Representative Objects using Influence Sets
 Techniques
 Experiment Results
 Summary
Faculty of Information Technology
Influence Set based on Reverse Top-k
•
•
Definition of importance
– Each user u has a preference function
– A facility f is important to a user u if f is
one of the top-k facilities for u
Reverse Top-k Query (RTk)
– Find every user u for which the query
facility q is one of her top-k facilities.
Influence set of f1 is {u2}
Influence set of f2 is {u1,u3}
1*distance
2
u2
f1
Price=2
u1
0.9*price + 0.1*distance
3
0.5*price + 0.5*distance
u3
Price=1
f2
K=1
Faculty of Information Technology
Existing work on Reverse Top-k
 Vlachou et al., “Reverse top-k queries”, ICDE 2010
 Chester et al., “Indexing reverse top-k queries in two dimensions,” DASFAA
2013
 Cheema et al., “A Unified Framework for Efficiently Processing Ranking
Related Queries”, EDBT 2014
 Vlachou et al., “Branch-and-bound algorithm for reverse top-k queries”,
SIGMOD 2013
 Ge et al., “Efficient all top-k computation: A unified solution for all top-k,
reverse top-k and top-m influential queries”, TKDE 2013.
 Vlachou et al., “Monitoring reverse top-k queries over mobile devices”, MobiDE
2011
 Yu et al., “Processing a large number of continuous preference top-k queries”,
SIGMOD 2012
Faculty of Information Technology
Outline
 Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
 Representative Objects using Influence Sets
 Techniques
 Experiment Results
 Summary
Faculty of Information Technology
Influence Set based on Reverse Skyline
•
•
•
Dominance
 A facility x dominates another facility y
w.r.t. a user u, if for every attribute, u
prefers x over y
Definition of importance
 A facility f is important to a user u if f is not
dominated by any other facility
Reverse Skyline
 Find every user u for which the query
facility q is not dominated by any other
facility.
Influence set of f1 is {u1,u2}
Influence set of f2 is {u1,u2,u3}
f1
u2
Price=2
u1
Price=1
u3
f2
Faculty of Information Technology
Existing work on Reverse Skylines
 Dellis et al., “Efficient computation of reverse skyline queries”, VLDB 2007
 Lian et al., “Reverse skyline search in uncertain databases”, TODS 2010
 Prasad et al., “Efficient reverse skyline retrieval with arbitrary non-metric
similarity measures”, EDBT 2011
 Wang et al., “Energy-efficient reverse skyline queries processing over wireless
sensor networks”, TKDE 2012
 Wu et al., “Finding the influence set through skylines”, EDBT 2009
Faculty of Information Technology
Outline
 Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
 Representative Objects using Influence Sets
 Techniques
 Experiment Results
 Summary
Faculty of Information Technology
Representative Objects
Given a set of facilities and a set of users, choose t representative
facilities considering coverage and diversity
Coverage
•LetKoh
I(f) denote
influence
set of afavorite
facility. products based on
et al., the
“Finding
k most
 Given
a settop-t
of facilities
F, its
coverage
is the measure of total
reverse
queries”,
VLDB
J. 2014
number of distinct users that are influenced by the facilities in F
• Gkorgkas et al., “ Finding the most diverse products using
preference queries”, EDBT 2015
Faculty of Information Technology
Representative Objects
Diversity
 Let I(f) denote the influence set of a facility.
 Dissimilarity between two facilities is defined based on the Jaccard
similarity of their influence sets
 Diversity of a set of facility F is the minimum of the pair-wise
dissimilarities between the facilities in the set
Faculty of Information Technology
Representative Objects
Problem Definition
 Score of a set of facilities F is
 Given a set of facilities and a set of users, return a set of t facilities
with maximum score.
Faculty of Information Technology
Outline
 Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
 Representative Objects using Influence Sets
 Techniques
 Experiment Results
 Summary
Faculty of Information Technology
Techniques
Challenges
 Problem is NP-Hard
 Requires computing influence sets for many facilities
 Requires set intersection and union operations to compute diversity
Faculty of Information Technology
Techniques
Phase 1: Compute influence sets
 Prune the facilities that cannot be among the representative facilities
 Compute influence sets of remaining facilities
Phase 2: Greedy Algorithm
 Iteratively select a facility f that maximizes the score of current set
 Stop when t facilities have been selected
Faculty of Information Technology
Techniques
Phase 1: Compute influence sets
 Prune the facilities that cannot be among the representative facilities
 Compute influence sets of remaining facilities
RTK
1. Apply existing reverse top-k algorithm for each remaining facility
2. Compute top-k facilities for each user and populate the influence
sets of each facility
TK
a) Use branch-and-bound top-k algorithm for each user
b) Use brute-force algorithm to compute top-k for each user
NBF
Faculty of Information Technology
Techniques
Phase 2: Greedy Algorithm
 Iteratively select a facility f that maximizes the score of current set
 Stop when t facilities have been selected
ESO f requires computing set intersection and union operations
 Selecting
1. Compute exact set operations
2. Compute approximate set intersection and union
MK
Faculty of Information Technology
Outline
 Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
 Representative Objects using Influence Sets
 Techniques
 Experiment Results
 Summary
Faculty of Information Technology
Experimental Results
Faculty of Information Technology
Experimental Results
Faculty of Information Technology
Outline
 Influence Sets
Reverse k Nearest Neighbors Queries
Reverse top-k Queries
Reverse Skyline Queries
 Representative Objects using Influence Sets
 Techniques
 Experiment Results
 Summary
Faculty of Information Technology
Summary
 We studied the problem of computing representative objects using
influence sets based on reverse top-k queries
 Proposed a two phase greedy algorithm with approximation
guarantee
 Experimental results demonstrate that the greedy algorithms produce
high quality results
Faculty of Information Technology
Thanks
Faculty of Information Technology