Efficient Respondents Selection for Biased Survey using Online

Efficient Respondents Selection for Biased Survey using Online Social Networks
Donghyun Kim1, Jiaofei Zhong2, Minhyuk Lee1, Deying Li3, Alade O. Tokuta1
1North Carolina Central University, Durham, NC, USA
2California State University, East Bay, Hayward, CA, USA
3Renmin University of China, Beijing, China
Presenter: Donghyun (David) Kim Presented at the 2nd Workshop on Computational Social Networks (CSoNet 2014)
Agenda
1.
Motivation 2. Preliminaries 3. Problem Statement 4. Algorithm
5. Experiment
6. Conclusion
1. Motivation Motivation
• Growing popularity of social networking web sites such as FACEBOOK, Google+, Twitter, etc
• Online social networks are getting lots of attentions • Useful applications of online social network such as online advertising, information propagation, online survey, etc
• This work investigates the potential of online social network for survey
Motivation – cont’
• U.S spent more than $1.8 billion for all survey researches in 2012
• Online survey is useful to data‐collection for marketing or political decision making
• Hard to find right sample group of respondents in online survey
• The respondent should represent for each community or group
Motivation – cont’
• The person who belongs to major group in community, activist and has many friend (e.g. B)
• The person who belongs to minor group in community, has less friend like (e.g. A) Motivation – cont’
Motivation – cont’
•Sometimes the minor opinion is more important than major one
Motivation – cont’
• Representative group in a social network graph
• frequently, modeled as a dominating set problem
• Minimum dominating set
• good choice of a representative group
• mostly a subset of majority • not suitable for our purpose
• Needs a new dominating set
2. Preliminaries
Preliminaries
• Notations
• G  (V , E ) : represents an online social network graph • V  V (G) : node set
• E  E(G) : edge set
• |V |: number of nodes in V
• G[D] : subgraph of G induced by D, for any subset D  V
v
• | Nv, V(G) |: set of nodes in neighboring to in , for G
V
each node v V
Preliminaries – cont’
• Definition 1 (DS)
Given a graph G , a subset D  V is a dominating set (DS) of G
if for each node u  V \ D , v  D such that (v,u)  E
4
1
2
5
6
3
Preliminaries – cont’
• Definition 2 (MDSP)
Given a graph G , the goal of the minimum dominating set
problem (MDSP) is to find a minimum size DS of G
4
1
2
5
6
3
Preliminaries – cont’
k
• Definition 3 (Inverse ‐core)
Given a graph G , a subset D  V is, and a positive integer k
such that 0  k  Δ, where Δ is the degree of G, D is an
inverse k - core in G if for each v  D, |Nv,D(G) | k
no more than k neighbors in D
3. Problem Statement
Problem Statement
• Definition 4 (IkCDS )
Given a graph G , a subset D  V is, and a positive integer k
D is an inverse k - core dominating set (IkCDS)of G if(a) D
is a DS of G and (b)for each v  D,|Nv,D(G) | k
• Definition 5 (MIkCDSP)
Given a graph G and a positive integer k the goal of
the minimum inverse k - core dominating set problem
( MIkCDSP )is to find a minimum size IkCDS of G.
Problem Statement – cont’
• MIkCDSP is NP-hard
A special case of MIkCDSP with k  n is equivalent to the
minimum dominating set problem, which is proven to be
NP-hard . As a result, MIkCDSP is NP-hard .
• Our Approach: Greedy Approximation
4. Algorithm
Algorithm
Example
G = (V,E), 1
v4
v1
v2
v5
v6
v3
Example – cont’
v1
v2
v3
0
0
0
0
0
0
v4
v5
v6
D  { }
Node
v1
v2
v3
0
0
0
ni
X 0  {v1, v2, v3, v4, v5, v6}
v4
v5
v6
0
0
0
Example – cont’
•
v1
v2
v3
0
0
0
0
0
0
v4
v5
v6
The Candidate are
v 2 and v 5
Example – cont’
•
Let’s select
D  {v 2}
v 2 , hence
0
v1
v2
1
0
ni
X 0  {v12,,vv24,,vv35,}v4, v5, v6}
X1  {v1, v3, v6}
v2
v3
1
0
1
0
v4
Node
v1
1
v5
v6
v3
v4
v5
v6
1
0
0
1
Example – cont’
•
•
Since X 0   Select node
We can select v1
D  {v1, v 2}
0
v4
Node
v1
v2
1
1
ni
X 00 {{vv2,4}v4, v5}
X1  {v1, v23, v36}, v5, v6}
v1
v2
v3
1
01
1
01
1
v5
v6
v3
v4
v5
v6
1
0
1
1
Example – cont’
•
•
Since X 0   Select node
We can select v1
v1
v2
v3
1
01
1
D  {v1, v 2, v 4}
0
021
v4
Node
v1
v2
2
1
ni
X 0  {v4}
X1  {v1,,v22, v33, v54,,vv66}}
X 2  {v5}
1
v5
v6
v3
v4
v5
v6
1
0
1
2
•
•
Now X 0  
And the result is
D  {v1, v 2, v 4}
Performance Analysis
Proof idea: the algorithm produces a DS D such that D
is also a feasible solution of MIkCDSP.
5. Experiment
Experiment
• Case Study #1 ‐ The Jazz Musician Network
• The Jazz Musician Network is a collaboration network of jazz musicians
• P.M Geiser and L.Danon, “Community Substructure In Jazz”, 2003
• 198 Jazz musicians that performed between 1912 and 1940, with most of famous in the 1920's.
Experiment – cont’
• Cytospace – visualization tools
Experiment – cont’
•Compare average degree among all nodes, • MIkCDSP vs. Minimum Dominating Set Experiment – cont’
• Compare average degree between MIkCDSP
and Minimum Dominating Set D
29%
21%
50%
65%
13%
20%
Experiment – cont’
• Compare average degree with various k value
Experiment – cont’
• Visualized Result
Experiment – cont’
• The hard problem is to pick a right sample on online survey
• We focus to pick a sample with minority opinion, which is have less friends
• To the best of our knowledge, this is the first attempt to use online social network to improve the result of online survey
• The result is showing that our algorithm can select nodes which representative minority opinion
6. Conclusion
Thank You
Question?