Mining Social Ties Beyond Homophily Hongwei Liang* Ke Wang* Feida Zhu # * Simon # Singapore 16-05-18 Fraser University, Canada Management University, Singapore ICDE2016 Helsinki, Finland 1 Outline Introduction & Motivation Problem Formulation Solution Evaluation Conclusion & Future Work 16-05-18 ICDE2016 Helsinki, Finland 2 Integrating graphs and demographic data 5 ID SEX RACE LOCATOIN 1 F Asian US 2 F Latino US … … … … (a) Graph topology (b) User profile integration Graph data and demographic data are everywhere But the two aspects of data maybe incomplete within single social network More comprehensive analysis can be done by integrating them 16-05-18 ICDE2016 Helsinki, Finland 3 Facebook Dating App Example All except black women preferred white men, while all men except Asians preferred Asian women. 16-05-18 ICDE2016 Helsinki, Finland 4 Social Ties (Group Relationships) Form: 1 2 3 4 5 6 7 F Asian US M 8 9 R1 : (Sex: M) 10 11 12 13 Latino White Canada Finland 14 (Sex: F, Race: Asian) conf = 7 /14; supp = 7/15 R2 : (Sex: M, Race: Asian) (Sex: F, Race: Asian) conf = 0; supp = 0 16-05-18 ICDE2016 Helsinki, Finland 5 Homophily in Social Network Homophily principle: love of the same Homophily effect is well-known and is often “dominant” R3 : (Sex: F, Location: US) (Sex: M, Location: US) conf = 4/6; supp = 4/15 Homophily captures “primary” bond Literature largely focuses on applications based on homophily e.g.: community detection, link prediction, friend/product recommendation 16-05-18 ICDE2016 Helsinki, Finland 6 Beyond Homophily Unearth the treasures beyond homophily? Assume “Location” is homophilic in a dating network R4: (Sex: F, Location: US) standard confidence? (Sex: M, Location: Canada) VS conf = 2/6, not interesting US 1 2 Canada 3 4 nhp = 2/ (6 – 4) = 100%, interesting ! support of the homophily effect is 4/15 (Sex: F, Location: US) (Sex: M, Location: US) Finland 5 new metric that remove homophily? 6 7 F Reads as: if a female from US does NOT want her partner to be from US, there is a high chance that she prefers a partner from Canada. M 8 16-05-18 9 10 11 12 13 14 ICDE2016 Helsinki, Finland 7 Potential Applications Target advertising Homophily pattern: (JOB : Lawyer, PRODUCT : Stocks) → (PRODUCT : Stocks) Non-Homphily pattern: (JOB : Lawyer, PRODUCT : Stocks) → (PRODUCT : Bond) Helpful in link predicting, beyond homphily Friend/dating Recommendation User behaviors/habits analysis Profile completion Criminal investigation 16-05-18 ICDE2016 Helsinki, Finland 8 Non-homophily Preference Non-homophily preference: a probability of links going to a node described by 𝑟, given 𝑙 ∧ 𝑤 and exclude the homophily effect 𝑤 𝑛ℎ𝑝 𝑙 𝑤 𝑠𝑢𝑝𝑝(𝑙 𝑟) 𝑟 = 𝑠𝑢𝑝𝑝 𝑙 ∧ 𝑤 − 𝑠𝑢𝑝𝑝(ℎ𝑜𝑚𝑜𝑝ℎ𝑖𝑙𝑦 𝑒𝑓𝑓𝑒𝑐𝑡) Example: (Sex: F, Location: US) (Sex: M, Location: Canada) (Sex: F, Location: US) (Sex: M, Location: US) Captures “secondary bonds” beyond “primary bonds” nhp does not have the regular anti-monotonicity Adding an attribute on the RHS may increase supp(homophily effect) 16-05-18 ICDE2016 Helsinki, Finland 9 Outline Introduction & Motivation Problem Formulation Solution Evaluation Conclusion & Future Work 16-05-18 ICDE2016 Helsinki, Finland 10 Problem - Mining Top-k GRs Given an multi-dimensional information network the setting of homophily for attributes a supp threshold, a nhp threshold and an integer k Goal discover the top-k GRs, ranked by nph followed by supp, and each of them satisfies the supp and nhp thresholds 16-05-18 ICDE2016 Helsinki, Finland 11 Challenges Storage Space = , if single table Computation Exponential order of attributes value combination nhp does not have anti-monotonicity If only supp pruning: small threshold, and post-processing is needed How to deal with? Storage: favourable data modeling Computation: ingenious enumeration with efficient pruning strategies 16-05-18 ICDE2016 Helsinki, Finland 12 Outline Introduction & Motivation Problem Formulation Solution Evaluation Conclusion & Future Work 16-05-18 ICDE2016 Helsinki, Finland 13 for mining t op-k GRs by leveraging the enumeration and pruning st rategies presented in Section 4, but representing the graph data using a more efficient data struct ure. We shall st ore the node and edge information separately. Fig. 3 shows t he data st ruct ure for our running example. LArray cont ains the records for individuals could occur Compact 3-table data presentation: combines profilet hat data and in the LHS of GRs and RArray cont ains t he records for individuals that could occur in the RHS of GRs. Out is t he graph topology together out -degree of a record and I nd is t he st art ing position of the outgoing edges in EArray. EArray cont ains one record No redundant records, data for each edgeare andlinked Ptr is tby he pointers point er to the record for t he destination node in RArray. This data st ruct ure has the size Space complexity V (# Attr V + 2) + E (# Attr E + 1) + V # Attr V , which is more compact t han t he single t able at t he beginning of Section 4 by eliminating t he term E 2 # Attr V . We assume t hat this struct ure is held in memory. We use these tables t o part ition t he dat a for counting the support s for GRs. For example, the first row in LArray represents t he record 1 for LHS, which connects to the dest inat ion records 2, 4 and 5 for RHS, found by the point ers Ptr kept in t he ent ries [I nd, I nd + Out 1] of EArray. Our algorit hm enumerates each attribut e subset L W R following t he subset-first dept h-first order as described in t he previous sect ion. To comput e supp and di vConf of t he GRs at the node for L W R, it part it ions the data using t he at- Data Model 16-05-18 ICDE2016 Helsinki, Finland 14 SFDF Enumeration Subset-First Depth-First (SFDF) Enumeration Subset-First: some kind of reverse order, all parts of supp, including that for homophily effect, are available when computing nhp Depth-First: only materialize the current branch 16-05-18 ICDE2016 Helsinki, Finland 15 Dynamic Ordering How to make nhp anti-monotone? Dynamically order the homophily attributes, on the basis of whether the same homophily attributes were enumerated in the LHS dynamic ordering assume both A and B are homophily attributes for the GRs with same is anti-monotone, with the help of dynamic ordering 16-05-18 ICDE2016 Helsinki, Finland 16 Multiple Pruning Strategies supp based pruning nhp based pruning enabled with the help of dynamic ordering Top-k pruning tights up the nph threshold The mining task finishes in one phase 16-05-18 ICDE2016 Helsinki, Finland 17 Data partition and Pruning Partition attributes while computing supp and nhp? Recursive partition with linear CountingSort At each node, GR representing the homophily effect is generated first, e.g. is generated earlier than 8B b1 10b B 21 b21b …... 11b 1b2 1A b1b21a12 b1b21a3 16-05-18 ICDE2016 Helsinki, Finland b2 b3 2 1 b21b 3 2 b21b 1 3 18 Outline Introduction & Motivation Problem Formulation Solution Evaluation Conclusion & Future Work 16-05-18 ICDE2016 Helsinki, Finland 19 Experimental Evaluation Implementation: C++ Platform: CentOS 6.4 with Intel 8-core processors 2.53GHz and 12G of RAM Real Datasets Pokec Social Network data1 o 1,436,515 users and 21,078,140 edges, 6 node attributes DBLP co-authorship data2 o 28,702 authors and 66,832 directed edges, 2 node and 1 edge attributes Evaluation Measures Interestingness Efficiency (runtime) 1http://snap.stanford.edu/data/soc-pokec.html 2[Zhao 16-05-18 et al. SIGMOD 11] ICDE2016 Helsinki, Finland 20 Interestingness: Case Study Case study A top GR from Pockec data derives: This pair suggests a big difference in the preference of opposite sex partners by males and females when looking for sexual partners A top GR from DBLP data Authors in the DB area often collaborate with those in the DM area when collaborating with those not in their own area 16-05-18 ICDE2016 Helsinki, Finland 21 Efficiency Study: Pokec Data 600 GRMiner(k) GRMiner BL2 BL1 A+B+C+D A+B+C A+B A 550 500 300 A: supp based pruning 250 B: compact 3-table data storage 200 C: nhp based pruning 450 Time (sec) 400 350 300 150 D: top-k pruning 250 100 0 200 20 40 60 80 100 minNhp (%) 150 3000 100 0 10 1 2 10 3 10 Time (sec) Time (sec) 10 2500 250 200 150 100 0 Default Parameters Setting 4 10 20 10000 40 60 minNhp (%) minSupp = 50 (absolute value) 2000 1500 minNhp = 50% 1000 500 k = 100 100 80 100 1 k 0 4 6 8 10 12 Dimension 16-05-18 ICDE2016 Helsinki, Finland 22 Outline Introduction & Motivation Problem Formulation Solution Evaluation Conclusion & Future Work 16-05-18 ICDE2016 Helsinki, Finland 23 Conclusion, Extensions and Future Work Conclusion Mining social ties beyond homophily, many potential applications Compact data presentation Novel enumeration with multiple pruning strategies Interestingness and efficiency study on real data Extensions and future work Alternative metrics other than nhp, such as lift, laplace, gain, etc Deal with unstructured data Predictive model 16-05-18 ICDE2016 Helsinki, Finland 24 Q&A? Thanks ! 16-05-18 ICDE2016 Helsinki, Finland 25
© Copyright 2026 Paperzz