Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri 1 Outline Objectives Data Used Small World Graphs Predicting Friendship Results Future Works and Applications Conclusions 2 Objectives To devise techniques to mine Internet in order to predict relationships between individuals To show that some pieces of information (e. g. terms on homepages) are better indicators of social connections than others 3 Information Side Effects By-products of data intended for one use which can be mined to understand tangential and larger scale phenomena Our case: to extract large social networks from individuals’ homepages 4 Data Used Text on user homepage (cooccurrence of text → common interest) Out-links: from user homepage to other pages In-links: from other pages to user homepage Mailing lists 5 Small World Phenomenon Real World Social Networks described by Small World Phenomenon Stanley Milgram’s Experiment (“The Small World Problem”, 1967): Six Degrees of Separation 6 Small World Phenomenon (cont’d) Adamic: World Wide Web is a Small World Graph (“The Small World Web”, 1999) Our hypothesis (confirmed by Stanford and MIT personal homepages networks): networks of personal homepages are Small World Graphs 7 Stanford Graph 8 MIT Graph 9 Small World Graph Properties Watts & Strogatz (Collective Dynamics of small-world networks, 1999): Clustering Coefficient C is much larger than that of a Random Graph with same n° of vertices and avg n° of edges per vertex Characteristic Path Length L is almost as small as L for the corresponding Random Graph 10 Clustering Coefficient (Watts & Strogatz, 1999) If a vertex v has kv neighbors then at most kv*(kv-1) directed edges can exist between them If Cv denotes the fraction of these allowable edges that actually exists then C is the avg over all v 11 Clustering Coefficient in Friendship Graphs Cv: reflects the extent to which friends of v are also friends of each other C: measures the cliquishness of a typical friendship circle 12 Predicting Friendship To predict if one person is a friend of another: we rank all users by their similarity to that person Hypothesis: friends are more similar to each other than others 13 Similarity Measurement Similarity measured analyzing text, links and mailing-lists To evaluate the likelihood that A is linked to B: we sum the n° of items the 2 users have in common Weighting Scheme: items unique to a few users are weighted more than common items 14 Friendship Prediction Algorithm’s Evaluation To evaluate the algorithm’s performance: – – we compute how many friends have a non-zero similarity score we see what similarity rank the friends were assigned to Problem: friends can appear have no items in common (little information about one of 2 users, users’ homepages used to express different interests) 15 Coverage and Predictive Ability of Data Sources Avg rank was computed for matches above a threshold such that all 4 data sources ranked an equal n° of users 16 Have friends most in common than friends of friends? 17 Individual Item’s Predictive Ability Metric Used: ratio of the n° of linked users pairs associated with item divided by total n° of possible pairs Some Interesting Findings: – Shared items unique to a community are at the top, popular terms are at the bottom of MIT and Stanford lists – Different shared items at the top of Stanford and MIT lists (in MIT list, 5 of the top 10 terms are fraternities’ names) – In-link Stanford and MIT lists dominated by individual homepages – Bad predictive MIT and Stanford mailing lists are very general discussion lists, announcement lists and social activities lists 19 Individual Item’s Predictive Ability (cont’d) 20 Future Works New data sources: demographic information as address, year in school, major, … To solve the problem that individuals interact with many people regularly, but do not link to all of them through web pages (possible solution: obtain social links directly from users) 21 Applications To mine the correlations between groups of people (see: Pentland and Eagle works) To facilitate networking inside a community (see: LinkedIn) Marketing research: to identify groups interested in a product, to rely on the Social Network to propagate information about some products 22 Conclusions Personal homepages provide a glimpse into the social structure of university communities Important: personal homepages reveal not only who knows to whom, but they give a context (e. g. shared hobbies, shared dorm) 23 Thank You For Your Attention! Questions? 24
© Copyright 2025 Paperzz