You Are What You Link Lada Adamic Eytan Adar WWW 10 – May, 2001 Outline Graph structures of social networks •How person to person links on the web create observable social networks Understanding and predicting links •Additional online info (text, links, email subscriptions) gives context to social links •Predict social links even where there is no explicit hyperlink. Understanding communities through links Julie Hi, I’m Julie! I’m studying... I like ... My friends are... My favorite links: Becky Hey, I’m Becky. I study... I live in ... My favorite books are... Here are some photos... Becky and Julie aren’t the only ones to link to each other Stanford Social Web Graph Structure of Social Networks Differences in cohesiveness of communities Stanford MIT Links among personal homepages at MIT and Stanford MIT Stanford Users with non-empty WWW directories 2302 7473 Percent with links in either direction 69% 29% Percent with links in both directions 22% 7% The number of links/person is uneven number of users 10 10 10 given received undirected 3 2 1 0 10 0 10 1 10 number of links to or from users Interesting social networks analysis Largest connected component MIT: 86% Stanford: 58% Shortest path from one person to another MIT: 6.4 hops Stanford: 9.2 hops Clustering Coefficient # of links among neighbors C= max # links among neighbors 3 C= 4*3/2 MIT: Stanford: = 1 2 0.22 0.21 70x that of a random graph! Understanding and Predicting Links Information available online email list common text common text outlink outlink How information was collected User’s web directories were crawled Outlinks were extracted Text was passed through ThingFinder to extract things like people, places, companies Mailing list subscriptions were obtained from the mailing list servers (95% public for Stanford, internal to MIT) Inlinks were obtained by querying search engines: Google for Stanford AltaVista for MIT (equivalent urls) Comparison with traditional means of gathering information on social networks Advantages Easily and automatically gathered (no phone, live, or mail surveys). Data sets are orders of magnitude larger. Information is already public. Disadvantages Data sets are incomplete i.e. you don’t get to ask the questions, just take down the answers Friends have more in common I love Prince! Prince is the coolest! I play basketball I live in Terra House Find me in Terra. I live in Kimball I play volleyball Wanna play volleyball? I play a lot of computer games user 1: kpsounis user 2:stoumpis Konstantinos Psounis Stavros Toumpis Things in common CITIES: NOUN GROUPS: MISC: COUNTRIES: Escondido, Cambridge, Athens birth date, undergraduate studies, student association general lyceum, NTUA, Ph.D., electrical engineering, computer science, TOEFL, computer Greece Out links in common http://www.stanford.edu/group/hellas http://www.kathimerini.gr http://ee.stanford.edu http://www.ntua.gr Hellenic association Athens news Electrical Engineering Department National Technical University of Athens In links in common http://www.stanford.edu/~dkarali http://171.64.54.173/filarakia.html Dora Karali's homepage Dimitrios Vamvatsikos friends list Mailing lists in common greek-sports hellenic ee261-list ee376b Soccer/Basketball mailing lists for members of Hellas Hellenic association members Fourier transform class list Information theory class list http://negotiation.parc.xerox.com/web10/ So can we guess who’s friends with whom from the information gathered online? • Choose person A • Rank everybody else according to their likeness to that person • See how “friends” (people who are linked to A) were ranked. • Evaluate for text, outlinks, inlinks, mailing lists separately 1 likeness ( A, B) shareditems log[ frequency( shareditem )] Example, top matches for a particular user annaken: Clifford Hsiang Chao Linked (friends) Likeness score Person NO 8.25 Eric Liao YES 3.96 John Vestal NO 3.27 Desiree Ong YES 2.82 Stanley Lin NO 2.66 Daniel Chai NO 2.55 Wei Hsu YES 2.42 David Lee NO 2.41 Byung Lee Coverage in ability to predict user-user links i.e. friends had at least one item in common Method Pairs ranked Stanford Pairs ranked MIT inlinks 24% 17% outlinks 35% 53% mailing lists 53% 41% text 53% 64% Performance of friend matching algorithm 350 in link out link mailing list thing 300 frequency 250 200 Stanford method average rank inlinks 6.0 outlinks 14.2 mailing lists 11.1 text 23.6 150 100 50 0 1 2 3 4 5 6 7 8 9 10 rank 200 in link out link mailing list thing 180 160 The most common ranking for a friend is #1 method average rank inlinks 9.3 outlinks 18.0 mailing lists 22.0 text 31.6 frequency 140 120 MIT 100 80 60 40 20 0 1 2 3 4 5 6 rank 7 8 9 10 Stanford we don’t have that much in common with our friend’s friend’s friends Understanding Communities Through Links What are good and bad link predictors? • What you would expect… • Very unique things are only relevant to individuals • Very general things (“MIT” “Stanford”) are relevant to everyone • Some top 10 lists… Text Based Predictors MIT Top Things Stanford Top Things Union Chicana (student group) NTUA (National Technical University of Athens) Phi Beta Epsilon (fraternity) Project Aiyme (mentoring Asian American 8th graders) Bhangra (traditional dance, practiced within a club at MIT) pearl tea (popular drink among members of a sorority) neurosci (appears to be the journal Neuroscience) clarpic (section of marching band) Phi Sigma Kappa (fraternity) KDPhi (Sorority) PBE (fraternity) technology systems (computer networking services) Chi Phi (fraternity) UCAA (Undergraduate Asian American Association) Alpha Chi Omega (sorority) infectious diseases (research interest) Stuyvesant High School viruses (research interest) Russian House (living group) home church (Religious phrase) • Bad phrases: general organizations, cities (Oakland, Cambridge, etc), departments (CS) Out-link Based Predictors MIT Top Out-links Stanford Top Out-links MIT Campus Crusade for Christ* alpha Kappa Delta Phi (Sorority)* The Church of Latter Day Saints National Technical University Athens The Review of Particle Physics Ackerly Lab (biology)* New House 4 (dorm floor, home page)* Hellenic Association* MIT Pagan Student Group* Iranian Cultural Association* Web Communication Services* Mendicants (a cappella group)* Tzalmir (role playing game)* Phi_Kappa_Psi (fraternity)* Russian house (living group) comedy team * Magnetic Resonance Systems Research Lab* Sigma Chi (fraternity)* Applications assistance group* La Unión Chicana por Aztlán ITSS instructional programs* • Worst ranked sites are search engines and portals (Altavista, Lycos, Yahoo, etc.), and top level homepages such as www.mit.edu and www.stanford.edu. In-link Based Predictors • The top predictors are almost exclusively individual home pages pointing to lists of friends • Poor predictors: Long lists (all homepages, department listings) Mailing List Based Predictors MIT Top Mailing Lists Stanford Top Mailing Lists Summer social events for residents of specific dorm floor Kairos97 (dorm) Religious group mendicant-members (a cappella group) Religious group Cedro96 (dorm summer mailing list) Religious group first-years (first year economics doctoral students) Intramural sports team from a specific dorm local-mendicant-alumni (local a cappella group alumni) Summer social events for residents of specific dorm floor john-15v13 (Fellowship of Christ class of 1999) Religious a cappella group stanford-hungarians (Hungarian students) Intramural sports team from a specific dorm serra95-96 (dorm) “…discussion of MIT life and administration.” metricom-users (network services employees who use metricom) Religious group science-bus (science education program organized by engineering students) • Bad lists: General announcement lists at MIT, nonhousing based activities (theater), job lists Future Work • Use other pieces of available information • demographic information (where people live, department, year, etc.) • combine information • Label structures (Flake, et. al. 2000) • Given structures determined by graph algorithms • Label them using extracted information Summary • Homepage graph structure varies depending on community • Possible to predict (to some degree) where links will exist • Good predictors seem unique to communities
© Copyright 2026 Paperzz