Analysis of Topological Characteristics of Huge Online Social

Analysis of
Topological Characteristics of
Huge Online
Social Networking Services
2007.3.9. Friday 10am
Telefonica Barcelona
Yong-Yeol Ahn Seungyeop Han Haewoon Kwak
Sue Moon
Hawoong Jeong
To be presented at WWW 2007
Online Social Networking Services

Portal for people to …
 Stay in touch with friends
 Share photos and personal news
 Find others of common interests
 Establish a forum for discussion
2
CyWorld

Largest SNS in South Korea
 Started in September 2001
 10 million users in 2004
 16 million users out of 48 million population

Front runner of many features
 Friend (il-chon) relationship
 Guestbook
 Testimonial
 Photos - scraps
 Avatar in cyber home
3
My CyWorld “Mini-Homepage”
4
Overview of the Talk







Snowball sampling
CyWorld, MySpace, orkut data sets
Metrics representative of topological characteristics
Related works
Analysis of CyWorld, MySpace, orkut
Summary
Future work
5
Snowball Sampling (I)

Only feasible sampling method for crawling the web
Select a seed node
Snowball Sampling (II)
Pick all nodes
directly connected
to the seed node
(1st layer)
Snowball Sampling (III)
Pick all nodes
directly connected
to 1st layer nodes
(2nd layer)
CyWorld Data Sets

Complete snapshot (Nov 2005)
 191 million friend relationships between 12 million users
 Two additional snapshots (Apr/Sep 2005)

Testimonial network
 100,000 users
by snowball sampling
9
MySpace Data Set

Largest in the world
 Began in Jul 2003
 Has 130 million by Nov 2006

Snowball sampled
 During Sep/Oct 2006
 Random seed to 100,000 users
 About 23% of users had friend list hidden
10
Orkut Data Set

Google SNS
 Began in Sep 2002
 Became official Google service in Jan 2004
 Began as invitation-only; open now
 Has 33 million users

Snowball sampled
 During Jun to Sep 2006
 100,000 users
11
Metrics of Interest

Degree distribution
 “Power-law”
Small number of nodes have large number of links

Clustering coefficient C(k)
 # of existing links / # of all possible links between a link’s
adjacent neighbors
Close to 1, close to a mesh

Degree correlation knn
 Degree k ~ mean degree of adjacent neighbors of nodes

with degree k
Assortativity: characteristic of knn distribution
12
Assortative Mixing
“Social”
+
“nonsocial”
M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002)
assortative
degree
13
Previous Works

Other networks have been examined
 Milgram’s seminal degree-of-separation experiment
By snail mail
 Movie actors, human sexual contacts, scientific


collaborators
All heavy tailed and have assortative mixing
Work on online SNS
 Pussokram.com
Internet dating community of 30,000 users
 LiveJournal
1.3 million bloggers with email addr/geography
 Both show some super-heavy tail end
Questions We Raise
What are the main characteristics of online SNSs?
 How representative is a sample network?
 How does a social network evolve?

15
CyWorld

Complete data set
 Degree distribution
 Clustering coefficient
 Degree correlation
 Average path length

Historical analysis
 Growth in numbers
 Degree distribution, clustering coefficient, degree


correlation
Average path length over time
Testimonial network
16
Degree Distribution
Two scaling regions
17
Figure 1-(a): degree distribution, CCDF
Clustering Coefficient Distribution
18
Degree Correlation
Not assortative
19
Average Path Length
< 5 is about 90%
20
Historical Analysis
21
Evolution of Degree Distributions
Two kinds of
driving force
22
Evolution of Clustering Coefficient
Becoming
more
dense
23
Evolution of Degree Correlation
Transition from highly dissortative
to slightly assortative mixing – sign
of forming ‘real-world social
relationship’?
24
Evolution of Path Length
Start of densification?
25
CyWorld Testimonials
# of Friends ≤ 100
In CyWorld Testimonials
26
Degree Correlation
Not assortative
Assortative
27
Sample Networks

Rare opportunity to validate snowball sampling
28
Degree Distributions
29
Clustering Coefficients
30
Degree Correlations
31
Sample Networks

Degree correlation
 Overestimate exponents

Not fit for clustering coefficient/degree correlation
32
CyWorld, MySpace, and orkut
Three major online SNSs of the world
 Common traits?

33
Degree Distributions
34
Clustering Coefficient
The restriction of orkut’s
testmonial network system
(# of friends is less than
about 1000)
35
Degree Correlation
Dissortative
Assortative
“Real” Social
Network
Figure 8-(c): The degree correlation of two social networks:
orkut and MySpace
36
Summary and Conclusions

Cyworld has two scaling regions in the degree
distribution.



Other measurements (C(k), degree correlation) also support the
existence of two regions.
Boundary of two regions ~ 100s
Dunbar’s Law: limit on human social relationships
Mature online social community
orkut shows fast-decaying degree distribution and
assortative mixing pattern – similar to the small degree
region of Cyworld.
 Close-knit, real-world-like social network structure.
 MySpace shows heavy tail and dissortative mixing pattern –
similar to the large degree region of Cyworld.
 popularity is a more important than human interaction

Future Work

Friendship network topology not representative of
activities
 Identify steady core and study its evolution

Explicit vs implicit communities
 Clubs, towns vs cliquish behavior

Growth model
 Existing preferential attachment-based models do not fit
 Forest fire model? Extensions?
38
BACKUP SLIDES
39
Signature of Two Scaling Regions
PetterHolme, Christofer R. Edling,
and Fredrik Liljeros, Structure and
time evolution of an Internet dating
community, SocialNetworks, 26,
155 (2004)