Linked. Current research directions on social networks

Linked.
Current research directions on social networks
Mikołaj Morzy, Politechnika Poznańska
The Oracle of Bacon

the distance from Kevin in movies
http://oracleofbacon.org/
Bacon Number
#people
0
1
1
2512
2
263345
3
846332
4
207260
5
15601
6
1444
7
176
8
29
average BN :
2.981
number of actors:
1 336 700
average number of movies: 27
but…
41% actors played in less than 10 movies
Kevin Bacon is not an exception

Analysis of the IMDB database reveals that
 Kevin Bacon played with1800 other actors in 72 movies
 David Carradine played with 4000 actors in 229 movies
 Robert Mitchum played with 2900 actors in135 movies



distances between actors: R. Steiger (2.53) D. Pleasance (2.54), M. Sheen, Ch.
Lee, R. Mitchum, Ch. Heston (2.57)
Kevin Bacon is 876-th
M. Blanc (1035), T. Byron (1563), P. North (1616),
J.Silvera (968), R. Jeremy (1275)
Scale-free networks

The most popular form of social networks
P(k )  k 

node degree distribution

immune to random failures of nodes (up to 80%)
low clustering coefficient
practically constant network diameter
small world phenomenon



  2,3
d  ln ln n
Rethinking the basics



We are social creatures
We live in small social groups
We communicate



why we talk?
what we talk about?
who we talk to?
Why we talk?

to survive


to form social bonds



over 80% of tweets mentioning brands are questions
conversation after status update
to help others
to manage how others perceive us

95% of conversations are positive
What we talk about?

about people
who is doing what with whom, whether it is a good or a bad thing,
who is in, who is out, and why?

about feelings, not facts

7500 articles from NYT, 6 months
most emailed stories with content arousing emotions

J. Berger, K. Milkman, „Arousal increases social transmission of information”, 2011


R. Dunbar
about things around us
We talk of brands in passing



orange products sell best close to Halloween
22% of products cued by the surrounding (vs 4%)
19% of products that are visible (vs 2%)
Brand talk distribution
9% 3%
face to face
on the phone
17%
online
71%
rest
Who we talk to?

In fact, we have a very small set of people we talk to




7-15 people we talk to on the regular basis
5-10 people engage 80% of our conversations
on Facebook we discuss with 4 people/week, 6 people/month
80% of phone calls to only 4 people
20%
27%
spouse/partner
family
best friend
25%
50%
10%
acquaintance
stranger
Social circles
less than once a year
500 people, weak ties
150 people, social network
50 people, communication
at least once a year
once every few months
once a month
15 people, sympathy group
5 people, inner circle
at least once a
weak
The Dunbar number

Our core social network consists of 150 people





neolithic villages splitting
Roman army cohorts
wikipedia administrators limit
online games loose cohesion
number of sick days in business increases
Ch. Allen, „The Dunbar number as a limit to group sizes”, blog Life With Alacrity
How social circles appear?
development
We all have 4-6 social groups organized
along the most important dimensions of our lives
family
experience
hobby
A-L. Barabasi, „Linked: How Everything is Connected to Everything Else and What it Means”, Plume, 2003
Unique sets of people
Asia
Bob
Canada
Wayne
I Liceum Ogólnokształcące
im. Karola Marcinkowskiego
Me
NHL
N. Christakis, J. Fowler, „Connected: The Surprising Power of Our Social Networks
and How They Shape Our Lives”, Little, Brown, 2009
str. 13
Myth of „Six degrees of separation”




1929, Frigyes Karinthy, "Láncszemek"
1967, Stanley Milgram, distance between USA citizens
2001, Duncan Watts, email distance between 150 countries
2007, Jure Leskovec, Microsoft Messenger’s 240 mln users
Sharon, MA
Omaha
Wichita
Boston
Rules of the game
1.
2.
3.
4.
Add your name at the bottom of the list
Detach the postcard and mail it to Harvard University
If you know the recipient personally, send the package
Otherwise, do not try to contact the receipient, but instead think of your personal acquaintance who
could know the receipient

the first letter arrived after three days and two jumps

42 out of160 letters reached their receipients

median number of connections was 5.5

Milgram never used the term "six degrees of separation"
Everybody on this planet is separated by only six other people. Six degrees of separation. Between us and everybody else on
this planet. The president of the USA. A gondolier in Venice... It's not just the big names. It's anyone. A native in the rain
forest. A Tierra del Fueagan. An Eskimo. I am bound to everyone on this planet by a trail of six people. It's a profound thought.
How every person is the new door opening up into other worlds.
John Guare, "Six degrees of separation"
Real world relationships

We are seeking for strong ties (L. Spencer, R. Pahl)








associates
useful
fun friends
favor friends
helpmates
comforters
confidants
soulmates
Mark S. Granovetter, "The Strength of Weak Ties"
American Journal of Sociology 78 (6) (May, 1973), pp. 1360-1380
Social networks focus on weak ties, they provide artificial
bonding and remove nuance and subtlety in tie management
The myth of influencers

Malcolm Gladwell from New Yorker publishes a book
„The Tipping point” which contains a simple test of socialization degree
 248 randomly chosen names from Manhattan
phone book
 1 point for each recognized name
group
average
dispersion
students
21
2 – 95
professors
39
9 – 118
In reality…

Influence on Twitter




simple experiment on 75 mln tweets
less than 100 had more than 1000 retweets
less than 10 had more than 10 000 retweets
more than 98% of attempted cascades failed


„Measuring user influence in Twitter: The million follower fallacy”
Conditions for a real viral cascade


low adoption thresholds
many ordinary people
Process of adoption
often mistaken for influentials
innovative hubs
follower hubs
D. Watts, „Everything is Obvious: Once You Know the Answer”, Crown Business, 2011
mass adoption
What influences us?

what is around




what was before us


eating in 2 people makes you eat 35% more
eating in 4 people makes you eat 75% more
friends make you smoke, study, happy, study, buy cars, …
D. Watts, Music Lab experiment
what people in our group are doing

it takes 170 ms to identify someone from our group…
Unfortunately, not only ideas spread

email



bugs



Clarie Swire writing to Bradley Chait
7 mln readers in a few days
March 1999, Melissa worm
100 000 computers in 300 organizations within a weekend
Patient Zero




Gaetan Dugas (1953-84), steward on Air Canada flights
40 out of 248 people diagnozed with AIDS
until April 1982 had sex with Gaetan
ca. 250 partners per year
ca. 2500 partners during lifetime
Traditional model of infectiuous disease

Phases of infection



susceptible: can be infected
infectiuous: actively infects others
removed: either cured or dead
it was assumed that the
frequency of contacts between
S, I and R groups was random
S
loss of immunity
infection
R
I
recovery
W. Kermack, A.G. McKendrick
Logistic growth model

slow growth phase
I
S

explosive phase
I
S

burn-out phase
I
S
# infected
burn-out
reproduction rate: how many
new infections are created
by a single infected individual
explosion
slow growth
time
Paul Erdös (1913-1996)


authored the largest number of scientific
mathematical papers (ca. 1500)
owner of a single suitcase



„hello my friend, my brain is opened"
on coffee
 „a mathematician is a machine for turning coffee into theorems"
on amphetamine (after winning a $500 bet)
 „before, looking at a blank sheet of paper I have been seeing hundreds of ideas,
now I’m seeing a blank sheet of paper”
Erdös number

Erdös number is the distance measure computed on the basis of coauthorship of scientific papers
Liczba Erdosa
Liczba osób
1
1
2
504
3
6593
4
33605
5
83642
6
87760
7
40014
8
11591
9
3146
10
819
11
244
12
68
13
23
MathSciNet contains 1.9 mln publications
written by 400 000 people
50 000 auhors have an infinite Erdös number
84 000 authors have never co-authored a paper
source: http://www.oakland.edu/enp/trivia/
Alfred Renyi (1920-1971)



great Hungarian mathematician, the creator of
the Institute for Mathematical Research of the
Hungarian Academy of Sciences
co-authored 32 papers with Paul Erdös
on mathematics
 „when I’m miserable I’m doing mathematics to become happy, when I’m happy
I’m doing mathematics to remain happy"
Two models for random graph creation

G(n, M)


a graph is chosen uniformly at random from the collection of all graphs that have n
nodes and M edges
G(n, p)

a graph is constructed by randomly linking nodes with edges, where p is the
probability of an edge creation (independent of other edges), so the expected
n
number of edges is
 p
 2
 
p M (1  p )
n
   M
 2
probability of a graph with n nodes
and M edges
Random graph model

a universal model for random graph creation

despite random edge creation each node has approximately the same degree
governed by the Poisson distribution
f (k ;  ) 

k e  
k!
consequences

everyone has approximately the same number of friends
each company cooperates with a similar number of partners
each website records a similar number of visits

… clearly not true


źródło: www.boost.org
The birth of the connected component

percolation happens in random networks
n*p < 1 : no components larger than O(log n)
n*p = 1 : component with the size of order n2/3
n*p = const > 1 : single connected component
(1   ) ln n
n
(1   ) ln n
p
n
p
disconnected graph
connected graph
large deviation during
the formation of a single
connected component
average node degree
Watts-Strogatz model

nodes placed in a circle, each node is connected to 4 direct neighbours
 clustering coefficient cc = 0.5 (in a random network the clustering coefficient depends on the
number of nodes)
 random nodes are further connected with an edge using very small probabilities
 added edges do not affect the clustering coefficient, but they enable the phenomenon of small
worlds
SNA model for disease infection

Disease infection takes into cosideration the structure of the network


infections by „disease fronts” at the boundary of groups
random links allow the disease to spread
# infected
β=1
explains black plague, HIV,
foot-and-mouth disease in Britain
R=1
β=0
reproduction rate
Conclusions

Current research on SNA has some serious flaws





myth of influencial actors
myth of six degrees of separation
focus on weak ties
not taking sociology and psychology seriously
SNA methods can be very beneficial

when applied to novel research areas
when used imaginatively to solve old problems

[email protected]