Influence and Correlation in Social Network PRESENTED BY PNINA NISSIM Correlation in Social Network A correlation is a single number that describes the degree of relationship between two variables. It is highly interesting to interpret users’ actions in the context of their online friends and to correlate the actions of socially connected users. I hate game of thrones! Game of thrones is the worst series ever! Stop watching Game of thrones! game of thrones is the best! game of thrones! I love game of thrones game of thrones! I hate game of thrones! I love game of thrones Previous Work L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan (2006) examined the membership problem in an online community. C. Marlow, M. Naaman, D. Boyd, and M. Davis (2006) considered the tag usage problem in Flickr. These studies have established the existence of correlation between user actions and social affiliations, but they do not address the source of the correlation. Causes of Correlation in Social Network Influence (induction) – the action of a user is triggered by one of his friend’s recent actions. Homophily – individuals often befriend others who are similar to them, and hence perform similar actions. Environment (confounding factors, external influence)– external factors are correlated both with the event that two individuals become friends and also with their actions. Social Influence Social influence occurs when one’s opinions, emotions, or behaviors are effected by others. In the presence of social influence, an idea, norm of behavior and a product diffuses through the social network like an epidemic. Being able to identify in which cases influence prevails is an important step to strategy design. Models of Social Correlation Directed graph G that is generated from an unknown probability. The nodes are the agents in the social network. After an agent performs the action for the first time, we say that the agent has become active. Let W denote the set of agents that are active at the end of a certain time period [0, 𝑇]. Models of Social Correlation 2 2 4 1 0 4 3 4 Homophily Model The set 𝑊 of active nodes is first selected according to some distribution The graph 𝐺 is picked from a distribution that depends on 𝑊. Confounding Model There is a confounding variable 𝑋 Both the network 𝐺 and the set of active individuals 𝑊 come from distributions correlated with 𝑋. Generalization – Correlation Model The pair (𝐺, 𝑊) is selected according to a joint probability distribution. The time of activation for individuals in W is picked i.i.d. according to a distribution 𝜏 on [0, 𝑇]. The main assumption : the probability that an agent is active can be affected by whether their friends become active, but not by when they become active. Influence Model The graph 𝐺 is drawn according to some distribution. In each of the time steps 1, … , 𝑇 each non-active agent decides whether to become active. The probability of becoming active for each agent u is a function 𝑝(𝑥) of the number 𝑥 of other agents 𝑣 that have an edge to u and are already active. v u w The Function p(·) A logistic function with the logarithm of the number of friends provides a good fit for the probability. The probability 𝑝 𝑎 of activation for an agent with 𝑎 already-active friends: 𝑒 𝛼 ln 𝑎+1 +𝛽 𝑝 𝑎 = 1 + 𝑒 𝛼 ln 𝑎+1 +𝛽 Where 𝛼, 𝛽 are coefficients. Measuring Social Correlation The coefficient 𝛼 measures social correlation: a large value of 𝛼 indicates a large degree of correlation. 𝑌𝑎,𝑡 − the number of users who at the beginning of time 𝑡 had 𝑎 active friends and started using the tag at time 𝑡. 𝑁𝑎,𝑡 − the number of users who had 𝑎 active friends at time 𝑡 , but did not start using the tag (at time 𝑡). 𝑌𝑎 = Σ𝑡 𝑌𝑎,𝑡 , 𝑁𝑎 = Σ𝑡 𝑁𝑎,𝑡 Example 2 2 4 1 0 3 𝑌0,0 = 1 𝑌0,1 = 1 𝑌0,4 = 1 𝒀𝟎 = 𝟑 4 4 𝑌1,2 = 2 𝑌1,4 = 1 𝒀𝟏 = 𝟑 𝑁0,0 = 13 𝑁0,1 = 11 𝑁0,2 = 8 𝑁0,3 = 7 𝑁0,4 = 6 𝑵𝟎 = 𝟒𝟓 Maximum Likelihood Method We compute the values of 𝛼 and 𝛽 that maximize the expression 𝑎𝑝 𝑎 𝑝 𝑎 = 𝑌𝑎 1−𝑝 𝑎 𝑒 𝛼 ln 𝑎+1 +𝛽 1+𝑒 𝛼 ln 𝑎+1 +𝛽 𝑁𝑎 The Shuffle Test The Test: shuffle the timestamps of user activities and check if the new estimate of social correlation is significantly different from the estimate based on the user activity log. The Shuffle Test 𝛼 − the social correlation coefficient where user 𝑤𝑖 is first activated at time 𝑡𝑖 . 𝛼 ′ − the social correlation coefficient where user 𝑤𝑖 is first activated at time 𝑡′𝑖 ∶= 𝑡𝜋 𝑖 for a random permutation 𝜋. The shuffle test declares that the model exhibits no social influence if the values of 𝜶 and 𝜶′ are close to each other. Why Does it Work? In an instance generated from the correlation model, the time stamps 𝑡𝑖 are independent, identically distributed (i.i.d.) from a distribution 𝜏 over [0, 𝑇]. The second instance constructed above only permutes all time stamps, and hence the new 𝑡𝑖′ ’𝑠 are still i.i.d. from the same distribution 𝜏. The two instances come from the exact same distribution, and hence they should lead to the same expected social correlation coefficient 𝜶. The Edge-Reversal Test In this test we reverse the direction of all the edges and run logistic regression on the data using the new graph as well. Social influence spreads in the direction specified by the edges of the graph, and hence reversing the edges should intuitively change the estimate of the correlation. Simulations Three generative models. In each model, we will try to keep other aspects of the model as close to Flickr’s data as possible. o Number of users and Connections o The number of users that become active in each time step The No-Correlation Model There is no social correlation, influence or otherwise, in the pattern of activations. In each time step, we look at the real data to see how many new agents use the tag, and pick the same number of agents uniformly at random from the set of agents that have already joined the network and have not been picked yet. The Influence Model Influence is the only form of social correlation. This model is parameterized in terms of two parameters, 𝛼 and 𝛽. In every time step, each node in the set of nodes that has joined the network but not activated yet flips a coin independently to decide if to become active in this time step. The Correlation (no-influence) Model Agents that are close to each other in the network are affected by the same external factors that make them more likely to be activated. The model is parameterized in terms of one parameter 𝐿. Select a set 𝑆 of 𝐿 nodes. The Correlation (no-influence) Model – Selecting S picking a number of centers at random. The Correlation (no-influence) Model – Selecting S Adding a ball of radius 2 around each node in 𝑆 to 𝑆 Stop this process as soon as the size of 𝑺 reaches the prespecified number 𝑳. The Correlation (no-influence) Model Generate the set of agents that become active in each time step in a manner similar to the one in the no-correlation model, except that in each time step we pick the set of agents to become active uniformly at random from 𝑆. Measuring Correlation The first set of experiments focuses on the measurement of correlation in the network. We can compute the social correlation coefficient by applying logistic regression to each model. Correlation model Results Influence model No-correlation model Shuffle Test for Influence Model We can see that value of 𝛼 decreases after shuffling the tagging timestamp. Shuffle Test for Correlation Model for almost all tags the values of α retrieved are very close with and without the shuffle. Edge-Reversal Test for Influence Model Similarly to the previous test, there is a significant difference in the values of 𝛼. Edge-Reversal Test for Correlation Model The values of 𝛼 essentially coincide. Experiments on Real Data The techniques are effective for the simulated data Are they effective for the real-world data, namely on the Flickr social network? Experiment: Analyzing the tagging behavior of users Real Data Images may be tagged either by the uploader or by other users, if the uploader permits it. The Flickr Dataset 16 months. 800K users We restricted our attention to the set of users who have tagged any photo with any tag, which is about 340K users. The proportion of u’s contacts that do not have u as a contact is 28.5%. The Flickr Dataset Out of a collection of about 10K tags that users had used, they selected a set of 1,700, and analyzed each of them independently. various types (event, colors, objects, etc.) various numbers of users (most of them were used by more than 1,000 users) various growth patterns: bursty (e.g. “halloween”,“katrina”), smooth (e.g., “photos”) and periodic (e.g., “moon”). Measuring Correlation For almost all the tags the value is higher than 1, suggesting that correlation is prevalent in users’ tagging activities for almost all the tags. Distinguishing influence The Shuffle Test The correlation cannot be attributed to influence. Distinguishing influence The Edge-Reversal Test The correlation cannot be attributed to influence.
© Copyright 2026 Paperzz