Models for a network

Models for a network
• In this section we will study several mathematical and statistical models for network
• We will first develop methods for edge formation:
• We will also explore model for predicting attribute on a network:
• Assume the number of nodes is fixed, and we must propose a stochastic model for Yuv , where
Yuv = 1 if nodes u and v are connected and Yuv = 0 otherwise
• The simple random sample model fixes the total number of edges, D =
P
u<v
Yuv , and then
places equal probability on all graphs with D edges
• Similarly, the Bernoulli model is:
• These models are often used to test null hypotheses
(5) Networks - Part 2
Page 1
Extension of the Bernoulli model
• Let Xu be a vector of attributes for node u
• For example, if the nodes are people in a social network, then Xu might be:
• To learn about the effect of node attributes, we could use logistic regression:
logit[Prob(Yuv = 1|Xu , Xv )] = γ +
p
X
j=1
(Xuj + Xvj )βj +
p
X
|Xuj − Xvj |αj
j=1
• Interpretation:
(5) Networks - Part 2
Page 2
Extension of the Bernoulli model
• This model assumes that all nodes are independent of each other.
• However, are Yu1 and Yu2 really independent? Both edges involve node u and this suggests
possible dependence.
• One way to account for this dependence is using random effects:
• Interpretation of the random effect for node u:
• This is a generalized linear mixed effects model
(5) Networks - Part 2
Page 3
Huge networks
• When the number of nodes is large, the number of potential edges is huge!
• Subsampling can reduce the computational burden
• One sampling scheme is a simple random sample of nodes
• Another sampling scheme is a simple random sample of edges
• Neither are efficient for sparse networks where the vast majority of nodes are not connected.
• You could sample the 1’s with a higher probability than the 0’s (King and Zeng, 2001):
(5) Networks - Part 2
Page 4
Other models
• Small world model
• Preferential attachment model
(5) Networks - Part 2
Page 5
Attribute prediction
• Now assume the network is fixed, and we wish to study an attribute assigned to each node
• For example, let Xi be the movie rating given by person i
• Consider the case where we have observed the movie rating for many people on the network:
• Our objective to impute the missing ratings while exploiting the network structure
(5) Networks - Part 2
Page 6
K-nearest neighbor prediction
• The simplest approach is KNN
• Define a distance between the prediction node and each training node:
• The KNN prediction for Xi is then:
(5) Networks - Part 2
Page 7
Gaussian Markov random field model
• A more powerful approach is assume the attribute follows a multivariate normal distribution
• A common model for the covariance is the inverse of the Laplacian:
• Prediction then follows from the Kriging conditional expectation formula we studied in
Gaussian process regression
(5) Networks - Part 2
Page 8
Gaussian Markov random field model
• This is a Markov model since nodes are independent of all other nodes given their neighbors
• This is also called a conditionally autogressive model (CAR)
• Proof of Markov property:
(5) Networks - Part 2
Page 9