ProposalSlidesSmilyanovTsikhanovich

Algorithms For Solving History
Sensitive Cascade in Diffusion
Networks
Research Proposal
Georgi Smilyanov, Maksim Tsikhanovich
Advisor Dr Yu Zhang
Trinity University CS REU, 05.June.2009
Motivation
•Network Diffusion: the process by
which some nodes in a network
influence other, neighboring, nodes
and change their state
•Applications
•Brand recognition
•Diffusion in other domains
•Infectious diseases
•Ideas
•New technologies
Modeling Network Diffusion
•Common Models
•Linear Threshold Model: node
activates when a certain (weighted)
fraction of its neighbors is active
•Independent Cascade Model: active
node has a one-time chance of
activating a neighbor and succeeds
with certain probability
Modeling Network Diffusion
•New Model
•History Sensitive Cascade Model
(HSCM)
•Main idea: Allows nodes to try to
activate neighbors multiple times
•Benefit: More plausible as in reality
people have multiple interactions with
each other
History Sensitive Cascade Model
•Application: A company releases a
new product -- what should the
advertising target audience be?
•Consumers with the highest
willingness to pay?
•More influential consumers?
•Model consumers as nodes that have
both “intrinsic” value and “network”
value.
History Sensitive Cascade Model
•Application: A company releases a
new product -- what should the
advertising target audience be?
•Consumer with low intrinsic value
worth marketing to just because of her
network value
•Marketing to a profitable consumer
may be redundant if network effect
already makes her likely to buy
History Sensitive Cascade Model
•Problems
•Given a node, what is the probability
of this node becoming active at a given
time? (Vertex Activation Problem)
•What is the best subset of nodes to
activate initially as to maximize the
number of active nodes given a certain
time for interaction? (Optimization
Problem)
History Sensitive Cascade Model
•Problems
•Current algorithm implementing
HSCM runs in exponential time
•We hope to invent an approximation
algorithm running in polynomial time
3. Problem Definition
The problems we are trying to solve
Outline
• Vertex Activation Problem
– Approximating it
• Optimization Problems
– Time Minimization
– Activation Maximization
– Approximating them
Vertex Activation Problem
• Given a directed, and
weighted graph G
– Each edge represents the
probability of that edge’s
source activating its target in
one time step.
– What is the probability that a
certain vertex v is active on
the kth time step?
0
0.2
1
0.5
2
Vertex Activation Approximation
Problem
• Given a directed, and weighted graph G, a
vertex v and a time step k
• If we have a program P that takes (G,k,v) and
returns the exact probability of v being active
by the kth time step
• Create a program A such that
– |P(G,k,v)-A(G,k,v)|≤ε
– 0<ε<1
– Guaranteed to be ε for all G,k,v.
Possible Problems With the
Approximation
• We may not be able to create a polynomial
time approximation algorithm for general
graphs for any ε<1 because of the complexity
of the HSCM model
– We will explore this, and if we can’t do it, then
we’ll do it for restricted graphs,
– A polynomial time solution has been created
during last year’s REU for tree graphs.
What we can do with a Vertex
Activation Solver
• Use the concept of Θ-Certitude
– We are Θ-certain that a particular vertex is active
by the kth time step if P(G,v,k)≥Θ
• Determine whether we are Θ-certain that a
subest of V, U is active by time step k
– We simply check that P(G,u,k)≥Θ, for all u in U.
• We use Θ-Certitude to define two
optimization problems.
Time Minimization Problem
• Given G, and a number m<|V|
– Which subset of V, U where |U| ≤m should be
selected
– So that k is minimized, where k refers to the time
step where all v in V are activated with ΘCertitude.
Activation Maximization Problem
• Given G, and m<|V|
– Which subset of V, U should be selected such that
at the kth time step
– The size of the set of nodes activated with ΘCertitude, |AΘ| is maximized.
• Both optimization problems are NP-C, so in
order to work with large data sets, we need to
create approximations.
Approximating the Activation
Maximization Problem
• Given G, and m<|V|, which subset of V, U
should be selected such that
– At time step k, the size of the set of vertices
activated with Θ-certitude, |AΘ| is at least of size
ε|AΘ*|
– 0<ε<1
– |AΘ*| denotes the size of set of vertices activated
with Θ-certitude if the optimal U is chosen.
4. Proposed Solution
The strategies we expect to use to
solve our problems
Solving the Vertex Activation Problem
• Building up from the work of last year’s REU we
have created and implemented an algorithm
• Uses Markov chains to calculate the probability of
a vertex being activated by the kth time step
• Involves multiplying a state transition matrix; since there are
2|V| states the graph can take, this matrix is of size 22|V|
• It can be multiplied in polynomial time, but its size forces the
algorithm overall to run in exponential time.
A Graph and the State Transition
Matrix
0
0.5
0.5
1
2
0.5
[]
[0]
[1]
[0, 1]
[2]
[0, 2]
[1, 2]
[0, 1, 2]
[]
1
0
0
0
0
0
0
0
[0]
0
0.25
0
0.25
0
0.25
0
0.25
[1]
0
0
0.25
0.25
0
0
0.25
0.25
[0, 1]
0
0
0
0.25
0
0
0
0.75
[2]
0
0
0
0
0.25
0.25
0.25
0.25
[0, 2]
0
0
0
0
0
0.25
0
0.75
[1, 2]
0
0
0
0
0
0
0.25
0.75
[0, 1, 2]
0
0
0
0
0
0
0
1
Empirical Evidence of Intractibility
Time (ms)
800
y = 3.3966e0.4843x
R² = 0.9944
700
600
500
Time (ms)
400
Expon. (Time (ms))
300
200
100
0
0
2
4
6
8
10
12
Wrapping Up the Vertex Activation
Problem
• Provide a rigorous analysis of the space and
time complexities
• Optimize the matrix calculation and matrix
multiplication
– It’s easy to determine that it’s not possible for our
graph to go from some states to others, or
whether it cannot move from some states.
– Take advantage of the fact that the matrix is
upper-triangular.
Some (unexplored) ideas for
approximating the Vertex Activation
Problem
• Instead of using the Vertex Activation Problem in order
to decide how good a set U is, heuristically determine a
set of the most influential nodes in the graph
– This might be done using standard graph search, path, or
spanning tree algorithms.
• Simulate the History Sensitive Cascade Model, without
paying too much attention to the cyclical nature of the
graph
• Use Bayesian Networks to solve the Vertex Activation
Problem, and determine whether they are easier to
simulate.
Approximating the Optimization
Problems
• The solutions we have in mind depend on us being able
to determine how good some proposed solution U is
(U is a subset of V).
– Hopefully we will be able to do this with our
approximation to the Vertex Activation Problem, otherwise
we might use a heuristic as described before.
• Given this, we hope to explore several strategies for
calculating U:
– Algorithms that greedily add vertices to U
– Hill-Climbing and Simulated-Annealing algorithms
– A Genetic Algorithm
Proposed Experiment Domain
•Difficult to test
•Need two datasets: Feed the initial
state of the network to the algorithm
and compare against the final state
•Vertex Activation Problem is NPComplete: The approximation
algorithm will not fully reflect the
expressive power of the model
Proposed Experiment Domain
•Simulation
•Test approximations against optimal
predictions
(Kempe at al., Maximizing the Spread of Influence through a Social Network)
Proposed Experiment Domain
•Comparison of HSCM with collected data
•The arXiv database
•Contains citations between scientific papers
•Probability of a certain author being cited at a
given point, depending on the set of all others he
cited and who cited him.
•A Keyboard
•The keys you press influence, which keys you will
press next
•Interesting optimization problems: Dvorak vs
QWERTY, etc.
Timeline
•End of next week: Whole system up
and running; using the exponential-time
algorithm
•In three weeks: Approximation of the
Vertex Activation Problem
•In four weeks: Genetic algorithm to
approximate the Optimization Problem
•In five weeks: Other ways to
approximate the Optimization Problem
Conclusion
•Novel research
•We understand the problem
•But maybe not in its whole complexity?
•Venture into algorithm design
•Haven’t had much experience in this
•Learn a lot
•Even if goal fails
•Algorithms + AI: approximation
techniques + applications of model
(future work)
• Thank you for listening
• We will take questions