Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity University CS REU, 05.June.2009 Motivation •Network Diffusion: the process by which some nodes in a network influence other, neighboring, nodes and change their state •Applications •Brand recognition •Diffusion in other domains •Infectious diseases •Ideas •New technologies Modeling Network Diffusion •Common Models •Linear Threshold Model: node activates when a certain (weighted) fraction of its neighbors is active •Independent Cascade Model: active node has a one-time chance of activating a neighbor and succeeds with certain probability Modeling Network Diffusion •New Model •History Sensitive Cascade Model (HSCM) •Main idea: Allows nodes to try to activate neighbors multiple times •Benefit: More plausible as in reality people have multiple interactions with each other History Sensitive Cascade Model •Application: A company releases a new product -- what should the advertising target audience be? •Consumers with the highest willingness to pay? •More influential consumers? •Model consumers as nodes that have both “intrinsic” value and “network” value. History Sensitive Cascade Model •Application: A company releases a new product -- what should the advertising target audience be? •Consumer with low intrinsic value worth marketing to just because of her network value •Marketing to a profitable consumer may be redundant if network effect already makes her likely to buy History Sensitive Cascade Model •Problems •Given a node, what is the probability of this node becoming active at a given time? (Vertex Activation Problem) •What is the best subset of nodes to activate initially as to maximize the number of active nodes given a certain time for interaction? (Optimization Problem) History Sensitive Cascade Model •Problems •Current algorithm implementing HSCM runs in exponential time •We hope to invent an approximation algorithm running in polynomial time 3. Problem Definition The problems we are trying to solve Outline • Vertex Activation Problem – Approximating it • Optimization Problems – Time Minimization – Activation Maximization – Approximating them Vertex Activation Problem • Given a directed, and weighted graph G – Each edge represents the probability of that edge’s source activating its target in one time step. – What is the probability that a certain vertex v is active on the kth time step? 0 0.2 1 0.5 2 Vertex Activation Approximation Problem • Given a directed, and weighted graph G, a vertex v and a time step k • If we have a program P that takes (G,k,v) and returns the exact probability of v being active by the kth time step • Create a program A such that – |P(G,k,v)-A(G,k,v)|≤ε – 0<ε<1 – Guaranteed to be ε for all G,k,v. Possible Problems With the Approximation • We may not be able to create a polynomial time approximation algorithm for general graphs for any ε<1 because of the complexity of the HSCM model – We will explore this, and if we can’t do it, then we’ll do it for restricted graphs, – A polynomial time solution has been created during last year’s REU for tree graphs. What we can do with a Vertex Activation Solver • Use the concept of Θ-Certitude – We are Θ-certain that a particular vertex is active by the kth time step if P(G,v,k)≥Θ • Determine whether we are Θ-certain that a subest of V, U is active by time step k – We simply check that P(G,u,k)≥Θ, for all u in U. • We use Θ-Certitude to define two optimization problems. Time Minimization Problem • Given G, and a number m<|V| – Which subset of V, U where |U| ≤m should be selected – So that k is minimized, where k refers to the time step where all v in V are activated with ΘCertitude. Activation Maximization Problem • Given G, and m<|V| – Which subset of V, U should be selected such that at the kth time step – The size of the set of nodes activated with ΘCertitude, |AΘ| is maximized. • Both optimization problems are NP-C, so in order to work with large data sets, we need to create approximations. Approximating the Activation Maximization Problem • Given G, and m<|V|, which subset of V, U should be selected such that – At time step k, the size of the set of vertices activated with Θ-certitude, |AΘ| is at least of size ε|AΘ*| – 0<ε<1 – |AΘ*| denotes the size of set of vertices activated with Θ-certitude if the optimal U is chosen. 4. Proposed Solution The strategies we expect to use to solve our problems Solving the Vertex Activation Problem • Building up from the work of last year’s REU we have created and implemented an algorithm • Uses Markov chains to calculate the probability of a vertex being activated by the kth time step • Involves multiplying a state transition matrix; since there are 2|V| states the graph can take, this matrix is of size 22|V| • It can be multiplied in polynomial time, but its size forces the algorithm overall to run in exponential time. A Graph and the State Transition Matrix 0 0.5 0.5 1 2 0.5 [] [0] [1] [0, 1] [2] [0, 2] [1, 2] [0, 1, 2] [] 1 0 0 0 0 0 0 0 [0] 0 0.25 0 0.25 0 0.25 0 0.25 [1] 0 0 0.25 0.25 0 0 0.25 0.25 [0, 1] 0 0 0 0.25 0 0 0 0.75 [2] 0 0 0 0 0.25 0.25 0.25 0.25 [0, 2] 0 0 0 0 0 0.25 0 0.75 [1, 2] 0 0 0 0 0 0 0.25 0.75 [0, 1, 2] 0 0 0 0 0 0 0 1 Empirical Evidence of Intractibility Time (ms) 800 y = 3.3966e0.4843x R² = 0.9944 700 600 500 Time (ms) 400 Expon. (Time (ms)) 300 200 100 0 0 2 4 6 8 10 12 Wrapping Up the Vertex Activation Problem • Provide a rigorous analysis of the space and time complexities • Optimize the matrix calculation and matrix multiplication – It’s easy to determine that it’s not possible for our graph to go from some states to others, or whether it cannot move from some states. – Take advantage of the fact that the matrix is upper-triangular. Some (unexplored) ideas for approximating the Vertex Activation Problem • Instead of using the Vertex Activation Problem in order to decide how good a set U is, heuristically determine a set of the most influential nodes in the graph – This might be done using standard graph search, path, or spanning tree algorithms. • Simulate the History Sensitive Cascade Model, without paying too much attention to the cyclical nature of the graph • Use Bayesian Networks to solve the Vertex Activation Problem, and determine whether they are easier to simulate. Approximating the Optimization Problems • The solutions we have in mind depend on us being able to determine how good some proposed solution U is (U is a subset of V). – Hopefully we will be able to do this with our approximation to the Vertex Activation Problem, otherwise we might use a heuristic as described before. • Given this, we hope to explore several strategies for calculating U: – Algorithms that greedily add vertices to U – Hill-Climbing and Simulated-Annealing algorithms – A Genetic Algorithm Proposed Experiment Domain •Difficult to test •Need two datasets: Feed the initial state of the network to the algorithm and compare against the final state •Vertex Activation Problem is NPComplete: The approximation algorithm will not fully reflect the expressive power of the model Proposed Experiment Domain •Simulation •Test approximations against optimal predictions (Kempe at al., Maximizing the Spread of Influence through a Social Network) Proposed Experiment Domain •Comparison of HSCM with collected data •The arXiv database •Contains citations between scientific papers •Probability of a certain author being cited at a given point, depending on the set of all others he cited and who cited him. •A Keyboard •The keys you press influence, which keys you will press next •Interesting optimization problems: Dvorak vs QWERTY, etc. Timeline •End of next week: Whole system up and running; using the exponential-time algorithm •In three weeks: Approximation of the Vertex Activation Problem •In four weeks: Genetic algorithm to approximate the Optimization Problem •In five weeks: Other ways to approximate the Optimization Problem Conclusion •Novel research •We understand the problem •But maybe not in its whole complexity? •Venture into algorithm design •Haven’t had much experience in this •Learn a lot •Even if goal fails •Algorithms + AI: approximation techniques + applications of model (future work) • Thank you for listening • We will take questions
© Copyright 2024 Paperzz