Slides - The University of Texas at Dallas

Distributed Adaptive Importance Sampling on
Graphical Models using MapReduce
Ahsanul Haque*, Swarup Chandra*, Latifur Khan* and Charu Aggarwal+
* Department of Computer Science, University of Texas at Dallas
+ IBM T. J. Watson
Research Center, Yorktown NY, USA
This material is based upon work supported by
University Of Texas at Dallas
Agenda
 Brief overview on Inference techniques
 Problem
 Proposed Approaches
 Experiments
 Discussion
2
University Of Texas at Dallas
Agenda
 Brief overview on Inference techniques
 Problem
 Proposed Approaches
 Experiments
 Discussion
3
University Of Texas at Dallas
Graphical Models
 A probabilistic graphical model G is a collection of functions
over a set of random variables.
 Generally represented as a network of nodes:
 Each node denoting a random variable (e.g., data feature).
 Each edge denotes relationship between two random variables.
 Two types of representations:
 Bayesian network is represented by directed graph.
 Markov network is represented by undirected graph.
4
University Of Texas at Dallas
Example Graphical Model
(A,C)
A
(C,E)
C
Sample Factor:
E
(C,D)
(A,B)
B
D
(B,D)
(E,F)
F
A
C
(A,C)
0
0
5
0
1
100
1
0
15
1
1
20
(D,F)
 Inference is needed to evaluate Probability of Evidence,
Prior and Posterior Marginal, Most Probable Explanation
(MPE) and Maximum a Posteriori (MAP) queries.
 Probability of Evidence needs to be evaluated in classification
problems.
5
University Of Texas at Dallas
Exact Inference
 Exact Inference algorithms, e.g., Variable Elimination provide
accurate results for Probability of Evidence.
 Challenges:
 Exponential time and space complexity.
 Computationally intractable on large graphs.
 Approximate Inference algorithms are used widely in practice to
evaluate queries within resource limit.
 Sampling based, e.g., Gibbs Sampling, Importance Sampling.
 Propagation based, e.g., Iterative Join Graph Propagation.
6
University Of Texas at Dallas
Adaptive Importance Sampling (AIS)
 Adaptive Importance Sampling (AIS) is an approximate
Inference algorithm where-
 Samples are generated from a known distribution Q, called the
proposal distribution.
 Q is updated periodically based on the sample weights.
 Probability of evidence is evaluated using the samples generated
using the proposal distribution and by calculating the following
expected value with respect to Q.
P(E = e) = EQ
[
], where R=X\E
𝑃(𝑅=𝑟, 𝐸=𝑒)
𝑄(𝑅=𝑟)
 Considering weight of each sample reduces the variance in
expected value due to occurrence of rare events.
7
University Of Texas at Dallas
RB-AIS
 We focus on a special type of AIS in this paper, called Rao-
Blackwellized Adaptive Importance Sampling (RB-AIS).
 In RB-AIS, a set of variables, Xw ⊂ X \ Xe (called w-cutset
variables) are sampled.
 Xw is chosen in such a way that Exact Inference over X \ Xw,
Xe is tractable.
 Large |Xw| results in quicker evaluation of query but more
erroneous result.
 Small |Xw| results in more accurate result but takes more
time.
 Trade off!
V. Gogate and R. Dechter, “Approximate inference algorithms for hybrid bayesian networks with discrete constraints.” in UAI. AUAI Press, 2005, pp. 209–216.
8
University Of Texas at Dallas
RB-AIS : Steps
Start
Initial Q on Xw
Generate Samples
𝑛
𝑡
𝑡
𝑉𝐸(𝐺𝑒𝑤)
𝑡=1 δ𝑥 (𝑋 = 𝑥)Ψ𝑥
Q(X
Ψ𝑥==x) =
𝑡
𝑛
𝑄(𝑿 = 𝑥)𝑡=1 Ψ𝑥
Calculate Sample
Weights
Update Q and Z
Converge?
No
Yes
End
9
University Of Texas at Dallas
Agenda
 Brief overview on Inference techniques
 Problem
 Proposed Approaches
 Experiments
 Discussion
10
University Of Texas at Dallas
Problem
 Real world applications require good quality result within the
time constraint.
 Typically, real world networks are large and complex (i.e.,
large tree width).
 For instance, if we want to model facebook users using
graphical models, it will have billions of nodes in it!
 Even RB-AIS may run out of time to provide a quality
estimate within the time limit.
 For instance, RB-AIS takes more than 6 hours to find out a
single probability of evidence on a network having only 67 nodes
and 271 factors.
11
University Of Texas at Dallas
Agenda
 Brief overview on Inference techniques
 Problem
 Proposed Approaches
 Experiments
 Discussion
12
University Of Texas at Dallas
Challenges
 To design a parallel and distributed approach for RB-AIS,
following challenges need to be addressed:
 RB-AIS updates Q periodically.
 Since values of Q and Z at iteration i depends on those values at
iteration i -1, a proper synchronization mechanism is needed.
 Distributing the task of sample generation on Xw over the
worker nodes.
13
University Of Texas at Dallas
Proposed Approaches
 We design and implement two MapReduce based approaches
for distributed and parallel computation of inference queries
using RB-AIS.
 Distributed Sampling in Mappers (DSM)
 Parallel sampling.
 Sequential weight calculation.
 Distributed Weight Calculation in Mappers (DWCM)
 Sequential sampling
 Parallel weight calculation.
14
University Of Texas at Dallas
Distributed Sampling in Mappers (DSM)
Input to ith MR job: Xw, Qi
X1
X2
Qi[x1]
X3
Qi[x2]
Map 2
Map 1
Qi[x3]
-1
Z
Xm
Map 3
1
(X1, x11, Qi[X1])
1
(X2, x21, Qi[X2])
1
(X3, x31, Qi[X3])
n
(X1, x1n, Qi[X1])
n
(X2, x2n, Qi[X2])
n
(X3, x3n, Qi[X3])
Qi[xm]
Map m
-1
Z
1
(Xm, xm1, Qi[Xm])
n
(Xm, xmn, Qi[Xm])
Shuffle and Sort: aggregate values by keys
(X1, x1s, Q[X1])
s
(X2, x2s, Q[X2])
(Xm, xms, Q[Xm])
(X3, x3s, Q[X3])
Reducer
Combine x1s, x2s…xms to form xs, where s = {1,2…n}
Calculate Ψ1, Ψ2 … Ψn
Update Z, and Qi to Qi+1
X1
Qi+1[x1]
X2
Qi+1[x2]
X3
Qi+1[x3]
15
Xm
Qi+1[xm]
-1
Z
University Of Texas at Dallas
Distributed Weight Calculation in Mappers
(DWCM)
Input to ith MR job: Xw, List[x]
x1
x2
Qi[Xw=x1]
Map 2
Map 1
wv
(x1, Ψ1)
x3
Qi[Xw=x2]
xn
Map 3
(x2, Ψ2)
wv
Z
-1
Qi[Xw=x3]
wv
Qi[Xw=xn]
Map n
-1
(x3, Ψ3)
Z
wv
(xn, Ψn)
Shuffle and Sort: aggregate values by keys
(x1, Ψ1)
wv
(x2, Ψ2)
(xn, Ψn)
(x3, Ψ3)
Reducer
Update Z and Qi to Qi+1
x1
Qi+1[Xw=x1]
x2
Qi+1[Xw=x2]
x3
Qi+1[Xw=x3]
16
xn
Qi[Xw=xn]
-1
Z
University Of Texas at Dallas
Agenda
 Brief overview on Inference techniques
 Problem
 Proposed Approaches
 Experiments
 Discussion
17
University Of Texas at Dallas
Setup
 Performance Metrics:
 Speedup = Tsq/Td
 Tsq = Execution time of sequential approach.
 Td = Execution time of distributed approach.
 Scaleup = Ts/Tp
 Ts = Execution time using single Mapper.
 Tp = Execution time using multiple Mappers.
 Hadoop version 1.2.1.
Network
 8 data nodes, 1 name node.
Number of
Nodes
Number of
Factors
54.wcsp[1]
67
271
 Each machine has 2.2GHz
29.wcsp[1]
82
462
404.wcsp[1]
100
710
processor and 4 GB of RAM.
[1] “The probabilistic inference challenge (pic2011),” http://www.cs.huji.ac.il/project/PASCAL/showNet.php, 2011, last updated on 10.23.2014.
18
University Of Texas at Dallas
Speedup
19
University Of Texas at Dallas
Scaleup
20
University Of Texas at Dallas
Discussion
 Both of the approaches achieve substantial speedup and scaleup
comparing with the sequential execution.
 DWCM has better speedup and scalability than DSM.
 Weight calculation is computationally more expensive than sample
generation.
 DWCM does parallel weight calculation, so it outperforms DSM.
 Both of the approaches show similar accuracy to the sequential
execution asymptotically.
21
University Of Texas at Dallas
Questions?
22
University Of Texas at Dallas