Distributed Adaptive Importance Sampling on
Graphical Models using MapReduce
Ahsanul Haque*, Swarup Chandra*, Latifur Khan* and Charu Aggarwal+
* Department of Computer Science, University of Texas at Dallas
+ IBM T. J. Watson
Research Center, Yorktown NY, USA
This material is based upon work supported by
University Of Texas at Dallas
Agenda
Brief overview on Inference techniques
Problem
Proposed Approaches
Experiments
Discussion
2
University Of Texas at Dallas
Agenda
Brief overview on Inference techniques
Problem
Proposed Approaches
Experiments
Discussion
3
University Of Texas at Dallas
Graphical Models
A probabilistic graphical model G is a collection of functions
over a set of random variables.
Generally represented as a network of nodes:
Each node denoting a random variable (e.g., data feature).
Each edge denotes relationship between two random variables.
Two types of representations:
Bayesian network is represented by directed graph.
Markov network is represented by undirected graph.
4
University Of Texas at Dallas
Example Graphical Model
(A,C)
A
(C,E)
C
Sample Factor:
E
(C,D)
(A,B)
B
D
(B,D)
(E,F)
F
A
C
(A,C)
0
0
5
0
1
100
1
0
15
1
1
20
(D,F)
Inference is needed to evaluate Probability of Evidence,
Prior and Posterior Marginal, Most Probable Explanation
(MPE) and Maximum a Posteriori (MAP) queries.
Probability of Evidence needs to be evaluated in classification
problems.
5
University Of Texas at Dallas
Exact Inference
Exact Inference algorithms, e.g., Variable Elimination provide
accurate results for Probability of Evidence.
Challenges:
Exponential time and space complexity.
Computationally intractable on large graphs.
Approximate Inference algorithms are used widely in practice to
evaluate queries within resource limit.
Sampling based, e.g., Gibbs Sampling, Importance Sampling.
Propagation based, e.g., Iterative Join Graph Propagation.
6
University Of Texas at Dallas
Adaptive Importance Sampling (AIS)
Adaptive Importance Sampling (AIS) is an approximate
Inference algorithm where-
Samples are generated from a known distribution Q, called the
proposal distribution.
Q is updated periodically based on the sample weights.
Probability of evidence is evaluated using the samples generated
using the proposal distribution and by calculating the following
expected value with respect to Q.
P(E = e) = EQ
[
], where R=X\E
𝑃(𝑅=𝑟, 𝐸=𝑒)
𝑄(𝑅=𝑟)
Considering weight of each sample reduces the variance in
expected value due to occurrence of rare events.
7
University Of Texas at Dallas
RB-AIS
We focus on a special type of AIS in this paper, called Rao-
Blackwellized Adaptive Importance Sampling (RB-AIS).
In RB-AIS, a set of variables, Xw ⊂ X \ Xe (called w-cutset
variables) are sampled.
Xw is chosen in such a way that Exact Inference over X \ Xw,
Xe is tractable.
Large |Xw| results in quicker evaluation of query but more
erroneous result.
Small |Xw| results in more accurate result but takes more
time.
Trade off!
V. Gogate and R. Dechter, “Approximate inference algorithms for hybrid bayesian networks with discrete constraints.” in UAI. AUAI Press, 2005, pp. 209–216.
8
University Of Texas at Dallas
RB-AIS : Steps
Start
Initial Q on Xw
Generate Samples
𝑛
𝑡
𝑡
𝑉𝐸(𝐺𝑒𝑤)
𝑡=1 δ𝑥 (𝑋 = 𝑥)Ψ𝑥
Q(X
Ψ𝑥==x) =
𝑡
𝑛
𝑄(𝑿 = 𝑥)𝑡=1 Ψ𝑥
Calculate Sample
Weights
Update Q and Z
Converge?
No
Yes
End
9
University Of Texas at Dallas
Agenda
Brief overview on Inference techniques
Problem
Proposed Approaches
Experiments
Discussion
10
University Of Texas at Dallas
Problem
Real world applications require good quality result within the
time constraint.
Typically, real world networks are large and complex (i.e.,
large tree width).
For instance, if we want to model facebook users using
graphical models, it will have billions of nodes in it!
Even RB-AIS may run out of time to provide a quality
estimate within the time limit.
For instance, RB-AIS takes more than 6 hours to find out a
single probability of evidence on a network having only 67 nodes
and 271 factors.
11
University Of Texas at Dallas
Agenda
Brief overview on Inference techniques
Problem
Proposed Approaches
Experiments
Discussion
12
University Of Texas at Dallas
Challenges
To design a parallel and distributed approach for RB-AIS,
following challenges need to be addressed:
RB-AIS updates Q periodically.
Since values of Q and Z at iteration i depends on those values at
iteration i -1, a proper synchronization mechanism is needed.
Distributing the task of sample generation on Xw over the
worker nodes.
13
University Of Texas at Dallas
Proposed Approaches
We design and implement two MapReduce based approaches
for distributed and parallel computation of inference queries
using RB-AIS.
Distributed Sampling in Mappers (DSM)
Parallel sampling.
Sequential weight calculation.
Distributed Weight Calculation in Mappers (DWCM)
Sequential sampling
Parallel weight calculation.
14
University Of Texas at Dallas
Distributed Sampling in Mappers (DSM)
Input to ith MR job: Xw, Qi
X1
X2
Qi[x1]
X3
Qi[x2]
Map 2
Map 1
Qi[x3]
-1
Z
Xm
Map 3
1
(X1, x11, Qi[X1])
1
(X2, x21, Qi[X2])
1
(X3, x31, Qi[X3])
n
(X1, x1n, Qi[X1])
n
(X2, x2n, Qi[X2])
n
(X3, x3n, Qi[X3])
Qi[xm]
Map m
-1
Z
1
(Xm, xm1, Qi[Xm])
n
(Xm, xmn, Qi[Xm])
Shuffle and Sort: aggregate values by keys
(X1, x1s, Q[X1])
s
(X2, x2s, Q[X2])
(Xm, xms, Q[Xm])
(X3, x3s, Q[X3])
Reducer
Combine x1s, x2s…xms to form xs, where s = {1,2…n}
Calculate Ψ1, Ψ2 … Ψn
Update Z, and Qi to Qi+1
X1
Qi+1[x1]
X2
Qi+1[x2]
X3
Qi+1[x3]
15
Xm
Qi+1[xm]
-1
Z
University Of Texas at Dallas
Distributed Weight Calculation in Mappers
(DWCM)
Input to ith MR job: Xw, List[x]
x1
x2
Qi[Xw=x1]
Map 2
Map 1
wv
(x1, Ψ1)
x3
Qi[Xw=x2]
xn
Map 3
(x2, Ψ2)
wv
Z
-1
Qi[Xw=x3]
wv
Qi[Xw=xn]
Map n
-1
(x3, Ψ3)
Z
wv
(xn, Ψn)
Shuffle and Sort: aggregate values by keys
(x1, Ψ1)
wv
(x2, Ψ2)
(xn, Ψn)
(x3, Ψ3)
Reducer
Update Z and Qi to Qi+1
x1
Qi+1[Xw=x1]
x2
Qi+1[Xw=x2]
x3
Qi+1[Xw=x3]
16
xn
Qi[Xw=xn]
-1
Z
University Of Texas at Dallas
Agenda
Brief overview on Inference techniques
Problem
Proposed Approaches
Experiments
Discussion
17
University Of Texas at Dallas
Setup
Performance Metrics:
Speedup = Tsq/Td
Tsq = Execution time of sequential approach.
Td = Execution time of distributed approach.
Scaleup = Ts/Tp
Ts = Execution time using single Mapper.
Tp = Execution time using multiple Mappers.
Hadoop version 1.2.1.
Network
8 data nodes, 1 name node.
Number of
Nodes
Number of
Factors
54.wcsp[1]
67
271
Each machine has 2.2GHz
29.wcsp[1]
82
462
404.wcsp[1]
100
710
processor and 4 GB of RAM.
[1] “The probabilistic inference challenge (pic2011),” http://www.cs.huji.ac.il/project/PASCAL/showNet.php, 2011, last updated on 10.23.2014.
18
University Of Texas at Dallas
Speedup
19
University Of Texas at Dallas
Scaleup
20
University Of Texas at Dallas
Discussion
Both of the approaches achieve substantial speedup and scaleup
comparing with the sequential execution.
DWCM has better speedup and scalability than DSM.
Weight calculation is computationally more expensive than sample
generation.
DWCM does parallel weight calculation, so it outperforms DSM.
Both of the approaches show similar accuracy to the sequential
execution asymptotically.
21
University Of Texas at Dallas
Questions?
22
University Of Texas at Dallas
© Copyright 2026 Paperzz