Particle Filtering in Network Tomography

Particle Filtering in Network
Tomography
Mark Coates
McGill University
Brain mapping
(opening it up can
disturb the system)
Network mapping
(opening it up can
disturb the system)
Brain Tomography
unknown
object
counting
statistical&
projection
model
prior
knowledge
MRF
model
Maximum
likelihood
estimate
measurements
physics
Poisson
maximize
likelihood
data
Link-level Network Tomography
unknown
object
routing
&
statistical
counting
model
queuing behaviour
Maximum
likelihood
estimate
measurements
bi/multinomial
physics
maximize
likelihood
data
Likelihood Formulation
y  A  
A = routing matrix (graph)
 = packet loss probabilities
or queuing delays
for each link
y = packet losses or delays
measured at the edge
 = randomness inherent in
traffic measurements
l ( A, )  f ( y | A, )
Statistical
likelihood function
Classical Problem
Solve the linear system
y  A  
Interesting if A, , or  have special structures
Maximize the likelihood function
l ( )  f ( y | A )
or:
l ( A, )  f ( y | A )
Network Tomography: The Basic Idea
sender
receivers
Network Tomography: The Basic Idea
sender
receivers
Loss Rate Network Tomography
Measure end-to-end losses of packets
‘0’ loss
‘1’ success
‘0’ loss
‘1’ success
Identifiability Problem: Cannot isolate where losses occur !
Multicast or Packet-Pair Measurement
packet (2) packet (1)
cross-traffic
packet (2)
packet (1)
delay
measurement
packet pair
packet(1) and packet(2) experience (nearly) identical
losses and/or delays on shared links
Loss Rate Estimation
Measure end-to-end losses of packet-pairs
Packets experience the
same fate on link 1
possible
outcomes
0
0
1
1
0
1
0
1
loss on link 2
loss on link 3
Modelling Time Variations
x-traffic
x-traffic
• Nonstationary cross-traffic induces time-variation
• Directly model the dynamics (but maybe not the traffic!)
• Goal is to perform online tracking (and prediction) of
network link characteristics
Non-stationary behaviour
Introduce time-dependence in parameters
yt  At t   t
Filtering exercise (track θt ):
(1) Describe dynamic behaviour of θt
(2) Form estimate:
ˆt  E p ( | y ) [ t ]
t
1: t
(MMSE)
Particle Filtering
Objective:
Estimate expectations
 h(
with respect to a sequence of distributions
known up to a normalizing constant, i.e.
0: t
)  t (d 0: t )
 t t 0
 t (d 0: t )   t ( 0: t ) d 0: t
Monte Carlo: Obtain N weighted samples
where
wt(i )  0,
N
(i )
w
 t 1

(i )
0:t

(i )
t
i 1, , N
,w
such that
i 1
(i )
(i )
w
h

  h  0:t   t  d 0:t 
 t  0:t  
N 
N
i 1
Sequential Monte Carlo Methods

obtain 
• With
(i )
0:t 1
(i )
0:t


(i )
t 1 i 1, , N
,w
, wt(i )
i 1, , N
• Sequential methods
from  t 1 in hand, goal is to
from
t .
do not repeat work.
• Combine importance sampling, resampling, MCMC.
Importance Sampling (1)
• Cannot sample directly from  t
 .
0:t
• Introduce an importance function (pdf)
• Ensure supports match:
 t  0:t  
qt  0:t 
 t  0:t   0  qt  0:t   0
wt  0:t  qt  0:t 
 w   q   d
t
0:t
where importance weight
t
0:t
0:t
 t  0:t 
wt  0:t  
qt  0:t 
Importance Sampling (2)
• Sample

• Then
ˆtN  d 0:t    wt(i )
(i )
0:t
~ qt .
N
i 1
where
w  wt  0:t 
(i )
t
(i )
0:t
 d 
0:t
N
and
w
i 1
(i )
t
1
Sequential Importance Sampling (2)
• Compute weights sequentially
 t  0:t 
wt  0:t  
qt  0:t 
 t  0:t  qt 1  0:t 1 

wt 1  0:t 1 
qt  0:t   t 1  0:t 1 
• At time t:
qt  0:t   qt 1  0:t 1  qt  t |  0:t 1 
 t  0:t 
wt  0:t  
wt 1  0:t 1 
 t 1  0:t 1  qt  t |  0:t 1 
Optimal Filtering
• Evolution of parameters described by function
• Observation described by function
• We have
 t 0:t   p(0:t | y1:t )
f (t | t 1 )
g ( yt | t )
• Importance weight update rule:
 t  0:t 
wt  0:t  
wt 1  0:t 1 
 t 1  0:t 1  qt t |  0:t 
wt  0:t  
f  t |  t 1  g  yt |  t 
qt  t |  0:t 1 , y1:t 
wt 1  0:t 1 
Optimal Filtering Algorithm
• At time t: for i = 1,...,N,
• Sample
 t(i ) ~ q   |  0:(it)1 , y1:t 
• Update the importance weights
wt(i ) 
f  t(i ) |  t(i1)  g  yt |  t( i ) 
qt 
(i )
t
|
(i )
0:t 1
, y1:t 
• Form an estimate:
N
ˆt   wt(i )1  t( i )
i 1
wt(i )1
Key Issues
• Choice of importance function:
• Make
qt  0:t 
as close to
qt  0:t 
 t 0:t  as possible
• Options: prior distribution, optimal distribution, locally
optimal distributions, bridging techniques, etc.
• Choice should attempt to ensure that particles focus on
likely regions in the state space.
• Mechanisms to avoid degeneracy (sample impoverishment)
Resampling
• As time goes by, some weights become dominant.
• Many particles are wasted (sample impoverishment)
• Number of effective particles Neff « N.
• Estimate
Neff  1
w 
(i ) 2
t
• Resampling : each particle spawns a number of children
particles (copies)
• Number of children C(i) related (proportional) to weight.
• May introduce jitter in children to reduce clustering effects.
Delay Distribution Tracking
• Time-varying delay distribution of window size R at time m
Tm , R ( k )
Delay unit
• In each window, R probe
measurements.
time
• Form estimates of
average delay and jitter
over short time intervals
Delay units
Optimal Filtering
• Evolution of parameters described by dynamic model
f ( m , m |  m1 , m1 )
• Observations described by function
g ( ym | m ,  m )
• Interested in forming estimate of:
p( x j ,m | y1:m )   p( x j ,m |  m , ym )  (d m | y1:m )
where
m   m , m .
• Estimate is:
N
ˆp( x j ,m | y1:m )   p( x j ,m | m(i ) ,  m(i ) , ym ) wm(i )
i 1
Dynamic model
• Queue/traffic model:


log  j ,m1  log  j ,m  N (0,  )
2
 j,m
reflected random walk on [0,max_del]
p j ,m (k )  exp(  j ,m k   j ,m )
Probability
Delay units
Observations
• Measurements:

x j ,m ~ p j ,m (k )
 Observe
packet (2) (m)
y ( 2) (m)
y j ,m 
x
s ,m
sPath( j , m )
packet (1) (m)
y (1) (m)
i  1,2
Tracking Algorithm (Particle Filter)
{i ,m ,  i ,m }iL
{i ,m 1 ,  i ,m 1}iL
Estimation of Delay Distributions
• Sequential Monte Carlo Approximation to posterior
mean estimate:
N
pˆ j ,m (k )   p( x j ,m  k | ym , m ,  m ) wm
(i )
(i )
i 1
Message-passing algorithm
Particle weights
• Estimate of time-varying delay distribution:
m
1
Tˆm, R (k ) 
pˆ ( x j ,l | y1:l )

R l  m  R 1
(i )
Analysis
• Complexity:
2
O( LK N )
Average Number
of Unique Links
per measurement
Max. delay
units per link
Number of
Particles
• Convergence analysis of [Crisan, Doucet 01 ] applies.
• The approximation to the posterior mean estimate
converges to the true estimate as N ∞
Simulation Results
Delay Distributions
true
tracking
Mean Delay
time
Tracking shadow prices
• Explicit congestion notification pricing mechanisms
• Price variable maintained at each queue in the network.
• Related to congestion, but not a specific performance
measure (such as loss rate, queuing delay).
• REM (random exponential marking)
• Price p, marking probability m, total link traffic y, target queue
length b* , measured queue length b


*

pt 1  pt     bt  b   yt  c 


mt  1    pt
Tracking shadow prices (2)
• Observations (relatively easy to collect !)
• For one path:
• nt : total traffic along a path defined by a row of routing matrix
A during time period t.
• xt : marked packets along same path.
xt ~ Bi  nt , qt 
qt  1  
 Apt
Summary
Why Dynamic Models/Particle Filtering?
• Dynamic models allow us to account for non-stationarity
but it is difficult to generate and incorporate dynamic models
derived from realistic traffic models
• Particle filtering only appropriate when analytical techniques fail
non-Gaussian or non-linear dynamics or observations
• Sequential structure allows on-line implementation
care must be taken to reduce computation at each step
• Convergence, optimality results available
provided particle filters satisfy fairly mild constraints