Sensor Fusion - Virginia Tech

Sensor Fusion
Rosalyn Moran
Short Course on Bayesian Inference,
Virginia Tech, 26th January 2012
Outline





Bayes Rule for Gaussians
Sensory Integration
Perception as Statistical Inference
Updating over time
A Brief Introduction to the Free Energy Principle
Outline





Bayes Rule for Gaussians
Sensory Integration
Perception as Statistical Inference
Updating over time
A Brief Introduction to the Free Energy Principle
Gaussian Probability Density
new data
p( y |  )
prior knowledge
p( )
p ( | y )  p ( y |  ) p ( )
posterior
 likelihood
∙ prior
Bayes theorem allows one to formally
incorporate prior knowledge into
computing statistical probabilities.
The “posterior” probability of the
parameters given the data is an optimal
combination of prior knowledge and new
data, weighted by their relative precision.
Bayes Rule for Gaussians
Parametric Techniques:
Posit a specific functional form on a
Probability density function
The central limit theorem states that, (under general circumstances),
the mean of M random variables tends to be normally distributed,
in the limit as M tends to infinity
Form specified in terms of adjustable parameters
The mean, μ and variance σ2 x ∼ N(μ,σ2) Univariate,
Case
 ( x   )2 

p( x) 
exp 
2 1/ 2
2
(2 )
2 

1
Bayes Rule for Gaussians
Parametric Techniques:
Posit a specific functional form on a
Probability density function
Form specified in terms of adjustable parameters
The mean, μ (vector) and covariance Σ (matrix)
x2
X ∼ N(μ,Σ) x1
Σ = E[(X-μ)(X-μ)T]
Multivariate,
Case

p( x ) 
1
(2 )

x  [ x1..xn ]T
n/2

1/ 2
 1   T 1   
exp  ( x   )  ( x   ) 
 2

Bayes Rule for Gaussians
Coherence matrix = normalised covariance
1
Parametric Techniques:
0.8
5
Posit a specific functional form on a
Probability density function
0.6
10
0.4
15
0.2
20
Form specified in terms of adjustable parameters
The mean, μ (vector) and covariance Σ (matrix)
-0.2
30
-0.4
-0.6
-0.8
40
E[(X-μ)(X-μ)T]
Multivariate,
Case
25
35
X ∼ N(μ,Σ) Σ=
0
5

p( x ) 
1
(2 )

x  [ x1..xn ]T
n/2

1/ 2
10
15
20
25
30
35
 1   T 1   
exp  ( x   )  ( x   ) 
 2

40
-1
Bayes Rule for Gaussians
Theorem
The Product of two Gaussian PDFs is also a Gaussian
P(x1)
P(x3)
P(x2)
x1
x2
x3
Bayes Rule for Gaussians
Eg. Combining Multi Subject Probabilities via
Bayesian Parameter Averaging for Fixed Effects Analysis of DCM Parameters
Assuming:
Likelihood
distributions
from different
subjects are
independent
individual
posterior
covariances
group
posterior
covariance
C|1y1 ,..., y N 
 | y ,..., y
1
group
posterior
mean
N
N
1
C
  | yi
i 1

 N 1
   C | yi | yi C | y1 ,..., y N
 i 1

individual posterior
covariances and means
Bayes Rule for Gaussians
Theorem
The Product of two Gaussian PDFs is also a Gaussian
P(x1)
P(x3)
P(x2)
x1
x2
When forming a posterior using a likelihood this property is called conjugacy
x3
Bayes Rule for Gaussians
Moments of Posterior can be found analytically:
Precision is inverse variance eg. a variance of 0.1 is a
precision of 10.
For a Gaussian prior with mean, μ0 and precision λ 0,
and a Gaussian likelihood with mean μD and precision λ D
the posterior is Gaussian with
λ= λ0 + λD
μ = (λ 0 / λ) μ0 + (λ D / λ) μD
So, (1) precisions add and (2) the posterior mean is the
sum of the prior and data means, but each weighted by
their relative precision.
Bayes Rule for Gaussians
So, (1) precisions add and (2) the posterior mean is the
sum of the prior and data means, but each weighted by
their relative precision.
Since:
 ( x  0 )2 
,
p( x0 ) 
exp 
2 1/ 2
2
(20 )
2 0 

 ( x  D ) 2 
1

p( D | x) 
exp 
2 1/ 2
2


(2 D )
2
D


p(D | x) p( x0 )
p( x | D) 
p( D)
1
Bayes Rule for Gaussians
So, (1) precisions add and (2) the posterior mean is the
sum of the prior and data means, but each weighted by
their relative precision.
Taking Logs:
log p( x | D)  log p(D | x)  log p( x0 )  log p( D)

  1
 1
log p( x | D)   2 ( x2  d2  2xD )   2 ( x2  02  2x0 )
  2 0
 2 D

Bayes Rule for Gaussians
So, (1) precisions add and (2) the posterior mean is the
sum of the prior and data means, but each weighted by
their relative precision.
Some algebra:
log p( x | D)  log p(D | x)  log p( x0 )  log p( D)

  1 2
 1 2
2
2
log p( x | D)   2 ( x  d  2xD )   2 ( x  0  2x0 )
  2 0
 2 D

2
  x2  1
1   0 D    0
D 
 2  2   x 2  2  

 2 
2

 2   0  D    0  D   2 0 2 D 
 1

  2 ( x   )2 
 2

Bayes Rule for Gaussians
So, (1) precisions add and (2) the posterior mean is the
sum of the prior and data means, but each weighted by
their relative precision.
Matching Moments:
2
  x2  1
D   1
1   0 D    0
2
 2  2   x 2  2  




x

(
)


2
2
2



 2   0  D    0  D   2 0 2 D   2
 1
1 

  2  2 
2
  0  D 
1
Matching Coefficients in x2
λ= λ0 + λD
Bayes Rule for Gaussians
So, (1) precisions add and (2) the posterior mean is the
sum of the prior and data means, but each weighted by
their relative precision.
Matching Moments:
2
  x2  1





0 D  0
1
D   1
2
 2  2   x 2  2  
x
(
)







2
2 
 2 2


2
2
2









0
D 
D  
D 
 0
 0

 2  0 D 
  2  2 
2
 2
 0  D 
0
2 D
   2  2
D
0
2
Matching Coefficients in x
μ = (λ 0 / λ) μ0 + (λ D / λ) μD
Outline





Bayes Rule for Gaussians
Sensory Integration
Perception as Statistical Inference
Updating over time
A Brief Introduction to the Free Energy Principle
Visual and Haptic Sensors
Ernst and Banks (Nature, 2002) asked subjects which of two sequentially presented
blocks was the taller. Subjects used either vision alone, touch alone or a
combination of the two. If vision v and touch h information are independent given
the height x then,
x
p(v,h,x)=p(v|x)p(h|x)p(x)
v
Bayesian fusion of the sensory information
produces a posterior density
p(x|v,h) = [p(v|x)p(h|x)p(x)]/p(v,t)
p(x|v,h) [p(v|x)p(h|x)p(x)]

Then under the uniform prior p(x) = const.
xˆ MAP  xˆ ML  arg max p ( x | v, h)
x
t
Visual and Haptic Sensors
Fusion through different noisy sensory channels:
Bayes optimal Sensory Fusion: A means of estimation (producing the lowest-variance estimate)
is to add the sensor estimates weighted by their normalised reciprocal variance
And add the inverse variances to produce a more precise posterior estimate


xˆ   wi sˆi  wv v  wh h
i
1 /  i2
wi 
 1 /  2j
 v2 h2
  2
 v   h2
2
vh
Is this true for human agents?
Visual and Haptic Sensors
Testing Fusion through different noisy sensory channels:
Probability Taller
1. Measure Unimodal Thresholds
84% Threshold Tv/h = 21/2 σv/h
Point of Subjective Inequality
∆Height
Visual and Haptic Sensors
Testing Fusion through different noisy sensory channels:
2.
Predict Bimodal weighting
Probability Taller
1 /  i2
xˆ   wi sˆi : wi 
i
 1 /  2j
TH2  H2 wv


TV2  V2 wh
∆Height
TH2
Tv2
wv  2
, wH  2
2
TV  TH
TV  TH2
Visual and Haptic Sensors
Testing Fusion through different noisy sensory channels:
3.
Predict Bimodal Threshold
Probability Taller
1 /  i2
xˆ   wi sˆi : wi 
i
 1 /  2j
TH2  H2 wv


TV2  V2 wh
∆Height
TH2
Tv2
wv  2
, wH  2
2
TV  TH
TV  TH2
2
TVH
Tv2TH2
 2
Tv  TH2
Visual and Haptic Sensors
Testing Fusion through different noisy sensory channels:
4. Present Bimodal Stimuli:
Visually and Haptically specified heights differ by ∆
5. Measure TVH
Visual and Haptic Sensors
6. Repeat for Different levels of
Visual Noise
Visual and Haptic Sensors
Tv2
TH2
, wH  2
wv  2
2
TV  TH2
TV  TH
2
Threshold (TVH)
TVH
Noise level (%)
Tv2TH2
 2
Tv  TH2
Outline





Bayes Rule for Gaussians
Sensory Integration
Perception as Statistical Inference
Updating over time
A Brief Introduction to the Free Energy Principle
Helmholtz
Perception as Inference
In Helmholtz’s view our percepts are our best guess as to
what is in the world, given both sensory data and
prior experience. He proposed that perception is
Unconscious inference
Eg.The Helmholtz Machine
Dayan, Hinton, Neal & Zemel, 1995
The Free Energy Principle, Friston
Perception as Inference
Laboratory of Dale Purves MD
http://purveslab.net/seeforyourself/
Perception as Inference
Laboratory of Dale Purves MD
http://purveslab.net/seeforyourself/
Perception as Inference
Bumps and holes
Bayes rule in perception, action and cognition
Wolpert and Ghahramani, 2005
Perception as Inference
Bumps and holes
Flip the image!
P(state|sensory input)=[P(sensory input|state)P(state)]/P(sensory input)
Perception as Inference
Bumps and holes
P(state|sensory input)=[P(sensory input|state)P(state)]/P(sensory input)
State 1 = light source
State 2 = bumps & holes
Implies hierarchy … see later
Outline





Bayes Rule for Gaussians
Sensory Integration
Perception as Statistical Inference
Updating over time
A Brief Introduction to the Free Energy Principle
Updating Prior Beliefs: Decision Making
Dynamics
Yu, Dayan & Cohen, 2009
In the Eriksen Flanker task subjects have to implement
the following stimulus-response mappings
Stimulus
Response
1:
HHH Right
2:
SHS Right
3:
SSS Left
4:
HSH Left
The subject should press the right button if the central cue is H and left if it is S.
On trial type one and three the flankers are compatible (M = C) and on two
and four they are incompatible (M = I).
Decision Making Dynamics
Generative Model:
Given noisy pattern of visual inputs comprising the three
letter stimuli s = [s1, s2, s3] (s1=s3)
On each trial, assume Gaussian neuronal response, x by
three populations
xt : = [x1(t), x2(t), x3(t)]
p(x|s)=p(x1|s1) p(x2|s2) p(x3|s3)
=N [μ(s1), σ2 ] .N [μ(s2), σ2 ] .N [μ(s3), σ2 ]
Yu, Dayan & Cohen, 2009
Decision Making Dynamics
p(x|s)= N [μ(s1), σ2 ] .N [μ(s2), σ2 ] .N [μ(s3), σ2 ]
Some numbers on Sensory Response:
Assume that for stimuli
H :s1= s3 μ(s1)= μ(s3) = -1
S :s1= s3 μ(s1)= μ(s3) = 1
Assume independence in successive observations of the stimulus when it is
on-screen, ie accumulating evidence during perception:
p(xt=1, xt=2 xt=3…xt=T | s)=p(xt=1 |s) p(xt=2 |s) … p(xt=T s)
si
xit=1
xit=2
Time
xit=
3
Yu, Dayan & Cohen, 2009
Decision Making Dynamics
For a stream of inputs, the ideal observer’s belief about the identity of the target s2 and
compatibility M at time t, is a function of the belief at the previous time point, and
the latest input:
P ( s2 , M | X t ) 

p ( xt | s2, M ) P ( s2, M | X t 1 )
s2' M '
p ( xt | s '2, M ' ) P( s '2, M ' | X t 1 )
P ( s2  H | X t )  P ( s2  H , M  C | X t )  P ( s2  H , M  I | X t )
Initialise with
P( s2, M | X o )  (0.5) * (0.5)
Then the agent will update their beliefs based on sensory inputs and make a response
eg.
when P(s2=H|Xt) > q (some threshold probability)
Yu, Dayan & Cohen, 2009
Decision Making Dynamics
P ( s2 , M | X t ) 

p ( xt | s2, M ) P ( s2, M | X t 1 )
s2' M '
p ( xt | s '2, M ' ) P( s '2, M ' | X t 1 )
P ( s2  H | X t )  P ( s2  H , M  C | X t )  P ( s2  H , M  I | X t )
when P(s2=H|Xt) > q (some threshold)
Make a response
Yu, Dayan & Cohen, 2009
Compatibility Bias Hypothesis
What if there was a compatibility bias? P(M = C) > 0.5 ?
Initialise with
P( s2, M | X o )  (0.5) * p ( M  C )
p( M  C )  0.9
Then according to the same updates with these new priors:
1:
HHH Right
2:
SHS Right
3:
SSS Left
4:
HSH Left
P(s2=H|Xt)
Yu, Dayan & Cohen, 2009
Compatibility Bias Hypothesis
What if there was a compatibility bias? P(M = C) > 0.5 ?
Initialise with
P( s2, M | X o )  (0.5) * p ( M  C )
p( M  C )  0.9
Then according to the same updates with these new priors:
1:
HHH Right
2:
SHS Right
3:
SSS Left
4:
HSH Left
P(s2=H|Xt)
Yu, Dayan & Cohen, 2009
Compatibility Bias Hypothesis
Is there a compatibility bias in this task in human agents?
Initialise with
P( s2, M | X o )  (0.5) * p ( M  C )
p( M  C )  0.9
P(s2=H|Xt)
Yu, Dayan & Cohen, 2009
Spatial Uncertainty Hypothesis
What if each x also responded to neighbouring stimuli?
Before:
p(x|s)= N [μ(s1), σ2 ] .N [μ(s2), σ2 ] .N [μ(s3), σ2 ]
Now:
x1(t) ~ N [a1μ1 + a2μ2, σ12 + σ22]
x2(t) ~ N [a1μ2 + a2μ1 + a2μ3 , σ12 + 2σ22]
x3(t) ~ N [a1μ3 + a2μ2 , σ12 + σ22]
If x is driven equally by all stimuli; a1 = a2, then no spatial discrimination:
H/S response based on majority vote; ie. the flankers
P ( s2 , M | X t ) 
Initialise with
Some discrimination
ability

a1 : primary stimulus signal
a2 : neighbouring stimulus signal
σ1 : primary stimulus noise
σ2 : neighbouring stimulus noise
p ( xt | s2, M ) P ( s2, M | X t 1 )
s 2' M '
p ( xt | s '2, M ' ) P( s '2, M ' | X t 1 )
a1  1.7; a2  0.3;  1  6;  2  3.5;   0.5,   0.03, q  0.9
Spatial Uncertainty Hypothesis
P ( s2 , M | X t ) 

p( xt | s2, M ) P( s2, M | X t 1 )
s 2' M '
p ( xt | s '2, M ' ) P( s '2, M ' | X t 1 )
Initialised with
a1  1.7; a2  0.3;  1  6;  2  3.5;   0.5,   0.03, q  0.9
Vs No Spatial Uncertainty
Yu, Dayan & Cohen, 2009
Outline





Bayes Rule for Gaussians
Sensory Integration
Perception as Statistical Inference
Updating over time
A Brief Introduction to the Free Energy Principle
Review
Nature Reviews Neuroscience 11, 127-138 (February 2010)
The free-energy principle: a unified brain theory?
Karl Friston

A free-energy principle has been proposed recently that accounts for
action, perception and learning.This Review looks at some key brain
theories in the biological (for example, neural Darwinism) and physical
(for example, information theory and optimal control theory) sciences
from the free-energy perspective. Crucially, one key theme runs through
each of these theories — optimization. Furthermore, if we look closely at
what is optimized, the same quantity keeps emerging, namely value
(expected reward, expected utility) or its complement, surprise
(prediction error, expected cost).This is the quantity that is optimized
under the free-energy principle, which suggests that several global brain
theories might be unified within a free-energy framework
... in the sensorium
There was a
particular
sound
The sound has dynamics
determined by properties,
Frequency, x1 and
Amplitude x2
~
y
Sensations
The Tautology : Biological agents act/perceive
to preclude phase transitions: minimise entropy of y
… Why do we not walk into fire?
There was a
particular
sound
 ln( p( ~
y ))
min
The sound has
dynamics
determined by
properties,
Frequency and
Amplitude
~
y
Sensations
Predictions: DEM inversion of HDM
EEG Responses
DCM for ERPs
The Tautology : Biological agents act/perceive
to preclude phase transitions: minimise entropy of y
… Why do we not walk into fire?
There was a
particular
sound
F   ln( p( ~
y m))
The sound has
dynamics
determined by
properties,
Frequency and
Amplitude
~
y
Sensations
Predictions: DEM inversion of HDM
EEG Responses
DCM for ERPs
The Tautology : Biological agents act/perceive
to preclude phase transitions: minimise entropy of y
…What does this tell us about the brain?
There was a
particular
sound
The sound has
dynamics
determined by
properties,
Frequency and
Amplitude
F   ln( p( ~
y m))
F   ln( p( ~
y m))  D(q( ) p( ~
y ))
This implies
A generative model, m
~
y
Sensations
An ensemble/recognition density, q
The probability of an environmental
state given the brain state
The Tautology : Biological agents act/perceive
to preclude phase transitions: minimise entropy of y
…What does this tell us about the brain?
There was a
particular
sound
F   ln( p( ~
y m))
F   ln( p( ~
y m))  D(q( ) p( ~
y ))
min
The sound has
dynamics
determined by
properties,
Frequency and
Amplitude
This implies
A generative model, m
~
y
Sensations
An ensemble/recognition density, q
The probability of an environmental
state given the brain state
Perception: Minimising divergence between
recognition density and posterior probability
of causes, move towards the bound
The Tautology : Biological agents act/perceive
to preclude phase transitions: minimise entropy of y
…What does this tell us about the brain?
There was a
particular
sound
F   ln( p( ~
y m))
F   ln( p( ~
y m))  D(q( ) p( ~
y ))
min
The sound has
dynamics
determined by
properties,
Frequency and
Amplitude
This implies
Action: Resampling
the environment
A generative model, m
~
y
Sensations
An ensemble/recognition density, q
The probability of an environmental
state given the brain state
Perception: Minimising divergence between
recognition density and posterior probability
of causes, move towards the bound
The Tautology : Biological agents act/perceive
to preclude phase transitions: minimise entropy of y
…What does this tell us about the brain?
There was a
particular
sound
F   ln( p( ~
y m))
F   ln( p( ~
y m))  D(q( ) p( ~
y ))
min
The sound has
dynamics
determined by
properties,
Frequency and
Amplitude
This implies
A generative model, m
~
y
Sensations
An ensemble/recognition density, q
The probability of an environmental
state given the brain state
Perception: Minimising divergence between
recognition density and posterior probability
of causes, move towards the bound
The Tautology : Biological agents act/perceive
to preclude phase transitions: minimise entropy of y
…What does this tell us about the brain?
There was a
particular
sound
F   ln( p( ~
y m))
F   ln( p( ~
y m))  D(q( ) p( ~
y ))
min
The sound has
dynamics
determined by
properties,
Frequency and
Amplitude
This implies
1.
A generative model, m
2.
~
y
Sensations
An ensemble/recognition density, q
The probability of an environmental
state given the brain state
3.
Perception: Minimising divergence between
recognition density and posterior probability
of causes, move towards the bound
… in the sensorium
Hierarchies and dynamics
~
y
Sensations
1. Generative Model:
Hierarchical Dynamical Model in Generalised coordinates
1. Generative Model:
… in the sensorium
Hierarchical Dynamical Model in Generalised coordinates
Hierarchies and dynamics
v~ m    z m 1
.
.
.
(
i
)

~ (i ) ~ (i ) ~ (i )
~
x  f (~
x ,v )  w
v~ i 1  g~ ( ~
x ( i ) , v~ ( i ) )  ~
z (i )
~
y
Sensations
.
.
.
 (1)
~ (1) ~ (1) ~ (1)
~
x  f (~
x ,v )  w
~
y  g~ ( ~
x (1) , v~ (1) )  ~
z (1)
The prediction at the
ith level
is a prior at the level
below
… These causes are
Probabilistic ….many to
one mappings
“four candles” vs “fork handles”
~
y
Sensations
1. Generative Model:
HDM: Probabilistic
… These causes are
Probabilistic ….many to
one mappings
1. Generative Model:
HDM: Probabilistic
(1) ~ (1)
(1) ~ (1)
~
~
~
~
p( y , x , v )  p( y x , v ) p( ~
x (1) , v~ (1) )
~
(1) ~ (1)
~
~
p( y x , v )  N ( ~
y : g~,  1 ( ) z )
p( ~
x (1) , v~ (1) )  p ( ~
x (1) v~ (1) ) p(v~ (1) )
(1) ~ (1)
(1) ~ ~ 1
~
~
p( x v )  N ( Dx : f ,  ( ) w )
~
y
Sensations
~
p(v~ (1) )  N (v~ (1) :  ,  v )
Gaussian noise accounting for higher order correlations within levels
Hypothesis
Model Inversion in Brain Dynamics
(tomorrow)
~
y
Sensations
Summary






Conjugacy of Gaussian Prior and Gaussian Likelihood,
allow simple updates of sufficient statistics
Maximum Likelihood equivalent under uniform priors
Sensory fusion through ML estimates observed
experimentally
Evidence accumulation models available through Bayes
rule
Parameter priors can dramatically change updating
behaviour
Similar behaviours can be encapsulated by different
models per se – coming up….