Line fitting example Consider the fitting of a straight line ym = ax + b

Line fitting example
Consider the fitting of a straight line
ym = ax + b
to data D = {(xi , yi ), i = 1, . . . , N }.
Consider an (improper) uninformative prior
⇡(a, b) = Const
providing no prior information on (a, b).
Assume iid additive unbiased Gaussian noise in y with a given
constant noise variance 2 , thus the data model is:
y = ax + b + ✏,
✏ ⇠ N (0,
with no noise in the independent variable x.
2
)
Line fitting example
Presuming
known, we have the likelihood,
L(a, b) = p(D|a, b) =
N
Y
i=1
where
p(yi |a, b) = p
1
2⇡
exp
✓
(yi
p(yi |a, b)
axi
2 2
b)2
◆
and, per Bayes formula, the posterior density p(a, b|D) is
p(a, b|D) =
p(D|a, b)⇡(a, b)
/ p(D|a, b)⇡(a, b)
p(D)
Hyperparameter
If
is unknown, it has to be inferred from the data.
Infer ln
so that
is strictly positive by construction
L(a, b, ln ) = p(D|a, b, ln ) =
N
Y
i=1
where
1
p(yi |a, b, ) = p
2⇡
exp
✓
(yi
p(yi |a, b, )
axi
2 2
b)2
◆
Then, using an uninformative prior ⇡(a, b, ln ) = Const, the
posterior on (a, b, ln ) is the tri-variate density
p(a, b, ln |D) / (2⇡
2
)
N/2
exp
N
X
(yi
i=1
axi
2 2
b)2
!
Line fitting example – Effect of data size on p(a, b|D)
Low data noise: = 0.25
- 1.10
20
18
16
14
12
85
6
4
2
0
- 100
300
250
200
150
1.50
05
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
-5
0. 5
1
1. 5
N = 20
2
2. 5
3
-5
0. 5
1
1. 5
2
N = 200
More data ) more accurate parameter estimates
2. 5
3
Line fitting example – Effect of data size on p(a, b|D)
Medium data noise: = 0.5
- 1. 0
6
5
4
3
2
1
- 1.10
70
60
50
40
30
20
05
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
-5
0. 5
1
1. 5
N = 20
2
2. 5
3
-5
0. 5
1
1. 5
2
N = 200
More data ) more accurate parameter estimates
Higher noise amplitude ) higher uncertainty
2. 5
3
Line fitting example – Effect of data size on p(a, b|D)
High data noise: = 1.0
- 1.
0. 4
05
6
1
8
2
- 1.10
18
16
14
12
85
6
4
2
0
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
-5
0. 5
1
1. 5
N = 20
2
2. 5
3
-5
0. 5
1
1. 5
2
N = 200
More data ) more accurate parameter estimates
Higher noise amplitude ) higher uncertainty
2. 5
3
Bayesian inference illustration: noise" ) uncertainty"
data: y = 2x2
✏ ⇠ N (0,
2 ),
3x + 5 + ✏
= {0.1, 0.5, 1.0}
Fit model y = ax2 + bx + c
Marginal posterior density p(a, c):
Line fitting example – Effect of data range on p(a, b|D)
Medium data noise: = 0.5
- 1. 0
6
5
4
3
2
1
- 1. 0
65
5
4
3
2
1
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
-5
0. 5
1
1. 5
2
x 2 [ 2, 0]
2. 5
3
-5
0. 5
1
1. 5
2
2. 5
x 2 [0, 2]
Posterior correlation structure depends on subjective
details of the experiment
3
Illustration: Data range ) correlation structure
data: y = 2x2
3x + 5 + ✏
✏ ⇠ N (0, 0.04)
ranges:
x 2 {[ 2, 0], [ 1, 1], [0, 2]}
Fit model y = ax2 + bx + c
Marginal posterior density p(b, c):
Line fitting – Effect of data realization on p(a, b|D)
Medium data noise: = 0.5
- 1. 0
6
5
4
3
2
1
- 1. 0
65
5
4
3
2
1
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
-5
0. 5
1
1. 5
2
2. 5
3
-5
0. 5
1
1. 5
2
Posterior depends on specific measured data set
Two data sets, each with N = 20
2. 5
3
Bayesian illustration: Data realization ) posterior
data: y = 2x2
✏ ⇠ N (0, 1)
3x + 5 + ✏
3 different random seeds
Fit model y = ax2 + bx + c
Marginal posterior density p(b, c):
Effect of Prior
Consider next an informative prior
a ⇠ N (µa ,
ln
b ⇠ N (µb ,
2
a)
2
b)
⇠ U (s` , sh )
such that, with
⇡(ln ) =
(
1/(sh
0
s` )
for s` < ln
otherwise
< sh
we have
⇡(a, b, ln )
=
=
⇡(a)⇡(b)⇡(ln )
✓
◆
1
(a µa )2
1
p
p
exp
2
2 a
2⇡ a
2⇡
exp
b
✓
µb ) 2
(b
2
2
b
◆
⇡(ln )
Effect of prior – cont’d
Then, with ✓ = (a, b)T ,
⇡(✓, ln ) =
=
1
2⇡
exp
a b
1
2⇡ |
pr |
1/2
✓
(✓1
µa ) 2
2 a2
✓
1
exp
(✓ µ✓ )T
2
where µ✓ = (µa , µb )T , and
given by
pr
µb ) 2
(✓2
2
2
b
1
pr (✓
◆
⇡(ln )
◆
µ✓ ) ⇡(ln )
pr
is the prior covariance matrix
=

2
a
0
0
2
b
Effect of prior – cont’d
Then, the posterior is given by
p(✓, ln |D) / (2⇡)
⇥ (2⇡)
N/2
1
|
|
obs |
pr |
1/2
1/2
exp
exp
✓
✓
1
(✓
2
1
(y
2
G✓)
µ✓ ) T
and the marginal posterior on (a, b), being again
Z
p(✓|D) = p(✓, ln |D)d ln
T
1
pr (✓
1
obs (y
◆
G✓)
◆
µ✓ ) ⇡(ln )
Line fitting example – prior vs. data-size
20 data points
0. 2
1.
0
1
4
6
8
- 1. 5
- 1. 5
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
- 5100
90
80
70
60
50
40
30
20
10
0
-5
0 0. 5 1 1. 5 2 2. 5 3 3. 5
0 0. 5 1 1. 5 2 2. 5 3 3. 5
Constant uninformative prior
Gaussian prior
Line fitting example – prior vs. data-size
80 data points
8
7
6
5
4
3
2
1
0
- 1. 5
- 1. 5
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
- 5100
90
80
70
60
50
40
30
20
10
0
-5
0 0. 5 1 1. 5 2 2. 5 3 3. 5
0 0. 5 1 1. 5 2 2. 5 3 3. 5
Constant uninformative prior
Gaussian prior
Line fitting example – prior vs. data-size
200 data points
20
18
16
14
12
10
8
6
4
2
0
- 1. 5
- 1. 5
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
- 5100
90
80
70
60
50
40
30
20
10
0
-5
0 0. 5 1 1. 5 2 2. 5 3 3. 5
0 0. 5 1 1. 5 2 2. 5 3 3. 5
Constant uninformative prior
Gaussian prior
Line fitting example – prior vs. data-size
2000 data points
180
160
140
120
100
80
60
40
20
0
- 1. 5
- 1. 5
-2
-2
- 2. 5
- 2. 5
-3
-3
- 3. 5
- 3. 5
-4
-4
- 4. 5
- 4. 5
- 5100
200
180
160
140
120
80
60
40
20
0
-5
0 0. 5 1 1. 5 2 2. 5 3 3. 5
0 0. 5 1 1. 5 2 2. 5 3 3. 5
Constant uninformative prior
Gaussian prior