complete pdf document - Institut de Physique du Globe de Paris

Individual Sections of the Book
Inverse Problems: Exercices
With mathematica, matlab, and scilab solutions
Albert Tarantola1
Université de Paris, Institut de Physique du Globe
4, place Jussieu; 75005 Paris; France
E-mail: [email protected]
March 12, 2007
1©
A. Tarantola, 2006. Students and professors are invited to freely use this text.
86
4.2
Linear Least Squares
Ray Tomography Using Blocks
Executable notebook at
http://www.ipgp.jussieu.fr/~tarantola/exercices/chapter_04/RayTomography.nb
This exercise corresponds to a highly idealized version of an X-ray tomography experiment. A 2D medium is characterized by a parameter m( x, y) whose physical dimension is
the inverse of a length. This parameter may take any real value (including negative values).
Assume that when a ray Ri is materialized in the medium (using a source and a receiver),
we are able to measure the observable parameter
i
d =
Z
Ri
d` m( x, y)
,
(4.1)
where ` is the length along the ray. The goal of the exercise is to use some observed values
i
dobs
to infer the values of the function m( x, y) .
11
10
09
08
12
07
13
06
14
05
15
04
16
03
17
02
18
19
20
21
22
01
01 02 03 04
05 06 07 08
09 10 11 12
13 14 15 16
Figure 4.1: Geometry of the ‘X-ray experiment’. This drawing is at scale 1:1 (the length of the
side of the blocks is 1 cm ).
To simplify the problem, the medium is divided into blocks, numbered from 1 to 16
(see figure 5.6), so, instead of evaluating the function m( x, y) , we only need to evaluate
4.2 Ray Tomography Using Blocks
87
the discrete values {m1 , m2 , . . . , m16 } . With this discretization, the relation (4.1) becomes
di =
α=16
∑
Gi α mα
,
(4.2)
α =1
where, as we have seen in the last lesson, Gi α is the length of the ray i in the block α . We
shall use 22 rays, so the index i runs from 1 to 22. For short, equation (4.2) shall be written
d = Gm
.
(4.3)
With the geometry of the 16 blocks and the 22 rays represented in figure 5.6, it is easy to see
that the matrix G is (only the nonzero elements are indicated)


− − − − − − − − − − − − − − − a
 − − − − − − − − − − − a − − a − 
 − − − − − − − a − − a − − a − − 
 − − − a − − a − − a − − a − − − 
 − − a − − a − − a − − − − − − − 


 − a − − a − − − − − − − − − − − 
 a − − − − − − − − − − − − − − − 
 − − − b − − − b − − − b − − − b 


 − − b − − − b − − − b − − − b − 
 − b − − − b − − − b − − − b − − 


G =  b − − − b − − − b − − − b − − −  ,
(4.4)
 − − − a − − − − − − − − − − − − 
 − − a − − − − a − − − − − − − − 
 − a − − − − a − − − − a − − − − 


 a − − − − a − − − − a − − − − a 
 − − − − a − − − − a − − − − a − 
 − − − − − − − − a − − − − a − − 
 − − − − − − − − − − − − a − − − 


 b b b b − − − − − − − − − − − − 
 − − − − b b b b − − − − − − − − 
− − − − − − − − b b b b − − − −
− − − − − − − − − − − − b b b b
√
with a = 2 cm and b = 1 cm .
In a real-life application, the components of the matrix G should be automatically computed using the positions of the end-points of the ray, but —as writing the associated code is
tedious— in this simple exercise, we can enter the components “by hand” as follows1
(* The matrix G in the d = G m relation *)
G = Table [0 , {i ,1 ,22} , {j ,1 ,16}];
sq2 = Sqrt [2];
G [[01 ,16]] = sq2 ;
G [[02 ,12]] = sq2 ; G [[02 ,15]] = sq2 ;
G [[03 ,08]] = sq2 ; G [[03 ,11]] = sq2 ; G [[03 ,14]] = sq2 ;
G [[04 ,04]] = sq2 ; G [[04 ,07]] = sq2 ; G [[04 ,10]] = sq2 ; G [[04 ,13]] = sq2 ;
G [[05 ,03]] = sq2 ; G [[05 ,06]] = sq2 ; G [[05 ,09]] = sq2 ;
G [[06 ,02]] = sq2 ; G [[06 ,05]] = sq2 ;
G [[07 ,01]] = sq2 ;
G [[08 ,04]] = 1; G [[08 ,08]] = 1; G [[08 ,12]] = 1; G [[08 ,16]] = 1;
G [[09 ,03]] = 1; G [[09 ,07]] = 1; G [[09 ,11]] = 1; G [[09 ,15]] = 1;
G [[10 ,02]] = 1; G [[10 ,06]] = 1; G [[10 ,10]] = 1; G [[10 ,14]] = 1;
1 The units are assumed by default to always be in centimeters, and are not input into the code (see remark in
footnote XXX).
88
Linear Least Squares
G [[11 ,01]]
G [[12 ,04]]
G [[13 ,03]]
G [[14 ,02]]
G [[15 ,01]]
G [[16 ,05]]
G [[17 ,09]]
G [[18 ,13]]
G [[19 ,01]]
G [[20 ,05]]
G [[21 ,09]]
G [[22 ,13]]
=
=
=
=
=
=
=
=
=
=
=
=
1; G [[11 ,05]] =
sq2 ;
sq2 ; G [[13 ,08]]
sq2 ; G [[14 ,07]]
sq2 ; G [[15 ,06]]
sq2 ; G [[16 ,10]]
sq2 ; G [[17 ,14]]
sq2 ;
1; G [[19 ,02]] =
1; G [[20 ,06]] =
1; G [[21 ,10]] =
1; G [[22 ,14]] =
1; G [[11 ,09]] = 1; G [[11 ,13]] = 1;
=
=
=
=
=
sq2 ;
sq2 ; G [[14 ,12]] = sq2 ;
sq2 ; G [[15 ,11]] = sq2 ; G [[15 ,16]] = sq2 ;
sq2 ; G [[16 ,15]] = sq2 ;
sq2 ;
1; G [[19 ,03]]
1; G [[20 ,07]]
1; G [[21 ,11]]
1; G [[22 ,15]]
=
=
=
=
1; G [[19 ,04]]
1; G [[20 ,08]]
1; G [[21 ,12]]
1; G [[22 ,16]]
=
=
=
=
1;
1;
1;
1;
Let us assume that we have the a priori information that, for all the blocks,
m = 5 cm-1 ± 3 cm-1
.
(4.5)
As we are thinking in using the least-squares theory, we put this a priori information into the
form of an a priori model
mprior = ( 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 )t
(in cm-1 units)
,
(4.6)
and an a priori covariance matrix
Cprior = s2m I
,
(4.7)
where, interpreting the symbol ± in equation 4.5 as twice the standard deviation of a Gaussian distribution, we take
sm = 1.5 cm-1 .
(4.8)
We take a diagonal covariance matrix because we are given no information about some other
possible probabilistic model that would better represent the a priori information.
At this point, we have the choice to keep in mind the vector mprior and the matrix Cprior ,
and proceed directly to the examination of the measurements, or, alternatively, we may pause
to examine exactly which kid of a priori information have we introduced.
Least-squares formulas are only interesting when all uncertainties are of the Gaussian
type. This means the we are, in fact, assuming that we have an a priori probability density
in the model space that is a Gaussian distribution, with mean mprior and covariance Cprior .
The best we can do to grasp exactly the a priori information we are using is to generate a few
random models from such a distribution. After introducing mprior and Cprior
(* A priori information *)
mprior = {5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5.} ;
sm = 1.5 ;
Cprior = sm ^2 IdentityMatrix [16] ;
we perform the Cholesky decomposition of the prior covariance matrix (in this case, this is
trivial, as Cprior is diagonal)
(* Square root of the covariance matrix *)
(* One has cov = L . Transpose [ L ] *)
Lprior = Transpose [ CholeskyDecomposition [ Cprior ]] ;
4.2 Ray Tomography Using Blocks
89
With the matrix L at hand, we generate nine pseudo-random realizations of the prior (Gaussian) distribution as explained in section 1.7.2:
(* Generation of random models ( prior distribution ) *)
SeedRandom [1]
rnd := Sqrt [2] InverseErf [ Random [ Real , { -1 , 1}]]
Do [ {
mrandom0 = Table [ rnd , {16}] ,
mrandom = mprior + Lprior . mrandom0 ,
PLOT [ mrandom ]
} , {i ,1 ,9} ]
Using the color scale in figure 4.2, the nine models so obtained are displayed in figure 4.3
Figure 4.2: The scale used for the model parameter values. All models represented in this section use this scale.
2 cm-1
9 cm-1
Figure 4.3: Least-squares formulas are only interesting when all uncertainties are of Gaussian nature.
This is true, in particular, for the prior distribution
in the model space. This prior distribution is completely described by a model vector mprior and a covariance matrix Cprior . At the right, nine pseudorandom samples of the prior Gaussian distribution.
To set the exercise, let us now generate “artificial data”, to be later use as “observations”.
To do that, let us invent an arbitrary model
mtrue = ( 7 , 3 , 3 , 3 , 7 , 3 , 5 , 3 , 7 , 3 , 3 , 3 , 7 , 3 , 5 , 5 )t
(in cm-1 units)
.
(4.9)
Using the scale in figure 4.2, this “true model” can be represented as in figure 4.4 The reader
Figure 4.4: The “true model” invented for artificially generating some “data”,
to be later used in the inversion exercice.
may perhaps read the two letters “IP” in this model. This shows that I am violating the
assumption that the true model is a random model from the prior Gaussian distribution, i.e.,
one model that could be amid models like those in figure 4.3, while it is extremely unlikely
that one of those models shows any “readable letter”. So, when using this “true model” we
are certainly violating the a priori assumption. But the least-squares method is so practical
90
Linear Least Squares
that we often use it, even when we know that other, more expensive, methods should be
used instead. So, this (mild) violation of the a priori assumptions is part of the exercise.
Computing the artificial data associated to the model mtrue just amounts to implement
the computation dtrue = G mtrue . These true values can then be converted into pseudoobserved values by adding some noise, here independent random Gaussian values with
zero mean and with standard deviation
sd = 0.15 cm-1
.
(4.10)
When implementing this,
(* Calculating " true data " , then adding Gaussian random errors *)
dtrue = G . mtrue ;
dobs = dtrue ;
SeedRandom [123]
rnd := Sqrt [2] InverseErf [ Random [ Real , { -1 , 1}]]
sd = 0.15;
Do [ dobs [[ i ]] = dobs [[ i ]] + sd rnd , {i , 1 , 22} ]
we obtain the 22 values
dobs = ( 6.93 , 11.29 , 12.89 , 25.72 , 18.28 , 14.09 , 10.06 , 14.05 , 15.91 , 12.07 , 28.14 ,
4.21 , 8.32 , 15.52 , 25.50 , 21.10 , 14.12 , 9.98 , 15.95 , 17.89 , 16.18 , 20.07)t
.
(4.11)
This vector of “observed values” is accompanied by the 22 × 22 diagonal2 covariance matrix
Cobs = s2d I
,
(4.12)
that describes our “experimental” uncertainties. The value of sd is that in equation 4.10.
As explained in the introduction to this chapter, in least-squares theory, one demonstrates
that the posterior probability density for the model parameters is a Gaussian, whose center
is given by any of the three equivalent equations
-1
-1
t -1
-1
mpost = ( Gt C-1
obs G + Cprior ) ( G Cobs dobs + Cprior mprior )
-1
-1 t -1
= mprior + ( Gt C-1
obs G + Cprior ) G Cobs ( dobs − G mprior )
(4.13)
= mprior + Cprior Gt ( G Cprior Gt + Cobs )-1 (dobs − G mprior ) ,
and whose covariance is given by any of the two equivalent expressions
-1
-1
Cpost = ( Gt C-1
obs G + Cprior )
= Cprior − Cprior Gt ( G Cprior Gt + Cobs )-1 G Cprior
(4.14)
.
In this small-scale problem there is not much computational difference between these equivalent expressions, so let us use for mpost the expression
-1
-1
t -1
-1
mpost = ( Gt C-1
obs G + Cprior ) ( G Cobs dobs + Cprior mprior )
2 The
,
matrix is diagonal because the noise values generated are independent from each other.
(4.15)
4.2 Ray Tomography Using Blocks
91
and for Cpost the expression
-1
-1
Cpost = ( Gt C-1
obs G + Cprior )
.
(4.16)
After introducing the vector dobs and the matrix Cobs ,
dobs = {6.93 , 11.29 , 12.88 , 25.65 , 18.28 , 14.09 , 10.05 , 14.06 , 15.90 ,
12.07 , 28.13 , 4.20 , 8.33 , 15.51 , 25.51 , 21.09 , 14.11 , 9.99 ,
15.94 , 17.89 , 16.16 , 20.08} ;
Cobs = sd ^2 IdentityMatrix [22] ;
we compute Cpost and mpost using the commands
Cpost = Inverse [ Transpose [ G ]. Inverse [ Cobs ]. G + Inverse [ Cprior ] ] ;
mpost = Cpost .( Transpose [ G ]. Inverse [ Cobs ]. dobs
+ Inverse [ Cprior ]. mprior )
Impatient people may now rush to look the the model mpost , and to examine the standard deviations and correlations in Cpost , but I prefer we proceed orderly. The philosophy
of our approach is to pass from the prior Gaussian distribution in the model space to the posterior Gaussian distribution. In the same way we obtained samples of the prior distribution
(figure 4.3), the commands3
(* Generation of random models ( posterior distribution ) *)
Cpost = (1/2) ( Cpost + Transpose [ Cpost ]) ;
Lpost = Transpose [ CholeskyDecomposition [ Cpost ]];
SeedRandom [1]
rnd := Sqrt [2] InverseErf [ Random [ Real , { -1 , 1}]]
Do [{
mrandom0 = Table [ rnd , {16}] ,
mrandom = mpost + Lpost . mrandom0 ,
PLOT [ mrandom ]
} , {i , 1 , 9}]
produce the 9 models displayed in figure 4.5. This figure gives a quite clear idea of the
resolution allowed by our data. The expert eye can directly guess what the posterior uncertainties and the posterior correlations are (and, of course, a larger number of posterior
realization would allow to compute these with great accuracy).
The average of a large enough set of posterior sample would be prior , and an evaluation
of the statistical covariances would give Cpost . But they are already available to us, so we
only need to plot them. The vector mpost (the output of the least-squares formulas) is plotted
in figure 4.6.
To examine the matrix Cpost we can first compute the associated standard deviations,
that are the square roots of the diagonal elements,
p
σi =
Cii .
(4.17)
Using the color scale in figure 4.7 the posterior uncertainties can be represented as in figure 4.8.
3 Although the matrix C
post is, in principe symmetric, rounding errors may cause the Cholesky decomposition to work. For this reason, it is better to “resymmetrize” it.
92
Linear Least Squares
Figure 4.5: Nine pseudorandom models that are samples of the posterior Gaussian distribution in the
model space obtained used the data of 22 rays.
Figure 4.6: Mean posterior model mpost . This is the output of the leastsquares formulas. We would also obtain this model if taking the average of a
large number of models like those in figure 4.5.
Figure 4.7: The scale used for the model parameter uncertainties.
The same scale is used for all the display of uncertainties in this section.
0.08 cm-1
Figure 4.8: The posterior uncertainties, obtained as the square root of the diagonal elements of the matrix Cpost . They could also be obtained as the statistical standard deviations from a large number of models like those in figure 4.5.
Note that the uncertainties in the blocks that are in the corner and in the center are smaller than those of the other blocks, due to the special geometry of
the rays: the blocks at the corners are well resolved because there is, for each
block, one ray that only explores the block. As a consequence, the central
blocks also are well resolved (because of the existence of the diagonal rays).
1.5 cm-1
4.2 Ray Tomography Using Blocks
93
The off-diagonal elements of the posterior covariance matrix Cpost are better analyzed
by introducing the correlations
Cij
ρij =
(4.18)
σi σj
whose values satisfy the constraints
−1 ≤ ρij ≤ +1 ,
(4.19)
with the value +1 meaning perfect correlation, and the value −1 meaning perfect anticorrelation. Using the color scale in figure 4.9, the correlations can be displayed as in figure 4.10.
Figure 4.9: The scale used for the correlation values. The value +1
means perfect correlation, and the value −1 means perfect anticorrelation.
-1
+1
It is not of much interest to represent a crude 16 × 16 matrix. Rather, I have plotted a 4 × 4
Figure 4.10: Posterior correlations. This
4 × 4 array corresponds to the 4 × 4
blocks in the model: at position (i, j) of
this array, there are the correlations of the
posterior value (i, j) with all other posterior values. As an example, the green
cross corresponds to the correlation between the value for block (1, 4) and itself (necessarily +1 ), while the red cross
corresponds to the correlation between
the value for block (1, 4) and the value
for block (3, 4) . Although these correlations have been obtained from the matrix Cpost , they could have been obtained
as the statistical correlations from a large
number of models like those in figure 4.5.
array of 4 × 4 matrices, each array representing the correlation between the uncertainty of
a given block with all other blocks. In each submatrix, the correlation of the block to which
the submatrix is associated with the block itself is always 1, per definition. So, what matters
are the other correlations. They are quite large in absolute value, this being due to the fact
that we only have ‘22 data for 16 unknowns’. The patterns of covariances in the blocks that
habe large posterior uncertainties is particularly interesting, with correlations both, close to
+1 and close to −1 . Once again, this results from the particular geometry of the rays. One
may anticipate that, adding more and more data, with paths that are not only the horizontalvertical-diagonal of this exercise, would make (i) the uncertainties tend to zero, and (ii) the
correlations between distinct blocks tend to zero.
To further explore this exercise, let us suppress the data corresponding to the rays 07,
11, 15, and 19, solve the problem again. When the data corresponding to these four rays is
94
Linear Least Squares
suppressed, one is left with the geometry indicated in figure 4.11.
Figure 4.11: Geometry of the ‘X-ray experiment’ when the rays 07,
11, 15, and 19, are suppressed. Figure at scale 1:3.
There are two ways of
01 02 03 04
05 06 07 08
09 10 11 12
13 14 15 16
suppressing these data: setting a new problem with less rays (redefining the matrix G , suppressing 4 of its rows) or artificially putting enormous uncertainties on the data associated
to these rays. In the accompanying computer code both possibilities are implemented, and
they give practically undistinguishable results.
To implement the second possibility we only need to replace a few of the lines in the
previous code, those defining the observations:
(* The observations again , four of the data with huge uncertainties *)
dobs = {6.93 , 11.29 , 12.88 , 25.65 , 18.28 , 14.09 , 100. , 14.06 , 15.90 ,
12.07 , 100. , 4.20 , 8.33 , 15.51 , 100. , 21.09 , 14.11 , 9.99 ,
100. , 17.89 , 16.16 , 20.08} ;
Cobs = sd ^2 IdentityMatrix [22] ;
Cobs [[07 , 07]] = 100000000 ; Cobs [[11 , 11]] = 100000000 ;
Cobs [[15 , 15]] = 100000000 ; Cobs [[19 , 19]] = 100000000 ;
The data corresponding to the missing rays has been set to an arbitrary value (100) and the
associated standard deviations has been set to a large value (10 000).
Note: explain here that we obtain the posterior models are displayed in figure 4.12, the
mean posterior model is displayed in figure 4.13, the posterior uncertainties are in figure 4.14,
and the posterior correlations in figure 4.15. When comparing the two figures 4.13 and 4.14
Figure 4.12: Posterior models.
to the two figures 4.6 and 4.8, we observe the following. The posterior model is similar,
excepted for the block (1, 1) , that now takes the a priori value ( 5 cm-1 ), with a posterior
4.2 Ray Tomography Using Blocks
Figure 4.13: Mean posterior model.
Figure 4.14: Posterior uncertainties.
Figure 4.15: Posterior correlations.
95
96
Linear Least Squares
uncertainty equal to the prior one ( 1.5 cm-1 ). This is because the suppression of the four
rays makes that the block (1, 1) is not explored by any ray. The least-squares solution in
this case gives a consistent a posteriori value for this block, identical to the a priori value,
with a posterior uncertainty identical to the prior one. In addition we see in the central panel
that the posterior uncertainty for this block is also identical to the prior one, as it should.
Note also, in figure 4.15, that the uncertainty of this block is completely uncorrelated with
the uncertainties of the other blocks.
As a further example, let us now suppress the data corresponding to the rays 02, 05, 08,
10, 13, 16, 19, and 22. When the data corresponding to these eight rays is suppressed, one
is left with the geometry indicated in figure 4.16. As above, we simply replace a few of the
Figure 4.16: Geometry of the ‘X-ray experiment’ when the rays 02,
05, 08, 10, 13, 16, 19, and 22, are suppressed. Figure at scale 1:3.
01 02 03 04
05 06 07 08
09 10 11 12
13 14 15 16
lines in the previous code, those defining the observations, simulating the absence of data as
data with huge uncertainties:
(* The observations again , eight data with huge uncertainties *)
dobs = {6.93 , 100. , 12.88 , 25.65 , 100. , 14.09 , 10.05 , 100. , 15.90 ,
100. , 28.13 , 4.20 , 100. , 15.51 , 25.51 , 100. , 14.11 , 9.99 ,
100. , 17.89 , 16.16 , 100.} ;
Cobs = sd ^2 IdentityMatrix [22] ;
Cobs [[02 , 02]] = 100000000 ; Cobs [[05 , 05]] = 100000000 ;
Cobs [[08 , 08]] = 100000000 ; Cobs [[10 , 10]] = 100000000 ;
Cobs [[13 , 13]] = 100000000 ; Cobs [[16 , 16]] = 100000000 ;
Cobs [[19 , 19]] = 100000000 ; Cobs [[22 , 22]] = 100000000 ;
As above, the data corresponding to the missing rays has been set to an arbitrary value (100)
and the associated standard deviations has been set to a large value (10 000).
Note: explain here that we obtain the posterior models are displayed in figure 4.17, the
mean posterior model is displayed in figure 4.18, the posterior uncertainties are in figure 4.19,
and the posterior correlations in figure 4.20.
The most remarkable fact on this solution is that the a posteriori model is very similar to
that obtained using the whole 22 data. We are computing the value of 16 parameters, and
we are only using 14 data, so we have ‘less data than unknowns’. Of course, it is the use of
the a priori information that allows to obtain reasonable results even in this situation. Note
that two of the blocks have large posterior uncertainties (note: I have to explain this better).
4.2 Ray Tomography Using Blocks
Figure 4.17: Posterior models.
Figure 4.18: Mean posterior model.
Figure 4.19: Posterior uncertainties.
Figure 4.20: Posterior correlations.
97