Individual Sections of the Book Inverse Problems: Exercices With mathematica, matlab, and scilab solutions Albert Tarantola1 Université de Paris, Institut de Physique du Globe 4, place Jussieu; 75005 Paris; France E-mail: [email protected] March 12, 2007 1© A. Tarantola, 2006. Students and professors are invited to freely use this text. 86 4.2 Linear Least Squares Ray Tomography Using Blocks Executable notebook at http://www.ipgp.jussieu.fr/~tarantola/exercices/chapter_04/RayTomography.nb This exercise corresponds to a highly idealized version of an X-ray tomography experiment. A 2D medium is characterized by a parameter m( x, y) whose physical dimension is the inverse of a length. This parameter may take any real value (including negative values). Assume that when a ray Ri is materialized in the medium (using a source and a receiver), we are able to measure the observable parameter i d = Z Ri d` m( x, y) , (4.1) where ` is the length along the ray. The goal of the exercise is to use some observed values i dobs to infer the values of the function m( x, y) . 11 10 09 08 12 07 13 06 14 05 15 04 16 03 17 02 18 19 20 21 22 01 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 Figure 4.1: Geometry of the ‘X-ray experiment’. This drawing is at scale 1:1 (the length of the side of the blocks is 1 cm ). To simplify the problem, the medium is divided into blocks, numbered from 1 to 16 (see figure 5.6), so, instead of evaluating the function m( x, y) , we only need to evaluate 4.2 Ray Tomography Using Blocks 87 the discrete values {m1 , m2 , . . . , m16 } . With this discretization, the relation (4.1) becomes di = α=16 ∑ Gi α mα , (4.2) α =1 where, as we have seen in the last lesson, Gi α is the length of the ray i in the block α . We shall use 22 rays, so the index i runs from 1 to 22. For short, equation (4.2) shall be written d = Gm . (4.3) With the geometry of the 16 blocks and the 22 rays represented in figure 5.6, it is easy to see that the matrix G is (only the nonzero elements are indicated) − − − − − − − − − − − − − − − a − − − − − − − − − − − a − − a − − − − − − − − a − − a − − a − − − − − a − − a − − a − − a − − − − − a − − a − − a − − − − − − − − a − − a − − − − − − − − − − − a − − − − − − − − − − − − − − − − − − b − − − b − − − b − − − b − − b − − − b − − − b − − − b − − b − − − b − − − b − − − b − − G = b − − − b − − − b − − − b − − − , (4.4) − − − a − − − − − − − − − − − − − − a − − − − a − − − − − − − − − a − − − − a − − − − a − − − − a − − − − a − − − − a − − − − a − − − − a − − − − a − − − − a − − − − − − − − − a − − − − a − − − − − − − − − − − − − − a − − − b b b b − − − − − − − − − − − − − − − − b b b b − − − − − − − − − − − − − − − − b b b b − − − − − − − − − − − − − − − − b b b b √ with a = 2 cm and b = 1 cm . In a real-life application, the components of the matrix G should be automatically computed using the positions of the end-points of the ray, but —as writing the associated code is tedious— in this simple exercise, we can enter the components “by hand” as follows1 (* The matrix G in the d = G m relation *) G = Table [0 , {i ,1 ,22} , {j ,1 ,16}]; sq2 = Sqrt [2]; G [[01 ,16]] = sq2 ; G [[02 ,12]] = sq2 ; G [[02 ,15]] = sq2 ; G [[03 ,08]] = sq2 ; G [[03 ,11]] = sq2 ; G [[03 ,14]] = sq2 ; G [[04 ,04]] = sq2 ; G [[04 ,07]] = sq2 ; G [[04 ,10]] = sq2 ; G [[04 ,13]] = sq2 ; G [[05 ,03]] = sq2 ; G [[05 ,06]] = sq2 ; G [[05 ,09]] = sq2 ; G [[06 ,02]] = sq2 ; G [[06 ,05]] = sq2 ; G [[07 ,01]] = sq2 ; G [[08 ,04]] = 1; G [[08 ,08]] = 1; G [[08 ,12]] = 1; G [[08 ,16]] = 1; G [[09 ,03]] = 1; G [[09 ,07]] = 1; G [[09 ,11]] = 1; G [[09 ,15]] = 1; G [[10 ,02]] = 1; G [[10 ,06]] = 1; G [[10 ,10]] = 1; G [[10 ,14]] = 1; 1 The units are assumed by default to always be in centimeters, and are not input into the code (see remark in footnote XXX). 88 Linear Least Squares G [[11 ,01]] G [[12 ,04]] G [[13 ,03]] G [[14 ,02]] G [[15 ,01]] G [[16 ,05]] G [[17 ,09]] G [[18 ,13]] G [[19 ,01]] G [[20 ,05]] G [[21 ,09]] G [[22 ,13]] = = = = = = = = = = = = 1; G [[11 ,05]] = sq2 ; sq2 ; G [[13 ,08]] sq2 ; G [[14 ,07]] sq2 ; G [[15 ,06]] sq2 ; G [[16 ,10]] sq2 ; G [[17 ,14]] sq2 ; 1; G [[19 ,02]] = 1; G [[20 ,06]] = 1; G [[21 ,10]] = 1; G [[22 ,14]] = 1; G [[11 ,09]] = 1; G [[11 ,13]] = 1; = = = = = sq2 ; sq2 ; G [[14 ,12]] = sq2 ; sq2 ; G [[15 ,11]] = sq2 ; G [[15 ,16]] = sq2 ; sq2 ; G [[16 ,15]] = sq2 ; sq2 ; 1; G [[19 ,03]] 1; G [[20 ,07]] 1; G [[21 ,11]] 1; G [[22 ,15]] = = = = 1; G [[19 ,04]] 1; G [[20 ,08]] 1; G [[21 ,12]] 1; G [[22 ,16]] = = = = 1; 1; 1; 1; Let us assume that we have the a priori information that, for all the blocks, m = 5 cm-1 ± 3 cm-1 . (4.5) As we are thinking in using the least-squares theory, we put this a priori information into the form of an a priori model mprior = ( 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 , 5 )t (in cm-1 units) , (4.6) and an a priori covariance matrix Cprior = s2m I , (4.7) where, interpreting the symbol ± in equation 4.5 as twice the standard deviation of a Gaussian distribution, we take sm = 1.5 cm-1 . (4.8) We take a diagonal covariance matrix because we are given no information about some other possible probabilistic model that would better represent the a priori information. At this point, we have the choice to keep in mind the vector mprior and the matrix Cprior , and proceed directly to the examination of the measurements, or, alternatively, we may pause to examine exactly which kid of a priori information have we introduced. Least-squares formulas are only interesting when all uncertainties are of the Gaussian type. This means the we are, in fact, assuming that we have an a priori probability density in the model space that is a Gaussian distribution, with mean mprior and covariance Cprior . The best we can do to grasp exactly the a priori information we are using is to generate a few random models from such a distribution. After introducing mprior and Cprior (* A priori information *) mprior = {5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5. ,5.} ; sm = 1.5 ; Cprior = sm ^2 IdentityMatrix [16] ; we perform the Cholesky decomposition of the prior covariance matrix (in this case, this is trivial, as Cprior is diagonal) (* Square root of the covariance matrix *) (* One has cov = L . Transpose [ L ] *) Lprior = Transpose [ CholeskyDecomposition [ Cprior ]] ; 4.2 Ray Tomography Using Blocks 89 With the matrix L at hand, we generate nine pseudo-random realizations of the prior (Gaussian) distribution as explained in section 1.7.2: (* Generation of random models ( prior distribution ) *) SeedRandom [1] rnd := Sqrt [2] InverseErf [ Random [ Real , { -1 , 1}]] Do [ { mrandom0 = Table [ rnd , {16}] , mrandom = mprior + Lprior . mrandom0 , PLOT [ mrandom ] } , {i ,1 ,9} ] Using the color scale in figure 4.2, the nine models so obtained are displayed in figure 4.3 Figure 4.2: The scale used for the model parameter values. All models represented in this section use this scale. 2 cm-1 9 cm-1 Figure 4.3: Least-squares formulas are only interesting when all uncertainties are of Gaussian nature. This is true, in particular, for the prior distribution in the model space. This prior distribution is completely described by a model vector mprior and a covariance matrix Cprior . At the right, nine pseudorandom samples of the prior Gaussian distribution. To set the exercise, let us now generate “artificial data”, to be later use as “observations”. To do that, let us invent an arbitrary model mtrue = ( 7 , 3 , 3 , 3 , 7 , 3 , 5 , 3 , 7 , 3 , 3 , 3 , 7 , 3 , 5 , 5 )t (in cm-1 units) . (4.9) Using the scale in figure 4.2, this “true model” can be represented as in figure 4.4 The reader Figure 4.4: The “true model” invented for artificially generating some “data”, to be later used in the inversion exercice. may perhaps read the two letters “IP” in this model. This shows that I am violating the assumption that the true model is a random model from the prior Gaussian distribution, i.e., one model that could be amid models like those in figure 4.3, while it is extremely unlikely that one of those models shows any “readable letter”. So, when using this “true model” we are certainly violating the a priori assumption. But the least-squares method is so practical 90 Linear Least Squares that we often use it, even when we know that other, more expensive, methods should be used instead. So, this (mild) violation of the a priori assumptions is part of the exercise. Computing the artificial data associated to the model mtrue just amounts to implement the computation dtrue = G mtrue . These true values can then be converted into pseudoobserved values by adding some noise, here independent random Gaussian values with zero mean and with standard deviation sd = 0.15 cm-1 . (4.10) When implementing this, (* Calculating " true data " , then adding Gaussian random errors *) dtrue = G . mtrue ; dobs = dtrue ; SeedRandom [123] rnd := Sqrt [2] InverseErf [ Random [ Real , { -1 , 1}]] sd = 0.15; Do [ dobs [[ i ]] = dobs [[ i ]] + sd rnd , {i , 1 , 22} ] we obtain the 22 values dobs = ( 6.93 , 11.29 , 12.89 , 25.72 , 18.28 , 14.09 , 10.06 , 14.05 , 15.91 , 12.07 , 28.14 , 4.21 , 8.32 , 15.52 , 25.50 , 21.10 , 14.12 , 9.98 , 15.95 , 17.89 , 16.18 , 20.07)t . (4.11) This vector of “observed values” is accompanied by the 22 × 22 diagonal2 covariance matrix Cobs = s2d I , (4.12) that describes our “experimental” uncertainties. The value of sd is that in equation 4.10. As explained in the introduction to this chapter, in least-squares theory, one demonstrates that the posterior probability density for the model parameters is a Gaussian, whose center is given by any of the three equivalent equations -1 -1 t -1 -1 mpost = ( Gt C-1 obs G + Cprior ) ( G Cobs dobs + Cprior mprior ) -1 -1 t -1 = mprior + ( Gt C-1 obs G + Cprior ) G Cobs ( dobs − G mprior ) (4.13) = mprior + Cprior Gt ( G Cprior Gt + Cobs )-1 (dobs − G mprior ) , and whose covariance is given by any of the two equivalent expressions -1 -1 Cpost = ( Gt C-1 obs G + Cprior ) = Cprior − Cprior Gt ( G Cprior Gt + Cobs )-1 G Cprior (4.14) . In this small-scale problem there is not much computational difference between these equivalent expressions, so let us use for mpost the expression -1 -1 t -1 -1 mpost = ( Gt C-1 obs G + Cprior ) ( G Cobs dobs + Cprior mprior ) 2 The , matrix is diagonal because the noise values generated are independent from each other. (4.15) 4.2 Ray Tomography Using Blocks 91 and for Cpost the expression -1 -1 Cpost = ( Gt C-1 obs G + Cprior ) . (4.16) After introducing the vector dobs and the matrix Cobs , dobs = {6.93 , 11.29 , 12.88 , 25.65 , 18.28 , 14.09 , 10.05 , 14.06 , 15.90 , 12.07 , 28.13 , 4.20 , 8.33 , 15.51 , 25.51 , 21.09 , 14.11 , 9.99 , 15.94 , 17.89 , 16.16 , 20.08} ; Cobs = sd ^2 IdentityMatrix [22] ; we compute Cpost and mpost using the commands Cpost = Inverse [ Transpose [ G ]. Inverse [ Cobs ]. G + Inverse [ Cprior ] ] ; mpost = Cpost .( Transpose [ G ]. Inverse [ Cobs ]. dobs + Inverse [ Cprior ]. mprior ) Impatient people may now rush to look the the model mpost , and to examine the standard deviations and correlations in Cpost , but I prefer we proceed orderly. The philosophy of our approach is to pass from the prior Gaussian distribution in the model space to the posterior Gaussian distribution. In the same way we obtained samples of the prior distribution (figure 4.3), the commands3 (* Generation of random models ( posterior distribution ) *) Cpost = (1/2) ( Cpost + Transpose [ Cpost ]) ; Lpost = Transpose [ CholeskyDecomposition [ Cpost ]]; SeedRandom [1] rnd := Sqrt [2] InverseErf [ Random [ Real , { -1 , 1}]] Do [{ mrandom0 = Table [ rnd , {16}] , mrandom = mpost + Lpost . mrandom0 , PLOT [ mrandom ] } , {i , 1 , 9}] produce the 9 models displayed in figure 4.5. This figure gives a quite clear idea of the resolution allowed by our data. The expert eye can directly guess what the posterior uncertainties and the posterior correlations are (and, of course, a larger number of posterior realization would allow to compute these with great accuracy). The average of a large enough set of posterior sample would be prior , and an evaluation of the statistical covariances would give Cpost . But they are already available to us, so we only need to plot them. The vector mpost (the output of the least-squares formulas) is plotted in figure 4.6. To examine the matrix Cpost we can first compute the associated standard deviations, that are the square roots of the diagonal elements, p σi = Cii . (4.17) Using the color scale in figure 4.7 the posterior uncertainties can be represented as in figure 4.8. 3 Although the matrix C post is, in principe symmetric, rounding errors may cause the Cholesky decomposition to work. For this reason, it is better to “resymmetrize” it. 92 Linear Least Squares Figure 4.5: Nine pseudorandom models that are samples of the posterior Gaussian distribution in the model space obtained used the data of 22 rays. Figure 4.6: Mean posterior model mpost . This is the output of the leastsquares formulas. We would also obtain this model if taking the average of a large number of models like those in figure 4.5. Figure 4.7: The scale used for the model parameter uncertainties. The same scale is used for all the display of uncertainties in this section. 0.08 cm-1 Figure 4.8: The posterior uncertainties, obtained as the square root of the diagonal elements of the matrix Cpost . They could also be obtained as the statistical standard deviations from a large number of models like those in figure 4.5. Note that the uncertainties in the blocks that are in the corner and in the center are smaller than those of the other blocks, due to the special geometry of the rays: the blocks at the corners are well resolved because there is, for each block, one ray that only explores the block. As a consequence, the central blocks also are well resolved (because of the existence of the diagonal rays). 1.5 cm-1 4.2 Ray Tomography Using Blocks 93 The off-diagonal elements of the posterior covariance matrix Cpost are better analyzed by introducing the correlations Cij ρij = (4.18) σi σj whose values satisfy the constraints −1 ≤ ρij ≤ +1 , (4.19) with the value +1 meaning perfect correlation, and the value −1 meaning perfect anticorrelation. Using the color scale in figure 4.9, the correlations can be displayed as in figure 4.10. Figure 4.9: The scale used for the correlation values. The value +1 means perfect correlation, and the value −1 means perfect anticorrelation. -1 +1 It is not of much interest to represent a crude 16 × 16 matrix. Rather, I have plotted a 4 × 4 Figure 4.10: Posterior correlations. This 4 × 4 array corresponds to the 4 × 4 blocks in the model: at position (i, j) of this array, there are the correlations of the posterior value (i, j) with all other posterior values. As an example, the green cross corresponds to the correlation between the value for block (1, 4) and itself (necessarily +1 ), while the red cross corresponds to the correlation between the value for block (1, 4) and the value for block (3, 4) . Although these correlations have been obtained from the matrix Cpost , they could have been obtained as the statistical correlations from a large number of models like those in figure 4.5. array of 4 × 4 matrices, each array representing the correlation between the uncertainty of a given block with all other blocks. In each submatrix, the correlation of the block to which the submatrix is associated with the block itself is always 1, per definition. So, what matters are the other correlations. They are quite large in absolute value, this being due to the fact that we only have ‘22 data for 16 unknowns’. The patterns of covariances in the blocks that habe large posterior uncertainties is particularly interesting, with correlations both, close to +1 and close to −1 . Once again, this results from the particular geometry of the rays. One may anticipate that, adding more and more data, with paths that are not only the horizontalvertical-diagonal of this exercise, would make (i) the uncertainties tend to zero, and (ii) the correlations between distinct blocks tend to zero. To further explore this exercise, let us suppress the data corresponding to the rays 07, 11, 15, and 19, solve the problem again. When the data corresponding to these four rays is 94 Linear Least Squares suppressed, one is left with the geometry indicated in figure 4.11. Figure 4.11: Geometry of the ‘X-ray experiment’ when the rays 07, 11, 15, and 19, are suppressed. Figure at scale 1:3. There are two ways of 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 suppressing these data: setting a new problem with less rays (redefining the matrix G , suppressing 4 of its rows) or artificially putting enormous uncertainties on the data associated to these rays. In the accompanying computer code both possibilities are implemented, and they give practically undistinguishable results. To implement the second possibility we only need to replace a few of the lines in the previous code, those defining the observations: (* The observations again , four of the data with huge uncertainties *) dobs = {6.93 , 11.29 , 12.88 , 25.65 , 18.28 , 14.09 , 100. , 14.06 , 15.90 , 12.07 , 100. , 4.20 , 8.33 , 15.51 , 100. , 21.09 , 14.11 , 9.99 , 100. , 17.89 , 16.16 , 20.08} ; Cobs = sd ^2 IdentityMatrix [22] ; Cobs [[07 , 07]] = 100000000 ; Cobs [[11 , 11]] = 100000000 ; Cobs [[15 , 15]] = 100000000 ; Cobs [[19 , 19]] = 100000000 ; The data corresponding to the missing rays has been set to an arbitrary value (100) and the associated standard deviations has been set to a large value (10 000). Note: explain here that we obtain the posterior models are displayed in figure 4.12, the mean posterior model is displayed in figure 4.13, the posterior uncertainties are in figure 4.14, and the posterior correlations in figure 4.15. When comparing the two figures 4.13 and 4.14 Figure 4.12: Posterior models. to the two figures 4.6 and 4.8, we observe the following. The posterior model is similar, excepted for the block (1, 1) , that now takes the a priori value ( 5 cm-1 ), with a posterior 4.2 Ray Tomography Using Blocks Figure 4.13: Mean posterior model. Figure 4.14: Posterior uncertainties. Figure 4.15: Posterior correlations. 95 96 Linear Least Squares uncertainty equal to the prior one ( 1.5 cm-1 ). This is because the suppression of the four rays makes that the block (1, 1) is not explored by any ray. The least-squares solution in this case gives a consistent a posteriori value for this block, identical to the a priori value, with a posterior uncertainty identical to the prior one. In addition we see in the central panel that the posterior uncertainty for this block is also identical to the prior one, as it should. Note also, in figure 4.15, that the uncertainty of this block is completely uncorrelated with the uncertainties of the other blocks. As a further example, let us now suppress the data corresponding to the rays 02, 05, 08, 10, 13, 16, 19, and 22. When the data corresponding to these eight rays is suppressed, one is left with the geometry indicated in figure 4.16. As above, we simply replace a few of the Figure 4.16: Geometry of the ‘X-ray experiment’ when the rays 02, 05, 08, 10, 13, 16, 19, and 22, are suppressed. Figure at scale 1:3. 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 lines in the previous code, those defining the observations, simulating the absence of data as data with huge uncertainties: (* The observations again , eight data with huge uncertainties *) dobs = {6.93 , 100. , 12.88 , 25.65 , 100. , 14.09 , 10.05 , 100. , 15.90 , 100. , 28.13 , 4.20 , 100. , 15.51 , 25.51 , 100. , 14.11 , 9.99 , 100. , 17.89 , 16.16 , 100.} ; Cobs = sd ^2 IdentityMatrix [22] ; Cobs [[02 , 02]] = 100000000 ; Cobs [[05 , 05]] = 100000000 ; Cobs [[08 , 08]] = 100000000 ; Cobs [[10 , 10]] = 100000000 ; Cobs [[13 , 13]] = 100000000 ; Cobs [[16 , 16]] = 100000000 ; Cobs [[19 , 19]] = 100000000 ; Cobs [[22 , 22]] = 100000000 ; As above, the data corresponding to the missing rays has been set to an arbitrary value (100) and the associated standard deviations has been set to a large value (10 000). Note: explain here that we obtain the posterior models are displayed in figure 4.17, the mean posterior model is displayed in figure 4.18, the posterior uncertainties are in figure 4.19, and the posterior correlations in figure 4.20. The most remarkable fact on this solution is that the a posteriori model is very similar to that obtained using the whole 22 data. We are computing the value of 16 parameters, and we are only using 14 data, so we have ‘less data than unknowns’. Of course, it is the use of the a priori information that allows to obtain reasonable results even in this situation. Note that two of the blocks have large posterior uncertainties (note: I have to explain this better). 4.2 Ray Tomography Using Blocks Figure 4.17: Posterior models. Figure 4.18: Mean posterior model. Figure 4.19: Posterior uncertainties. Figure 4.20: Posterior correlations. 97
© Copyright 2026 Paperzz