Problem Set 4: Covariance functions and Gaussian random fields

Problem Set 4: Covariance functions and Gaussian random fields
GEOS 627: Inverse Problems and Parameter Estimation, Carl Tape
Assigned: February 13, 2017 — Due: February 20, 2017
Last compiled: July 7, 2017
Overview and instructions
1. This problem set deals with three probability distributions: the uniform distribution, the
exponential distribution, and the Gaussian distribution.
2. Reading:
• Tarantola (2005): Ch. 2 (note Example 2.1) and Sections 5.1, 5.2, 5.3 (note 5.3.3).
• Aster et al. (2013): Appendix B
• class notes
3. (git pull) cp covrand_template.m covrand.m
Problem 1 (1.0). Gaussian and exponential PDFs
1. (0.5) Consider the standard normal distribution, fN (x).
(a) (0.1) What is the exact expression and approximate value of fN (±σ)?
(b) (0.1) What is the exact expression and approximate value of fN (µ)?
(c) (0.1) Make a plot of fN (x).
(d) (0.2) Label σ, µ, and fN (±σ) on your plot, and plot the three points (µ, fN (µ)),
(−σ, fN (−σ)), and (σ, fN (σ)).
2. (0.2) Consider the exponential probability density function
!
√
2|x − µ|
f (x) = k exp −
σ
(1)
√
Show that k = 1/(σ 2).
Hint: Split the integration into two intervals in order to eliminate the absolute values.
3. (0.3) Consider the two Gaussian probability density functions
(x − µx )2
fX (x) = kx exp −
2σx2
(y − µy )2
fY (y) = ky exp −
2σy2
(2)
(3)
(a) (0.1) Assuming that the variables X and Y are independent, what is the joint probability density function f (x, y)?
(b) (0.2) Assuming that the mean is zero and that f (x, y) has circular level surfaces, show
that the normalization factor for f (x, y) is h = 1/(2πσ 2 ) (such that f (x, y) = h g(x, y)).
1
Hints:
• Aster et al. (2013, eq. B.28)
• What does “mean zero” and “circular Gaussian” imply about f (x, y)?
• Try the integration using polar coordinates (it is clean).
Problem 2 (2.5). Uniform PDF (and central limit theorem)
The formulas for expected value and variance are given by
Z ∞
x fX (x) dx
E[X] =
(4)
−∞
Var[X] = E[X 2 ] − (E[X])2
where fX (x) is a probability density function. The expectated value of g(X) is given by
Z ∞
g(x) fX (x) dx
E[g(X)] =
(5)
(6)
−∞
1. (0.2) Write the expression for a uniform distribution, fU (x), on the interval [a, b].
Write the Matlab1 command to generate n samples of fU (x).
2. (1.0) Using Equations (4)–(6) (with your fU (x) in place of fX (x)), show that the expected
value and variance for fU (x) are given by
E[X] =
Var[X] =
a+b
2
(b − a)2
12
(7)
(8)
Hint: You will probably need to use polynomial long division.
1
or whatever computing language you are using
2
3. (0.0) In Matlab, generate 105 or so samples of fU (x) (remember: a sample of fU (x) will be
a random number between a and b), and check that the mean (µ) and variance (σ 2 ) of the
samples are close to the theoretical values, i.e., µ ≈ E[X] and σ 2 ≈ Var[X]. For the sake
of comparison, use
√
a = − 12,
√
b = 5 12
Plot a histogram of your samples to check that the distribution is flat over the appropriate interval. (No need to turn in this plot.)
4. (1.3) The central limit theorem is stated in Aster et al. (2013, Section B.6):
Let X1 , X2 , . . . , Xn be independent and identically distributed (IID) random
variables with a finite expected value µ and variance σ 2 . Let
Zn =
X1 + X2 + · · · + Xn − nµ
√
.
σ n
(9)
In the limit as n approaches infinity, the distribution of Zn approaches the standard normal distribution.
The central limit theorem works for any kind of distribution. You will demonstrate it using
the uniform distribution, fU (x) (Problem 2-1), for which you know µ and σ.
(a) (0.1)
• Write the expression for Z1 .
• What are the minimum and maximum values of Z1 ?
Note: Your answer should not have µ or σ in the expressions, since these can be
written in terms of a and b.
(b) (0.4)
• Write the expression for Z2 .
• What are the minimum and maximum possible values of the sum X1 + X2 ?
• What are the minimum and maximum values of Z2 ?
Note: Your answer should not have µ or σ in the expressions, since these can be
written in terms of a and b.
(c) (0.8) By generating samples (X) from your fU (x), demonstrate the central limit theorem by showing a set of histograms of Z1 , Z2 , Z3 , and Z10 . To obtain each distribution
of Zn (Eq. 9), you will need to repeat the experiment p times; try p = 105 . Center
your histograms between ±4.
Hint:
• Consider the case of n = 2. The first “experiment” will involve generating two
random samples, X1 and X2 , of fU (x). You can then compute Z2 using Equation (9). You then repeat this process p times and plot a histogram of the p values
of Z2 .
3
Problem 3 (3.5). Estimating a covariance matrix from a set of samples
See the template script covrand.m. Let P be the number of samples and M be the number
of model parameters describing a single sample. The ith sample is represented by the M × 1
vector mi .
It may (or may not) help to attach some physical meaning to these samples. Think of each
sample as the functional variation in a single dimension. The set of samples might represent, for
example:
• the variation in topography along different transects.
• the variation in height of an interval of an oscillating wire: each profile represents a different
time.
• the variation of vertical ground displacement with time, as captured by a seismogram: each
profile is for a different earthquake.
In this problem, the goal is to compute a sample covariance matrix, CP , and to use it to estimate
the covariance function, C(d), that characterizes the samples.
1. (0.0) Run covrand.m. Identify the key variables (and their dimensions) that are loaded
into Matlab, then comment the break statement and proceed.
2. (0.3) Plot 8 samples in a 4 × 2 subplot figure, with one sample per subplot and with the
same axis scale for each subplot (use ax0 from covrand.m). Plot each sample using the
spatial discretization given by x. Either use a default plotting style or ’-.’(but not ’.’).
For all M -dimensional vectors in the rest of the problem, plot them using the same y-axis
range and with the spatial discretization given by x.
3. (1.5) Use the first P = 10 samples to do the following:
(a) (0.3) Compute and plot the mean, µ10 .
(b) (0.7) Compute and plot the covariance matrix, C10 (use imagesc). Show your code
to compute C10 , and do not use the black-box cov function2 .
(c) (0.5) Make a scatterplot (use plot) of (C10 )kk′ versus Dkk′ = |xk − xk′ |, where D is
provided in covrand.m.
Hint: Try plot(D,Csamp,’b.’); where Csamp represents C10 .
4. (0.5) Repeat the previous (include plots), but use all 1000 samples. How does the estimated
mean and covariance change with increasing the number of samples?
5. (0.2) Examine the script covC.m. Some example plots using covC.m are shown in Figure 1.
Two of the functions plotted are
2d2
Cgaus (d) = σ exp − ′2
L
2d
2
Cexp (d) = σ exp − ′ ,
L
2
(10)
(11)
2
If you use cov to check, you may need to transpose your matrix of samples to ensure that the resultant matrix
is M × M .
4
where d is the distance between x and x′ . In our 1D example, d(x, x′ ) = |x − x′ |. Note 3 .
C takes in a distance between two points and outputs a value. It can alternatively be
written as a function of the two input points, x and x′ :
2(x − x′ )2
Cgaus (x, x ) = σ exp −
L′2
2|x − x′ |
Cexp (x, x′ ) = σ 2 exp −
,
L′
2
′
(12)
(13)
or in discrete form
(Cgaus )kk′
(Cexp )kk′
2(xk − xk′ )2
= Cgaus (xk , xk′ ) = σ exp −
L′2
2|xk − xk′ |
= Cexp (xk , xk′ ) = σ 2 exp −
,
L′
2
(14)
(15)
(a) What are Cgaus and Cexp for two points separated by d = L′ ?
(b) What are Cgaus and Cexp for two points separated by d = L′ /2?
(c) What are Cgaus and Cexp for two points separated by d = 0?
(d) What values of σ 2 and L′ were used for Figure 1?
Note: Only integers and variables should appear in your answers.
6. (0.0) Run the example listed in covC.m and make sure you understand what the input
parameters are.
7. (0.5) Use covC.m to find a covariance function, C(d), that reasonably fits the scatterplot of
(C1000 )kk′ versus Dkk′ from Problem 3-4.
(a) List your values of the parameters that describe C(d).
(b) Include a plot with C(d) superimposed on the scatterplot of (C1000 )kk′ versus Dkk′ .
(c) Let C be the covariance matrix corresponding to C(d).
What are the diagonal entries of C and why?
(d) Include a plot of C (use imagesc).
Problem 4 (3.0). Generating samples from a prescribed covariance
Tarantola (2005, p. 45):
. . . a large enough number of realizations completely characterizes the [Gaussian random] field. . . Displaying the mean of the Gaussian random field and plotting the covariance is not an alternative to displaying a certain number of realizations, because
the mean and covariance do not relate in an intuitive way to the realizations.
In Problem 3, you used a set of 1000 samples and computed a mean, µ1000 , and a covariance
matrix, C1000 . You used C1000 to estimate a covariance function, C(d), with corresponding
covariance matrix C. Here you will use µ1000 and C (not C1000 ) to generate a set of samples
that (hopefully) resembles the original samples.
3
I have used the notation L′ = 2L to distinguish our L′ from the L that appears in Tarantola (2005).
5
1. (1.5)
(a) (1.0) Generate 2000 samples of C, and save these as a set of mC (each mC is still
M × 1). Include the pertinent lines of your code.
Hints:
• A = chol(C,’lower’);
• If x = Aw is a sample of C, what are A and w?
(b) (0.4) Add µ1000 to each mC , then plot the first 8 samples (as in Problem 3-2).
Superimpose µ1000 in each subplot.
(c) (0.1) Do your samples resemble those provided in Problem 3? (yes or no)
2. (0.5) Consider the samples of the covariance matrix, mC .
Compute the mean (mean), standard deviation (std), and norm of each of the 2000 mC , and
show your results in three histogram plots. Do your results check with what you expect?
NOTE: Matlab’s norm command will not be useful here. In calculating the norm, you will
need to use a modified covariance matrix, M C, where M × M is the dimension of C. This
will ensure that the norm of each mC is about 1.
3. (0.8) Now generate a new C using covC.m by making only one change: change icov to either
1 or 2. Repeat Problem 4-1 using the same set of Gassian random vectors, wi , as before.
This will allow for a true comparison between samples from the Gaussian or exponential
covariance functions.
(a) Generate samples of the new C, add µ10000 to each sample. Plot the the first 8 samples.
(b) Describe the differences and similarities between the samples from the two different
distributions.
4. (0.2) Repeat Problem 4-2 for the set of 2000 samples from the new C.
Problem
Approximately how much time outside of class and lab time did you spend on this problem set?
Feel free to suggest improvements here.
References
Aster, R. C., B. Borchers, and C. H. Thurber (2013), Parameter Estimation and Inverse Problems,
2 ed., Elsevier, Waltham, Mass., USA.
Tarantola, A. (2005), Inverse Problem Theory and Methods for Model Parameter Estimation,
SIAM, Philadelphia, Penn., USA.
6
Gaussian covariance
Exponential covariance
16
16
14
14
12
12
exp(−0.50)
10
10
exp(−0.71)
8
8
exp(−1.00)
6
6
exp(−1.41)
4
4
exp(−2.00)
2
2
0
0
0
10
20
30
Distance
40
50
0
10
16
16
14
14
12
12
10
10
8
8
6
6
4
4
2
2
0
0
10
20
30
Distance
40
30
Distance
40
50
Matern covariance
nu = 0.2, 0.5, 1.5, 100.0
Circular covariance
0
20
50
0
10
20
30
Distance
40
50
Figure 1: Covariance functions from covC.m characterized by length scale L′ and amplitude
σ 2 . See Tarantola (2005, Section 5.3.3, p. 113). Some reference e-folding depths are labeled; for
example, the y-values of the top line is y = σ 2 e−1/2 ≈ 9.70. The Matérn covariance functions
include an additional parameter, ν, that influences the shape: ν → ∞ for the Gaussian function
(upper left), ν = 0.5 for the exponential function (upper right).
7