Approximation and Analysis of Probability Densities using

Technische Universität München
Department of Mathematics
Masters's Thesis
Approximation and Analysis of Probability
Densities using Radial Basis Functions
Fabian Fröhlich
Supervisor: Prof. Dr. Dr. Fabian Theis
Advisor: Dr. Jan Hasenauer
Submission Date: ...
I assure the single handed composition of this Masters's thesis only supported by declared
resources.
Garching,
Zusammenfassung
In dieser Arbeit behandeln wir eine neue Methoden zu Approximation von Wahrscheinlichkeitsdichten. Obwohl die Methode für ein Breites spektrum an Problemen anwendbar
ist, werden wir uns auf Wahrscheinlichkeitsdichten, die bei der Analyse von dynamischen
System auftreten, fokussieren. Die dort auftretenden Wahrscheinlichkeitsdichten ergeben
sich aus den Unsicherheiten bei der Parameterschätzung.
Diese Art von Problem ist
besonders interessant, da die Analyse der Wahrscheinlichkeitsdichten bereits für Probleme mit niedrig-dimensionalen Parameter Räumen nicht-trivial sein kann.
Für Prob-
leme mit nichtlinearen Korrelationen zwischen Parametern oder endlastigen Wahrscheinlichkeitsdichten der Parameter funktionieren klassische Methoden zur Approximation von
Marginalen, wie Markov chain Monte Carlo in Kombination mit Kerndichteschätzern, nur
mittelmässig gut.
Es ist bekannt das gitterlose Approximationsschemata wie beispielsweise die Interpolation mit radialen Basisfunktionen, für die Approximation von hoch-dimensionalen, unregelmässig verteilten Daten gut geeignet sind. Dies macht sie zu einer aussichtsreichen
alternative zu Markov chain Monte Carlo Methoden. In dieser Arbeit werden wir einen alternativen Ansatz präsentieren, der die Wahrscheinlichkeitsdichten der Parameter mittels
einer Linearkombination von Gauss'schen radialen Basisfunktionen approximiert. Der sich
daraus ergendebende Approximand kann als Mixtur von Gaussdichten betrachtet werden,
wodurch die Ausdrücke für Marginale und Momente des Approximanden in geschlossener
Form angebbar sind.
Im Gegensatz zur Interpolation von unregelmässig verteilten Daten sind wir nicht auf
festgelegte Interpolationsknoten beschränkt. Dadurch ergibt sich die Möglichkeit auch die
Knotenwahl zu optimieren. Wir werden die Optimalitätseigenschaften von bestimmten
Gittern zeigen und anschliessend daraus eine adaptive Methode zur Knotenwahl herleiten.
Des Weiteren werden wir verschiedene Verbesserungen einführen, die die Approximationsgenauigkeit des adaptiven Verfahrens signikant erhöht.
Am Beispiel einer bivariaten Mixtur aus zwei Gaussdichten konnten wir zeigen, dass
der
L2
Approximationsfehler für Marginale mit radialen Basisfunktionen im Vergleich
zu Kerndichteschätzern bei begrenzter Anzahl and Funktionsauswertungen um mehrere
Zehnerpotenzen kleiner ist.
Darüber hinaus konnten wir die Methode erfolgreich bei
Problemen mit bis zu 4-dimensionalen Parameterräumen anwenden. Aufgrund von Einschränkungen an die Dimensionalität des Parameterraums, sehen wir das Hauptandwendungsgebiet der Methode bei niedrig-dimensionalen Problemen mit hoher Komplexität.
Abstract
In this thesis we will present a novel method for the approximation of probability densities. Although the method is applicable for a wide range of problems, we will focus on
the approximation of probability densities that arise in the estimation of parameters of
dynamical systems. These probabilities densities reect the uncertainty of the parameter estimates. This class of problems is of special interest, as the analysis of the arising
densities can easily be non-trivial, even for problems with low-dimensional parameter
spaces. In the presence of nonlinear correlations between parameters or heavy tails in the
probability density of parameters classical approximation methods for marginals, such as
Markov chain Monte Carlo combined with kernel density estimators, perform moderately.
Approximation schemes using radial basis functions networks, are known to perform well
for the interpolation of high-dimensional scattered data.
ing alternative to kernel density estimation methods.
This makes them a promis-
In this thesis we will present a
novel approach which approximates the probability densities of parameters with a linear
combination of Gaussian radial basis functions. As the resulting approximation can be
seen as a mixture of Gaussians, the expressions for marginals and moments of the of the
approximant can be given in closed-form.
In contrast to scattered data interpolation and kernel density estimation, one is not limited
to xed interpolation nodes. This facilitates the optimization of the choice of interpolation
nodes.
We will show the optimality properties of certain lattices and propose a novel
algorithm for the generation of lattices restricted to superlevel-sets. Subsequently we will
motivate and present an adaptive method for the generation of nodes based on interacting
particles. Moreover we will introduce several improvements to the adaptive method that
signicantly increase the approximation accuracy.
Based on the example of a bivariate mixture of two Gaussians, we will show that our
method yields an
L2 approximation error for marginals that is several orders of magnitude
lower compared to approximations using Markov chain Monte Carlo combined with kernel
density estimators. Moreover we will show that the method works for dynamical systems
of up to 4-dimensional parameter spaces. Due to limitations in terms of the dimensionality
of the parameter space, we currently envision the method to be mainly applied to low
dimensional problem of high computational complexity.
Contents
1 Introduction
1.1
1.2
Parameter Inference
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.1.1
Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.1.2
Identiability
3
1.1.3
Bayesian Inference
1.1.4
Marginals, Moments and Proles
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
4
. . . . . . . . . . . . . . . . . . .
6
Mathematical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.2.1
Optimization
7
1.2.2
Markov Chain Monte Carlo
. . . . . . . . . . . . . . . . . . . . . .
8
1.2.3
Kernel Density Estimate . . . . . . . . . . . . . . . . . . . . . . . .
8
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
Contributions of this Thesis
. . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
Example of Gaussian Mixture
. . . . . . . . . . . . . . . . . . . . . . . . .
2 Interpolation using Radial Basis Functions
10
11
13
2.1
Interpolation Conditions
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2
Positive Denite Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3
Error Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3.1
The Native Space . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3.2
Error Estimates for the Native Space . . . . . . . . . . . . . . . . .
18
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.4
5
2.5
2.6
2.4.1
Trade-o Principle
. . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.4.2
Finding the optimal Shape Parameter . . . . . . . . . . . . . . . . .
20
2.4.3
Improved Stability
21
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Non-negativity of the Density
. . . . . . . . . . . . . . . . . . . . . . . . .
23
2.5.1
Generalized Hermite Interpolation . . . . . . . . . . . . . . . . . . .
24
2.5.2
Compactly Supported Radial Basis Functions
. . . . . . . . . . . .
26
2.5.3
Moving Least Squares
. . . . . . . . . . . . . . . . . . . . . . . . .
27
2.5.4
Scaled Moving Least Squares
. . . . . . . . . . . . . . . . . . . . .
29
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3 Generation of Centers
31
3.1
Connection to Voronoi Cells
3.2
Lattices
3.3
3.4
. . . . . . . . . . . . . . . . . . . . . . . . . .
31
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.2.1
Restriction of Interpolation Area
. . . . . . . . . . . . . . . . . . .
36
3.2.2
Algorithm to generate restricted lattices
. . . . . . . . . . . . . . .
36
3.2.3
Discussion of the Algorithm. . . . . . . . . . . . . . . . . . . . . . .
38
3.2.4
Choice of Lattice
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.2.5
Reparametrisation
. . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.2.6
Local Reparametrisation . . . . . . . . . . . . . . . . . . . . . . . .
42
Adaptive Renement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.3.1
Self Organizing Potential . . . . . . . . . . . . . . . . . . . . . . . .
44
3.3.2
Original Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
Improvements to the Algorithm
. . . . . . . . . . . . . . . . . . . . . . . .
46
3.4.1
Modied Interpolation of
Dp
. . . . . . . . . . . . . . . . . . . . . .
46
3.4.2
Modied Density Function . . . . . . . . . . . . . . . . . . . . . . .
48
3.4.3
Improved Stopping Criterion . . . . . . . . . . . . . . . . . . . . . .
53
3.5
3.4.4
Extension to Higher Dimensions . . . . . . . . . . . . . . . . . . . .
57
3.4.5
Extension to Unknown Domains . . . . . . . . . . . . . . . . . . . .
57
3.4.6
Modied Spawning . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
3.4.7
Iterative Reparametrisation
. . . . . . . . . . . . . . . . . . . . . .
61
3.4.8
Local Kernel Shape . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
4 Analysis of the Interpolant
65
4.1
Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
4.2
Numerical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
4.3
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
5 Application of the Algorithm
73
5.1
Reversible First Order Reaction . . . . . . . . . . . . . . . . . . . . . . . .
73
5.2
Enzymatic Catalysation
76
5.3
Stochastic Gene Expression
. . . . . . . . . . . . . . . . . . . . . . . . . .
81
5.4
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Conclusion and Outlook
85
6.1
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
6.2
Outlook
86
Bibliography
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
Chapter 1
Introduction
p(x) of a continuous random variable X is a fundamental
theory. The function p(x) describes the relative likelihood of X to
a random variable X , the probability that X takes values in any
The probability density function
concept in probability
take the value
measurable set
x.
A
For
is given as follows:
Z
P (X ∈ A) =
p(x)dµ.
A
In statistics, one often deals with the problem where one is given a set of samples
where
{ξ k }N
k=1 ,
i.i.d.
ξk ∼ X
are independent realizations of the random variable
responding probability density of
X
X
and wants to estimate the cor-
from these samples.
Methods commonly used to
achieve this can be discriminated in two distinct categories. There are
parametric estima-
tion techniques where it is assumed that the samples are drawn from a known parametric
family of distributions.
In this context the goal is to estimate the parameters for the
special realization of the parametric family. In the setting of
nonparametric
density es-
timation, one does not make any strong assumptions about the probability distribution,
except the existence of a corresponding density. Two basic references covering parametric,
as well as nonparametric density estimation are the books (Ramsay and Scott, 1993) and
(Silverman, 1986).
In this thesis we will consider a related problem, where we are given a density function
p(x)
for which we have no closed-form expression but which is numerically evaluable up to some
scaling factor. To analyze this density, it is usually necessary to give an approximation
to the density and carry out the analysis on the approximant. One possible approach for
this problem is to generate independent, identically distributed samples from the random
variable associated to the density
p(x)
and use classical methods for density estimation.
In this thesis we will investigate an alternative approach that enforces certain regularity
conditions on the sampling scheme and employs a parametric density estimation technique
that only assumes a certain smoothness of the density function.
1
2
CHAPTER 1.
INTRODUCTION
In Chapter 1 we will introduce the problem and present the commonly used techniques for
the analysis of the probability density. In Chapter 2 we will present the relevant theory
for the approximation using radial basis functions.
Subsequently Chapter 3 treats the
problem of generating optimal interpolation nodes and Chapter 4 gives the closed-form
expressions for the analysis of the approximant.
In Chapter 5 we apply the proposed
method to several models for dynamical and stochastic systems. Eventually we present
our conclusion and outlook to the method in Chapter 6.
1.1 Parameter Inference
Given a set of (multi-dimensional) time-resolved data points
t
D = {(tk , ȳ(tk )}ni=1
where
ȳ(tk )
t = (t1 , . . . , tN )T
is the vector containing the time points where the measurements
were taken. We are interested in explaining the measured data with the help of a
model that is governed by ordinary dierential equations:
(
Σ(θ) :
Here
c(t) : [0, ∞) → Rnc
ċ(t) = f (t, c, θ), c(0) = c0 (θ)
y(t; θ) = h(c, θ)
.
is the function describing the time-dependent evolution of
the physical states of the system, such as the concentration of a certain substance. These
nθ
physical states follow the dierential equations described by the parameters θ ∈ Ωθ ⊆ R+ ,
n
n
n
the vector eld f : [0, ∞) × R c × Ωθ → R c and the initial conditions c0 (θ) : Ωθ → R c .
Normally it is not possible to directly measure the physical states of a system, hence it is
n
n
necessary to introduce the function y(t; θ) : [0, ∞) × R c → R y describing the observable
quantities of the system. The physical states c(t) and the observables y(t) are brought
n
n
in relationship by the lter h(c, θ) : R c × Ωθ → R y , which in many cases simply is a
projection.
In the following, we will assume that all involved functions have sucient
smoothness for existence and uniqueness of solutions.
1.1.1 Maximum Likelihood
When not stated otherwise, we will assume that the data
system
Σ(θ)
D
can be explained by the
under normally distributed measurement noise:
ȳ(tk ) = y(tk ; θtrue ) + k
y(t; θtrue ) sol.
to
Σ(θtrue ), kj ∼ N (0, σjk (θtrue )).
Hence we can state the conditional probability density to observe the measured data given
a set of parameters
θ
and the model
p(D|θ) =
Σ(θ):
ny
nt Y
Y
j=1 k=1
σjk
1
√
(ȳj (tk ) − y(tk ; θ))2
exp −
2
2σjk
2π
!
.
(1.1.1)
CHAPTER 1.
3
INTRODUCTION
This expression is called likelihood. It is possible to interpret this as a function in
to nd the maximizer of this function. The resulting
(MLE)
θ
and
maximum-likelihood estimator
θ̂MLE = argmax p(D|θ)
θ∈Ωθ
gives the parameter that maximizes the probability to observe the data
D.
The MLE
returns a single data point and does not assess the uncertainty of the returned value. Although it is possible to compute condence intervals (Joshi et al., 2006) and thus quantify
the uncertainty of the estimator, we still need to nd a way to evaluate this uncertainty.
To do this in a sophisticated fashion we will introduce the concept of identiability.
1.1.2 Identiability
For this thesis, it is sucient to introduce two dierent kinds of identiabilities.
First
there is structural identiability:
Denition 1.1 ((Hasenauer and Theis, 2012)).
A parameter
θi
is
Global structural identiability
globally structurally identiable
if for almost any
0
θ∈Ω
0
∀t : y(t, θ) = y(t, θ ) =⇒ θ = θ .
Structural identiability does not depend on the measured data but is a property of the
model.
It tells us wether the observables of the model are injective almost everywhere
with respect to the parameters.
Hence the only possibility to remove structural non-
identiabilities is to make changes to the model.
Second, there is practical identiability, which depends on the available data as well as
the model and the employed estimator. As practical identiability is not clearly dened
in literature, we will give an independent denition here:
Denition 1.2 ((Hasenauer and Theis, 2012)).
A parameter
θi
is
practically identiable
that
and
Practical identiability (MLE)
from data
D
if there exist
θmin ,θmax ∈ Ωθ
@ θi < θi,min : p(D|θ) > exp − ∆2α p D|θ̂MLE
@ θi > θi,max : p(D|θ) < exp − ∆2α p D|θ̂MLE .
such
(1.1.2)
For a system to be practically identiable, we look at the spread of the probability density
above a certain threshold. This is illustrated in Figure 1.1.1, where one example for an
identiable parameter and a not identiable parameter is given. Practical identiability
of a system essentially depends on the choice of this threshold. Therefore we should make
a reasonable choice. When we have a likelihood as in (1.1.1), we can use a result from
linear regression models which is also a good approximation for ODE models (Raue et al.,
2009):
2(−log(p(D|θtrue )) + log(p(D|θMLE )) ∼ χ2 (·|nθ ).
4
CHAPTER 1.
Parameter θi is identiable, here both θi,min
and θi,max exist.
Parameter θi is not identiable, here θi,max
does not exist.
(a)
Figure 1.1.1:
INTRODUCTION
(b)
Comparision of examples of identiable and not identiable parameters.
When we rewrite (1.1.2)
and
and use the
< ∆α
@ θi < θi,min : 2 −log(p(D|θ) + p D|θ̂MLE
@ θi > θi,max : 2 −log(p(D|θ) + p(D|θ̂MLE < ∆α
α-quantile
of the
χ2
distribution
Z
∆α
χ2 (ϑ|nθ )dϑ = α
0
we can interpret practical identiability as having a bounded condence region. At this
point the reader should be reminded that these condence regions are only approximated
2
condence regions as the real distribution is approximated with a χ -distribtution. An
alternative approach is the approximation of condence regions via bootstrapping (Joshi
et al., 2006).
1.1.3 Bayesian Inference
In many applications more information about the parameters is already available
the measurement is taken and inference is carried out.
before
The information could be the
interval of reasonable parameter values or the probability densities of parameters based
on previous experiments. With the help of Bayes' Theorem
p(A|B) =
p(B|A)p(A)
p(B)
CHAPTER 1.
5
INTRODUCTION
we can formulate a so called posterior density
p(θ|D)
for the parameters. The posterior
is expressed in terms of the previously introduced likelihood
p(D|θ)
and the prior density
p(θ):
p(θ|D) =
p(D|θ) p(θ)
.
p(D)
As the name suggests, the prior density will carry all information available
evidence
p(D)
a priori.
The
can be reformulated with help of the law of total probability
Z
p(D) =
p(D|θ)p(θ)dθ.
Ωθ
This term no longer depends on
θ
but merely acts as normalization factor and will be
omitted in the following. We will write
p(θ|D) ∝ p(D|θ) p(θ).
When the prior is only used to constrain parameter bounds, it can have the form of a
multivariate uniform distribution with compact support
p(θ) =
nθ
Y
U
h
θimin , θimax
i
θimin , θimax ∈ R+ ,
(1.1.3)
i=1
U ([a, b]) is the density of the uniform distribution on the interval [a, b]. For this kind
nθ
N
Ωθi , which is the cartesian product of
of prior density, the domain Ωθ is the orthotope
i=1
i
h
min max
. Naturally, more complex domains Ωθ are obtainable
, θi
the intervals Ωθi = θi
where
by choosing a prior with the respective support.
A prior having emanated from previous experiments, can be interpreted as a regularization. Looking at the logarithm of the posterior where the multiplication translates into
an addition:
log (p(θ|D))
= log (p(D|θ)) + log (p(θ)) − log (p(D))
When the likelihood is a product of normal distributions such as in (1.1.1) the loglikelihood is the
l2
distance between the simulated and real data.
The prior term can
then be interpreted as Tikhonov regularization. Here the weighting factor of the regularization term, which would give the inuence strength of the regularization, is given by
the sharpness of the prior. Regularizations are often employed to eliminate ill-posedness
due to non-uniqueness of the least squares solution. In the context of Bayesian inference,
inclusion of a prior can eliminate non-identiabilities, which can be construed as generalization of non-uniqueness.
At this point it should be noted that non-identiabilities
are essential informations about the system and one should not include articial prior
information to remove non-identiabilities. Inclusion of a prior generally also introduces
a bias and should hence be well-justied.
6
CHAPTER 1.
As for the likelihood we can formulate a
INTRODUCTION
maximum a posterior estimator (MAP)
θ̂M AP = argmax p(θ|D).
θ
For this estimator we can dene practical identiability by replacing
and
p(D|θ)
by
p(θ|D)
θ̂MLE
by
θ̂MAP
in (1.1.2).
1.1.4 Marginals, Moments and Proles
In the preceding paragraphs we demonstrated, that it is of crucial importance not only
to look at the value which the estimator of choice returns, but also to assess the uncertainty of this value.
Identiability is one way to asses this uncertainty, but there are
several other ways. One can use condence and credibility intervals or the variance of the
estimators to assess uncertainties. Those quantities both are mappings of properties of
the density to lower dimensional quantities. When the parameter dimension
nθ
is high,
these mappings are necessary for the investigator to be able to analyze the density. In
most cases these mappings lead to loss of information. For example, a general density can
not be fully described by mean and variance. Hence the goal is to maximize information
that is conveyed in the mapping, while keeping the dimension of the mapped quantities
suciently low.
This is usually achieved by studying the marginals, moments and proles of the density.
The one dimensional marginals are given by.
Z
p(θi |D) =
p(θ|D)dθj6=i
Ωθj6=i
where
Ωθj6=i
denotes the
nθ − 1
nθ
N
dimensional orthotope
Ωθj
and
dθj6=i
denotes the
j=1,j6=i
nθ − 1
nθ
N
dimensional product measure
dθj .
In the same fashion we can dene the
j=1,j6=i
two-dimensional marginals
Z
p(θi , θj |D) =
p(θ|D)dθk6=i,j
Ωθk6=i,j
where
Ωθk6=i,j
denotes the
nθ − 2
nθ
N
dimensional orthotope
Ωθk
and
dθk6=i,j
denotes
k=1,k6=i,k6=j
the
nθ
N
nθ − 2 dimensional product measure
dθk .
The moments of order
k=1,k6=i,k6=j
by
mdi
Z
θid pi (θi |D)dθi
=
Ωθi
and the proles are given by
qi (θi ) =
max
θj6=i ∈Ωθj6=i
p(θ|D).
d are given
CHAPTER 1.
7
INTRODUCTION
The formulas presented above only hold for cases, where
Ωθ is an nθ dimensional orthotope,
Ωθ .
but it is straightforward to extend them to more general
In most cases where no closed-form analytical solution to the ODE problem exists, there
also exists no closed-form analytical expression for the posterior.
Thus, analyzing the
density in terms of moments and marginals requires numerical evaluation of the likelihood
and numerical integration schemes. For proles on the other hand we will need numerical
optimization schemes. In the following section we will present some techniques that are
commonly applied in this context.
1.2 Mathematical Tools
1.2.1 Optimization
For the computation of the MAP and the MLE, as well as for the computation of proles
we need to rely on robust optimization schemes. In practice, local optimizers such as interior point method or trust region algorithms outperform most other global or stochastic
optimizers, when information on rst and second derivatives are provided (Raue, 2013).
For ODE models, the computation of the gradient and the Hessian via nite dierences is
ill-conditioned and will most likely produce bad results (Raue, 2013). Instead, gradients
can be computed based on the sensitivities
T 
∇θ cθ1 (t)


.
c
c
.
S c (t) = 
 = s1 (t), . . . , snθ (t) .
.
T
∇θ cθnc (t)

The gradient of the posterior is given by
ny nt
X
X
∂
p(θ|D) = p(θ|D) 2p(θ)
∂θi
j=1 k=1
where
(ȳj (tk ) − yj (tk ; θ))
2
2σjk
!
∂yj
∂p(θ)
|tk +
∂θi
∂θi
!
∂y
∂h c ∂h
=
s +
,
∂θi
∂c i ∂θi
∂sci
∂f c ∂f
∂c0
=
si +
, sci (0) =
.
∂t
∂c
∂θi
∂θi
Ecient computation of these gradients has been implemented in the CVODES framework (Serban and Hindmarsh, 2005). Nevertheless the computation of gradients in more
complex models becomes time consuming as the number of dierential equations scales
O(nθ ), whereas the number of terms in the ODE scales as O(nθ nx ). For the Hessian,
2
the number of dierential equations scales as O(nθ ) and the number of terms in the ODE
2
scales as O(nθ nx ). With this approach computation of the Hessian is even more expensive
and rather unattractive. When gradient information is available, a more robust approx-
as
imation the Hessian can be achieved using a dense quasi-Newton method (Shanno and
8
CHAPTER 1.
INTRODUCTION
Kettler, 1970; Goldfarb, 1970). The quasi-Newton method has the compelling advantage
of producing symmetric, positive denite Hessians. We will later see that this property is
indispensable for our application (cf. Section 3.2.5).
When the posterior is of the form (1.1.1) we can also approximate the Hessian via the
Fisher information matrix of the estimator (Wynn and Parkin, 2001).
ny
N X
T
X
1 ∇
y
|
∇θ yj |tk ,θ̂ .
F =
θ j tk ,θ̂
2
σ
j
k
k=1 j=1
1.2.2 Markov Chain Monte Carlo
Markov chain
The most common integration scheme for high dimensional functions is
(j) N
. This method generates samples {ξ }j=1 from a distribution
by constructing a Markov chain that has the posterior p(θ|D) as equilibrium density
Monte Carlo (MCMC)
(Hastings, 1970).
The moments can then be obtained using Monte Carlo Integration
N
Z
g(p(θi |D), θi )dθi ≈
1X
(j)
g(ξi )
N j=1
A lot of eort has been put into improving this method by developing adaptive schemes
(Haario et al., 2006; Andrieu et al., 2010; Gilks and Berzuini, 2001; Schmidl et al., 2013)
making MCMC a powerful tool. Nevertheless, the intrinsic stochasticity of the algorithm
makes it hard to evaluate the convergence of the estimators. There are several heuristic
approaches (Cowles and Carlin, 1996) to assess the convergence of Markov chains but
they can only identify cases where the chain has not yet converged, but cannot prove that
the chain has converged.
Moreover when the target density is multi-modal or, as it often is the case in the presence
of non-identiabilities, heavy-tailed, MCMC only has poor convergence properties (Jarner
and Roberts, 2007)
1.2.3 Kernel Density Estimate
We discussed in Section 1.1.4, that marginals give a visual representation of the density
and allow the investigator to visually assess the uncertainty of the parameter estimates. To
estimate marginals,
kernel density estimators (KDEs) are commonly used techniques.
KDEs approximate the posterior as convex combination of
(k)
centers ξ
:
N
X
1
Φ
p(θ|D) ≈ pΦ,N (θ) = n
ε θ N k=1
N
normalized Kernels
θ − ξ (k)
ε
Φ
with
CHAPTER 1.
9
INTRODUCTION
with normalization
1
εnθ
R
Rnθ
Φ
θ−ξ (k)
ε
dθ = 1.
In practice, estimating the marginal p(θi |D) we can produce samples from this density by
(k) N
using the projection {ξi }k=1 of the previously obtained MCMC samples.
In general these estimators are evaluated based on the
square error (AMISE)
Z
AMISE
asymptotic mean integrated
+∞
E(pΦ,N (θi ) − p(θi |D))2 dθi
=
−∞
for which the Epanechnikov kernel
(
Ψ (θ) =
3
(1
4
− θ2 ) |θ| ≤ 1
0
|θ| > 1
has been shown to perform optimal (Epanechnikov, 1969). This kernel has the compelling
advantage of having compact support on the domain
[−1, 1]. This makes evaluation of the
k -nearest neighbor algorithms
approximant, even for large sample sizes, ecient when fast
such as
kd-trees
(Bentley, 1975) are utilized.
Nevertheless, the Epanechnikov kernel is
non-dierentiable at the boundaries of its support, which raises questions of applicability
for smooth densities.
In contrast Gaussian kernels
(k)
εΦ
θ − ξi
ε
!
= N (0, 1)
(1.2.1)
are analytic functions and with the recent development in fast summation techniques such
fast Gaussian transformation (FGT) (Yang et al., 2003) or non-equidistant fast
Fourier transformation (NFFT) (Potts and Steidl, 2003), they can also be imple-
as
mented eciently. This makes them a viable alternative and in this thesis, we will always
use gaussian kernels when speaking of KDEs.
Independent of the choice of the kernel, we still have to determine the optimal bandwidth
ε.
One popular approach is to do this via
leave-one-out cross-validation (LOOCV)
(Celisse, 2008). The computational cost of this method is the cost of one evaluation of
the approximant times
bandwidth
ε.
N
times the number of necessary iterations to nd the optimal
For large sample sizes
N
this approach is unattractive and instead we can
use Scott's rule for one dimensional densities (Ramsay and Scott, 1993):
1
ε = σi N − 5
where
(k)
σi is the variance of the samples {ξi }N
k=1 .
When estimating
the generalized Scott's rule for the multivariate bandwidth
1
1
Γ = N − d+4 Σ 2 ,
d-dimensional marginals,
10
CHAPTER 1.
can be used. Here
Σ
INTRODUCTION
is the multivariate covariance matrix.
Despite the fact that KDEs require a bandwidth, they are non-parametric kernel models.
This already suggests some advantages and disadvantages of the method: Non-parametric
implies that, when Scott's Rule is used, they do not require any optimization of parameters
and can be directly applied at low computational cost. On the other hand classical KDEs
are also non-adaptive and do not use possibly available information about the density
that could improve the estimation.
In fact, the AMISE having the lower asymptotic bound in the univariate case all nonnegative univariate KDEs converge slower than linear (Ramsay and Scott, 1993)
AMISE
∗
4
= O(N − 5 ).
1.3 Contributions of this Thesis
There are some problematic aspects for the previously described method. The classical
(k)
KDE is non-adaptive and thus does not make use of the value of p(θ|D) on the samples ξ
which would be readily available. This non-adaptive approach requires a large amount of
samples to produce reliable results. In the following chapters, we will present a parametric
kernel density estimator to the multivariate density. By using a parametric and thereby
adaptive approximation, we hope to produce the same results with signicantly smaller
number of samples. The key idea is to approximate
p(θ|D)
via a mixture of Gaussians:
N
X
wk Φk (θ)
p(θ|D) ≈ Pp,Φ (θ) =
k=1
where
Φk (θ) = N (θ − ξ (k) , Σ).
In contrast to classic kernel density estimation, this approximation introduces weighting
terms to the individual kernels. The addition of weightings introduces more degrees of
freedom, making the approach parametric. Moreover, it allows to use general sets of
(k) N
samples {ξ
}k=1 , and no longer restricts us to sets of samples, which are distributed
according to the density we want to approximate.
This approximation scheme is a special case of the interpolation using
tions (RBFs).
radial basis func-
Figure 1.3.1 shows the approximation of a univariate Gaussian density
using RBF approximation, as well as KDE approximation. As one can see, being parametric, the RBF approach has superior approximation properties for this example.
In this thesis an overview of error estimates for RBF interpolation is given and with their
k N
help, more regular set of samples X = {ξ }k=1 are motivated. In this context we will investigate the behavior of RBFs on lattices and with adaptive sampling schemes. We will
present a novel algorithm to generate lattices in super-level-sets in arbitrary dimensions
CHAPTER 1.
11
INTRODUCTION
Kernel density estimator approximation of
the density. Single kernels in dashed red. Approximation in solid red and true density in
dashed black. Crosses represent sample locations.
Radial basis function approximation of the
density. Single kernels in dashed blue. Approximation in solid blue and true density in dashed
black. Crosses represent sample locations.
(a)
(b)
Comparison of Kernel density estimator and radial basis function approximation
for a one-dimensional gaussian based on the same 5 randomly generated samples.
Figure 1.3.1:
and motivate and improve the adaptive particle based sampling scheme introduced in
(Reboux et al., 2012). Moreover we will evaluate the performance of several reparametrisation schemes, localized kernel sizes as well as methods to ensure non-negativity of the
approximate density.
1.4 Example of Gaussian Mixture
In the following chapters, the performance of our approximations will repeatedly be investigated based on one example problem:
Example 1.3. We will approximate the marginals of a bivariate mixture of two Gaussian:
4
1
p(k1 , k2 ) = N (µ(1) , Σ(1) ) + N (µ(2) , Σ(2) )
5
5
with covariance matrices
(1)
Σ
=
0.1 0.25
0.25 1
and means
µ
(1)
=
1
1
,
,
Σ
(2)
(2)
µ
=
=
0.01 −0.01
−0.01 0.5
0.5
−1.5
.
12
CHAPTER 1.
Figure 1.4.1:
The domain
INTRODUCTION
Posterior density with marginals for Example 1.3.
Ωθ
was chosen to be
R2
to allow exact computation of errors for the approx-
imation of marginals without the necessity to account for truncation errors (cf. Section
4.1).
The joint density, as well as the marginals for Example 1.3 are shown in Figure
1.4.1.
For Example 1.3 we can analytically compute the marginals and moments and thus carry
out exact error analysis. Proles are computed only in areas of signicant mass of the
density. Hence we are not interested in eects occurring outside these areas (cf. Section
5
k 105
2.5 and Section 3.2.1). Therefore we will use N = 10 samples {τ }k=1 generated by
MCMC for the error analysis of the complete density.
using the DRAM toolbox (Haario et al., 2006).
the maximum over the complete interval. The
The
MCMC samples are generated
L∞
L2 -error
will be computed by taking
is computed using Monte Carlo
integration:
EL2
2
N
1 X ptrue (τ k ) − papprox (τ k )
.
=
N k=1
ptrue (τ k )
The behavior of the approximant outside these areas of signicant mass plays an important
role for the marginals as we need to integrate overΩθ , which in general is larger than the
area where signicant amounts of mass are located (cf. Section 2.5 and Section 3.2.1).
L∞ and L2 error are approximated based on
Ωk1 = [−1.5, 3.5] and Ωk2 = [−5, 6] respectively.
Hence both
domain
256 equidistant points on the
Chapter 2
Interpolation using Radial Basis
Functions
In this chapter an introduction into interpolation using RBFs is given. For a more detailed
treatment of the subject, we will refer the reader to the books (Wendland, 2005) and
(Fasshauer, 2007).
RBFs enjoy great popularity in the interpolation of multivariate scattered data, as well
as integration schemes for partial dierential equations in complex geometries (Sarra and
Kansa, 2009).
RBF methods interpolate using a univariate function that depends on
the Euclidean distance between the evaluation points and the location of individually
chosen centers. This technique turns every multi-dimensional problem into a quasi onedimensional problem. It has been shown that it is possible to give dimension-independent
convergence rates for RBF interpolation (Fasshauer et al., 2010).
As we can arbitrar-
ily chose the locations of the centers, RBF interpolation does not require any grid,
which makes the method geometrically exible and explains the classication as meshfree
method.
This Chapter is organized as follows.
We will start by motivating and dening RBFs
in Section 2.1 and then give further characterization of Gaussian RBFs in Section 2.2
and 2.3. Sections 2.4 and 2.5 will focus on numerical considerations for the interpolation
process.
2.1 Interpolation Conditions
We will now investigate the approximation of a function
basis functions
f (x) ≈ Pf (x) =
f (x)
N
X
wk Φk (x).
by a linear combination of
(2.1.1)
k=1
We are especially interested in basis functions
Φ that are radial basis functions (RBFs).
13
14
CHAPTER 2.
Denition 2.1
radial
INTERPOLATION USING RADIAL BASIS FUNCTIONS
(Fasshauer, 2007, Denition 2.1)
provided there exists an
univariate
.
function
Φ(x) : Rs → R
φ : [0, ∞) → R such that
A function
is called
Φ(x) = φ(kx − ξk)
where
ξ
is the center of the radial function.
Writing
Φk (x) = φ x − ξ (k) 2
one can see, that Gaussian functions with mean
ξk
and orthonormal covariance matrix
are radial functions. The corresponding univariate function for Gaussian RBFs is
φ(r) = exp(−ε2 r2 ).
(k) N
In this thesis we will only consider the case of the centers X = {ξ
}k=1 coinciding with
(k) N
the points where the function values {f (ξ
)}k=1 are known. In this case the coecients
wk
can be computed by interpolation. The respective interpolation condition for every
center is
N
X
wk φ ξ j − ξ k 2 = f (ξ j ) j = 1, . . . , N.
k=1
Hence we can determine the optimal coecients by solving the following linear system
 

φ ξ 1 − ξ N 2
f (ξ 1 )
w1
  ..  

.
.
.
..
.
.
.
=

.
.
N . 1 N . N   .  
wN
f (ξ N )
φ ξ − ξ 2 ··· φ ξ − ξ 2
| {z
|
{z
}

φ (kξ 1 − ξ 1 k2 )
···
AX ,Φ
fX
For the uniqueness of the solution we need
AX ,Φ


.
(2.1.2)
}
to be invertible, but from a numerical
point of view, it is even more attractive to have this matrix positive denite. The advantage of positive denite matrices is that ecient inversion techniques such as Cholesky
factorization (Golub and van Loan, 1996) can be employed. For general functions
spectively
Φ,
the matrix
AX ,Φ
will not be positive denite for all choices of
we will now introduce the class of
X.
φ
re-
Therefore,
positive denite functions, which always yields positive
denite interpolation matrices.
2.2 Positive Denite Functions
First we will recall the denition of a positive denite matrix.
Denition 2.2 (Fasshauer, Denition 3.1).
denite
A real symmetric matrix A is called
if its associated quadratic form is non-negative,
N X
N
X
cj ck Ajk > 0
j=1 k=1
for all
c = [c1 , . . . , cN ]T ∈ RN \{0}.
i.e.,
positive
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
15
As one can see this denition is quite similar to the denition of positive denite functions:
Denition 2.3
tion
.
(Fasshauer, 2007, Denition 3.2) A complexed valued continuous funcC is called
Rs if
s
Φ : R →
positive denite on
N X
N
X
wj w̄k Φ(ξ j − ξ k ) ≥ 0
j=1 k=1
N pairwise dierent points ξ 1 , . . . , ξ n ∈ Rs and w = [w1 , . . . , wN ]T ∈ CN . The
s
function Φ is called strictly positive denite on R if the associated quadratic form is zero
only for c ≡ 0.
for any
At this point the reader should note that a positive denite matrix, will only be generated
from a
strictly
positive denite function. Furthermore one should note that every strictly
positive denite function is also positive denite.
To further characterize the class of positive denite functions, the following list of properties is useful:
Theorem 2.4 (Fasshauer, 2007, Theorem 3.1). Some properties of positive denite func-
tions are
1. Non negative nite linear combinations of positive denite functions are positive
denite. If Ψ1 , . . . , Ψn are positive denite on Rs and w1 , . . . , wn ≥ 0, then
Φ(x) =
n
X
wk Ψk (x),
x ∈ Rs ,
k=1
is also positive denite. Moreover if at least one of the Ψk is strictly positive denite
and the corresponding wk > 0 then Φ is strictly positive denite.
2. φ(0) ≥ 0.
3. φ(−x) = φ(−x).
4. |φ(x)| ≤ φ(0).
5. If φ is positive denite with φ(0) = 0 then φ ≡ 0.
6. The product of (strictly) positive functions is (strictly) positive denite.
To assess, whether a given function is (strictly) positive denite we can use the following
two theorems:
Theorem 2.5 (Bochner, Fasshauer, 2007, Theorem 3.3). A (complex-valued) function
Φ ∈ C(Rs ) is positive denite on Rs if and only if it is the Fourier transform of a nite
non-negative Borel measure µ on Rs , i.e.
Φ(x) = µ̂(x) = p
1
(2π)s
Z
exp (−i hx, yi) dµ(y),
Rs
x ∈ Rs .
16
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
Apart from providing the possibility to check a function for positive deniteness, Bochner's
theorem also gives a characterization of positive denite functions as a convex combinations of the function exp (−i hx, yi).
For strictly positive denite functions a similar
characterization can be given:
Theorem 2.6 (Schoenberg,
Fasshauer, 2007, Theorem 3.8)
s
. A continuous function
φ : [0, ∞) → R is strictly positive denite and radial on R for all s if and only if it is of
the form
Z
∞
φ(r) =
exp −r2 t2 dµ(t),
0
where µ is a nite non-negative Borel measure on [0, ∞) not concentrated at the origin.
We immediately see that Gaussians are strictly positive denite on
apply the Schoenberg Theorem to a point evaluation measure
As for KDEs, it can be attractive to use
(CSRBFs) (cf.
Rs
for all
s,
when we
µ.
compactly supported radial basis functions
Section 1.2.3). Nevertheless the following theorem shows that we need a
dierent choice of basis function depending on the dimension of the problem.
Theorem 2.7 (Fasshauer, 2007, Theorem 3.9). There are no oscillatory univariate con-
tinuous functions which are strictly positive denite and radial on Rs for all s. Moreover,
there are no compactly supported univariate continuous functions which are strictly positive denite and radial on Rs for all s.
2.3 Error Estimates
The interpolation conditions (2.1.2) ensure that the interpolant to
values as
f
on the set of centers
X.
For points
x ∈ Ω\X
f
attains the same
there will always be an approxi-
mation error. To derive a good approximation to a function
f
via RBF interpolation, the
function should ideally lie in the space spanned by the employed RBF basis. For a RBF
Φ
this space is usually called the
this space as a
native space NΦ .
In the following, we will characterize
reproducing kernel Hilbert space (RKHS) and present the resulting
error estimates for RBF approximation.
2.3.1 The Native Space
Following Fasshauer (2007), a RKHS is a Hilbert space equipped with a reproducing
kernel:
Denition 2.8
functions
called
f :
.
(Fasshauer, 2007, Denition 13.1) Let H be a real Hilbert space of
Ω(⊆ Rs ) → R with inner product h·, ·iH . A function K : Ω × Ω → R is
reproducing kernel
for
H
if
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
1.
K(·, x) ∈ H
2.
f (x) = hf, K(·, x)iH
for all
So for every kernel
K
17
x ∈ Ω,
for all
f ∈H
and all
x ∈ Ω.
we can dene the space
HK (Ω) = span{K(·, y), y ∈ Ω}
and equip it with the bilinear form
*N
K
X
wj Φ(·, xj ),
j=1
NK
X
+
vk Φ(·, yk )
k=1
NK X
NK
X
=
wj vk Φ(xj , yk ).
k=1 j=1
K
k·kK
Taking the completion of this space with respect to the norm
product
h·, ·iK ,
we will call this the
Denition 2.9
we dene the
native space
(Fasshauer, 2007, Equation 13.1)
native space
induced by the inner
of K.
.
For every function
K : Ω×Ω → R
of the function as
NK = span{K(·, y) : y ∈ Ω}k·kK .
In the context of strictly positive denite and translation invariant functions Φ(x − y) =
K(x, y) and Ω = Rs , we can then obtain a characterization of this native space via Fourier
transformations.
Theorem 2.10
. Suppose
(Fasshauer, 2007, Theorem 13.4)
real-valued strictly positive denite function. Dene
Φ ∈ C(Rs ) ∩ L1 (Rs ) is a
fˆ
G = {f ∈ L2 (Rs ) ∩ C(Rs ) : p ∈ L2 (Rs )}
Φ̂
and equip this space with the bilinear for
1
hf, giG = p
(2π)s
*
ĝ
fˆ
p ,p
Φ̂
Φ̂
+
.
L2 (Rs )
Then G is a real Hilbert space with inner product h·, ·iG and reproducing kernel Φ(·, ·).
Hence, G is the native space of Φ on Rs , i.e. G = NΦ and both inner products coincide
This characterization gives us a an approximate idea which functions lie in the native
space of a specic RBF. In the case of Gaussian RBFs the native space will be small, as
the Fourier transformation of every function in the native space needs to decay faster than
the Fourier transformation of the Gaussian which is again a Gaussian. This means that
all functions from this native space necessarily are entire and vanish at innity for real
valued arguments (Platte, 2011). This might not necessarily be fullled by all probability
densities being investigated in this thesis.
For functions not lying in the native space,
estimates similar to those presented in the following, can be given (Narcowich et al.,
2005; Platte, 2011).
18
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
2.3.2 Error Estimates for the Native Space
For functions lying in the native space, we can state the following theorem:
Theorem 2.11 (Fasshauer, 2007, Theorem 14.2). Let Ω ⊆ Rs , Φ ∈ C(Ω × Ω) be strictly
positive denite on Rs , and suppose that the points X = {x1 , . . . , xN } are distinct. Denote
the interpolant of f ∈ NΦ (Ω) on X by Pf . Then for every x ∈ Ω
kf (x) − Pf (x)kL∞ (Ω) ≤ PΦ,X (x) kf kNΦ (Ω) .
The error is split up in two distinct terms.
function and only depends on the RBF
Φ
The rst term
and on the set of
PΦ,X (x) is called the power
points X . The Second term
is the native space norm of the interpolated function. The second term only depends on
the interpolated function and the native space. We will focus our attention to the rst
term, as this gives greatest exibility.
To provide an upper bound to the power function we need
Ω
to satisfy the interior cone
condition:
Denition 2.12 (Fasshauer, 2007, Denition 14.2). A region Ω ⊆ Rs satises an interior
cone condition
x∈Ω
if there exists an angle
there exists an unit vector
ζ(x)
θ ∈ (0, π2 )
and a radius
r>0
such that for every
such that the cone
C = {x + λy : y ∈ Rs , kyk2 = 1, y T ζ(x) ≥ cos(θ), λ ∈ [0, r]}
is contained in
If
Ω.
Ω satises an interior cone condition we can derive an upper bound to the approximation
hΩ,X = sup min x − ξ k 2 :
error using the so-called ll distance
Theorem 2.13
x∈Ω ξ k ∈X
. Suppose Ω ⊆ Rs is bounded and sat-
(Fasshauer, 2007, Theorem 14.5)
ises an interior cone condition. Suppose Φ ∈ C 2k (Ω × Ω) is symmetric and strictly
positive denite. Denote the interpolant to f ∈ NΦ (Ω) on the set X by Pf . Then there
exist positive constants h0 , c2 and C (independent of x, f and Φ) such that
kf (x) − Pf (x)kL∞ (Ω) ≤ ChkΩ,X
p
CΦ (x) kf kNΦ (Ω)
(2.3.1)
provided hΩ,X ≤ h0 . Here
CΦ (x) = max
|β|=2k
max
y,z∈Ω∩B(x,c2 hΩ,X )
β
D2 Φ(y, z)
with B(x, c2 hX ,Ω ) denoting the ball of radius c2 hX ,Ω centered at x.
In our setting we are not bound strictly to a xed domain. Hence we do not need to check
the interior cone condition on the domain
Ω,
as we could simply change the domain to
INTERPOLATION USING RADIAL BASIS FUNCTIONS
19
Ω̃ = ∪ B(ξ k , hΩ,X ) which always satises the interior cone cone condition.
The order of
CHAPTER 2.
ξ k ∈X
approximation will remain the same as
Ω ⊆ Ω̃
and
hΩ,X = hΩ̃,X
For the innitely smooth Gaussian RBF, we can obtain arbitrarily high approximation
s
orders k . It is possible to show that if Ω is a cube in R we obtain the improved bound
kf (x) − Pf (x)kL∞ (Ω) ≤ exp
−c |log hΩ,X |
hΩ,X Ω
kf kNΦ (Ω)
for Gaussian RBFs. This approximation order is commonly referred to as
super-spectral
convergence.
For now, all bounds provided rely on the
global
parameter
hΩ,X .
For adaptive methods it
is interesting to also give an approximation based on the respective local ll-distance
hρ (x) :=
max
min
y∈B(x,ρ) 1≤i≤M
ky − xi k ≤ h0 .
In (Wu and Schaback, 1993) a rened version of Theorem 2.13 is given using the local
density measure
hρ (x) :
kf (x) − Pf (x)kL∞ (Ω) ≤ cf C hkρ (x)
holds, provided that
hkρ (x) ≤ h0 .
Here the constant
cf
depends on
(2.3.2)
f, φ, Nφ
only. This
statement enables us to give error estimates when a local renement of the point density is
cf and kf kNΦ (Ω) on f are both
This is clearly visible in the characterization of the
carried out. In contrast to this behavior, the dependence of
global and not local relationships.
native space norm in terms of Fourier transforms. We are not aware of any error estimates
depending on the local behavior of
f.
Therefore we cannot give explicit statements on
how and where to locally rene the center density at this point.
In contrast to Gaussian RBFs, CSRBFs in general only have nite smoothness.
One
example for such functions are Wendland CSRBFs, which are piecewise polynomial and
compactly supported functions (Wendland, 1995). As suggested in Theorem 2.7, Wendland functions require an explicit construction depending on the spatial dimension
the smoothness parameter
k.
d
and
Due to their nite smoothness, their native space is much
larger, as their Fourier coecients decay slower. On the other hand their power function
also decays slower in
hΩ,X
and we cannot expect super-spectral convergence rates.
2.4 Stability
In the KDE setting the parameter
commonly referred to as the
parameter
λ<1
ε
is called the bandwidth, in the RBF setting it is
shape parameter.
Madych (1991) states that there exists a
such that
1
|f (x) − Pf (x)| ≤ Cλ ε hΩ,X .
20
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
This result suggests, that we can theoretically obtain an approximation of arbitrary precision by decreasing
matrix
AX ,Φ
ε→0.
In practice this is not possible, as the condition number of the
also becomes arbitrarily large with decreasing
ε,
which makes the Problem
2.1.2 increasingly ill-conditioned.
2.4.1 Trade-o Principle
For a matrix A, we know that the l2 -condition number for solving the corresponding linear
system is given by the ratio between the smallest and the largest eigenvalue:
cond(A)2
λmax
= kAk2 A−1 2 =
.
λmin
Using Theorem 2.4(4) and the Gerschgorin circle theorem (Gerschgorin, 1931), it is possible to calculate the following upper bound for the largest eigenvalue:
λmax ≤ N Φ(0),
where
N
is the number of centers.
In most practical applications the contribution of the largest eigenvalue to the condition
number is only marginal.
exponentially with
ε:
This is due to the fact that the smallest eigenvalue scales
For Gaussian RBFs we have that the smallest eigenvalue decreases
exponentially with the the separation distance (Wendland, 2005)
qX =
the shape parameter
ε
λmin
1
min kx − yk
2 y6=x
s:
and the dimension
1
≥
2Γ( s+2
)
2
M
√s
8
s
√
( 2ε)−s exp
where
Ms = 12
Hence decreasing
ε
of the problem. As
x, y ∈ X ,
s+2
2
πΓ2
9
−40.71s2
(qX ε)2
qX−s
1
! s+1
.
will lead to an increase in the numerical error due to ill-conditioning
ε
determines the balance between approximation error and numerical
error, this behavior is often termed uncertainty principle or trade-o principle.
2.4.2 Finding the optimal Shape Parameter
As the RBF interpolation is an exact solution to problem (2.1.2), it will not be possible
to nd a good shape parameter based on the solution to linear problem. Instead we can
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
21
Exemplary behavior of the error for Example 1.3. The error are given relative
to the maximal function value p(θMAP |D). L∞ -error as well as L2 -error were computed on 105
Markov chain Monte Carlo samples for the density based on Monte Carlo integration (cf. Section
1.4).
Figure 2.4.1:
use the previously mentioned LOOCV algorithm which was used in the context of KDEs
(cf. Section (1.2.3)). When naively applied to the RBF setting, the computational cost
4
of this cross-validation technique will have order O(N ), as the cost matrix inversion for
3
the interpolation has order O(N ) and it needs to be carried out once for every center left
out. This renders cross-validation impractical, even for problems with only moderately
high number of centers N . However, the computational cost of the algorithm can reduced
3
to O(N ) when Rippa's method (Rippa, 1999) is employed.
We will denote the approximation error on center
k
when the
k -th
center is left out by
Ek .
Rippa showed that this error can be approximated based on the respective coecient
−1
and the respective diagonal entries of the inverse A = AX ,Φ of the interpolation matrix :
Ek =
wk
.
Akk
Figure 2.4.1 shows the comparison of several error measures of the Example 1.3. One can
∗
see that the shape parameter ε approximated via Rippa's method produces reasonable
as well as the L2
∗
The oscillations which appear in the L∞ -error and the L2 -error for ε < ε are
approximations of optimal shape parameter with respect to the
norm.
L∞
typical for an ill-conditioned problem.
2.4.3 Improved Stability
The most intuitive way to improve the conditioning of the linear system (2.1.2) is via
preconditioning. Good preconditioners are mostly based on problem specic properties. In
22
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
the case of RBF interpolation, it has been suggested that the previously stated uncertainty
principle only occurs because the interpolation is carried out in the wrong basis (Fasshauer
and Mccourt, 2012).
The best basis in terms of conditioning is the Lagrange basis,
{Ψk }N
k=1 . In that case, the interpolation matrix is the
identity matrix and the coecients are exactly the function values at the corresponding
consisting of cardinal functions
centers:
Pf (x) =
N
X
f (ξ k )Ψk (x − ξ k ).
k=1
These cardinal functions
Ψk
can be approximated as a linear combination of our original
RBF:
Ψk (x) =
N
X
bkj Φk (x).
j=1
When the preconditioned system
BAΦ,X c = BfX
with
B = {bjk }N
j,k=1
is solved via generalized minimal residual method (GMRES) (Saad and Schultz, 1986)
dramatic improvement of the condition numbers can be achieved (Beatson et al., 1999).
Another promising approach for Gaussian RBFs is based on a truncated Mercer expansion
of the Gaussian kernel (Fasshauer and Mccourt, 2012):
exp
2
2
−ε (x − z)
=
M
X
λn ϕn (x)ϕn (z)
n=1
where
ϕn (x)
are orthonormal functions in
L2 (R, ρ)
with respect to the density
α
ρ(x) = √ exp(−α2 x2 ).
π
This approach introduces two new free parameters, the truncation length
global scale parameter
M
and the
α.
For this method the computational cost for determining the coecients
λn
is
O(N 3 ).
As
Rippa's method is not applicable in this setting, the method is only practical for a xed
shape parameter
ε.
Similar to this method, there is also the Contour-Padé algorithm which
represents the interpolant as Laurent series (Fornberg and Wright, 2004), but it suers
from the same problem, that cross validation has not yet been eciently implemented.
In terms of conditioning, CSRBFs also oer an attractive alternative. The corresponding
interpolation matrices will be sparse and have limited bandwidth when the centers are ordered using
kd-trees (cf.
Section (1.2.3)). Sparse matrices perform especially well in terms
of data storage, whereas matrices with limited bandwidth are usually well-conditioned.
In contrast to the other methods, for CSRBFs improved conditioning always comes at the
CHAPTER 2.
(a)
23
INTERPOLATION USING RADIAL BASIS FUNCTIONS
True posterior density for the Example 1.3.
Approximated posterior density using radial basis functions with centers indicated in
black.
(b)
Comparison between true density and approximated density using radial basis
function interpolation. Approximation was carried out using the A∗2 lattice with κ = 0.3 (172
points). The approximation attains slightly negative function values between the centers as well
as outside the interpolation domain. In areas of high density, the approximation matches the
true density well. The maximum error on computed on 105 MCMC samples was 0.2059.
Figure 2.5.1:
cost of loss of goodness of approximation. This is due to the fact that sparseness depends
on the size of the support and thus on the shape parameter
ε.
Another method is based on spatially varying shape parameters (Fornberg and Zuev,
2007). The trade-o for this method is, that the positive deniteness of the interpolation
matrix is no longer guaranteed. We will speak about this method again in more detail in
Section 3.4.8.
2.5 Non-negativity of the Density
RBF interpolation can be seen as unconstrained optimization problem in the coecients
wk ,
which allows
wk
to take negative values. This raises the question of plausibility of
the approximation, as the interpolant is not guaranteed to be non-negative.
(2.3.1) only holds on the domain
Ω,
Equation
thus we obtain positivity of the solution on
Ω
when
the approximation error becomes smaller than max p(θ|D). In practice we will generally
θ∈Ω
want to limit the interpolation domain Ω to a small fraction of the parameter space Ωθ
where
p(θ|D)
has signicant mass. It is not possible to give any error estimates for the
approximation outside of
Ω.
Figure 2.5.1 shows the true joint density as well as the
RBF approximation with the corresponding centers. As one can see the negativity of the
interpolation mainly arises outside of
Ω,
where no centers are placed.
24
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
Ω is good, the approximation of marginals
Even when the approximation of the density on
and moments can be arbitrarily poor, as it requires integration of the interpolant on the
complete support of the employed basis function.
A natural approach to enforce the non-negativity of the solution, would be to impose a
constraint on the coecients during the optimization process:
min kAΦ,X w − fX k .
w∈RN
+
In this case, we can no longer resort to Rippa's method for choosing a good shape parameter and for large N this approach becomes impractical.
In the following we will discuss three methods which are closely related to RBF interpolation and are possible candidates for non-negative approximation techniques.
2.5.1 Generalized Hermite Interpolation
Another idea to ensure non-negativity, was to incorporate information on derivatives into
the interpolant. The motivation here was, that the gradient should drive the interpolant
as well as the derivative of the interpolant smoothly to zero at the edge of the domain, thus
reducing oscillations outside the domain. In the context of RBFs we can include gradient
information via
generalized Hermite interpolation (GHI) (Fasshauer, 2007, Chapter
36).
For GHI we have a modied interpolant which also includes a term depending on the
gradient of the RBF:
N
N
X
X
k
Pf (x) =
wk Φk (x) −
β , ∇Φk (x) .
k=1
k=1
The corresponding interpolation conditions are
 
w
  β1  
 1  

. 
. 


.




  βN  
 1  
  ..  = 
 .  
 1  
  βs  
 .  
 ..  

}
βsN

AΦ,X −G1 −G2
···
−Gs−1
−Gs
 G1
L1
M1,2
···
M1,s−1
M1,s

.
.
..

.
.
.
M2,1
L2
.
.
 G2
 .
.
.
.
.
 ..
.
..
..
.
.
Ms−2,s−1
.

 Gs−1 Ms−1,1 · · · Ms−1,s−2
Ls−1
Ms−1,s
Gs
Ms,1
···
···
Ms,s−1
Ls
{z
|

HΦ,X
where
Gj

fX
∂
f |ξ1 

∂x1

.
.

.

∂

N
f
|
ξ

∂x1

.
.

.

∂
1 
f
|
ξ
∂xs


.
.

.
∂
N
f
|
ξ
∂xs
are matrices containing the rst order partial derivatives with respect to


Gj = 
∂
Φ|1
∂xj 1 ξ
.
.
.
∂
Φ|N
∂xj 1 ξ
···
..
.
···
∂
Φ |1
∂xj N ξ
.
.
.
∂
Φ |N
∂xj N ξ


,
xj
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
Mj,k are the matrices
to xj and xk
containing the mixed second order partial derivatives with respect


Mj,k = Mk,j = 
and
Lj
25
∂2
Φ|1
∂xj ∂xk 1 ξ
.
.
.
2
∂
Φ|N
∂xj ∂xk 1 ξ
∂2
Φ |1
∂xj ∂xk N ξ
.
.
.
2
∂
Φ |N
∂xj ∂xk N ξ
···
..
.
···



are the matrices which contain the remaining second order partial derivatives


Lj = 

∂2
Φ|1
∂x2j 1 ξ
.
.
.
2
∂
Φ|N
∂x2j 1 ξ
···
..
.
···
∂2
Φ |1
∂x2j N ξ
.
.
.
2
∂
Φ |N
∂x2j N ξ


.

The involved derivatives of the RBFs can be expressed in general as derivatives of the
corresponding univariate radial function where the formula for the rst order partial
derivative is given by
xk − ξkj d
∂
j
φ( x − ξ ) =
φ(r),
∂xk
kx − ξ j k dr
and for the second order partial derivatives is given by the two formulas
j
j
j
j
2
x
−
ξ
x
−
ξ
x
−
ξ
x
−
ξ
d
d
∂
k
k
l
l
k
k
l
l
φ(x − ξ j ) =
φ(r)|kx−ξj k −
φ(r)
2
3
2
∂xk ∂xl
dr
dr
kx − ξ j k
kx − ξ j k
and
j 2
j 2 2
j 2
kx
−
ξ
k
−
x
−
ξ
x
−
ξ
∂2 d
d
l
k
l
k
φ(r) +
φ(r).
φ(x − ξ j ) =
2
3
2
2
j
j
∂xk
dr
kx − ξ k dr
kx − ξ k
Explicitly for Gaussian RBFs
φ(r) = exp −ε2 r2
the rst order derivative is given by
d
φ(r) = −2ε2 r φ(r)
dr
and the second order derivative is given by
d2
2
2
2(εr)
−
1
φ(r).
φ(r)
=
2ε
dr2
The size of the interpolation matrix
HΦ,X
is
N (s + 1) × N (s + 1), which means that the
O((N (s + 1))3 ). Hence this approach scales
computational cost for the matrix inversion is
poorly in higher dimensions.
Figure 2.5.2 shows the comparison between the true density and the approximation via
GHI. For this example the GHI did not yield a good approximation to the density. We
attribute the poor performance of this method to the bad conditioning of the problem.
26
(a)
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
True posterior density for the Example 1.3.
Approximated posterior density using generalized Hermite interpolation with centers indicated in black.
(b)
Comparison between true density and approximated density using generalized
Hermite interpolation interpolation. Approximation was carried out using the A∗2 lattice with
κ = 0.3 (172 points) (cf. Section 3.2.2). The approximation attains negative function values
between the centers as well as outside the interpolation domain. The approximation does not
seem to match the true density anywhere. The maximum error computed on 105 MCMC samples
was 0.5438.
Figure 2.5.2:
2.5.2 Compactly Supported Radial Basis Functions
As previously discussed, CSRBFs have several interesting features (cf. Section 2.4). For
non-negativity they are also an interesting alternative as their compact support will only
allow negativity in a narrow hull around the interpolation domain.
In this setting we used the CSRBF
(
φ(r) =
(1 − εr)4 r ≤
0
r>
1
ε
1
ε
Rippa's method returned a quite small shape parameter
.
ε∗ = 0.2.
As the support of
the CSRBF scales inversely with the shape parameter, this results a large hull around
the interpolation domain where negativity can occur. Figure 2.5.2 shows the comparison
between the true density and the approximation via GHI. As one can see, the interpolant
is negative almost everywhere around the interpolation domain for our Example 1.3. Thus
we can conclude that, as for the stability of the interpolation with CSRBF, that the size
of the area where negativity can occur can only be reduced at the cost of goodness of
approximation.
solution.
Eventually, this method is also not able to guarantee positivity of the
CHAPTER 2.
(a)
27
INTERPOLATION USING RADIAL BASIS FUNCTIONS
True posterior density for the Example 1.3.
Approximated posterior density using compactly supported radial basis functions with
centers indicated in black.
(b)
Comparison between true density and approximated density using compactly
supported radial basis function interpolation. Approximation was carried out using the A∗2 lattice
with κ = 0.3 (172 points) (cf. Section 3.2.2). The approximation attains negative function values
outside the interpolation domain. The approximation has lower function values than the true
density on the whole domain. The maximum error on 105 MCMC samples was 0.2436.
Figure 2.5.3:
2.5.3 Moving Least Squares
In contrast to the two previously presented methods, we will now investigate a method
s
which strictly enforces non-negativity of the solution. For every point y ∈ R this methods nds the best local polynomial approximation to the locally weighted least squares
problem
m
X
Λj pj (· − y)
argmin fX −
Λ
j=1
Πsd is the space
discrete l2 norm
where
2,wy
of polynomials of order
kf k22,wy
=
pj ∈ Πsd Λj ∈ R
,
N
X
d
in
Rs
f (ξ k ) − u(ξ k , y)
and
2
k·k2,wy
is the locally weighted
w(ξ k , y).
k=1
In this context
In general
in
x
y
y
can be interpreted as local variable and
will be xed and we will obtain a globally dened approximation as function
for every choice of
approach is called
solution
x can be seen as global variable.
Pf (x)
y.
As we can move our approximation in the local variable
y
moving least squares (MLS) (Fasshauer, 2007, Chapter 22).
this
The
of this optimization problem is given by
Pf,y (x) =
N
X
k=1
m
X
f (ξ )w(ξ , x)
λT (x, x)pl (ξ k − x),
k
k
l=1
m = dim(Πsd )
(2.5.1)
28
where
CHAPTER 2.
λ(x, y)
INTERPOLATION USING RADIAL BASIS FUNCTIONS
is the solution of a Gram system which is locally dened in
y
G(y)λ(x, y) = p(x − y)
with the Gram matrix
G(y)
(2.5.2)
of weighted l2 inner products for
Gjl (y) =
N
X
Πsd
pj (ξ k − y)pl (ξ k − y)w(ξ k , y).
k=1
In general it is not possible to give analytical solutions to (2.5.2) for every space dimension
s
and polynomial degree
d.
Hence no closed-form analytical expression for (2.5.1) is
available. Moreover for general
d this method does not yet ensure non-negative solutions.
Non-negativity can only be guaranteed in the special case when we x
d = 0.
The method
is then also referred to as Shephard's method (Shepard, 1968). In that case we locally
reproduce the target functions using
Hence we have
λ(x, y) = 1
constant
polynomials and the Gram matrix is scalar.
as global solution. The previously local variable
y
becomes
the global variable
Pf,y (x) =
N
X
f (ξ k )
j=1
w(ξ k , y)
N
P
w(ξ k , y)
j=1
Φk (x) as local weights w(ξ k , x), this solution is
k
Given that all values f (ξ ) are positive, as it is the case for prob-
We see that if we use our Gaussian kernels
a special case of (2.1.1).
ability densities, we obtain guaranteed non-negativity of the approximation. This method
is again quasi parameter-free and thus does not require solving a linear system. Unfortunately, this approach in general only has approximation order O(hΩ,X ) and requires that
k
the centers ξ are quasi-uniformly distributed (Fasshauer, 2007, Chapter 22).
When using special kernels such as Laguerre-Gaussians
1
s/2
Ψ(x) = √ s exp(− kxk2 )Ld (kxk2 )
π
where
Ld (t)
are Laguerre polynomials of order
d,
we do not need to restrict ourselves
to regular centers and also obtain signicantly better approximation orders (Fasshauer,
2007, Chapter 26). But again in this case we no longer have guaranteed non-negativity
as the Laguerre polynomials can assume negative values.
In the context of density estimation the one dimensional case
s = 1 of the MLS approach is
used to compute regression curves between samples of two independent random variables
and is known as Nadaraya-Watson kernel regression (Nadaraya, 1964; Watson, 1964).
Figure 2.5.4 shows the comparison between the MLS approximation and the true density.
One can see that although MLS yields a non-negative approximant, the scaling of the
approximation is bad. This suggest that the approximation could be improved by rescaling
the approximant. This idea will be further investigated in the next section.
CHAPTER 2.
(a)
INTERPOLATION USING RADIAL BASIS FUNCTIONS
True posterior density for the Example 1.3.
29
Approximated posterior density using moving least squares with centers indicated in black.
(b)
Comparison between true density and approximated density using moving least
squares approximation. Approximation was carried out using the A∗2 lattice with κ = 0.3 (172
points) (cf. Section 3.2.2). The approximation attains no negative function values. The function
value of the approximation is higher than the true density on the whole domain. The maximum
error on 105 MCMC samples was 0.5691.
Figure 2.5.4:
2.5.4 Scaled Moving Least Squares
On of the disadvantages of the MLS approach is, which it is not an interpolation and
hence does not give the exact value for the density at the centers. This is due to the fact
j
that for ξ the single term
w(ξ j , ·)
f (ξ j ) N
P
w(ξ l , ·)
l=1
already gives an exact approximation for
f (ξ j )
and the remaining
N −1
terms, being
non-negative functions, only deteriorate the approximation at this center. Therefore, we
will introduce the additional free scaling parameter
α ∈ [0, 1]
which rescales the MLS
approximation. The resulting approximant is:
Pf (x) =
N
X
k=1
αf (ξ k )
w(ξ k , x)
.
N
P
w(ξ l , x)
l=1
Numerical experiments in which
α
was optimized via LOOCV, after nding the optimal
ε, yielded an optimal scaling parameter of approximately α∗ ≈ 1. When the optimization
was carried using the l2 dierence between the true function values and the MLS approx∗
imant on the set of points X , the optimal scaling factor was α ≈ 0.5 which resulted
in a better approximation. This suggest, that cross-validation is not a good method to
estimate the optimal scaling factor.
The comparison for the scaled MLS approximant
30
(a)
CHAPTER 2.
INTERPOLATION USING RADIAL BASIS FUNCTIONS
True posterior density for the Example 1.3.
Approximated posterior density using
scaled moving least square with centers indicated in black.
(b)
Comparison between true density and approximated density using the scaled
moving least squares approximation. Approximation was carried out using the A2 lattice with
κ = 0.3 (172 points) (cf. Section 3.2.2). The approximation attains no negative function values.
The function value of the approximation matches the true density on the whole domain. The
maximum error on 105 MCMC samples was 0.2167.
Figure 2.5.5:
compared to the true density is shown in Figure 2.5.5.
We will introduce an abuse of
notation here and write MLS for the scaled version of MLS.
2.6 Conclusion
In this chapter we introduced all required theory for RBF interpolation of arbitrary functions which is necessary for this thesis. In the previous section we considered the special
case of approximating probability densities where we need to assure non-negativity of the
approximant.
What we have not yet considered is the choice of centers
X.
We showed that the goodness
of approximation using RBFs essentially depends on the two quantities
are both determined by the choice of centers in the domain
approximation will essentially depend on the choice of
the optimization of the center locations.
X.
Ω.
hΩ,X
and
qX
which
Thus the goodness of
The next chapter will deal with
Chapter 3
Generation of Centers
In the previous chapter we reviewed error estimates for interpolation using RBFs. In the
setting of scattered data interpolation, we are given a set of xed interpolation centers.
When approximating probability densities we do not have these limitation and it is therefore interesting to investigate smart choices for interpolation centers.
resort to methods employed when solving
For this we will
partial dierential equations (PDEs).
When numerically solving parabolic PDEs it is desirable to adapt the discretization of the
respective dierential operator over time. This is often achieved by using meshfree methods where discretization nodes are interpreted as particles and their location is dynam-
smoothed particle hydrodynamics (SPH) (Monaghan, 1992), diuse element method (DEM)
(Nayroles et al., 1992) and reproducing kernel particle method (RKPM) (Liu et al.,
ically updated over time. Popular particle (meshfree) methods include
1995). All three of these methods are closely related to RBF interpolation and MLS approximation 2.5.
This makes the algorithms which generate the particle locations also
interesting for the interpolation using RBFs.
This chapter is organized as follows In Section 3.1 we will motivate the usage of lattices
for RBF interpolation.
We will characterize lattices in Section 3.2 and introduce an
algorithm for lattice generation on superlevel-sets. We will then give a characterization of
these lattices as ground states of interacting particles. Based on this characterization we
will present the algorithm introduced by Reboux et al. (2012) in Section 3.3 and analyze
several modications in Section 3.4 until we give a nal discussion of the algorithm in
Section 3.5.
3.1 Connection to Voronoi Cells
In Chapter 2 we demonstrated that the two properties
qX
and
hΩ,X
of the set of centers
determine the approximation error and the stability of the interpolation. As we will see
in the following, those two properties are closely related to Voronoi cells.
31
32
CHAPTER 3.
GENERATION OF CENTERS
Figure 3.1.1: Illustration of an exemplary Ω (shaded in grey) with 10 centers and the corresponding Voronoi cells. The red circles are centered around the centers and have radius qX . The
minimal distance between two centers is indicated by a red connecting line between the centers.
The blue circle is has radius hΩ,X and is centered on the point for which the outer radius of the
Voronoi cell has maximal radius.
The Voronoi cell
in the set
V(ξ k )
of
ξk
is the set of points in
Ω
for which
ξk
is the nearest neighbor
X:
V(ξ k ) = {x ∈ Ω ⊆ Rs | x − ξ k ≤ x − ξ l for all
ξ l ∈ X }.
In the remainder, we will use dierent properties of the Voronoi cells. More precisely
qX
is the radius of the smallest inner radius of the Voronoi cells
qX =
and
hΩ,X
min sup{r
ξ k ∈X r
∈ R+ | B(ξ k , r) ⊆ V(ξ k )}
is the smallest outer radius of the Voronoi cells
hΩ,X =
min inf{r
ξ k ∈X r
Figure 3.1.1 shows an exemplary set
∈ R+ | V(ξ k ) ∩ Ω ⊆ B(ξ k , r)}.
X , it's corresponding Voronoi cells and the respective
circles with smallest inner radius and smallest outer radius.
It is natural to interpret minimizing
qX
as to the
sphere packing problem, which is to ll
the volume Ω with the biggest possible number of non-overlapping spheres with centers
ξ k . On the other hand, minimizing hΩ,X is equivalent to the
, which is
covering problem
to completely ll the volume Ω with the smallest possible number of possibly overlapping
k
spheres centered around ξ . For dimensions s < 17 the best known solutions to the sphere
packing and covering problem are given by lattices (Torquato, 2010). From this we can
conclude that our interpolation will optimally perform when the centers
the respective lattices.
X
are given by
CHAPTER 3.
33
GENERATION OF CENTERS
3.2 Lattices
First, we will give a denition of lattices, characterizing them as integer-linear combinations of a chosen basis.
Denition 3.1 (Martinet, 2003, Denition 1.1.1).
A
lattice
which satises the following property: There exists a basis
that
Λ
is the set of all
vectors
e1 , . . . , es
is a
Z-linear
combinations of the
generator matrix M
ei .
Rs is a subgroup Λ of Rs
E = (e1 , . . . , es ) for Rs such
in
The matrix M whose columns are
of the lattice
ΛM .
Each vector
x ∈ ΛM
can be
written as
x = Mz
where
z ∈ Zn .
For dimension
s ≤ 5 the best known solutions to the sphere covering problem as well as the
sphere packing problem are give by root lattices (Conway and Sloane, 1998). Root lattices
are a special case of lattices where the basis
V
is generated from the roots of Lie algebras,
so called root systems (Kac, 2010, Denition 15.1). All Integer linear combinations of the
roots of Lie algebras constitute the special class of root lattices:
Proposition 3.2 (Kac, 2010, Corollary 16.3 ). If (V, ∆) is a root system, then Q := Z∆
is a lattice, called the root lattice.
To illustrate the behavior the lattices in higher dimension we will investigate the three
quantities
•
The packing density
%=
portion of space that is occupied by
the densest packing associated to the lattice.
•
The covering density
Θ=
average number of spheres that contain a point of the
space for the thinnest covering associated to the lattice.
•
The kissing number
K=
highest number of other spheres that
one sphere in the sphere packing of the lattice touches.
The three quantities are illustrated in Figure 3.2.1. Figure 3.2.1a shows the densest sphere
∗
packing for the A2 lattice. For this lattice the number of other spheres one sphere touches
is the same for every sphere. For the grey sphere the touching points are indicated by red
circles. The points of the space which are not covered by the sphere packing are colored
in blue. The packing density
%
is
1
minus the fraction of the space which is occupied by
34
CHAPTER 3.
Illustration of the kissing number and the
packing density. Points of the lattice are indicated in black. The total number of touching
points for the grey sphere, which is equal to the
kissing number, is 6. The touching points are
indicated by red circles. The space which is not
covered by the packing is colored blue.
Illustration of the covering density. Points
of the lattice are indicated in black. Points of
the space which are contained in two spheres
are colored in red. Points of the space which are
contained in only one sphere are colored white.
(a)
Figure 3.2.1:
lattice.
GENERATION OF CENTERS
(b)
Illustration of the kissing number, packing density and covering density on a A∗2
those points. Figure 3.2.1b shows the thinnest sphere covering for the
A∗2
lattice. The
points in space which are contained in two spheres is colored in red. Here the covering
density
Θ is the one times the fraction of space which is only occupied by one sphere plus
two times the fraction of space which is occupied by two spheres.
%
For the packing density
large
Kabatiansky and Levenshtein (1978) provided the bound for
s
% . 2−0.5990s ,
for the covering density
Θ
Coxeter et al. (1959) provided the lower bound
Θ≥
and for the kissing number
K
2s
s+1
2s
%
Wyner (1965) provided the lower bound
K ≥ 1.1547s .
From this we can conclude that the number of spheres required for the solutions of the
sphere packing as well as the sphere covering problem will exponentially increase with
respect to the dimension
s.
As the number of centers for the spheres corresponds to the
number of function evaluations, it is essential to make a reasonable choice for the lattice,
especially for high dimensions.
The signicant dierence in density of lattices can be
A∗s and the Zs lattice
exemplarily illustrated by the ratio of sphere density between the
CHAPTER 3.
Dimension
s
A1 ≡ Z
A∗2 ≡ A2
D3 ≡ A3
D4 ≡ D4∗
D5
2
3
4
5
Sloane, 1998)
%
1
Best Packing
1
Table 3.1:
35
GENERATION OF CENTERS
Kissing Number
Best Covering
2
6
12
24
40
A1 ≡ Z
A∗2 ≡ A2
D3∗ ≡ A∗3
A∗4
A∗5
√π
12
√π
18
π2
16
2π√2
30 2
Θ
1
1.2092
1.4635
1.7655
2.124
Best known lattices for the covering and sphere packing problem.(Conway and
for suciently large
s
(Torquato, 2010):
√
Θ (A∗s )
se
∼ s
s
Θ (Z )
32
(3.2.1)
∗
This equation suggests that the As lattice has exponentially thinner coverings compared
s
∗
to the Z lattice in the large-s limit. Here
denotes the dual lattice. In the interest of
simplicity, we will give an autonomous denition of dual lattices at this point:
Denition 3.3
.
((Martinet, 2003, Proposition 1.3.2))
−1 T
by the transposed generator matrix (M
) = M −T
Λ∗ ,
the
dual lattice
to
Λ
is given
Λ∗M = ΛM −T .
Although several dierent root lattices exist, the two root lattices
As
and
Ds
are of
biggest interest in this thesis, as they and their respective dual lattices are the best known
solutions to the sphere packing and covering problems for dimensions
s ≤ 5 (Conway and
Sloane, 1998) (cf. Table 3.1).
For the lattice
As
one of the possible choices for the generator matrix is

α 1 ··· 1
 1 α ··· 1

 .. .. . .
.
.
 . .
.
.
1 1 ··· α





√
s + 1 + 2. The generator for the dual lattice A∗s can be obtained by setting
α
=
√
α = s + 1 − s (Agrell and Eriksson, 1998). For the lattice Ds one of the possible choices
with
for generator matrix is given by

2 0
0

 1 1 0

 1 0 ...

 . . .
 .. .. . .
1 0 ···
···
0
..
.
.
.
.
..
.
..
.
0




0 


0 
1
36
CHAPTER 3.
and the generator for the dual lattice
Ds∗
can be given by
0
0
···
0

 0 1
 . .
 ..
..

 0 ···
0
..
.
.
.
.
..
.

1
1
2
1
2
GENERATION OF CENTERS
..
.
0
···

1



.
0 

0 
1
2
1
2
We will now introduce methods to generate lattices for the approximation of densities
using these generator matrices.
3.2.1 Restriction of Interpolation Area
In most cases it will not be practical to interpolate the density on the whole domain
Ωθ
as the probability density can have most of its mass located in a small conned region.
Thus it would require a large amount of nodes in
conned area of high mass.
Ωθ
to obtain a certain resolution in the
It is therefore natural to interpolate the density only in a
conned region where it has high mass. This is achieved by choosing the domain
the super-level set to the level
Ω
to be
δ.
Ωδ = {θ ∈ Ωθ |p(θ|D) > δ}
When we choose of the the mass we can choose
c according to the respective approximate
condence region which was introduced in Section 1.1.2:
qχ2 (nθ ) (α)
MAP
.
δ = p θ̂
|D exp −
2
(3.2.2)
The previous estimates for the approximation error (2.3.1) and (2.3.2) only hold in the
interpolation domain
Ω = Ωδ .
Therefore the choice (3.2.2) ensures that proles com-
puted based on our approximant are correct up to approximation error in the respective
condence regions.
We analyzed the approximation error based on the choice of
comparison of
L2 -error
δ.
Figure (3.2.2) shows the
as well as the resulting number of points for several choices of
δ.
As one can see that the restriction of the lattice based (indicated in red) on the condence
level
α = 0.999 produces a reasonable result as the error is not signicantly lower for lower
δ and while the corresponding number of centers is relatively low. We will now
values of
present the algorithm which was used to carry out this analysis.
3.2.2 Algorithm to generate restricted lattices
As the size and shape of
Ωδ
large, pre-generated lattice
is not known a priori, it is not possible to restrict a suciently
X
to
Xδ = X ∩ Ωδ .
Instead, the shape and size of
Ωδ
must
CHAPTER 3.
GENERATION OF CENTERS
37
Error analysis as well as number of points in the lattice for dierent values
of δ . The respective condence levels for this two dimensional example can be obtained using
the formula for nθ = 2: α = 2δ . Analysis was carried out on a A∗2 lattice with κ = 0.5 using
the Lenstra-Lenstra-Lovász reduced lattice basis (cf Section 3.2.3). The L2 -error was computed
based on 105 Markov chain Monte Carlo samples for the density via on Monte Carlo integration
(cf. Section 1.4)
Figure 3.2.2:
be explored during the lattice generation process. As we not able to nd any previous
work concerned with the generation of lattices in unknown superlevel-sets, we propose
the following algorithm:
Algorithm 3.1 Pseudocode algorithm for lattice generation
1:
Choose the basis
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
Add initial point
e1 , .., es of the the lattice from the columns of the generator matrix M
while Y 6= Ø do
ξ0
to the set
Y
ξ i from Y
p(ξ |D) > δ then
for j = 1 to s do
i
Add ξ + ej to Y
i
Add ξ − ej to Y
i
Add ξ to X
i
Remove ξ from Y
Choose
i
if
end for
Remove duplicates in
end if
Remove duplicates in
end while
return X
Y
X
The key idea of this algorithm is to interpret the lattice as an innite, simply connected
graph for which the edges are given by the basis vectors. On this graph we perform a
38
CHAPTER 3.
GENERATION OF CENTERS
branch search on the vertices, to nd all points on the lattice which lie inside of
ξ 0 should be the mode of the density.
Ωδ .
In
general
It is obvious that this algorithm will fail when
level sets
partition
Ωδ
is not connected. Non-connected super-
Ωδ can arise when the density is for example multimodal.
In this case we can
S
Ωδ into several disjoint connected domains Ωδ = ˙ Ωδ,j and apply the algorithm
j
on each connected domain Ωδ,j and eventually join the individual lattices. In this setting
0
the one should choose ξ to be the mode of the mode corresponding to the domain Ωδ,j .
To obtain a lattice of dierent resolutions we can introduce the scaling factor
κ
and use
the columns of the rescaled generator matrix
M̃ = κM
as basis vectors.
3.2.3 Discussion of the Algorithm.
In this section we will exemplarily demonstrate certain behaviors of Algorithm 3.1 in two
∗
dimensions for the hexagonal lattice A2 and an arbitrary superlevel-set Ωδ of a bivariate
Gaussian density in the form of an ellipse.
The algorithm will in general not produce
restriction of the innite lattice
Λ
Xδ = Λ ∩ Ωδ ,
which is equivalent to the
to the interpolation domain
Xδ
Ωδ .
that there will also be several points in
by simply removing points outside of
which lie outside of
Ωδ .
Ωδ .
One reason for this is,
This could be avoided
We will nevertheless include those points, as
they should serve as additional information in the interpolation process. Another, more
problematic, reason why the generated lattice is not equal to
Xδ ,
is that, depending on
the scaling of the lattice basis, the algorithm might not generate all points of the lattice
which lie in
Ωδ .
some points in
This behavior is illustrated in Figure 3.2.3. In Figure 3.2.3a we see that
Ωδ
are not added to the lattice as there exists no connection by basis
vectors from previously generated points to them. We could try alleviate this problem
by allowing the algorithm to make several steps at once in its branch search, but this
would require some estimate on how many steps are necessary. Moreover the number of
k
points which are added in each search step (Algorithm 3.1, lines 5-10) would scale as s
for
k
steps.
As every additional points will require one additional function evaluation,
this approach will become impractical for higher dimensions and larger number of steps.
Moreover the resulting lattice
Xδ
depends on the choice of basis vectors
Figure 3.2.4 illustrates this behavior.
ej
for the lattice.
When comparing Figure 3.2.4a and 3.2.4b, one
can see that the choice of basis vectors aects the generation of points outside of
For dierent examples the generation of points inside
perviously described connectivity between points in
Ωδ
Ωδ
Ωδ .
could also be aected, as the
depends on the choice of basis.
In other application areas, the choice of basis is treated under the term lattice reduction.
For channel codes several approaches have been investigated (Luk et al., 2010), but none
CHAPTER 3.
(a)
39
GENERATION OF CENTERS
Generated lattice with higher scaling factor.
(b)
Generated lattice with lower scaling factor.
Comparison of generated lattices depending on the choice of the scaling factor.
Blue dots indicate points of the innite lattice A2 . the shaded red area is the domain Ωδ . The
generated lattice Xδ is indicated by grey lines which connect points of the lattice. (a) shows that
Xδ is not equal to Ωδ ∩ Λ when the grid spacing is not suciently high.
Figure 3.2.3:
(a)
Generated lattice for rst choice of basis.
(b)
Generated lattice for second choice of basis.
Comparison of generated lattices depending on the choice of basis. Blue dots
indicate points of the innite lattice A∗2 . the shaded red area is the domain Ωδ . The generated
lattice Xδ is indicated by grey connecting lines. The basis vectors v1 and v2 are indicated by red
arrows.
Figure 3.2.4:
40
CHAPTER 3.
Generated lattice for rst choice of initial
point.
(a)
GENERATION OF CENTERS
Generated lattice for second choice of initial
point.
(b)
Comparison of generated lattices depending on the choice of initial point. Blue
dots indicate points of the innite lattice A∗2 . The shaded red area is the domain Ωc . The
generated lattice Xδ is indicated by grey connecting lines. (A) and (B) show the dierence in
Xδ depending on the choice of the initial point x0 .
Figure 3.2.5:
of them have been tested with respect to their performance for RBF approximation.
Rigorous analysis of the performance of dierent algorithms is beyond the scope of this
thesis and we will use the popular
Lenstra-Lenstra-Lovász (LLL) algorithm (Lenstra
et al., 1982) for basis reduction.
The last issue we will treat in this sections is the eect of the choice of the initial point
ξ0.
In Figure 3.2.5 one can see that the choice of the initial point signicantly aects both the
number of points inside of
Ωδ
as well as outside of
Ωδ .
We will not further investigate this
problem in this thesis and we will always assume that we use suciently many points,
such that eects depending on the choice of the initial point only have marginal inuence
on the approximation.
3.2.4 Choice of Lattice
In the previous sections, we rst presented the theoretical motivation for lattices and then
presented an algorithm to generate these lattices. We will now use the lattice generation
algorithm to evaluate the performance of dierent lattices. As illustrative example we will
∗
2
investigate the performance of the A2 lattice with the Z lattice as reference for Example
1.3.
Figure 3.2.6 shows the comparison of the respective
ple 1.3. We see the that even for low dimensional
2
better than the Z lattice.
L2 -error for the two lattices for Exam∗
problems, A2 lattice performs slightly
CHAPTER 3.
41
GENERATION OF CENTERS
Comparison of the L2 -error on the A∗2 lattice
and the Z2 lattice.
Fraction of the L2 -error for the Z2 lattice
divided by the L2 -error for the A∗2 lattice.
(a)
(b)
Comparison of the performance of the Z2 and the A∗2 lattice. Analysis was
carried out for the two lattices using 40 dierent values for the parameter κ equally spaced in
logarithmically equispaced between 2 and 0.2. The L2 -error was computed on 105 Markov chain
Monte Carlo samples for the density based on Monte Carlo integration (cf. Section 1.4).
Figure 3.2.6:
3.2.5 Reparametrisation
For many applications reasonable parameter values can range of several orders of magnitude. Hence it usually advisable to carry out all the analysis of the density in logarithmic
parameters
ξ = log(θ) (Tarantola, 2005).
In this section we will explore other possibilities
to introduce further transformation based on the covariance structure of the parameters.
In the presence of correlations between the log-parameters, it is intuitive to include the
same correlation structure in the basis function. In Chapter 2 we only gave a characterization of functions which are radial with respect to the Euclidean norm. This limits us to
Gaussian functions with orthonormal covariance matrices. For most other choices of the
norm it is not possible to give the results presented in Chapter 2. For the Mahalanobis
norm (Mahalanobis, 1936)
kξkM,µ =
which is a globally weighted
p
(ξ − µ)T M −1 (ξ − µ),
l2 -norm,
it is possible to give the same estimates. This is
possible as, equivalently to using the Mahalanobis norm, we can transform the parameter
using one of the principal square root of the Mahalanobis matrix
M
1
ξ˜ = M − 2 (ξ − µ)
(3.2.3)
and then interpolate on the transformed parameters using the standard Gaussian-RBFs.
To include the correlation structure one should use the vector of means as
covariance matrix as Mahalanobis norm.
µ
and the
For non-degenerate densities the covariance
42
CHAPTER 3.
GENERATION OF CENTERS
matrix is positive denite and the principal square root exists.
This transformation is
commonly known as whitening transformation or decorrelation transformation.
It has been shown that in the context of protein structure prediction, the use of such a
metric yields improvement over standard Gaussian-RBF interpolation (Ibrikci and Brandt,
2002). Equation (3.2.3) suggests that, if we want to use covariance information in such
a fashion, the approximation quality depends on the regularity of the centers in the
transformed coordinates
ξ˜.
Hence both lattice generation, as well as the particle method
should be carried out in the transformed coordinates.
In the interest of simplicity we
will therefore introduce an abuse notation and still denote the centers/particles in the
k
transformed coordinates by ξ and the parameters ξ˜ by ξ . Moreover the domain of the
logarithmic parameters corresponding to
Ωθ
will be denoted by
density will still be carried out in the untransformed coordinates
Ωξ .
ξ.
The analysis of the
When the Fisher Information matrix at the optimal point yields a good approximation to
p(ξ|D)
the covariance matrix to the density
and should be used as Mahalanobis matrix.
In cases where the Fisher Information is not available, it is also possible to approximate
the covariance of the density via approximations of the Hessian of the density. For details
on how to robustly approximate the Hessian see Section (1.2.1).
For Example 1.3 we can compute the exact covariance matrix using the law of conditional
covariance.
Figure (3.2.7) shows the comparison of the approximation in transformed
and untransformed coordinates.
the
L2 -error in
Using this the exact covariance we were able improve
our approximation by over one order of magnitude by interpolating in the
transformed coordinates.
3.2.6 Local Reparametrisation
When the there is nonlinear correlation between the log-parameters, it is attractive give
every single Gaussian RBF a locally dened covariance.
For example for our Example
1.3 one would intuitively want to choose at least two dierent covariances depending on
which of the two mixture terms is locally dominating.
The kernels could be dened via the corresponding univariate function with a localized
Mahalanobis norm for every center of a basic function:
Φk
ξk − ξ
ε
!
k
ξ − ξ =φ ε .
M (ξ k ),µ
For this kernel it is not possible to give an equivalent transformation, as we do not know
how to transform non-center coordinates. But we can exploit the symmetry of the radial
function and localize the norm with respect of the evaluation variable
k
center ξ .
!
Φk
ξk − ξ
ε
k
ξ − ξ =φ ε M (ξ),µ
.
ξ
instead of the
CHAPTER 3.
43
GENERATION OF CENTERS
(a) Comparison of the L2 -error with transformation and the L2 -error without transformation.
Fraction of the L2 -error with transformation
divided by the L2 -error without transformation.
(b)
Comparison between the approximation error in transformed and untransformed
coordinates. The curves were generated using the A∗2 lattice with logarithmically equispaced
values for κ between 0.2 and 2 for the transformed coordinates and 0.1 and 1 for the untransformed coordinates. The L2 -error was computed on 105 Markov chain Monte Carlo samples for
the density based on Monte Carlo integration (cf. Section 1.4).
Figure 3.2.7:
The corresponding transformation is
1
ξ˜ = M (ξ)− 2 ξ
where
M (ξ)
is a local approximation to the covariance in
ξ.
In contrast to the transfor-
mation (3.2.3) this transformation is non-linear. Moreover the corresponding functional
determinant is not constant, which complicates computation of integrals in the untransformed coordinates. Therefore this approach was not further pursued. Nevertheless this
approach in combination with moving least square approximation might be interesting
for further research (cf. Section (6.2)).
3.3 Adaptive Renement
In Chapter 2 we introduced the two error estimates (2.3.1) and (2.3.2). The error estimate
hΩ,X , whereas (2.3.2)
hρ (x). Lattices oer us
(2.3.1) gives an approximation in terms of the global mesh distance
gives us an approximation in terms of the local mesh distance
great control over the global mesh distance, but it is not straightforward to change the
local mesh distance.
Driscoll and Heryudono (2007) propose an algorithm to locally rene a rectangular mesh.
With this approach the local renement of
hρ (x)
is limited to a division by a factor
2.
Moreover the algorithm requires a pre-generated rectangular lattice which covers the
44
Figure 3.3.1:
CHAPTER 3.
GENERATION OF CENTERS
Plot of the potential function V1 (r) for r ∈ [0, 2].
complete domain
Ωδ
which we cannot guarantee, when the domain
Ωδ
is not known a
priori. Instead of rening the lattice itself, we propose an approach which exploits the
fact that lattices constitute the ground states of interacting particle systems (Torquato,
2010). It can be shown that two-body interactions are sucient to produce lattice ground
states. Two-body interactions can be described by a pairwise potential
V.
3.3.1 Self Organizing Potential
Ground states are the minima of the energy of the system.
For systems of interacting
particles where only two-body interactions are considered, the energy E is dened as sum
p q
over the pairwise potentials between the single particles ξ , ξ ∈ X in the system:
E=
N X
N
X
V (kξ q − ξ p k).
p=1 q=1
Based on the choice of the potential
V , minimizing E
with respect to the particle locations
can result in dierent lattices (Rechtsman et al., 2006b,a; Marcotte et al., 2013). In the
following we will use the potential
V1
proposed in (Reboux et al., 2012)
V1 (r) = 0.8 · 2.51−5r − 2.5−4r .
In the same paper a local rescaling of the potential is proposed using the local resolution
function depending on the gradient of the density
p(ξ)
that we want to characterize.
D0
.
D̃(ξ) = p
1 + k∇p(ξ)k
Based on this local resolution function it is possible to locally rescale the potential by
Dpq :
E=
XX
p
q
2
Dpq
V1
kξ q − ξ p k
Dpq
(3.3.1)
CHAPTER 3.
Dpq
where
45
GENERATION OF CENTERS
is the pairwise minimum
Dpq = min(Dp , Dq )
(3.3.2)
between the recursively dened value Dp which gives the lowest local density assigned
q
p
∗
to the particles ξ that lie in the neighborhood Np = {ξ ∈ X | kξ − ξ k ≤ r Dp } of the
q
particle ξ :
Dp = q min
D̃(ξ q ).
(3.3.3)
kξ −ξ p k≤r∗ Dp
Here
r∗ ,
which scales the neighborhood radius
rc,p = r∗ Dp ,
is a free parameter of the
algorithm which denes the size of the Neighborhood size. The symmetric denition of
Dpq
is necessary to assure the symmetry of the pairwise interactions. Another possibility
would be to take the mean of the two values (Gossard, 1995). Moreover the denition of
Dp
ensures the symmetric relationship
ξ p ∈ Nq ⇐⇒ ξ q ∈ Np .
Figure 3.3.1 shows the potential
at
1.
V1
V1 has its minimum
kξ − ξ k = Dpq the minimizer
. As one can see, the potential
p
q
Thereby, the local rescaling of the potential makes
of the local pairwise potential. This produces low energy congurations where particles
ξ p are locally equidistant to their neighbors ξ q with distance Dpq . Thus we can interpret
D̃(ξ q ) as local distance which the particle conguration should resolve (Reboux et al.,
2012).
3.3.2 Original Algorithm
It is computationally expensive to nd global minimizers of the energy (3.3.1)(Reboux
et al., 2012).
To reduce this computational cost the potential V1 should be restricted
[0, r∗ ] such that only the interaction between neighboring
to have compact support on
particles is considered (Reboux et al., 2012). Moreover we can add spawning, when there
∗
are less than N particles in the neighborhood Np , and fusion, when particles get too
close to each other, of particles to accelerate the convergence to the ground state. This
spawning/fusion technique should increase the convergence of the algorithm especially
0
when the initial particle conguration X is signicantly dierent from the nal, adapted
one. In such a setting the algorithm can generate more particles in areas where higher
density is required and remove particles where lower density is required. Reboux et al.
kξ p −ξ q k
< 12 . The potential V1 attains extremely
(2012) suggest to fuse particles where
Dpq
1
high values for r < , which means those states are high-energy states. As we minimize
2
the energy, we will prevent high-energy states and therefore prevent fusion. To allow
fusion to happen the authors suggest that the potential should linearly decrease to zero
1
for r < . The resulting potential is
2



0.0847 ∗ r
r < 0.5
(1−5r)
−4r
V1 (r) =
0.8 ∗ 2.5
− 2.5
0.5 ≤ r < r∗


0
r∗ < r
.
46
CHAPTER 3.
Algorithm 3.2
GENERATION OF CENTERS
Pseudocode algorithm for particle generation as proposed in (Reboux
et al., 2012)
1:
2:
3:
4:
5:
6:
7:
8:
0
Generate initial particle conguration X
0
Compute Dp for the initial conguration
stopping criterion is not met
q
p
Fuse particles where |ξ − ξ | < Dpq /2
while
do
Insert new particles at random location in the neighborhood of particles which have
∗
fewer than N neighbors.
new
0
Compute Dp of the new particles X
by rst-order interpolation from Dp of the
0
initial particles X .
Compute the gradient of the energy
E.
Perform a line search for the gradient-descent step size and advect the particles
accordingly.
9:
end while
The nal algorithm Reboux et al. (2012) proposed is presented in Algorithm 3.2.
3.4 Improvements to the Algorithm
The original algorithm was proposed to generate collocation nodes to numerically solve
PDEs. As we want to use the algorithm in an other we found it necessary to introduce
several modications to the algorithm to improve the performance in this setting. The
modication we investigated are presented in the following.
3.4.1 Modied Interpolation of Dp
Reboux et al. (2012) proposed to compute
Dp using rst order interpolation. In the setting
∇p(ξ) at every particle
of density approximation for ODE models it is possible to compute
location. When the computational cost of computing the norm of the gradient is low and
the computational complexity of the particle generation and the density approximation
are the dominating factors contributing to the runtime of the complete method, computing
the exact values should be considered. When the evaluation of
∇p(ξ)
takes a signicant
amount of time and the number of function evaluations is the dominating factor for the
runtime of the complete method, interpolation should be considered. We will now consider
the case when interpolation is carried out.
As the potential
V1 (r)
is only dened for positive values of
positivity of the interpolation
Dp
and thereby of
Dpq .
r
it is crucial to ensure the
With rst order interpolation the
non-negativity is not ensured for particles outside of the convex hull of the initial particle
conguration.
Therefore we propose to use scaled MLS approximation for
density is decreasing outside the domain
Ωδ , the values of Dp
Ωδ .
should also be increasing outside of
Therefore the approximation for
Dp
Dp .
As the
will be increasing outside of
Ωδ .
Hence it
CHAPTER 3.
47
GENERATION OF CENTERS
(a) Comparison of the log(L∞ -error ) for the
two methods using a low resolution initialisation.
Here the MLS method yields a slightly better
approximation.
(b)
(c) Comparison of the log(L∞ -error ) for the two
(d)
methods using a high resolution initialization.
Here the MLS method yields a slightly better
approximation.
Comparison of nal number of particles for
the two methods using a low resolution initialization. Here the MLS method produces a conguration with more points.
Comparison of nal number of particles for
the two methods using a low resolution initialization. Here the MLS method produces a conguration with more points.
Figure 3.4.1: Overview over dierent methods for the approximation of Dp for Example 1.3.
The renement was initialized using a A∗2 lattice with κ = 100.6 ≈ 0.25 (high res.) and κ = 1
(low res.). The parameter D0 was 0.6, the parameter d0 was 0.5. To compute the statistics, we
repeated the analysis 20 times.
48
CHAPTER 3.
is not possible to directly interpolate the values of
towards zero outside of
towards zero outside of
Ωδ .
Ωδ .
GENERATION OF CENTERS
Dp as the MLS approximation decreases
k∇p(ξ)k, which should decrease
Instead we will interpolate
When interpolation is used for
Dp , the initial particle conguration needs to be suciently
good, for the interpolation to be reliable. As MLS interpolation requires a regular grid, the
previously introduced lattices are a good choice for the initialization and we will always
∗
use an initialization with the As lattice.
Figure 3.4.1 shows the comparison of the two interpolation methods for a high resolution
initialization and a low resolution initialization. In both cases the MLS methods yields a
lower
L∞
approximation error in marginals than for linear interpolation but also produces
particle congurations of a higher number of points. This makes comparison of the two
methods dicult.
3.4.2 Modied Density Function
In the context of density approximation we will often deal with non-normalized density
functions where values of
∇p(ξ)
can become large. To account for this fact we will use a
modied resolution function which includes an additional free parameter
D̃(x) = s
1+
D0
2
D0
−1
d0
d0 :
.
|∇p(ξ)|
max p(ξ)
ξ∈Ωδ
This modied resolution function introduces the additional scaling factor
provides an upper bound to the density and
d0
(3.4.1)
d0 .
Here
D0
tunes the scaling with the norm of the
gradient of the function. In settings where the gradient of the function cannot be computed
reliably or is computationally expensive to compute on all particles, we can also use the
modied resolution function
D̃(x) = s
1+
D0
2
D0
−1
d0
(3.4.2)
p(ξ)
max p(ξ)
ξ∈Ωδ
p(ξ). In this setting we have more control as we know the upper
bound of values which p(ξ) can attain. Thus we can choose values D0 and d0 such that
the resolution function has lower bound d0 and upper bound D0 . The minimum d0 will
be attained at the maximum of p(ξ) in Ωδ and the maximum D0 is attained for p(ξ) = 0.
The resolution will be closest to D0 at the edge of the domain Ωδ . The normalization
with max p(ξ) in both cases was employed to introduced to account for the fact that the
which merely employs
ξ∈X
density is not normalized.
We will now qualitatively investigate the performance of the two dierent resolution functions. We will start by analyzing the performance of the lattice shown in Figure 3.4.2.
CHAPTER 3.
(a)
49
GENERATION OF CENTERS
Lattice in the true particle coordinates.
Lattice in the transformed coordinates. (cf.
Section 3.2.5)
(b)
Illustration of the lattice in non-transformed and transformed coordinates for
Example 1.3. Every lattice point is colored according to the respective log function value. The
lattice is the A∗2 lattice with κ = 0.3.
Figure 3.4.2:
For this lattice we can estimate the correlation between function value and error as well
as norm of the gradient and function value using MCMC samples of the density. As the
MCMC samples have an innately higher density for higher function values, it is hard to
analyze correlation only based on this density. Therefore we computed the conditional
expectation to observe an error given a certain function value or value of the norm of the
gradient.
To make the dierent setups comparable, the parameters for the algorithm were tuned
such that the density at the edges of the domain is approximately the same for all experiments. Figure 3.4.3 shows the distribution of the pointwise approximation error depending
on the function value and on the norm of the gradient in the lattice setting. One can see
that many of the high error values are located in areas where the norm of the gradient
is high, whereas the correlation between function value and error is not as pronounced.
This analysis suggests that a particle density according to (3.4.1) should perform better
than ones adapted according to (3.4.2).
Figure 3.4.4 shows the resulting particle conguration for a density based on (3.4.2). The
regularity as well as the increased density of the particles is clearly visible in the transformed coordinates (cf. Figure 3.4.4b). Moreover one can see that the lattice structure
was preserved in the areas where no local renement took place. This suggests that the
lattice constitutes a ground state of the chosen potential.
The performance of this particle conguration is analyzed in the same fashion as before
(cf. Figure 3.4.5). As one would expect the local renement drastically reduced the error
for high function values. In contrast to that, the expected error given the norm of the
gradient, seems to drop by a at amount for a large range of higher values.
50
Conditional expectation of the log error
given the log function value.
(a)
CHAPTER 3.
GENERATION OF CENTERS
Conditional expectation of the log error
given the log norm of the gradient.
(b)
Comparison between the conditionally expected log error based on the log function value as well as the log norm of the gradient for centers on the lattice shown in Figure 3.4.2.
The computations are based on 105 Markov chain Monte Carlo samples from the distribution.
The density given as colored background is computed using traditional kernel density estimator,
the conditioned expectation was computed using a Nadaraya-Watson kernel density estimator.
Figure 3.4.3:
Particle renement using a density based
on the function value in non-transformed coordinates.
(a)
Particle renement using a density based on
the function value in transformed coordinates.
(cf. Section 3.2.5)
(b)
Illustration of the particle renement using (3.4.2) in non-transformed and transformed coordinates for Example 1.3. Every particle is colored according to the respective log
function value. Renement was carried out using the parameters d0 = 0.4 and D0 = 0.6.
Figure 3.4.4:
CHAPTER 3.
Conditional Expectation of the log error
given the log function value.
(a)
51
GENERATION OF CENTERS
Conditional Expectation of the log error
given the log norm of the gradient.
(b)
Comparison between the conditionally expected log error based on the log function value as well as the log norm of the gradient on particles adapted using (3.4.2). The
computations are based on 105 Markov chain Monte Carlo samples from the distribution. The
density given as colored background is computed using traditional kernel density estimator, the
conditioned expectation was computed using a Nadaraya-Watson kernel density estimator.
Figure 3.4.5:
Figure 3.4.6 shows the resulting particle conguration for a density based on (3.4.1). One
can see that the local renements were carried out in dierent locations (cf. . One can
identify 3 distinct modes of high particle concentrations. The covariance matrices of both
Gaussians in the Example 1.3 have one signicantly larger and one signicantly smaller
eigenvalue. Thus one can expect steep gradients close to the modes in the positive and
negative direction of the eigenvector corresponding to the smaller eigenvalue. This means
that we should expect a total of 4 modes. Due to the denition of
Dp
as the minimal
value over the neighborhood (cf. Equation (3.3.3)) the particles resolution will increase
in an neighborhood around the occurrence of steep gradients. For suciently large values
∗
of r these neighborhoods can overlap and the modes merge together. We suppose that
this is responsible for the merging of two modes in Figure 3.4.6a. Again the particles in
areas where no renement was carried out remain close to their initial locations.
Figure 3.4.7 shows the distribution of the pointwise approximation error depending on
the function value and on the norm of the gradient in the lattice for the particles which
were locally rened according to (3.4.1). One can see a drastic change in both correlations
of the error with respect to the function value as well as the norm of the gradient. The
expected error signicantly decreases for high function values as well as for the gradient.
In conclusion, we can say that for Example 1.3 local renement based on the gradient
of the function seems to be the more promising approach.
Nevertheless, in numerical
experiments for other probability densities that were almost normal distributed, local
renement based on the function value performed and on the gradient performed equally
well. Hence we suggest that the choice of renement should be made problem specic.
We will now investigate the performance of the algorithm with respect to dierent ratios
52
Particle renement using a density based on
the norm of the gradient in transformed coordinates.
(a)
CHAPTER 3.
GENERATION OF CENTERS
Particle renement using a density based on
the norm of the gradient in transformed coordinates. (cf. Section 3.2.5)
(b)
Illustration of the particle renement using a density based on the norm of
the gradient in non-transformed and transformed coordinates for Example 1.3. Every particle
is colored according to the respective log function value.Renement was carried out using the
parameters d0 = 0.5 and D0 = 0.6.
Figure 3.4.6:
Conditional expectation of the log error
given the log function value.
(a)
Conditional expectation of the log error
given the log norm of the gradient.
(b)
Comparison between the conditionally expected log error based on the log function value as well as the log norm of the gradient. The computations are based on 105 Markov
chain Monte Carlo samples from the distribution. The density given as colored background is
computed using traditional kernel density estimator, the conditioned expectation was computed
using a Nadaraya-Watson kernel density estimator.
Figure 3.4.7:
CHAPTER 3.
53
GENERATION OF CENTERS
D0
. Figure 3.4.8 shows the comparison of the L2 -error, the L∞ -erro as well as the resulting
d0
number of particles for Example 1.3. In Figure 3.4.8a one can see that the number of
D0
faster than linear. As the L2 error and the L∞ error
points increases with the ratio
d0
D0
seems to increase linearly with the ratio
(cf. Figure 3.4.8b 3.4.8c), we decided to
d0
D0
always use the ratio
= 1.2 for the rest of this thesis.
d0
3.4.3 Improved Stopping Criterion
Reboux et al. (2012) proposed the stopping criteria for the gradient descent should be
based on the localized separation distance
dc = max max
q
q:ξ ∈Np kξ p
p
Dpq
.
− ξ q k2
In our numerical experiments for Example 1.3, it was not possible to show that this
is a reasonable stopping criterion. Figure 3.4.9 shows the evolution of the
of
dc
over the course of
12
L2 -error
and
iterations as well as the correlation between the two values.
dc
In these numerical experiment
did not show the tendency to decrease with increasing
number of iterations (cf. Figure 3.4.9a). In contrast the
L2 -error
signicantly dropped
over the course of several iterations (cf. Figure 3.4.9b). In Figure 3.4.9c one can see that
there even is a weak inverse correlation between
applicability of the suggested stopping criterion
Moreover the
L2 -error
for low values of
dc
dc and the L2 error, which questions the
dc < 2.5 for the problems we consider.
is spread over one order of magnitude which
makes the criterium unreliable.
Instead of a stopping criterion based on dc we tried to develop a stopping criterion based
(k)
on the energy E
of the particles in iteration k . For this we tried to base it either on
the current absolute average energy
(k)
Eavg
=
Energy in iteration
k
Number of particles in iteration
k
or the relative change in the energy
(k)
∆Erel
The motivation for
(k)
Eavg
(k)
E − E (k−1) .
= E (k)
is that the energy of the system should converge to a lower bound,
the ground state of the system which is the target conguration we want to attain. The
lower the energy is, the closer the system is to it's ground state and the closer we are to
(k)
the target conguration. The same applies for the change in average energy ∆Erel . This
change should vanish as the particle conguration converges towards it's ground state and
therefore indicate how close we are to the ground state.
Figure 3.4.10 shows the evolution of
(k)
Eavg
over the course of
12
iterations as well as its
(k)
correlation with the L2 -error. Figure 3.4.10 shows the evolution of ∆Erel over the course
54
Change of the number of points with respect
to the ratio Dd00 .
(a)
CHAPTER 3.
GENERATION OF CENTERS
Change of the L2 -error with respect to the
ratio Dd00 .
(b)
Change of the L∞ -error with respect to the
ratio Dd00 .
(c)
Change of dierent quantities with respect to the ratio Dd00 for Example 1.3.
L∞ and L2 -errors were computed on based on 105 Markov chain Monte Carlo samples. The
renement was initialized with a A∗2 lattice with κ = 0.5. The renement was carried out with
parameters D0 = 0.8 and 6 dierent values for d0 equally spaced between 0.6 and 0.8. To
compute the statistics, we repeated the analysis 20 times.
Figure 3.4.8:
CHAPTER 3.
(a)
55
GENERATION OF CENTERS
Distribution of dc over several iterations.
Distribution of the L2 -error over several iterations.
(b)
Linear Regression of the L2 -error given
the value of dc . The correlation coecient is
−0.2752.
(c)
Analysis of correlation between dc and the L2 -error for Example 1.3. The renement was initialized using a A∗2 lattice with κ = 0.5. The parameter D0 was 0.8. To compute
the statistics, we repeated the analysis 20 times.
Figure 3.4.9:
56
Distribution of the average energy over several iterations.
(a)
CHAPTER 3.
GENERATION OF CENTERS
(b) Linear Regression of the L2 -error given the
average energy. The correlation coecient is
0.6808.
Analysis of correlation between the average energy and the L2 -error for Example
1.3. The renement was initialized using a A∗2 lattice with κ = 0.5. The parameter D0 was 0.8.
To compute the statistics, we repeated the analysis 20 times.
Figure 3.4.10:
Distribution of the change in average energy
over several iterations.
(a)
Linear Regression of the L2 -error given the
change in average energy. The correlation coefcient is 0.3283.
(b)
Analysis of correlation between the change in average energy and the L2 -error
for Example 1.3. The renement was initialized using a A∗2 lattice with κ = 0.5. The parameter
D0 was 0.8. To compute the statistics, we repeated the analysis 20 times.
Figure 3.4.11:
CHAPTER 3.
of
12
57
GENERATION OF CENTERS
L2 -error. As one can see in Figure
∆E (k) qualify as stopping criteria as
and the L2 -error.
iterations as well as its correlation with the
(k)
Eavg and
3.4.10 and Figure 3.4.11 respectively, both
we could demonstrate correlation between them
The average energy depends on the average number of particles in every neighborhood
Np . As this average neighborhood size depends on the dimension of the problem, the scale
(k)
of Eavg changes with the dimension of the problem. This means the stopping criteria for
(k)
(k)
Eavg would need to be adapted to every dimension. As ∆Erel is a relative quantity, it is
invariant with respect to the scale of the energy and thereby also invariant with respect
(k)
−2
to the dimension. Therefore we will use ∆Erel ≤ 10
as stopping criteria for the rest of
this thesis.
3.4.4 Extension to Higher Dimensions
In the original algorithm further particles were added when the size of the neighborhood
∗
of a particle |Np | was smaller than a certain threshold N . Reboux et al. (2012) proposed
∗
∗
several choices of N based on the scaling factor for the neighborhood size r and the
dimension
s
but gave only gave an intuition for their choice based on triangular lattices.
∗
As they only presented values for N for up to 3 dimensions it was necessary to motivate
a reasonable choice for higher dimensions. We were able to identify triangular lattices in
∗
∗
∗
∗
2D and 3D as the A2 and A3 lattice and thereby identify N for r = 1 as the kissing
∗
number for the respective lattice. This motivated us to use the kissing number of the As
∗
lattice as choice for N for dimensions s > 3.
As we do not know how good this heuristic is in practice, we checked how robust the
∗
algorithm is with respect to the choice N .Figure 3.4.12 shows the L2 -error, the L∞ -error
the required number of iterations as well as the nal number of particles for several choices
∗
of N for Example 1.3. As one can see all four quantities remain at the same level for all
∗
choices of N . We expect the same behavior in higher dimension and we will therefore
∗
∗
use the kissing number of the As lattices as choice for N .
Another approach to assess whether the neighborhood of a particle is suciently large
was presented in (Gossard, 1995). It is based on the the fraction of the sphere around the
particle which overlaps with other spheres around neighboring particles. This approach
is only mentioned at this point as reference, but was not further pursued in this thesis.
3.4.5 Extension to Unknown Domains
Reboux et al. (2012) suggests that the algorithm can be used to generate particle congurations in unbounded domains, when the appropriate potential is chosen. In numerical
experiments we were not able to obtain satisfying results for the problem at hand as the
area covered by particles grew indenitely. To constrain the particles to bounded domains
we chose a similar approach as for the lattices. We limited the spawning of new particles
p
in Np to neighborhoods of particles ξ ∈ Ωδ . This automatically creates a set of boundary
p
particles ξ ∈
/ Ωδ that keeps particles inside of Ωδ conned.
58
(a)
N ∗.
CHAPTER 3.
Resulting L∞ -error for dierent choices of
Required number of iterations for dierent
choices of N ∗ .
(c)
(b)
N ∗.
GENERATION OF CENTERS
Resulting L2 -error for dierent choices of
Resulting nal number of particles for different choices of N ∗ .
(d)
Overview over the performance of the algorithm for dierent choices of N ∗ for
Example 1.3. The renement was initialized using a A∗2 lattice with κ = 0.5. The parameter D0
was 0.8. To compute the statistics, we repeated the analysis 20 times.
Figure 3.4.12:
CHAPTER 3.
59
GENERATION OF CENTERS
Illustration of the original spawning algorithm.
(a)
Illustration of the modied spawning algorithm.
(b)
Comparison of the original and the modied spawning algorithm. Black dots
indicate locations of particles. The blue, dashed circles enclose the area where particle fusion
occurs. The sphere S(ξ p , Dp ) is the black dashed circle and S(ξ p − (ξ¯p,loc − ξ p ), Dp ) is the orange
dashed circle. Grey dots indicate locations where new particles could be spawned. Orange dots
indicate the locations of the new particles before the projection. The straight green line shows
the projection trajectory. Newly generated particles which will subsequently removed by fusion
are crossed out in red.
Figure 3.4.13:
3.4.6 Modied Spawning
Reboux et al. (2012) proposes to spawn new particles in the neighborhood Np by placp
p
ing them at uniformly random locations on the surface S(ξ , Dp ) of the ball B(ξ , Dp ).
p
When most of the particles in the neighborhood of ξ are clustered in a conned area,
newly generated particles which are spawned close to those clusters have a high chance of
being subsequently removed by fusion. Hence it is desirable to give the randomly chosen
spawning location a bias towards the opposite direction of a possible cluster.
This is
achieved by computing the mean ξ¯p,loc of the particle locations in Np and generating the
p
p
new particles on the sphere S(ξ − (ξ¯p,loc − ξ ), Dp ) and projecting those particles onto the
p
p
sphere S(ξ , Dp ). For homogeneously distributed particles we have ξ¯p,loc ≈ ξ and there
is only marginal change compared to the original algorithm.
The algorithm is sketched in Figure 3.4.13. Figure 3.4.14 shows the comparison of the
resulting
L∞ -error, L2 -error, the sum of spawned particles as well as the resulting number
of particles for both methods for Example 1.3. As one can see, we were not able to show
signicant improvement by using this method
60
(a)
CHAPTER 3.
Comparison of the resulting L∞ -error.
Comparison of the total number of spawned
particles.
(c)
(b)
GENERATION OF CENTERS
Comparison of the resulting L2 -error.
Comparison of the required number of iterations to reach the stopping criterion.
(d)
Illustration of the robustness of the algorithm for dierence choices of N ∗ for
Example 1.3. The renement was initialized using a A∗2 lattice with κ = 0.5. The parameter D0
was 0.8. To compute the statistics, we repeated the analysis 20 times.
Figure 3.4.14:
CHAPTER 3.
61
GENERATION OF CENTERS
3.4.7 Iterative Reparametrisation
In the context of iterative renement we tried to implement an iteratively adaptive computation of the covariance based on the current particle conguration. But as the particle
generation is carried out in the transformed coordinates, it is necessary to transform back
to the original logarithmic coordinates, compute the new approximation
M̂
and transform
the particles again using this matrix. The combination of the two transformations is given
by the transformation
−1
1
2
M̂ 2 θold .
θnew = M̂new
old
From this combined transformation we can give upper and lower bounds the resulting
change of the norm of the parameters
1
1
− 12
− 21
2
θ ≤ kθnew k ≤ λmax M̂new
θ .
λmin M̂new
M̂ 2
M̂
old
old
old
old
This change in norm results in a change of the neighborhood structure, as the particles
∗
in the new coordinates might be closer to each other or further apart. So if λmax ' r or
λmin / 12 the transformation will most likely lead to drastic changes of Np which could
result in the spawning of many new particles or the fusion of many particles. Numerical
experiments showed that, if such a transformation, that led to either many spawnings
or many fusions occurs, the subsequent transformation alternately gave rise to a large
number of newly spawned particles and a large amount of fused particles. This unstable
behavior with respect to the transformation led us to not further pursue the idea of an
adaptive approximation of the covariance matrix.
3.4.8 Local Kernel Shape
As suggested in Section 2.4 we can improve the conditioning of the interpolation by locally
varying the shape parameter. Reboux et al. (2012) proposed that radius
rc,p
is a good
value for the shape parameter. In the paper does not treat RBF interpolation but the
discretization-corrected particle strength exchange (DC-PSE)
related
method
(Schrader et al., 2010), we will slightly modify the approach and introduce an additional
parameter
ς
:
ς
ε̃ = εrc,p
.
Using
of
rc,p
ς
as exponent enables us to determine the strength of the inuence of the value
on the local shape parameter
ε̃.
An exponent
ς = 0
would correspond to no
inuence, whereas positive and negative values would correspond to positive and negative
correlation of the values.
To investigate the how well this localized kernel size performs, we xed
and computed the corresponding optimal
ε
ς
to several values
via Rippa's method for Example 1.3. Figure
L∞ -error, as well as the L2 -error for values of ς between
3.4.15 also shows an exemplary analysis of the L∞ value for
3.4.15 shows the comparison of the
−1
and
1.
Moreover Figure
dierent values of
ε.
In Figure 3.4.15b and Figure 3.4.15a one can see that the both
62
L2 -error
CHAPTER 3.
as well as
higher values of
ς
L∞ -error
GENERATION OF CENTERS
remain at the same level for
ς
between
−1
and
0.14.
For
both error dramatically increase. We attribute this behavior to the fact
that Rippa's method is only a good approximation for
ς = 0.
Consequently the minima
of Rippa's method and the true error no longer coincide. Moreover for larger values of
ς
Rippa's method seems to have a narrow basin of attraction where a local minimizer would
converge to the global minimum (cf. 3.4.15c). The very high errors in Figure 3.4.15b can
be explained by the local minimizer for the shape parameter getting stuck in local minima.
As the obtained improvement in the approximation error was quite small, we decided not
to further investigate the performance of localized kernel sizes.
3.5 Conclusion
One of the compelling advantages of the rened particle over the lattice, is that a previously generated particle conguration can be used as initialization for further renement.
This allows easy-to-implement, exible, renement schemes where the particle conguration is subsequently rened to reach higher precision.
Such arbitrary subsequent re-
nements are not possible with lattices as it is only possible to increase the density by
a factor 2 (cf. Section 3.3). Moreover this allows us to reuse the generated particles as
initialization for subsequent parameter estimation task with similar models or changes in
experimental setup.
Until now we have only considered a low-dimensional dimensional Example 1.3 which can
4
be approximated reasonably well with less than 10 particles. For higher dimensional
settings the number of particles will signicantly increase.
In the current setting the
dominating factor for the computational cost for the generation of centers is the comp
q
2
putation of the pairwise dierences kξ − ξ k with O(N ). This computational cost can
be reduced by using domain decomposition which is, for example, implemented in the
parallel particle-mesh (PPM) library (Sbalzarini et al., 2006).
Nevertheless the computational cost of RBF interpolation will be
dominating factor for the complete method in the large-N limit.
O(N 3 )
and be the
The Gaussian MLS
approximation requires a regular grid and is thus not applicable on the rened set of
particles.
Moreover with RBF interpolation we do not obtain guaranteed positivity of
the density on the complete domain
Ωθ .
Until now we did not carry out any analysis
of our approximant and therefore do not know how it is aected by possibly negative
approximants. We dedicate the next chapter to the investigation of the analysis in terms
of marginals and moments.
CHAPTER 3.
(a) L∞ -error
63
GENERATION OF CENTERS
for several choices of ς .
(b) L2 -error
for several choices of ς .
Exemplary analysis of the performance of
Rippa's method for ς = 1.
(c)
Error analysis for a localized kernel size for Example 1.3. The renement was
initialized using a A∗2 lattice with κ = 0.5. The parameter D0 was 0.8. To compute the statistics,
we repeated the analysis 20 times.
Figure 3.4.15:
64
CHAPTER 3.
GENERATION OF CENTERS
Chapter 4
Analysis of the Interpolant
The compelling advantage of Gaussian RBFs over other approximation methods is that
we can derive analytical expressions for the computation of moments and marginals. In
s
the following we will give formulas which are exact for a domain Ωθ = R for RBF approximation. The formulas for the MLS approximant as well as the CSRBF approximant
are the same up to constant factors. For GHI approximation it is also possible to give
similar closed form expressions for the marginals but the formulas will not be presented
in this thesis. For bounded domains Ωθ these formulas will produce a truncation error as
s
assume the integrals over R and not over Ωθ . For suciently large domains Ωθ and for
densities which are heavy-tailed, we can expect the truncation error to be below machine
precisions. For densities which have signicant mass on the edge of the domain
∂Ωθ ,
the
truncation error will dominate the approximation of marginals at the edge of the domain
and also have some impact on the computation of marginals.
As classical KDEs also
experience this truncation error, we decided that investigating of methods to correct for
this truncation error is beyond the scope of this thesis.
This chapter is organized as follows: In Section 4.1 we will present the formulas for the
computation of moment and marginals for the RBF approximant. In Section 4.2 we will
present the respective numerical convergence analysis for KDE, RBF on a lattice, RBF
on rened particles and MLS for both moments as well as marginals.
4.1 Formulas
We will illustrate the computation of the zeroth order moment, and the rst order moments, also known as the expected values. Using the ane linear transform presented in
Section 3.2.5 the form of the RBF approximant is as follows
Pp =
N
X
2
k 2
− 21
ξ−ξ .
wk exp −ε M
k=1
65
66
CHAPTER 4.
ANALYSIS OF THE INTERPOLANT
From standard probability theory we know that the integral over one of the single term
is given by
Z
exp
Rs
π 2s p
2
− 12
j −ε M
ξ − ξ dξ = 2
|det(M )|.
ε
2
Thus we can, exploiting the linearity of the integral, compute the approximate zeroth
order moment using the formula
m̂0p =
N
X
wk
k=1
π 2s p
|det(M )|
ε2
Furthermore from basic probability theory we know that the i-th one-dimensional marginals
k
k
of a multivariate Gaussian N (ξ |C) is given by the one-dimensional Gaussian N (ξi |Cii ).
Hence we can give the following formula for the one
approximate marginal
i-th
one-dimensional, normalized
p̂(ξi |D):
s
p̂(ξi |D) =
N
P
2 2 wk exp − Mε ii ξi − ξik ε2 k=1
πMii
N
P
.
wk
k=1
On the same lines we can compute two-dimensional marginals exploiting the fact that the
two dimensional marginals of a multivariate Gaussian are again Gaussian with restricted
mean and covariance matrix
N
P
2
wk επ
k=1
p̂(ξi , ξj |D) = C

k − 12 2
ξi
ξi

Mii Mij
exp −ε2 −
k
ξj
ξj
Mji Mjj

N
P
,
wk
k=1
where
C
is the normalization factor
C=s
det
1
Mii Mij
Mji Mjj
.
Using the formula for one-dimensional marginals is is straightforward to compute the
respective
i-th
rst order approximate moment:
N
P
m̂1i =
wk ξik
k=1
N
P
.
wk
k=1
CHAPTER 4.
67
ANALYSIS OF THE INTERPOLANT
4.2 Numerical Evaluation
Ωδ
δ:
For proles it is sucient to assure positivity of the approximant on the domain
is ensured, given that the maximal approximation error on
is smaller than
which
δ
Given the approximation approximation error
|p(ξ) − p̂(ξ)| ≤ δ
Ωδ
∀ ξ ∈ Ωξ : p(ξ) ≥ δ
we have that
p̂(ξ) ≥ p(ξ) − δ
ξ ∈ Ωξ : p(ξ) ≥ δ
which yields
p̂(ξ) ≥ 0.
For the computation of marginals and moments, negativity of the approximant outside the
domain
Ωδ
can have signicant impact as the computation requires integration over the
complete domain
Ωθ .
As we were not able to derive error bound for the approximation of
marginals and moments, we carried out numerical analysis of the error. We considered the
L∞
error and the sum of the
L2
errors on the individual marginals (denoted as l1
error), computed as described in Section 1.4.
− L2 -
Moreover we considered the sum of the
relative error in the estimation of the rst order moments
nθ 1
X
mj − m̂1j m1 .
j=1
j
Figure 4.2.1 shows the convergence analysis for the RBF and MLS approximation on
A∗2 lattices for several choices of the scaling parameter κ in comparison to a KDE on
an increasing number of MCMC samples. One can see that in settings with more than
102.1 ≈ 125 function evaluations, RBF as well as MLS approximation always yield better
results than KDE. We suppose that the oscillations in the error for RBF approximation in
Figures 4.2.1a, 4.2.1b and 4.2.1c arise due to negative values of the approximant outside
the domain
Ωδ .
We are not able to explain the parabolic shape of the error for the moment
approximation in Figure 4.2.1c.
To illustrate the necessity of the regularity of the set of centers in both MLS and RBF
5
approximation, the approximation of marginals was also carried out on subsets of 10
MCMC samples. For both approximations the
L∞ -error
and the
L2 -error
in marginals
as well as the relative error in moments do not show any improvement for an increased
number of function evaluations. Moreover the
L∞ -error is for both approximations of the
same magnitude as the function values itself (cf. Figure 1.4.1). Similarly the relative error
in moments is close to
1 for both approximations.
These results suggest that MLS as well
as RBF approximations of marginals and moments are meaningless when carried out on
MCMC samples.
68
CHAPTER 4.
Convergence of the L∞ -error in marginal approximation for dierent methods.
(a)
ANALYSIS OF THE INTERPOLANT
(b) Convergence of the l1 −L2 -error in marginal
approximation for dierent methods.
Convergence of the sum of relative errors in
rst order moment approximation for dierent
methods.
(c)
Convergence analysis for dierent error measures for Example 1.3. Radial basis
functions (RBF) as well as moving least squares (MLS) approximation was carried out on a A∗3
lattice with 20 dierent values for κ logarithmically equispaced between 10−0.9 ≈ 0.125 and 1.
RBF and MLS approximation as well as kernel density estimation (KDE) was also carried out
on subsets of a set of 105 Markov chain Monte Carlo samples obtained using the DRAM toolbox
(Haario et al., 2006) .
Figure 4.2.1:
CHAPTER 4.
ANALYSIS OF THE INTERPOLANT
Convergence of the L∞ -error in marginal approximation for dierent methods.
(a)
69
Convergence of the l1 − L2 -error in marginal
approximation for dierent methods.
(b)
Convergence of the sum of relative errors in
rst order moment approximation for dierent
methods.
(c)
Convergence analysis for dierent error measures. Radial basis function approximation on the lattice was carried out on a A∗3 lattice with 20 dierent values for κ logarithmically
equispaced between 10−0.9 and 1. radial basis function approximation on particles was carried
out for three dierent initialization lattices (low-resolution: κ = 1 (63 points), mid-resolution:
κ = 0.5 (193 points) high-resolution: κ = 10−0.6 (672 points)). Particle renement was then
carried out for 20 dierent values of D0 equally spaced between 0.6 and 2 with a ratio Dd00 = 1.2.
The number of runs for every parameter setting was 20. Kernel density estimation (KDE) was
carried out on subsets of a set of 105 Markov chain Monte Carlo samples obtained using the
DRAM toolbox (Haario et al., 2006) .
Figure 4.2.2:
70
CHAPTER 4.
ANALYSIS OF THE INTERPOLANT
Required time for interpolation with dierent. The test function was Himmelblau's function (Himmelblau et al., 1972) and the points were points generated by the Halton
sequence (Halton, 1964) rescaled to [−5, 5] × [−5, 5]. The measured values were then tted using
a cubic polynomial. The analysis was performed on a Mid 2011 Macbook Air with a dual core
1.7 GHz Intel Core i5 processor and 4 GB 1333 MHz DDR3 RAM running MATLAB R2012b.
Figure 4.2.3:
Figure 4.2.2 shows the convergence analysis for the RBF approximation on the particle
renement for several dierent choices of lattice initializations and several dierent rene∗
ment resolutions compared to the RBF approximation on A2 lattices for several choices of
the scaling parameter κ. The number of total function evaluations for the particle setting
is the sum of the number of points on the initialization lattice and the nal number of particles. The numerical results show that, for Example 1.3 and our choice of parameters, we
were able to obtain approximations to marginals and moments using particle renements
that are comparable to those approximations using lattices. Nevertheless both methods
yield better results than KDE on MCMC samples.
To assess the required computation time for interpolation, we approximated Himmelblau's
function (Himmelblau et al., 1972) using the pseudorandom Halton sequence (Halton,
1964) to generate centers. Himmelblau's function is a commonly used test function and the
Halton sequence is a popular choice to generate centers for scattered data approximation
3
(Wendland, 2005). The theoretically predicted computational complexity is O(N ) as the
dominating operation is the inversion of the interpolation matrix. The measured values
are show in Figure 4.2.3 including the cubic t to those values. The equation for the
−8 3
−5 2
−3 1
cubic t is T (N ) = 3.9 ∗ 10 N + 1.2 ∗ 10 N − 9.7 ∗ 10 N + 0.69. The evaluation
time for both KDE and RBF approximation of marginals is the same. Therefore we need
to compare the time required to perform the RBF interpolation to the time gained by
requiring less function evaluations. Hence using RBF interpolation over KDE will only
result in a speedup if
%Ñ ≥ T (N )
where
%
Ñ
is the number of function evaluation RBF interpolation needs less over KDE,
is the time required for one function evaluation and
N
is the number of centers that
CHAPTER 4.
71
ANALYSIS OF THE INTERPOLANT
were employed. Due to the curse of dimensionality (Novak and Wo¹niakowski, 2009) N
will increase exponentially with the dimension of the parameters space, which limits the
application of the method to low-dimensional systems where
%
is high.
4.3 Conclusion
In this chapter we investigated the error for the approximation of marginals and moments
of the approximated densities. In contrast to the error in the approximation of the density, we were not able to show any advantage of the particle renement over the lattice
approximation. Moreover, for Example 1.3, we did not have any noticeable problems with
the positivity of the approximation. One possible explanation for this is the fact that the
density in Example 1.3 lies in the native space of the RBF approximation. In the following
chapter we will consider the approximation of more complex examples.
72
CHAPTER 4.
ANALYSIS OF THE INTERPOLANT
Chapter 5
Application of the Algorithm
Until now, all analysis was carried out on the two-dimensional Example 1.3. This example
had the compelling advantage, that closed-form expressions for marginals and moment
were available.
In the following we will consider problems where this is not the case.
Therefore, we will not carry out rigorous quantitative analysis in this chapter. We will only
show that the method we proposed is capable of producing similar results to those obtained
by KDE in combination with MCMC. To demonstrate the behavior of the algorithm on
ODE models we will apply the method to a reversible rst order reaction model in Section
5.1, for which the parameter dimension is 2 and to a enzymatic catalysation model in
Section 5.2, for which the parameter dimension is 3. We will conclude by applying the
method to a stochastic dierential equation model for protein expression in Section 5.3,
for which the parameter dimension is 4.
5.1 Reversible First Order Reaction
In this section we will consider a reversible rst order reaction between two substances
and
B:
A
k+1
A
B
k−1
Here the kinetic rate of the reaction
reaction
B →A
is given by
k−1 .
A→B
is given by
k+1
and the kinetic rate for the
Assuming mass action kinetics, we can formulate the
following set of dierential equations for the concentrations
cA
and
cB
:
ċA = −k+1 cA + k−1 cB
.
ċB = +k+1 cA − k−1 cB
y1 (t) = cA (t) is an observable of the system and generate noise6 time-points for tk = 0, . . . , 5 for the parameters k+1 = 0.6, k−1 =
We will assume that only
free synthetic data for
73
74
CHAPTER 5.
APPLICATION OF THE ALGORITHM
Synthetic measurement data. For the observed quantity cA , the uncertainty is
indicated by error-bars of length 2σ .
Figure 5.1.1:
0.4.
we will use the noise free-date and assume a variance of
σ = 0.1, instead of generating
several realizations by adding articial noise to the data.
Thereby we avoid that the
stochastic realization of noise has any eect on our results. The generated synthetic data
is shown in Figure 5.1.1.
This example was designed such that there is a pronounce uncertainty in the estimation
of all parameters. This was achieved by only providing a low amount of samples of the
concentration time-course.
To analyze how close the approximations is to the true marginals, we used a KDE ap6
proximation on 10 samples. We suppose that this approximation is close to the true
marginals. Therefore we will now compare the RBF approximation of marginals to the
KDE approximation of marginals on the same number of points to the KDE approxima6
tion on 10 samples. Figure 5.1.2 shows the RBF approximation of the joint density as well
as the respective RBF approximation of marginals and KDE approximation of marginals.
The uncertainty of parameters is visible in the joint density shown in Figure 5.1.2, as the
mass of the density does not decrease to zero at the border of the investigated domain.
Analyzing the marginals in Figure 5.1.2, one can see that all three methods exhibit significant truncation errors at the upper bound of both parameter intervals. Moreover there
are some slight negativities of the RBF approximate of the density, which result in small
oscillations in the marginal of log(k−1 ) at the lower bound of the investigated domain.
One can see that the RBF approximation is much closer to the KDE approximation with
106 samples compared to the KDE approximation using the same number of samples.
Hence we can conclude that in this setting, the RBF approximation is better than the
KDE approximation.
CHAPTER 5.
APPLICATION OF THE ALGORITHM
75
Figure 5.1.2: Comparison of the approximation of marginals using radial basis functions (RBF)
on a particle renement and kernel density estimator (KDE) on Markov chain Monte Carlo
samples. The particle renement was initialized on a A∗2 lattice with κ = 2 (112 points), and
rened to 325 particles (D0 = 3, d0 = 2.5). Kernel density estimation was performed on 106
samples obtained by Markov chain Monte Carlo using the DRAM toolbox (Haario et al., 2006).
The surface plot shows the radial basis function approximation of the density on the complete
domain Ωθ =[ log(k+1 )-3 , log(k+1 )-3 ]×[ log(k−1 )-3 , log(k−1 )-3 ].
76
CHAPTER 5.
APPLICATION OF THE ALGORITHM
Synthetic measurement data. For the observable concentrations cA and cP , the
uncertainty is indicated by error-bars of length 2σ .
Figure 5.2.1:
5.2 Enzymatic Catalysation
We will now increase the parameter dimension by one and consider the enzymatic catalysation of the conversion of the substrate
A
to the product
P
which is catalyzed by the
enzyme E:
k+1
k
A + E C →2 P + E.
k−1
Here the kinetic rate for the formation of the enzyme-substrate complex
The dissociation rate is given by
k−1
C
is given by
and the catalyzed reaction is given by
k2 .
k+1 .
Using the
law of mass action we can formulate the following set of ordinary dierential equations
for the concentrations
cA
and
ċA
ċE
ċC
ċP
cE , cC
=
=
=
=
and
cP :
−k+1 cA cE + k−1 cC
−k+1 cA cE + (k−1 + k2 ) cC
.
k+1 cA cE − k2 cC
k2 cC
y1 (t) = cA (t) as well as y2 (t) = cP (t). We generate
for tk = 0, . . . , 5 for the parameters k+1 = 0.6, k−1 =
We will assume that we can observe
30 time-points
0.4, k2 = 1. Along the lines of Section 5.1 we will assume a variance
data of σ = 0.1. The generated synthetic data is shown in Figure 5.2.1.
synthetic data for
in the measured
Figure 5.2.2 shows the approximated two-dimensional marginals using RBF interpolation.
One can see pronounce negative areas in the approximate two-dimensional marginals.
CHAPTER 5.
APPLICATION OF THE ALGORITHM
(a)
2D joint RBF marginal for log(k+1 ) and
(c)
2D joint RBF marginal for log(k+1 ) and
log(k2 ).
(b)
77
2D joint RBF marginal for log(k−1 ) and
log(k2 ).
log(k−1 ).
Two-dimensional normalized marginals approximated via radial basis function
interpolation. Radial basis function interpolation was carried out on a particle renement. Initialization was carried out on a A∗3 lattice with κ = 5 (353 points) and rened to 778 points
(D0 = 5) particles.
Figure 5.2.2:
78
CHAPTER 5.
Approximations of the one-dimensional
marginal for log(k+1 ) using radial basis function
(RBF) interpolation and kernel density estimation (KDE).
(a)
APPLICATION OF THE ALGORITHM
Approximations of the one-dimensional
marginal for log(k−1 ) using radial basis function
(RBF) interpolation and kernel density estimation (KDE).
(b)
Approximations of the one-dimensional
marginal for log(k2 ) using radial basis function
(RBF) interpolation and kernel density estimation (KDE).
(c)
Comparison of one-dimensional approximate normalized marginals using moving
least squares and kernel density estimation. Radial basis function interpolation was carried out
on a particle renement. Initialization was carried out on a A∗3 lattice with κ = 5 (353 points)
and rened to 778 points (D0 = 5) particles. Kernel density estimation was performed on 5 · 104
samples obtained by Markov chain Monte Carlo using the DRAM toolbox (Haario et al., 2006).
Figure 5.2.3:
CHAPTER 5.
APPLICATION OF THE ALGORITHM
(a)
2D joint MLS marginal for log(k+1 ) and
(c)
2D joint MLS marginal for log(k+1 ) and
log(k2 ).
(b)
79
2D joint MLS marginal for log(k−1 ) and
log(k2 ).
log(k−1 ).
Two-dimensional normalized marginals approximated via moving least squares.
Approximation was carried out on a A∗3 lattice with κ = 2 (3440 points).
Figure 5.2.4:
80
CHAPTER 5.
Approximations of the one-dimensional
marginal for log(k+1 ) using moving least squares
(MLS) and kernel density estimation (KDE).
(a)
APPLICATION OF THE ALGORITHM
Approximations of the one-dimensional
marginal for log(k−1 ) using moving least squares
(MLS) and kernel density estimation (KDE).
(b)
Approximations of the one-dimensional
marginal for log(k2 ) using moving least squares
(MLS) and kernel density estimation (KDE).
(c)
Comparison of one-dimensional approximate normalized marginals using moving
least squares and kernel density estimation. Approximation was carried out on a A∗3 lattice
κ = 2 (3440 points). Kernel density estimation was performed on 106 samples and 3340 samples
obtained by Markov chain Monte Carlo using the DRAM toolbox (Haario et al., 2006).
Figure 5.2.5:
CHAPTER 5.
Figure 5.3.1:
81
APPLICATION OF THE ALGORITHM
Overview over the stochastic system described in (Kazeroonian et al., 2013)
Figure 5.2.3 shows the corresponding one-dimensional marginals. One can see that the
negative areas also appear in the one-dimensional marginals and lead to strong oscillations
in the approximate marginals which renders this approximation meaningless.
To assure the non-negativity of the density we approximated the density using MLS on a
A∗3 lattice with 353 points. The resulting two-dimensional marginals are shown in Figure
5.2.4. As expected the approximate marginals are non-negative. As one can see in Figure
5.2.5, the resulting approximations to the one-dimensional marginals are also non-negative
and are comparable to those obtained via KDE on MCMC samples.
To again analyze how close the approximations is to the true marginals, we used a KDE
6
approximation on 10 samples and suppose that this approximation is close to the true
marginals. In this case the KDE approximation on is closer to the KDE approximation
6
with 10 samples. Hence we can conclude that in this setting, the MLS approximation
does not perform better than the KDE approximation.
5.3 Stochastic Gene Expression
As nal application we will examine the high-dimensional (nθ
= 4),
stochastic system
described in (Kazeroonian et al., 2013). The system describes an active and inactive state
of DNA where the on-switching occurs with rate
τof f .
τon and the o-switching occurs with rate
Only the active DNA is only transcribed to mRNA. The rate for the transcription
is given by
kr .
The translation of mRNA to proteins occurs with the rate
degradation occurs with the rate
γr .
kp
whereas the
Degradation also occurs for the protein with the rate
γp .
The chemical master equation for this systems is solved via
(FSP) (Munsky and Khammash, 2008).
nite state projection
Using the nite state projection transforms this
82
CHAPTER 5.
APPLICATION OF THE ALGORITHM
Bubble chart of the synthetic data for the observed number of proteins of the
system. Samples were generated for 6 dierent time-points. The number of samples was 61 for
every time-point. The size of the circles indicate the number of samples with the corresponding
number of proteins at the corresponding point in time. The MATLAB code to generate this
graphic was kindly provided by Atefeh Kazeroonian.
Figure 5.3.2:
problem into an ODE model with an high-dimensional state-space. The observable of this
system is the protein concentration. For the details on the computation of the likelihood
for this system we refer the reader to the original paper (Kazeroonian et al., 2013).
∗
The approximate marginals obtained via MLS on a A4 lattice with 6396 and via KDE on
4
10 MCMC samples are shown in Figure 5.3.3. One can see that the approximation via
MLS is comparable to the approximation via KDE. Due to the large number of lattice
points required to obtain a satisfying approximation to the marginals and the unsatisfying
results in the previous section, we did not carry out a particle renement and subsequent
RBF approximation.
In this setting we also analyzed how close the approximations is to the true marginals, we
4
used a KDE approximation on 10 samples and suppose that this approximation is close
to the true marginals. In this case the KDE approximation is much closer to the KDE
4
approximation with 10 samples. Hence we can conclude that in this setting, the MLS
approximation does not perform better than the KDE approximation.
5.4 Conclusion
In this chapter we investigated the performance of the method on two dynamical as well
as one stochastic system.
For the two dimensional problem presented in Section 5.1
the approximation of marginals using RBF interpolation on a particle renement yielded
better approximations to marginals than KDE on MCMC samples. In contrast, for the
CHAPTER 5.
APPLICATION OF THE ALGORITHM
(a)
1D approximate marginals for log(τon ).
(b)
1D approximate marginals for log(τof f ).
(c)
1D marginals for log(km ).
(d)
1D approximate marginals for log(kp ).
83
Comparison of one-dimensional approximate normalized marginals using moving
least squares and kernel density estimation. Moving least squares approximation was carried out
on the A∗4 lattice with κ = 0.01 (6396 points). Kernel density estimation was performed on 104
samples obtained by Markov chain Monte Carlo using the DRAM toolbox (Haario et al., 2006).
Figure 5.3.3:
84
CHAPTER 5.
APPLICATION OF THE ALGORITHM
example in Section 5.2 the RBF interpolation on a particle renement does not yield
meaningful approximations to marginals. For both the examples presented in Section 5.2
and Section 5.3 the approximation with MLS on a lattice yields approximations to the
marginals comparable to those obtained using KDE on MCMC samples.
Chapter 6
Conclusion and Outlook
6.1 Conclusion
In this thesis we considered the approximation of probability densities using radial basis
functions in combination with lattices and particle methods.
We proposed a novel al-
gorithm for lattice generation on superlevel-sets. Moreover we were able to increase the
accuracy of radial basis function interpolation by several orders of magnitude using the
Mahalanobis norm with approximations to the covariance of the density. We were able to
provide a comprehensible motivation for the algorithm proposed in Reboux et al. (2012)
and introduced several improvements to the algorithm. Furthermore we were able to show
that radial basis function approximation as well as moving least squares approximation
can perform better than kernel density estimators in terms of approximation error for
marginals and moments using lattices for a given number of function evaluations based
on a bivariate mixture of two gaussians. We were not yet able to show any signicant
improvement the approximation error by using the particle renement over the lattice as
interpolation nodes for the radial basis function approximation.
In low dimensional systems, we were able to obtain acceptable approximation orders using
a low amount of function evaluations. For higher dimensional systems, the required number of function evaluation necessarily increases. Although radial basis function approximation should yield better approximations than kernel density estimators, the computational
cost for the computation of the RBF approximants with acceptable approximation error
renders the method unattractive. Although we were able to make certain improvements
to the method, they were not sucient to assure the applicability of the method in higher
dimensions.
The numerical experiments in Chapter 5 suggest that non-negativity plays an increasingly
important role in higher dimensions. The only reliable method to ensure non-negativity
we were able to provide, was the approximation using moving least squares. Unfortunately
using moving least squares over radial basis function interpolation comes with a signicant
sacrice in terms of approximation order and the loss of exibility to use the particle
85
86
CHAPTER 6.
CONCLUSION AND OUTLOOK
method for local renement. Nevertheless numerical experiments showed that, as theory
suggests, moving least squares applied on a lattice seems to perform better or at least
comparable to kernel density estimators.
6.2 Outlook
In Chapter 5 we shortly investigated the performance of the method on several examples.
The investigation of those examples could be carried out in more detail and with other
choices for parameters. Moreover it could be interesting to extend the set of examples to
a wider class of problems, as we did not consider partial dierential equation models in
this thesis.
Concerning the non-negativity of the problem, it would be interesting to make a more
rigorous analysis under which conditions negative approximations arise. Consequently one
should also further investigate methods to ensure the non-negativity of the approximation.
We have not yet used the full set of available tools in the context of moving least squares
approximation, such as as approximate moving least square approximation (Fasshauer,
2007, Chapter 26), iterated approximate moving least squares approximation (Fasshauer,
2007, Chapter 31) and multilevel iterations (Fasshauer, 2007, Chapter 32).
Moreover
Wendland (2005) suggests the use of localized kernel sizes to deal with non-uniform sets of
nodes
X.
These methods could be used to obtain higher order approximation schemes and
improve the approximation goodness of moving least squares. Furthermore these methods
could enable us to also use moving least squares in combination with the adaptive particle
renement.
Although we mentioned interpolation using compactly supported radial basis functions,
we did not carry out any rigorous analysis.
Using compactly supported radial basis
functions yields sparse interpolation matrices, which can be eciently inverted, even for
large number of interpolation node. Therefore compactly supported radial basis functions
are an interesting alternative for problem where a high number of interpolation nodes is
necessary.
In the context of node generation, the particle method could be implemented more eciently using the PPM toolbox (Sbalzarini et al., 2006), which implements domain decomposition and faster algorithms to compute neighborhoods (Awile et al., 2012). Moreover
we briey discussed the idea of using radial basis function interpolation as well as moving
least squares in combination with Markov chain Monte Carlo samples in Section 4.2. This
idea could be further pursued by using greedy methods to select near-optimal interpolation
nodes from a set of Markov chain Monte Carlo samples (Marchi et al., 2005).
As previously mentioned, the method is currently not suited to be applied to high dimensional problems. Moreover the adaptivity of the method comes at a certain computational
cost. Therefore we envision the method to be applied to low dimensional problems where
the evaluation of the likelihood takes a large amount of time or where other methods have
CHAPTER 6.
CONCLUSION AND OUTLOOK
87
problems due to pronounce non-linear correlations between parameters or heavy tails in
the density.
When parameters are orthogonal and an independent partition of the pa-
rameter space into low dimensional spaces is possible, the method could also be applied
to those low-dimensional spaces individually.
88
CHAPTER 6.
CONCLUSION AND OUTLOOK
Bibliography
Agrell E and Eriksson T.
Optimization of Lattices for Quantization.
IEEE Transactions
on Information Theory, 9 (1998):pp. 18141828.
Andrieu C, Doucet A and Holenstein R.
ods.
Particle Markov Chain Monte Carlo Meth-
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72
(2010):pp. 269342.
Awile O, Büyükkeçeci F, Reboux S and Sbalzarini IF.
Resolution Particle Simulations.
Fast Neighbor Lists for Adaptive-
Computer Physics Communications, 183 (2012):pp.
10731081.
Fast Fitting of Radial basis Functions: Methods
Based on Preconditioned GMRES Iteration. Advances in Computational Mathematics,
Beatson R, Cherrie J and Mouat C.
11 (1999):pp. 253270.
Bentley JL.
Multidimensional Binary Search Trees Used for Associative Searching.
Com-
munications of the ACM, 18 (1975):pp. 509517.
Celisse
A.
Optimal Cross-Validation in Density Estimation.
arXiv
preprint
arXiv:0811.0802, (2008):pp. 137.
Conway J and Sloane N.
Sphere Packings, Lattices and Groups.
Springer, 1998.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review. Journal of the American Statistical Association, 91 (1996):pp. 883904.
Cowles M and Carlin B.
Coxeter H, Few L and Rogers C.
Covering space with equal spheres.
Mathematika, 6
(1959):pp. 147157.
Adaptive Residual Subsampling Methods for Radial Basis
Function Interpolation and Collocation Problems. Computers & Mathematics with
Driscoll Ta and Heryudono AR.
Applications, 53 (2007):pp. 927939.
Epanechnikov V.
Nonparametric Estimation of a Multidimensional Probability Density.
Teoriya Veroyatnostei i ee Primeneniya, 14 (1969):pp. 156161.
Fasshauer G.
Meshfree Approximation Methods with MATLAB.
World Scientic, 2007.
Rate of Convergence and Tractability of the Radial Function Approximation Problem. ArXiv preprint arXiv:1012.2605,
Fasshauer GE, Hickernell FJ and Zniakowski HWO.
(2010):pp. 128.
89
90
BIBLIOGRAPHY
Fasshauer GE and Mccourt MJ.
Stable Evaluation of Gaussian RBF Interpolants.
SIAM
Journal on Scientic Computing, 34 (2012):pp. 737762.
Fornberg B and Wright G.
of the Shape Parameter.
Stable Computation of Multiquadric Interpolants for All Values
Computers & Mathematics with Applications, 48 (2004):pp.
853867.
Fornberg B and Zuev J.
The Runge Phenomenon and Spatially Variable Shape Parameters
in RBF Interpolation. Computers & Mathematics with Applications, 54 (2007):pp. 379
398.
Gerschgorin S.
Über die Abgrenzung der Eigenwerte einer Matrix.
Bulletin de l'Académie
des Sciences de l'URSS. Classe des Sciences Mathématiques et Naturelles, 25 (1931):pp.
749754.
Gilks W and Berzuini C.
namic Bayesian Models.
Following a Moving Target Monte Carlo Inference for DyJournal of the Royal Statistical Society: Series B (Statistical
Methodology), 63 (2001):pp. 127146.
Goldfarb D.
A Family of Variable-Metric Methods Derived by Variational Means.
Math-
ematics of Computation, 24 (1970):pp. 2326.
Golub GH and van Loan CF.
Matrix computations.
The Johns Hopkins University Press,
1996.
Bubble Mesh : Automated Triangular Meshing of Non-Manifold Geometry by
Sphere Packing. In Proceedings of the third ACM Symposium on Solid Modeling and
Applications, pp. 409419. 1995.
Gossard C.
Haario H, Laine M, Mira A and Saksman E.
DRAM: Ecient adaptive MCMC. Statistics
and Computing, 16 (2006):pp. 339354.
Halton J.
Algorithm 247: Radical-Inverse Quasi-Random Point Sequence.
Communica-
tions of the ACM, 7 (1964):pp. 701702.
Lecture Notes: Parameter inference for stochastic and deterministic dynamic biological processes. Technical University of Munich, (2012).
Hasenauer J and Theis F.
Hastings W.
tions.
Monte Carlo Sampling Methods Using Markov Chains and Their Applica-
Biometrika, 57 (1970):pp. 97109.
Himmelblau D, Clark B and Eichberg M.
Applied nonlinear programming.
1972.
Mahalanobis Distance with Radial Basis Function Network on
Protein Secondary Structures. Proceedings of the 24th Annual Conference on Engineer-
Ibrikci T and Brandt M.
ing in Medicine and Biology and the Annual Fall Meeting of the Biomedical Engineering
Society EMBS/BMES Conference, (2002):pp. 21842185.
Jarner S and Roberts G.
rithms.
Convergence of Heavy-tailed Monte Carlo Markov Chain Algo-
Scandinavian Journal of Statistics, 34 (2007):pp. 781815.
91
BIBLIOGRAPHY
Exploiting the Bootstrap Method for
Quantifying Parameter Condence Intervals in Dynamical Systems. Metabolic engi-
Joshi M, Seidel-Morgenstern A and Kremling A.
neering, 8 (2006):pp. 44755.
Kabatiansky G and Levenshtein V.
On Bounds for Packings on a Sphere and in Space.
Problemy Peredachi Informatsii, (1978):pp. 325.
Kac V.
Lecture Notes: 18.745 Introduction to Lie Algebras.
Massachusets Institute of
Technology, (2010).
Parameter Estimation for Stochastic Biochemical Processes : A Comparison of Moment Equation and Finite State Projection. In
Proceedings of the 10th Workshop on Computational Systems Biology. 2013.
Kazeroonian A, Hasenauer J and Theis F.
Lenstra A, Lenstra H and Lovász L.
Factoring Polynomials with Rational Coecients.
Mathematische Annalen, 534 (1982):pp. 515534.
Liu W, Jun S and Li S.
Reproducing Kernel Particle Methods for Structural Dynamics.
International Journal for Numerical Methods in Engineering, 38 (1995):pp. 16551679.
Luk F, Qiao S and Zhang W.
A Lattice Basis Reduction Algorithm.
Institute for Com-
putational Mathematics Technical Report, 10 (2010):pp. 120.
Madych W.
Error Estimates for Interpolation by Generalized Splines. Curves and surfaces,
(1991):pp. 297306.
Mahalanobis P.
On the Generalized Distance in Statistics. Proceedings National Institute
of Science, India, 2 (1936):pp. 4955.
Near Optimal Data Independent Point Locations
for Radial Basis Function Interpolation. Advances in Computational Mathematics, 23
Marchi S, Schaback R and Wendland H.
(2005):pp. 317330.
Communication: Designed Diamond Ground
State via Optimized Isotropic Monotonic Pair Potentials. The Journal of Chemical
Marcotte E, Stillinger FH and Torquato S.
Physics, 138 (2013):p. 061101.
Martinet J.
Perfect Lattices in Euclidean Spaces.
Monaghan J.
Springer, 2003.
Smoothed Particle Hydrodynamics.
Annual Review of Astronomy and
Astrophysics, 30 (1992):pp. 543547.
The Finite State Projection Approach for the Analysis
of Stochastic Noise in Gene Networks. IEEE Transactions on Automatic Control, 53
Munsky B and Khammash M.
(2008):pp. 201214.
Nadaraya E.
On Estimating Regression.
Theory of Probability & Its Applications, 9
(1964):pp. 141142.
Sobolev Bounds on Functions With Scattered
Zeros, With Applications to Radial Basis Function Surface Fitting. Mathematics of
Narcowich F, Ward J and Wendland H.
Computation, 74 (2005):pp. 743764.
92
BIBLIOGRAPHY
Generalizing the Finite Element Method: Diuse
Approximation and Diuse Elements. Computational Mechanics, 10 (1992):pp. 307
Nayroles B, Touzot G and Villon P.
318.
Novak E and Wo¹niakowski H.
functions is intractable.
Platte R.
verge?
Approximation of innitely dierentiable multivariate
Journal of Complexity, (2009):pp. 19.
How Fast Do Radial Basis Function Interpolants of Analytic Functions ConIMA Journal of Numerical Analysis, 31 (2011):pp. 15781597.
Potts D and Steidl G.
Fast Summation at Nonequispaced Knots by NFFT.
SIAM Journal
on Scientic Computing, 24 (2003):pp. 20132037.
Ramsay PH and Scott DW.
Visualization.
Multivariate Density Estimation, Theory, Practice, and
Wiley, 1993.
Quantitative Dynamic Modeling: Theory and Application to Signal Transduction
in the Erythropoietic System. Ph.D. thesis, 2013.
Raue A.
Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U and Timmer J.
Structural and Practical Identiability Analysis of Partially Observed Dynamical Models
by Exploiting the Prole Likelihood. Bioinformatics, 25 (2009):pp. 19239.
A Self-Organizing Lagrangian Particle Method
for Adaptive Resolution Advection-Diusion Simulations. Journal of Computational
Reboux S, Schrader B and Sbalzarini IF.
Physics, 231 (2012):pp. 36233646.
Rechtsman M, Stillinger F and Torquato S.
Methods for Self-Assembly.
Physical Review E, 73 (2006a):p. 011406.
Rechtsman M, Stillinger F and Torquato S.
with an Isotropic Potential.
Designed Interaction Potentials via Inverse
Self-Assembly of the Simple Cubic Lattice
Physical Review E, 74 (2006b):p. 021404.
An Algorithm for Selecting a Good Value for the Parameter c in Radial Basis
Function Interpolation. Advances in Computational Mathematics, 11 (1999):pp. 193
Rippa S.
210.
GMRES: A Generalized Minimal Residual Algorithm for Solving
Nonsymmetric Linear Systems. SIAM Journal on Scientic and Statistical Computing,
Saad Y and Schultz M.
7 (1986):pp. 856869.
Multiquadric Radial Basis Function Approximation Methods for
the Numerical Solution of Partial Dierential Equations. Advances in Computational
Sarra SA and Kansa EJ.
Mechanics, 2 (2009).
Sbalzarini IF, Walther JH, Bergdorf M, Hieber SE, Kotsalis EM and Koumoutsakos P.
PPM - A Highly Ecient Parallel Particle-Mesh Library for the Simulation of Continuum Systems. Journal of Computational Physics, 215 (2006):pp. 566588.
A Vine-copula Based Adaptive MCMC Sampler
for Ecient Inference of Dynamical Systems. Bayesian Analysis, 8 (2013):pp. 122.
Schmidl D, Czado C, Hug S and Theis FJ.
93
BIBLIOGRAPHY
Discretization Correction of General Integral
PSE Operators for Particle Methods. Journal of Computational Physics, 229 (2010):pp.
Schrader B, Reboux S and Sbalzarini IF.
41594182.
Serban R and Hindmarsh A.
bilities.
CVODES: An ODE Solver with Sensitivity Analysis Capa-
ACM Transactions on Mathematical Software, 31 (2005):pp. 363396.
Shanno D and Kettler P.
Optimal Conditioning of Quasi-Newton Methods.
Mathematics
of Computation, 24 (1970):pp. 657664.
A Two-Dimensional Interpolation Function for Irregularly-Spaced Data.
Proceedings of the 1968 23rd ACM National Conference, pp. 517524. 1968.
Shepard D.
In
Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC,
Silverman B.
1986.
Tarantola A.
Inverse Problem Theory and Methods for Model Parameter Estimation.
Society for Industrial & Applied, 2005.
Reformulation of the Covering and Quantizer Problems as Ground States of
Interacting Particles. Physical Review E, 82 (2010):pp. 152.
Torquato S.
Watson G.
Smooth Regression Analysis. Sankhya:
The Indian Journal of Statistics, Series
A, 26 (1964):pp. 359372.
Piecewise Polynomial, Positive Denite and Compactly Supported Radial
Functions of Minimal Degree. Advances in Computational Mathematics, 4 (1995):pp.
Wendland H.
389396.
Wendland H.
Scattered Data Approximation.
Wu Z and Schaback R.
Scattered Data.
Wyner AD.
Cambridge University Press, 2005.
Local Error Estimates for Radial Basis Function Interpolation of
IMA Journal of Numerical Analysis, 13 (1993):pp. 115.
Capabilities of Bounded Discrepancy Decoding.
The Bell System Technical
Journal, 44 (1965):pp. 10611122.
Sensitivity Analysis and Identiability For Dierential Equation
Proceedings of the 40th IEEE Conference on Decision and Control, pp.
Wynn H and Parkin N.
Models.
In
31163121. 2001.
Improved Fast Gauss Transform and
Proceedings of the Ninth IEEE International
Yang C, Duraiswami R, Gumerov NA and Davis L.
Ecient Kernel Density Estimation. In
Conference on Computer Vision, pp. 664671. 2003.

Download Report

Approximation and Analysis of Probability Densities using

Paperzz.com

Your Paperzz