Sparse Dynamic SPECT Modelling and Reconstruction

Sparse Dynamic SPECT
Modelling and Reconstruction
Master Thesis
submitted in fulfillment of the requirements for the degree of
Master of Science
Institute for Numerical and Applied Mathematics
University of Münster, Germany
Supervisors:
Prof. Dr. Martin Burger
Prof. Dr. Xiaoqun Zhang
Submitted by:
Carolin Maria Roßmanith
Münster, September 2014
i
Abstract
This work deals with the reconstruction of dynamic Single Photon Emission Computed
Tomography (SPECT). We make use of a basis approach for the unknown tracer concentration by assuming that the region of interest can be divided into subregions with
spacially constant concentration curves. Applying a regularized variational framework
we simultaneously reconstruct both the indicator functions of the subregions as well
as the subconcentrations within each region. We are going to present some analysis of
the variational model and derive a necessary optimality condition as well as a source
condition which allows some basic convergence estimates. Furthermore, two different
algorithms are tested on exact and noisy artificial data. In order to verify the reasonability of the reconstruction approach and the chosen variational model we finally
analyze and compare the results and present some error measures.
ii
Eidesstattliche Erklärung
Hiermit versichere ich, Carolin Maria Roßmanith, dass ich die vorliegende Arbeit
selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel
verwendet habe. Gedanklich, inhaltlich oder wörtlich übernommenes habe ich durch
Angabe von Herkunft und Text oder Anmerkung belegt bzw. kenntlich gemacht. Dies
gilt in gleicher Weise für Bilder, Tabellen, Zeichnungen und Skizzen, die nicht von mir
selbst erstellt wurden.
Alle auf der CD beigefügten Programme sind von mir selbst programmiert worden,
mit Ausnahme von Radontransform.c (zur Verfügung gestellt von Prof. Dr. Xiaoqun
Zhang, Shanghai Jiaotong University, China) sowie projsplxMatrix.m (zur Verfügung
gestellt von Hendrik Dirks, Westfälische Wilhelms-Universität Münster).
Münster, 24. September 2014
Carolin Roßmanith
iii
Acknowledgements
I would like to thank everyone who supported me by whatever means on my way
through several lectures, seminars and exams until I finally reached the point to be
able to finish this thesis, especially
• Prof. Dr. Martin Burger for being a great supervisor, having the patience to
always answering all my questions and for setting the stage for an unforgetable
research stay at the other side of the world.
• Prof. Dr. Xiaoqun Zhang from the Shanghai Jiaotong University for her irreplaceable support before, during and after my semester abroad, for introducing
me to a fascinating new world and for offering me an interesting and challenging
topic for my thesis.
• Qiaoqiao Ding for sharing her ideas with me and for taking me by the hand when
I was alone in a new environment.
• Hendrik Dirks for always being there for me, even when I was thousands of miles
away, for always believing in me, even when I lost all my courage, and for his
great technical and mental support in all kinds of situations.
• Dr. Martin Benning for his advice and his patience when answering my questions.
• Elin Sandberg and Dr. Rachel Hegemann for their extraordinary proofreading
skills.
• My family, Magdalene and Ulrich Roßmanith, for always being there for me and
for holding me when I stumbled.
• My fellow students and friends for all their advice and mental support.
• The Heinrich-Hertz-Foundation for offering me the great possibility to study
abroad.
iv
Contents
1 Introduction
2 Introduction to emission tomography
2.1 The Principles of PET and SPECT
2.2 PET versus SPECT . . . . . . . . .
2.3 Clinical Applications . . . . . . . .
2.4 Mathematical Model . . . . . . . .
2.4.1 SPECT Modelling . . . . .
2.4.2 Dynamic SPECT . . . . . .
2.4.3 Discretized Model . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Mathematical Background
3.1 The Concepts of Mathematical Imaging . . .
3.1.1 The Definition of an Image . . . . . .
3.1.2 Image Noise . . . . . . . . . . . . . .
3.1.3 Error Measures . . . . . . . . . . . .
3.2 Inverse Problems . . . . . . . . . . . . . . .
3.3 Variational Methods . . . . . . . . . . . . .
3.3.1 Construction of the Data Term Based
3.3.2 Regularization Theory . . . . . . . .
3.3.3 Basic Functional Analysis . . . . . .
3.3.4 Bregman Distances . . . . . . . . . .
3.3.5 Error Measures and Source Condition
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
on
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Statistical
. . . . . .
. . . . . .
. . . . . .
. . . . . .
4 Dynamic SPECT Image Reconstruction Using a Basis
4.1 Basis Representation . . . . . . . . . . . . . . . . .
4.1.1 Basis Pursuit and Sparsity . . . . . . . . . .
4.1.2 Common Reconstruction Methods . . . . . .
4.1.3 A Slightly Different Approach . . . . . . . .
4.2 Selection of the Variational Model . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Modelling
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Approach
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
4
7
8
8
10
11
.
.
.
.
.
.
.
.
.
.
.
12
12
12
13
14
15
16
17
21
27
37
38
.
.
.
.
.
44
44
44
45
47
49
Contents
4.3
4.4
v
Discretization of the Model .
Analysis of the Model . . . .
4.4.1 Optimality Condition
4.4.2 Source Condition . .
4.4.3 Existence . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Numerical Realization
5.1 A Forward-Backward EM-Type Algorithm . . . . . . . .
5.1.1 The EM Algorithm . . . . . . . . . . . . . . . . .
5.1.2 The Regularized Problem . . . . . . . . . . . . .
5.1.3 The Weighted Denoising Subproblems . . . . . .
5.2 A Primal-Dual Algorithm for Constrained Minimization .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Computational Results
6.1 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Reconstruction Results . . . . . . . . . . . . . . . . . . . . .
6.2.1 Reconstruction Without Regularization . . . . . . . .
6.2.2 Reconstruction With Regularization Via Algorithm 2
6.2.3 Reconstruction With Regularization Via Algorithm 5
6.3 Comparison of the Methods . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
53
53
63
65
.
.
.
.
.
76
76
76
80
82
88
.
.
.
.
.
.
91
91
92
92
93
100
104
7 Conclusion and Outlook
105
List of Figures
107
List of Tables
109
Bibliography
110
1
1 Introduction
Recent years have brought along several improvements in modern medicine. Such improvements go hand in hand with an increasing demand for better medicine technology,
such as imaging techniques like Computed Tomography (CT) or Magnetic Resonance
Imaging (MRI). These tools have attained supreme importance since they allow us to
gain an insight into a living organism without any surgical operation. We are able to
provide a medical diagnosis based on frames or videos provided by a combination of
high technology imaging hardware and mathematical reconstruction techniques. The
mathematics behind a medical imaging procedure still bear a lot of room for improvements, such as providing a better spacial or temporal resolution, faster algorithms and
smaller error sensibility.
Under all common medical imaging tools, the so-called emission tomography with
its main branches Positron Emission Tomography (PET) and Single Photon Emission Computed Tomography (SPECT) attracted increasing attention within the last
decades. Unlike CT, emission tomography is a branch of functional imaging methods,
which visualize biochemical and physiological processes instead of physical structures.
Therefore it is of special importance in cancer diagnosis and research, since it enables
the detection of tumor cells within healthy tissue.
This work deals with dynamic SPECT reconstruction, a branch of emission tomography which aims at providing a series of images visualizing time-dependent processes
within an organism. The advantages of this method are quite obvious: In contrast to
a static reconstructed image, which is certainly unable to include information about
blood flow etc., it gives more freedom and enhanced possibilities. Nevertheless, dynamic SPECT reconstruction is a task of high complexity and is not possible without
including certain a-priori information. In this thesis we make a new reconstruction
approach, which slightly differs from existing methods: We assume that the region of
interest can be subdivided into areas with temporarily constant borders and within
each subregion, the tracer concentration remains spacially independent. Then we re-
1
Introduction
2
construct the indicator functions of the subregions as well as the subregional tracer
concentrations without further modelling.
The thesis is organized as follows. In the second chapter we go deeper into the backgrounds of PET and SPECT and present some clinical applications as well as the
mathematical model. Chapter 3 contains the mathematical background knowledge,
which is necessary to understand the following work. In chapter 4 we present our
reconstruction approach in detail and create an appropriate variational model in order to solve the inverse problem of dynamic SPECT reconstruction. Furthermore, we
present some analysis of the model. In chapter 5 we outline two suitable algorithms
and present the results in chapter 6. Some error measures and a discussion about the
methods follow. Finally we end with a conclusion and an outlook to possible future
works in chapter 7.
3
2 Introduction to emission
tomography
The aim of this chapter is to provide an overview of the field of medical imaging
via emission tomography. We will first define the basic concepts of PET and SPECT,
outline the differences and similarities, and then give examples of popular applications.
Afterwards, we will introduce the reader to the mathematical model of SPECT. The
whole chapter is mainly based on [41].
2.1 The Principles of PET and SPECT
Emission tomography is a form of medical imaging which uses radioactive materials
to visualize the distribution of a tracer in a patient’s body. It encompasses two main
techniques - Positron Emission Tomography (PET) and Single Photon Emission Computed Tomography (SPECT), which both have several applications in modern nuclear
medicine. Both methods are based on the so-called tracer principle, which proves that
the behaviour of a substance with added radioactive components is the same as the behaviour of the ’raw’ substance. In both PET and SPECT such a radioactive substance,
the so-called radiotracer, is injected into a living organism and starts to participate in
its natural processes. Thereby it is carried to a particular region according to the
properties of the tracer substance. Due to the radioactive decay it permanently emits
gamma ray photons. These photons are detected by special gamma cameras surrounding the patient. They rotate piece by piece around the object of interest and count
photons from several projection angles. These camera measurements enable us to locate and visualize the tracer in the organism and therefore provide an insight into the
patient’s body. This makes it possible, for example, to indicate tumor corrupted cell
tissue, using a radiotracer which is likely to be attracted by tumor cells. The arising
mathematical task now is to find a way to deduce the tracer distribution in the or-
2
Introduction to emission tomography
4
Figure 2.1: A typical double detector SPECT scanner [1]
ganism from the given measurements of the gamma cameras, i.e. to locate the exact
position where a counted photon came from.
Figure 2.2: 3D SPECT reconstruction of the brain: Transaxial slices (first row), coronal
slices (second row), sagital slices (third row) [41]
2.2 PET versus SPECT
While PET and SPECT share the same basic idea and practical realization, the main
difference between the two techniques lies in the radioisotope which is used in the process.
PET requires so-called positron emitters. As the name already suggests, a positron is
ejected from the nucleus and almost immediately hits an electron in the surrounding
cloud, which leads to annihilation and as a result, to the production of two gamma
ray photons moving in opposite directions. In PET studies, the patient is surrounded
2
Introduction to emission tomography
5
Figure 2.3: A circular PET camera [41]
by a circular camera consisting of a set of detector bins. This type of camera is able
to measure the timing of the counted photons. When two photons are detected from
different bins approximately at the same time, it is assumed that they originate from
the same emitted positron. Therefore, one can conclude that the decay must have happened somewhere along the line in between the two detector bins. This assumption is
incorporated into the mathematical model and mainly influences the attenuation term,
which is part of the attenuated Radon transform (see section 2.4) [41], [21].
On the contrary, in SPECT imaging so-called single photon emitters are applied, where
only one photon per decay is emitted. This requires a different type of camera. Whereas
in PET, the two-photon principle immediately indicates the line on which the origin
of the decay must be located, in SPECT imaging, we do not know from which direction the detected photon came from without additional information. This information is provided by the so-called collimator principle. The gamma camera contains a
honeycomb-like sheet consisting of a gamma ray blocking material and is perforated
with small channels called collimator bins. These channels only let photons coming
from a certain direction pass the sheet. Others are either blocked, meaning that they
hit the sheet outside or inside of the channels, or completely miss the collimator. Thus,
if a photon is detected by a particular bin, one immediately knows that the corresponding decay must have happened somewhere along the vertical line through the collimator
channel. This principle is illustrated in figure 2.4 [41].
2
Introduction to emission tomography
6
Figure 2.4: The collimator principle in SPECT imaging [41]
To discuss the advantages and disadvantages of PET and SPECT, several different
aspects must be considered. Issues with sensitivity, spatial and, in case of dynamic
imaging, temporal resolution as well as special problems can emerge during the imaging process (in the following cp. [27]).
In SPECT, the usage of collimators explicitly decreases the sensitivity by simply sorting out a high percentage of emitted photons which do not pass the collimator channels,
whereas PET does not require such an artificial limitation. To overcome this problem,
there are several approaches that attempt to find an optimal design of the collimator
cameras. One approach, for example, is to use shorter collimator channels so that the
photons can pass the sheet at a slightly wider angle. In turn; however, the resulting
image has a lower spatial resolution due to the fact that the origin of the detected
photon is defined less precisely.
In both methods, the spatial resolution is more or less limited by technology questions, such as collimator design in SPECT’s case. Additionally, PET suffers from some
inaccuracies caused by positron range (the average distance a positron travels until
it annihilates) and non-collinearity of the photons (the existence of a small range in
deviation in the 180◦ angle of the emitted photons). These effects produce resolution
blurring. Especially in dynamic SPECT imaging, one also faces the problem of find-
2
Introduction to emission tomography
7
ing a good temporal resolution, which is a more challenging task than in PET due to
the rotation period of the camera. We will have a closer look at this problem in the
later course of this work. To overcome the problem of low spatial resolution, PET and
SPECT are often used in combination with CT in the development of hybrid imaging
techniques such as PET/CT and SPECT/CT [26].
Other important influencial factors are attenuation correction and random coincidences
in PET. Indeed, tissue-dependent attenuation of the gamma rays on their way through
the body plays an important role in both methods. However, in SPECT, this is far
more difficult to address because, in contrast to PET, the attenuation function also depends on the origin of the counted photon. Random coincidences in PET can produce
small reconstruction errors when two photons from different simultaneously occurring
decay events are detected and therefore, assumed to originally come from the same
event.
The main advantage of SPECT over PET imaging, as often mentioned in today’s
literature, is the greater expense [11]. This is due to the usage of very transient radiotracer substances in PET studies. The substances which are most employed for
practical use in PET are 11 C with a half-life of 20.3 minutes and 18 F with 110 minutes.
In contrast, SPECT studies commonly work with 99m Tc (6 hours) and 123 I (13.2 hours)
[26].
2.3 Clinical Applications
Unlike CT, SPECT belongs to the field of so-called functional imaging, meaning that
it does not visualize anatomical structures but biochemical and physiological processes
of an organism. Therefore, a common application of SPECT (often in combination
with CT) would allow for the investigation of tumor cells. While CT is only able to
visualize the outward appearance of the tumor cells, SPECT enables the ability to see
whether a tumor cell is still active (i.e. the glucose consumption is relatively high) and
therefore dangerous for the organism [21].
Another widespread application of SPECT studies is regional cerebral blood flow [11],
which was first used to indicate a stroke. Usually, a stroke is caused by a decreased
blood flow rate within the brain. This effect can be measured in the form of a decreased signal using a dynamic SPECT scan. It can also determine the size of the
2
Introduction to emission tomography
8
tissue which is likely to be infected by a subsequent infarction and therefore might
prevent secondary failures [11].
Certainly, there exists a broad variety of other clinical applications of SPECT imaging,
which prove that the role of SPECT for today’s medicine is of great importance. The
interested reader will find more information on this topic in [11], [25] and [4].
2.4 Mathematical Model
2.4.1 SPECT Modelling
In this section we want to give a description of the common mathematical model of
Single Photon Emission Computed Tomography (SPECT), which explains the relationship between the radiotracer concentration in the body and the measured data from a
mathematical point of view. This model is mainly based on [41] and [39].
In a two-dimensional space, the distribution of the radioactive substance in a patient’s body can be described by a continuous function f : R2 → R such that f (x)
equals the concentration in x ∈ R2 . Each gamma camera position is specified by the
angle ϕ ∈ [0, 2π), while s ∈ R defines the position of a collimator. Thereby, every
data collection point in the camera can be described by the coordinates (θ, s), where
θ = (cosϕ, sinϕ)T , so our measured data can be represented by g(θ, s) with a continuous function g : C 2 → R defined on
C 2 := (θ, s) : θ ∈ S 1 , s ∈ R ,
(2.1)
where S 1 describes the unit sphere in R2 .
Applying the notations above we can now model the relationship between f and g.
Every given measured data entry g (θ, s) can be mathematically seen as a sum of
counted photons, which were emitted somewhere along a straight line perpendicular to
the camera angle ϕ and counted in the collimator with distance s. In our continuous
model, it therefore makes sense to describe the data as a line integral over the tracer
distribution f, i.e.
2
Introduction to emission tomography
9
Z
g (θ, s) =
f (x) dx
(2.2)
L(θ,s)
where L (θ, s) := tθ + sθ⊥ : t ∈ R . This line integral is known as the so-called Radon
transform:
Definition 2.1 (Radon transform).
For f : R2 → R, the Radon transform is defined by
Z
Rf (θ, s) :=
f (x) dx.
L(θ,s)
In order to obtain a more realistic model we also have to consider attenuation of gamma
rays. While passing through different regions in the body the intensity of the emitted
gamma rays becomes slightly weakened to various extents depending on the tissue type.
To consider the presence of attenuation in our model, we want to include an attenuation term in the line integral, which as a result, turns into the so-called attenuated
Radon transform:
Figure 2.5: Visualization of a two-dimensional object and the projection process [41]
Definition 2.2 (Attenuated Radon transform).
For f, µ : R2 → R, the attenuated Radon transform is defined by
Z
Rµ f (θ, s) :=
L(θ,s)
f (x) e−
R∞
x
µ(y)dy
dx.
2
Introduction to emission tomography
10
µ is the attenuation function which describes the intensity of attenuation in every point
in the image domain. Note that this point causes the main difference between PET
and SPECT: In SPECT, the attenuation function depends on the point where the
counted photon came from, while in PET, the attenuation takes place along the whole
integration line. Commonly the attenuation term is chosen to be exponential because
with increasing distance from its location to the detector the gamma rays become
exponentially weaker. In this thesis we want to assume that the attenuation function
µ is known a priori, so we can ignore it in the following work. If µ was unknown, we
would have to deal with the problem of simultaneously determining f and µ, preferably
without the need of additional measurements, which could cause more stress or damage
for the patient. For a more detailed discussion on this issue, see [32].
2.4.2 Dynamic SPECT
One of the main drawbacks of ’static’ SPECT is the fact that the measurement process
naturally takes a certain period of time, during which the patient is told to avoid any
movement. Nevertheless he is not able to stop small motion like heart beat, breathing
or shivering. This will cause an error in the measured data which is not easy to control.
Additionally one is often interested in measuring dynamic processes like blood flow,
which can not be recorded by static SPECT imaging. Therefore it makes sense to
extend the basic idea of SPECT by adding a time component to our previous model,
i.e. our tracer distribution function f as well as the Radon integral and measured data
function g depend on time [24]:
Z
g (θ, s, t) =
f (x, t) e−
R∞
x
µ(y)dy
dx = Rf (θ, s, t)
(2.3)
L(θ,s)
In this case, the task of reconstructing the tracer distribution reaches a higher level of
difficulty. Due to the fact that we have significantly more unknown components, i.e.
degrees of freedom in our unknown functional f , but the amount of given information
does not or hardly change, we are not able to simply apply the standard methods of
static SPECT in order to reconstruct every time step image independently. Moreover,
obtaining a high temporal resolution means dealing with low quality data since we have
less projections for each time step. .It is clear that a way to combine the different time
step images and obtain more a priori information must be found in order to acquire a
suitable solution.
2
Introduction to emission tomography
11
2.4.3 Discretized Model
In order to develop a practical method for solving the reconstruction problem described
above, we need to adapt our model for the discretized (for our observations, the twodimensional) case. For that purpose, let us divide the region of interest into n1 · n2 = n
pixels, and assume that the number of available time steps is N with interspace ∆t.
The collimator camera consists of m bins s1 , . . . , sm altogether, where every bin has
length ∆s. Thus, we can compute the projection data in detector bin i at time step tk
as
Z
si +∆s
Z
tk +∆t
g (θk , s, t) dtds
gi (tk ) =
si
Z si +∆s
tk
Z tk +∆t
Z
=
si
tk
(2.4)
L(θk ,s)
f (x, t) µ
e (x) dxdtds
(2.5)
R∞
where µ
e (x) := e− x µ(y)dy (cp. [24]). We are interested in reconstructing the tracer
value in every pixel i at every time step tk , so by discretizing the integrals using a
suitable quadrature and considering the Radon transform R as a linear operator, we
can transform our ideal model into a matrix equation, i.e.

 

f11 . . . f1N
g11 . . . g1N
 .

.. 
.. 
 =  ...
..
R·
.
. 

 

fn1 . . . fnN
gm1 . . . gmN
(2.6)
Here, the k-th column of the matrix g contains the measured projections at the k-th
time step. R is a m × n-matrix. Usually, this model provides a high number of degrees
of freedom, since naturally the number of image pixels n is significantly higher than the
number of detector bins m. Therefore, this matrix equation is highly underdetermined
in most cases.
12
3 Mathematical Background
In this chapter, we want to provide a brief introduction to the mathematical background. Beginning with the basic definitions of a mathematical image and noise, we
will then shift to the main concepts of inverse problems and the application of variational methods as an appropriate solving technique. In this context, we are going
to present some basic functional analysis, which is necessary to fully understand the
mathematical task we are going to solve within this work.
3.1 The Concepts of Mathematical Imaging
3.1.1 The Definition of an Image
In mathematical image modelling, we usually distinguish between two types of images:
The ’ideal’ continuous image as a function and the ’real’ digital image, which can be
seen as a discretization of the continuous type [35].
Definition 3.1 (Mathematical image).
(a) A continuous image is a function f : Ω → R, where Ω ∈ Rd is the image region.
(b) A digital image is a matrix F ∈ RN1 ×...×Nd .
d is the called the image dimension.
Typically, the dimension d is an element of {2, 3, 4} for a 2D, 3D or 3D time-dependent
image. There exists a close relationship between these two definitions, which allows
the transition from one to another: If we divide the image region Ω into N1 × . . . × Nd
pixels (or voxels) and let every matrix entry of F equal the average of f over the corresponding pixel, we can easily transform the continuous image into an image matrix.
For the other way round, starting from a matrix F , we can define f as a piecewise
constant function in every pixel which together form the continuous region Ω. This
relation is obviously not bijective: Using the previously described methods to turn a
3
Mathematical Background
13
continuous image into a matrix and then back into a function will generally not lead
to the same continuous image as before, due to the loss of information about f by
computing the average over certain pixels [35].
In the context of mathematically modelling an image, the question of how to measure certain image properties arises. Therefore, both definitions of an image can be
useful. We want to mention some examples for the main tasks in mathematical imaging
which will play a role in this thesis:
• Denoising: In practice, one is often confronted with an image where some pixels
are noisy or fully destroyed due to several reasons, for example measuring errors
in emission tomography. The goal of denoising is to ’correct’ the value of these
pixels and therefore to improve the image quality [35]. For the basic concept and
mathematical description of noise, see section 3.1.2.
• Segmentation: In some cases we are not interested in an image itself, but
only in some parts of it, like locating tumor cells in organic tissue. For that
purpose image segmentation can be applied to separate the image into subregions
concerning different criteria, either finding imaged objects or locating sharp edges
[35].
• Reconstruction: Image reconstruction is, in the mathematical sense, the main
task in emission tomography and the foundation of this thesis. The goal is to
transform given measurements (which usually appear in the form of an image
itself) into the image one is originally interested in, like the distribution of a
tracer in a patient’s body. In general, the task of image reconstruction is an
ill-posed inverse problem (see section 3.2 for further information) [14].
3.1.2 Image Noise
In mathematical imaging, one can be faced with different types of image noise. Typically, noise is defined as a random process with an underlying probability distribution.
The noisy image can be modelled as a modification of the exact one, either by simply adding or multiplying with a random variable (additive/multiplicative noise) or by
defining an operator acting on the exact image. The content of this section is based
on [35].
3
Mathematical Background
14
In case of additive noise and an exact (probably unknown) image f , the noisy image fe is given by
fe = f + δ
(3.1)
with a random variable δ. If the underlying distribution of δ is the Gaussian distribution, we simply speak of Gaussian noise, which is added pointwise to the exact value of
a discrete image f . In case of, for example, a Gamma distributed variable we assume
that the noise is multiplicative, i.e.
fe = f · δ
(3.2)
Especially in many medical imaging techniques, a third way of noise modelling is used,
which is neither additive nor multiplicative. Hence, the noisy image is modelled as
fe = δ (f )
(3.3)
A common example which will play a major role in this work is Poisson noise, which
typically occurs in combination with radioactive decay and is caused by errors in the
number of counted photons.
3.1.3 Error Measures
To measure the quality of a reconstructed image, one is able to determine the absolute
or relative error (if the exact image is known). Let us denote u as the image after we
applied some denoising or reconstruction method, f as the exact image and fe as the
noisy image, then we can compute
eabs (u) := ku − f k
(3.4)
ku − f k
kfe − f k
(3.5)
erel (u) :=
A common modification of the relative error is the so-called signal-to-noise ratio (SNR)
3
Mathematical Background
15
or peak signal-to-noise ratio (PSNR):
SN R (u) := −10 log10
ku − f k2
kf k2
P SN R (u) := −10 log10
2
ku − f k2
kf k∞
(3.6)
2
(3.7)
The PSNR compares the difference between the exact and the reconstructed image
with the maximum value of the exact image, the so-called peak. Due to the fact that
the range of the fraction can be very high, the logarithmic scale is applied. In the
meantime, the negative of the logarithm causes the SNR or PSNR to be high in case
of a good quality of the reconstructed image and low in case of a poorly reconstructed
image.
Another way of measuring the image quality is to compare two images via structural
similarity (SSIM) [38]. This method computes the mean µu , µf , variance σu , σf and
covariance σuf of two images u and f and then determines
SSIM (u, f ) :=
(2µu µf + C1 ) (2σuf + C2 )
µ2u + µ2f + C1 σu2 + σf2 + C2
(3.8)
with C1 = (K1 L)2 and C2 = (K2 L)2 , where L describes the dynamic range of the image
(e.g. 255 in a gray scale image with an intensity range of 0, . . . , 255) and K1 , K2 are
scalar constants [38].
3.2 Inverse Problems
One of the basic concepts in mathematical imaging as well as in many natural sciences is
an inverse problem. This general framework is applied to turn observed measurements
back into the underlying cause. Like in Emission Tomography and many other medical
applications, one is often interested in a certain distribution or parameter, but can
only measure its effect under certain conditions. Mathematically, an inverse problem
is usually described by an equation of the form
3
Mathematical Background
16
K(x) = y
(3.9)
where y is the measured data or the effect and x is the unknown variable we are interested in. K is an operator which represents the relationship between data and unknown
[7]. A common example for an inverse problem is the already mentioned Radon transform [39].
In order to find a suitable solution to an inverse problem, one is commonly faced
with a very frequent property, namely their ill-posedness. A mathematical proplem is
said to be well-posed, if the following three conditions are satisfied [21]:
1. Existence: There exists at least one solution to the problem.
2. Uniqueness: The existing solution is unique.
3. Stability: The solution depends continuously on the data.
If one or more conditions are violated, the problem is called ill-posed. In many cases,
we will face problems where the third condition is not satisfied even if the given data is
free of noise. If the problem is not stable, the application of simple numerical methods
will often lead to an unsuitable solution. To overcome instability, we need to search
for a technique to include some a-priori information to further specify the solution.
This can be done by so-called regularization methods, which are supposed to lead to
an approximation of the solution [35].
In order to solve an inverse problem, a frequently applied technique that comes along
with several benefits are variational methods. In the following section, we want to give
an introduction to variational methods including regularization theory in order to gain
a greater insight into the mathematical components of our problem [35].
3.3 Variational Methods
We now want to introduce the basic concepts of variational methods, which play a
major role in modern medical imaging and form the foundation of this work. This
section is mainly based on [35] and [21].
3
Mathematical Background
17
Generally speaking, a variational method is a technique in mathematical optimization that formulates the energy of a solution as a so-called energy functional and leads
to an optimal solution by minimizing the energy. This energy basically consists of a
data term and one or more regularization terms, which means, in mathematical terms,
J (x) = D (x, y) + αR (x) .
(3.10)
or in terms of finding a solution of an inverse problem of the form (3.9)
J (x) = D (K (x) , y) + αR (x) .
(3.11)
The data term D has to make sure that the energy becomes small if the solution x
’fits’ the given data y, whereas any additional a priori information can be included in
the regularization term R. α is a parameter used for weighing the relation between the
two terms. The goal is to minimize the functional J with respect to x to obtain the
optimal solution.
Similar to inverse problems, the questions arising from the examination of variational
methods are mainly the following [21]:
• Existence: Does a minimizer of the functional exist?
• Uniqueness: Is the minimizer of the functional unique?
• Stability: Does the solution depend continuously on the data for a fixed parameter α?
3.3.1 Construction of the Data Term Based on Statistical
Modelling
An intuitive way to design a variational model with an appropriate data term is based
on statistical modeling and makes use of Bayes’ theorem. This section is based upon
[35] and [14].
Let us first assume that we want to find a solution x of the inverse problem K (x) = y
with given data y and an operator K, where the data may be corrupted by any kind
3
Mathematical Background
18
of noise. The idea of the Bayesian model is to define both x and y as random variables
and find an optimal solution by maximizing the a posteriori probability of x under
measured data y, which means that x must be the solution which makes the observation y most probable. According to the Bayes’ formula, this probability can be written
as
P (x|y) =
P (y|x) P (x)
.
P (y)
(3.12)
According to the present noise, every possible solution x can be attributed with an a
priori probability P (x). Additionally, we need to compute the conditional probability
of our observed data under the assumption that the measurements were caused by the
effect x.
Due to the high-dimensionality of the underlying probability densities, we are interested in constructing suitable estimators for these probabilities. By maximizing the a
posteriori probability, we obtain the so-called maximum a posteriori probability (MAP)
estimator for the solution x∗ :
x∗ = argmax P (x|y)
(3.13)
x
Applying the Bayes’ theorem, we obtain
x∗ = argmax P (y|x) P (x)
(3.14)
x
The probabiliy P (y) does not depend on the value of interest x and can therefore be
ignored or seen as a scaling parameter. To find our solution x∗ , we now need to find a
suitable description of P (y|x) and the a priori probability P (x). Since both depend on
the presence of noise, we want to outline the procedure by means of the simple example
of denoising a Gaussian noise-corrupted image.
The Gaussian Denoising Problem
In the presence of Gaussian noise, the relationship between the exact image x and
the given noisy image y can be described by y = x + n with a normally distributed
3
Mathematical Background
19
random variable n that describes the noise added to the image. In the cases of a twodimensional discrete image, addition occurs element-wise, which means that for every
pixel (i, j) of the noisy image we have yij = xij + nij . Hence, the nij are independent
and identically normally distributed with expected value 0 and variance σ 2 , and we
obtain
2
P (y|x) =
Y
i,j
(yij −xij )
1
√ e− 2σ2 .
σ π
(3.15)
For the a priori probability P (x), we can apply a so-called Gibbs model, a general way
to describe which image best fits all a priori known information about the structure of
the solution:
P (x) = Ce−βR(x)
(3.16)
with a constants C and β and a functional R. This model will naturally lead to the
regularization term R (x). Combining these two terms, we obtain an equation for our
maximization problem:
2
x∗ = argmax
x
Y
i,j
(yij −xij )
1
√ e− 2σ2 Ce−βR(x) .
σ π
(3.17)
In order to simplify this expression, one can apply the negative logarithm on the right
side, which will turn the maximization problem into the problem of finding a minimizer.
Aside from that, it will not influence the solution, i.e. (ignoring some constant terms)
x∗ = argmin − log
x
= argmin
x
Y
i,j
1
√ e
σ π
(yij −xij )
−
2
!
2σ 2
1X
(yij − xij )2 + σ 2 βR (x)
2 i,j
− log Ce−βR(x)
(3.18)
(3.19)
For N → ∞, where N equals the number of pixels, we receive the continuous data
term
3
Mathematical Background
20
1 X
(yij − xij )2
2N i,j
N →∞
→
1
2
Z
(y − x)2 .
(3.20)
Ω
For the second part, this procedure will lead to the regularization term
1 2
σ βR (x)
N
N →∞
→
αR (x) .
(3.21)
The Kullback-Leibler Divergence
In order to solve reconstruction problems like those arising in emission tomography,
another type of data term is of special interest. In a radioactive decay process, the typically occuring noise is Poisson-distributed. Hence, we need to modify our previously
described method to find a data term which suits this type of corruption and finds an
optimal solution of the inverse reconstruction problem K (x) = y.
In general, a random variable X is Poisson distributed, if we have
P (X = k) =
λk −λ
e
k!
(3.22)
where λ = EX is the expected value. As before, we can compute the probability of the
observation y under given x and the assumption of Poisson noise with Ey = K (x) as
P (y|x) =
Y (K (x))yijij
i,j
yij !
e−(K(x))ij .
Again refering to the Gibbs model, for our optimal solution x∗ it holds that
(3.23)
3
Mathematical Background
21
x∗ = argmax P (y|x) P (x)
(3.24)
x
= argmin − log P (y|x) − log P (x)
(3.25)
x
Y (K (x))yijij
= argmin − log
x
i,j
= argmin
x
X
yij !
!
e−(K(x))ij
− log(Ce−βR(x) )
((K (x))ij − yij log(K (x))ij + log(yij !)) + βR (x)
(3.26)
(3.27)
i,j
Hence, we obtain for the continuous model with N → ∞
1 X
((K (x))ij − yij log(K (x))ij )
N i,j
N →∞
Z
→
(K (x) − y log(K (x)))
(3.28)
Ω
To ensure the data term will always be positive, we add the constant term − (y − ylog (y)),
which will not influence the estimator further, but rather lead to
Z
K (x) − y log(K (x)) − (y − y log(y)) ≥ 0
D(x, y) =
(3.29)
Ω
The resulting data term is called Kullback-Leibler divergence or Kullback-Leibler Functional :
Z KL (K (x) , y) :=
K (x) − y + y log
Ω
y
K (x)
.
(3.30)
3.3.2 Regularization Theory
In addition to finding an appropriate data term, in most cases it is also essential
to specify the solution by adding a suitable regularization. Depending on the desired
properties of the solution it is a crucial task to define the vector space where the solution
lives in and therefore to specify the model by adding a certain norm minimization term.
3
Mathematical Background
22
Function Spaces
One of the first steps for the choice of a suitable regularizer is the selection of the
function space the regularized solution should live in. The most simple and intuitive
idea is to choose a general space where every image lies in, i.e. the space of integrable
functions L1 (Ω). This does not make sense in many tasks in mathematical imaging:
Indeed the space L1 (Ω) allows smooth functions as well as noisy ones, so that every
image independent of its structure is part of it, but one can also not distinguish between
a clean and a noisy signal, since the L1 -norm does not change significantly if we add
noise to an image [13]. This is illustrated in figure 3.1.
Figure 3.1: A constant and a Poisson noise-corrupted signal do not have significantly
different integrals
To restrict the solution, another natural choice could be the space of weakly differentiable functions, the so-called Sobolev spaces. Let us first remind the definition as
given in [35], Def. 9.34.
Definition 3.2 (Sobolev space).
Let 1 ≤ p ≤ ∞ and Ω be an open subset of Rn . Then the Sobolev space W 1,p (Ω) is
defined as
W 1,p (Ω) := {u ∈ Lp (Ω) : ∇u ∈ Lp (Ω)}
where ∇u denotes the weak gradient of u.
Together with the norm
3
Mathematical Background
23
Z
p
kukW 1,p (Ω) :=
p
p1
|u| + |∇u|
(3.31)
Ω
for 1 ≤ p < ∞, W 1,p (Ω) is a Banach space [35]. One of its major properties is the fact
that every function in W 1,p is continuous:
Theorem 3.3.
Let u ∈ W 1,p (Ω) for p > 1, where Ω ∈ R is an interval. Then u is continuous.
Proof.
Let u ∈ C 1 (Ω) ⊂ W 1,p (Ω). Then we obtain from the Hölder inequality
Z
|u (y) − u (x)| = x
with
1
p
+
1
q
y
Z
u (z) dz ≤
0
y
0
Z
|u (z)| dz ≤
x
y
p
0
|u (z)| dz
1
p
1
|x − y| q
x
= 1. It follows that
1
|u (y) − u (x)| ≤ kukW 1,p (Ω) |x − y| q
Since C 1 (Ω) is a dense subset of W 1,p (Ω), there exists a sequence (un )n in C 1 (Ω) such
that lim un (x) = u (x) and lim kun kW 1,p (Ω) = kukW 1,p (Ω) . Thus it follows that
n→∞
n→∞
1
1
|u (y) − u (x)| = lim |un (y) − un (x)| ≤ lim kun kW 1,p (Ω) |x − y| q = kukW 1,p (Ω) |x − y| q
n→∞
n→∞
⇒ u is continuous.
Hence, the choice of the Sobolev space as a function space for a regularizer makes
sense if the solution is favoured to be smooth. However, we will see in the following
section that there is another function space that plays an important role especially for
non-smooth functions.
3
Mathematical Background
24
TV Regularization
In many cases in mathematical imaging such as image denoising one is interested in
improving the image quality and simultaneously perserving sharp edges. Many image
denoising techniques at the same time cause a blurring of the image, so the structure
of sharp objects in the original image might get lost. This automatically means that
it might not be practical to search for a solution which is continuous in every point,
which for example is the case in the Sobolev spaces W 1,p for p > 1. In the following,
we want to present the basic concepts of TV regularization (see for example [13] or [21]).
Since W 1,p does not make sense as a regularizer for non-smooth functions, a very
popular and successful method has been developed by Rudin, Osher and Fatemi [33].
In their so-called ROF model the solution is chosen out of the space of functions with
bounded variation, i.e.
BV (Ω) := u ∈ L1 (Ω) : |u|BV < ∞
(3.32)
where |·|BV denotes the total variation
Z
|u|BV :=
u∇ · v
sup
v∈C0∞ (Ω,Rd ),kvk∞ ≤1
(3.33)
Ω
which is a seminorm on BV (Ω) and is also denoted as T V (u). Together with the norm
kukBV := kukL1 + |u|BV
(3.34)
BV (Ω) forms a Banach space which is a subset of L1 (Ω) by definition. Unlike W 1,1 (Ω),
BV (Ω) also contains piecewise constant function like for example indicator functions
of subregions of Ω
(
u (x) =
1 if x ∈ D
0 otherwise
for D ⊂ Ω with C 1 -boundary ∂D. This is shown in the following theorem.
(3.35)
3
Mathematical Background
25
Theorem 3.4.
u as defined in (3.35) is in BV (Ω).
Proof.
Z
|u|BV =
u∇ · v dx
sup
v∈C0∞ (Ω,Rd ),kvk∞ ≤1
Ω
Z
=
∇ · v dx
sup
v∈C0∞ (Ω,Rd ),kvk∞ ≤1
D
Z
=
v · n dσ
sup
v∈C0∞ (Ω,Rd ),kvk∞ ≤1
∂D
Z
=
dσ
∂D
<∞
If we regard ∂D as a curve of finite length, we see that in this case the BV-seminorm
can be interpreted as the total length of the boundaries of a subregion. In the context
of image denoising, this explains why it makes sense to search for the denoised image
in the BV space: Whereas the contours of objects in a noisy image will be unclear and
thus have longer boundary curves, regularization via the BV-seminorm will lead to a
solution with smooth contours and therefore preserves sharp edges. Figure 3.2 shows
the result of TV regularization in contrast to a smoothness-favouring regularizer via a
linear variational scheme, i.e. the squared L2 -norm of the gradient (cp. section 3.3.2).
Sparsity Regularization
Another important and often applied way to specify an unknown solution comes into
play when it is known a-priori that the solution has a sparse representation in a certain
P∞
basis (or dictionary, i.e. a given set of functions) {Φi }∞
i=1 , so we can write u =
i=1 αi Φi
where only a few coefficients αi are nonzero. This knowledge can be incorporated into
the variational model, which now aims at finding the coefficients of a basis representation of the solution such that it can be written with as few basis functions as possible
[36].
3
Mathematical Background
26
Figure 3.2: Comparison of the effect of different regularizers for denoising [14]
An intuitive idea is to simply penalize the number of non-zero elements in the set
0
of coefficients α = {αi }∞
i=1 . For this purpose an ` -seminorm regularization term would
be suitable, which counts the number of non-zero elements [21]:
kαk`0 :=
∞
X
|αi |0
(3.36)
i=1
where we define 00 := 0. However, this functional is highly non-convex, which makes it
difficult to find an appropriate minimizer which fits the given data. Hence the `0 -term
can be replaced by the `1 -norm of the scalar products
∞
X
|hu, Φi i|
(3.37)
i=1
1
which in case of an orthogonal basis {Φi }∞
i=1 is equivalent to the ` -norm regularization
of the coefficients
kαk`1 :=
∞
X
|αi |1
i=1
which in term is the convex relaxation of the `0 -seminorm [21].
(3.38)
3
Mathematical Background
27
In the discrete case one is often interested in reconstructing a sparse matrix where
only a few matrix entries are nonzero. In the same way as before this can be realized
by the convex relaxation of the norm which counts the number of nonzero elements.
For a m × n-matrix A, this convex relaxation is denoted as the sum over all matrix
entries, i.e.
kAk1 :=
m X
n
X
|aij |
(3.39)
i=1 j=1
For more information on sparsity constraint regularization in variational methods, see
[36].
Smoothness Regularization
Some imaging problems favour, in contrast to the general denoising problem, where
preservation of sharp edges is important, a solution which is smooth, meaning that its
gradient is not significantly high and the function does not have any ’jumps’. In this
case it naturally makes sense to penalize this gradient, hence the regularizer
kuk2L2 (Ω)
Z
=
|u|2 dx
(3.40)
Ω
leads to the favoured solution. We refer again to figure 3.2, where the effect of a linear
variational scheme using the L2 -norm regularizer is visualized.
3.3.3 Basic Functional Analysis
In the following we want to have a closer look into the underlying functional analysis
of a variational model in general. Thereby we want to find an answer to the questions
of existence and uniqueness of a solution, furthermore we want to analytically examine
optimality conditions which finally lead to a minimum of the energy functional.
Existence and Uniqueness of a Solution
The first and most important question in the context of variational methods naturally
is wether there generally exists a minimizer of the energy. Depending on the underlying
3
Mathematical Background
28
functional space (see section 3.3.2), the answer can be achieved via the fundamental
theorem of optimization. The content of this section is mainly based upon [13] and
[14]. Before we can give the theorem, we need to define two important properties of a
functional in relation to the existence of a solution.
Definition 3.5 (Lower semi-continuity).
A functional J : (X , τ ) → R × {+∞} on a topological space (X , τ ) is lower semicontinuous, if every sequence xk in X with xk → x in the sense of the topology τ holds
that
J (x) ≤ liminf J (xk ) .
k
Definition 3.6 (Compactness of sub-level sets).
Let J : (X , τ ) → R ∪ {+∞} a functional on a topological space (X , τ ). We say that
the sub-level sets are compact, if there exists α ∈ R such that
Sα := {x ∈ X : J (x) ≤ α}
is not empty and compact in the sense of the topology τ .
Using these two definitions, we can now state the conditions for the existence of a
minimizer [13].
Theorem 3.7 (Fundamental theorem of optimization).
Let J : (X , τ ) → R × {+∞} a functional on a topological space X with topology τ .
Furthermore, let the following two conditions be satisfied:
• J is lower semi-continuous.
• The sub-level sets of J are compact in τ .
Then there exists x∗ ∈ X such that J (x∗ ) = inf J (x), i.e. x∗ is a minimizer of J.
x∈X
Proof.
k→∞
Let (xk )k be a sequence in X with J (xk ) → inf J (x). Then there exists a k0 ∈ N
x∈X
such that xk ∈ Sα for all k ≥ k0 . Hence, (xk )k≥k0 is a sequence in Sα and Sα is compact,
l→∞
so (xk )k≥k0 has a convergent subsequence (xkl )l with xkl → x∗ .
With the lower semi-continuity it follows that
3
Mathematical Background
29
inf J (x) ≤ J (x∗ ) ≤ liminf J (xkl ) = inf J (x)
x∈X
x∈X
l→∞
Thus x∗ is a global minimizer of J.
A difficulty in the application of this theorem lies in the second condition. Whereas
in finite-dimensional spaces one can easily imply compactness from boundedness, this
relation does not hold in functional spaces due to their infinite dimension. Therefore
we need to find other conditions which can help to prove compactness of sub-level sets
under certain circumstances. This will not always be possible for any topological space
(X , τ ), for the conclusion of compactness out of boundedness only holds in the so-called
weak-* topology. Before we give the definition let us first remind the concept of a dual
space.
Definition 3.8 (Dual space and dual norm).
The dual space of a real topological vector space X is the set of all continuous linear
functionals f : X → R and is denoted as X ∗ . If X is a normed space with norm k·k,
then the dual norm of f ∈ X ∗ is defined as
kf k∗ :=
sup
hf, xi
x∈X , kxk≤1
where h·, ·i denotes the dual pairing.
Definition 3.9 (Convergence in weak and weak-* topology).
Let X be a Banach space and X ∗ its dual space.
• A sequence xk in X converges to x in the weak topology (xk * x), if
f (xk ) → f (x) ∀ f ∈ X ∗ .
• A sequence fk in X ∗ converges to f in the weak-* topology (fk *∗ f ), if
fk (x) → f (x) ∀ x ∈ X .
With the weak-* topology we are now able to state the theorem of Banach-Alaoglu,
which is of central importance in addition to the fundamental theorem of optimization
[13]:
3
Mathematical Background
30
Theorem 3.10 (Theorem of Banach-Alaoglu).
Let X = Z ∗ be the dual space of a Banach space Z and M ⊂ X bounded. Then M is
compact in the sense of the weak-* topology.
Proof.
See [34], theorem 3.15
The theorem of Banach-Alaoglu finally allows us to conclude at least the existence of
a minimizer in the sense of the weak-* topology, if the conditions are satisfied. At this
point we want to denote that the condition of X being the dual space of a Banach
space Z is not always trivial, as in the case of the bounded variation space (see section
4.4.3), but it is essential for the definition of the weak-* topology.
After having found some circumstances which prove the existence of a minimizer, we
now want to have a closer look at the uniqueness of a solution. This question is far
more easy to answer, since we know from basic analysis that a functional has only one
minimizer if it is strongly convex. Let us first remind of the definition of convexity and
subsequently give the central result for the existence of uniqueness.
Definition 3.11 (Convexity and concavity).
A functional J : X → R ∪ {+∞} is convex, if
J (βx + (1 − β) y) ≤ βJ (x) + (1 − β) J (y) ∀ x, y ∈ X , β ∈ [0, 1] .
J is strongly convex, if
J (βx + (1 − β) y) < βJ (x) + (1 − β) J (y) ∀ x, y ∈ X , x 6= y, β ∈ (0, 1) .
Theorem 3.12 (Uniqueness of a minimizer).
Let J : X → R ∪ {+∞} be strictly convex. Then there exists only one x∗ ∈ X with
J (x∗ ) = inf J (x).
x∈X
Proof.
Let x, y ∈ X with x 6= y be to global minimizers of J. Then for β ∈ (0, 1)
J (βx + (1 − β) y) < βJ (x) + (1 − β) J (y) = inf J (z)
z∈X
which is a contradiction to the definition of an infimum.
3
Mathematical Background
31
Convex Conjugate and Fenchel Duality
In the context of existence and uniqueness of a solution another concept in mathematical imaging is of certain interest, namely the convex conjugate of a function [31], [13].
Definition 3.13 (Convex conjugate).
Let X be a normed vector space, X ∗ the corresponding dual space and h·, ·i : X ∗ ×X → R
denote the dual pairing. Then for a functional J : X → R∪{+∞} the convex conjugate
(also known as Legendre-Fenchel transformation) J ∗ : X ∗ → R ∪ {+∞} is defined by
J ∗ (x∗ ) := sup (hx∗ , xi − J (x))
x∈X
The convex conjugate is especially useful for minimization problems of the form min f (x)+
x
g (Ax) with a linear operator A. In this case the so-called primal problem can be replaced by the corresponding dual problem. This idea is summarized by the following
theorem, which can be found including a detailed proof in [23].
Theorem 3.14 (Fenchel’s duality theorem).
Let X , Y be locally convex Hausdorff topological vector spaces, f : X → R ∪ {+∞},
g : Y → R ∪ {+∞} be proper functions, A : X → Y a linear continuous operator. If
the condition A (dom f ) ∩ dom g 6= ∅ holds, then the primal and dual problem
(P ) p∗ = inf f (x) + g (Ax)
x∈X
∗
(D) d = sup − f ∗ (−A∗ y) − g ∗ (y)
y∈Y ∗
satisfy weak duality, i.e. p∗ ≥ d∗ . If additionally the condition
(f + g ◦ A)∗ (0) ≥ (f ∗ A∗ g ∗ ) (0)
holds, where (g h) (x) := inf g (z)+h (x − z), then (P) and (D) satisfy strong duality,
i.e. p∗ = d∗ .
Proof.
See [23].
z∈X
3
Mathematical Background
32
Differentiability and Optimality Conditions
While investigating existence and uniqueness of a minimizer of a functional we are
often faced with terms which are not differentiable in the classical sense. However it
is necessary to deal with the concepts of derivatives and differentiability to find an
optimal solution. This procedure is comparable to the general case of finding extreme
values of a (classically) differentiable function by setting its first derivative to zero.
The definitions and contents of this section are based on [13] [35].
The easiest concept of differentiability is the so-called directional derivative.
Definition 3.15 (Directional derivative).
Let J : X → Y be a functional between Banach spaces. The directional derivative of J
in u ∈ X in direction v ∈ X is defined as
dv J (u) := lim
t&0
J (u + tv) − J (u)
t
The directional derivative in any point u ∈ X is a functional in v. If it exists for at
least one v ∈ X we speak of the Gateaux-derivative:
Definition 3.16 (Gateaux-derivative).
Let J : X → Y be a functional between Banach spaces. The collection of directional
derivatives
dJ (u) = {dv J (u) < ∞ : v ∈ X }
is called Gateaux-derivative in u ∈ X . J is called Gateaux-differentiable if dJ (u) is
not empty.
If J is Gateux-differentiable and the Gateaux-derivative contains only one single element we can define another form of differentiability, the Fréchet-derivative:
Definition 3.17 (Fréchet-derivative).
Let J : X → Y be a functional between Banach spaces, dv J (u) exists for all v ∈ X . If
there exists a continuous linear functional J 0 ∈ X such that
J 0 (u) v = dv J (u) ∀ v ∈ X
3
Mathematical Background
and
lim
kvkX →0
33
kJ (u + v) − J (u) − J 0 (u) vkY
kvkX
then J is called Fréchet-differentiable in u ∈ X and J 0 is called Fréchet-derivative.
One is often interested in optimality conditions of convex functionals which are not
Fréchet-differentiable. In this case the concept of a subdifferential is of special importance.
Definition 3.18 (Subdifferential).
Let X be a Banach space, J : X → R ∪ {+∞} be a convex functional. J is called
subdifferentiable in u ∈ X , if p ∈ X ∗ exists such that
J (v) − J (u) − hp, v − ui ≥ 0 ∀ v ∈ X
p is called a subgradient of J in u and
∂J (u) := {p ∈ X ∗ : J (v) − J (u) − hp, v − ui ≥ 0 ∀ v ∈ X }
is called subdifferential of J in u.
The subgradient is a functional in the dual space of X and ∂J (u) ⊂ X ∗ . If J is
Fréchet-differentiable, the subdifferential is one single element and equals the Fréchetderivative, i.e. ∂J (u) = {J 0 (u)}.
In some cases, as for the computation of an optimality condition of the bounded variation (see section 4.4.1), another formulation of the subdifferential for a special case
might be useful (cp. [14], Lemma 3.12):
Theorem 3.19.
Let J : X → R ∪ {+∞} be a convex functional on a Banach space X which satisfies
J (αu) = αJ (u) for every α > 0. Then the subdifferential of J is given by
∂J (u) = {p ∈ X ∗ : hp, ui = J (u) , hp, vi ≤ J (v) ∀ v ∈ X }
Proof.
From the definition of the subdifferential 3.18 we know that every element p must
satisfy
3
Mathematical Background
34
hp, v − ui ≤ J (v) − J (u) ∀ v ∈ X
so especially on the one hand side the inequality holds for v = 0, which leads to
hp, −ui ≤ −J (u) ⇔ hp, ui ≥ J (u)
and on the other hand side for v = 2u
hp, 2u − ui = hp, ui ≤ J (2u) − J (u) = 2J (u) − J (u) = J (u)
so together we obtain hp, ui = J (u). By inserting this result into the initial definition,
this leads to
J (v) ≥ hp, vi ∀ v ∈ X
If follows that the subdifferential is given as defined in the theorem.
For every concept of differentiability one can think of a condition a minimizer of the
functional must satisfy. If a functional is Gateaux-differentiable and the directional
derivative exists for any direction v, one can easily imagine that a condition for a local
minimum of J in û must be
dv J (û) ≥ 0 ∀ v ∈ X
(3.41)
since starting from a local minimum the functional must increase in every direction.
This automatically leads to a condition for a Fréchet-differentiable functional: Because
we know that for a minimum in û, we have the conditions J 0 (û) v = dv J (û) ≥ 0 and
also −J 0 (û) v = d−v J (û) ≥ 0, we directly obtain that J 0 (û) = 0 must be satisfied.
In case of subdifferentiability one has to keep in mind that ∂J (u) might consist of
a set of different values. Thus one can derive the condition that at least one element
of the subgradient must be equal to zero. This is stated by the following result.
3
Mathematical Background
35
Theorem 3.20 (Optimality condition for convex functionals).
Let X be a Banach space, J : X → R ∪ {+∞} be a convex functional. û ∈ X is a
minimizer of J if and only if 0 ∈ ∂J (û).
Proof.
For the forward implication, assume that 0 ∈
/ ∂J (û). From the definition of the
subdifferential we know that there exists at least one v ∈ X such that
J (v) − J (û) < h0, v − ûi = 0 ⇔ J (v) < J (û)
so û cannot be a minimizer of J.
If 0 ∈ ∂J (û), it follows that
0 = h0, v − ûi ≤ J (v) − J (û) ∀ v ∈ X
which is equivalent to
J (û) ≤ J (v) ∀ v ∈ X
so û is a minimizer of J.
In case of a constrained minimization problem we are faced with a more complicated
situation. Hence we need to find a way to include the constraint into the previous
optimality condition to make sure that the optimal solution fulfills every condition.
Usually we distinguish between equality and inequality constraints. A general framework to handle both at the same time is given by the so-called Karush-Kuhn-Tucker
(KKT) necessary conditions [12].
Theorem 3.21 (KKT conditions).
Let J : X → R ∪ {+∞} be a differentiable functional on a Banach space X . Let x̂ be
an optimal solution of the following constrained minimization problem:
min J (x) s.t. hi (x) = 0 ∀ i = 1, . . . , p
x∈X
gi (x) ≤ 0 ∀ i = 1, . . . , q
3
Mathematical Background
36
where hi , gi are differentiable. Then there exist constants µi , λi (the so-called KKT
multipliers) such that
0 = ∇J (x̂) +
p
X
µi ∇hi (x̂) +
i=1
q
X
λi ∇gi (x̂)
i=1
hi (x̂) = 0 ∀ i = 1, . . . , p
gi (x̂) ≤ 0 ∀ i = 1, . . . , q
λi ≥ 0 ∀ i = 1, . . . , q
λi gi (x̂) = 0 ∀ i = 1, . . . , q
Embeddings
Proving existence of a minimizer of a variational model includes showing lower semicontinuity of the corresponding functional. Amongst other things, this gives a motivation for a short introduction into different types of embeddings of vector spaces (see
[20]).
Definition 3.22 (Continuous and compact embedding).
Let X and Y be two normed vector spaces with norms k·kX , respectively k·kY and
X ⊂ Y. Then
• X is said to be continuously embedded in Y (and we write X ,→ Y), if there exists
a constant C ≥ 0 such that kxkY ≤ C kxkX .
• X is said to be compactly embedded in Y (and we write X ⊂⊂ Y), if X ,→ Y
and every sequence in a bounded set of X has a subsequence which is Cauchy in
k·kY .
For the space of functions with bounded variation there exists an interesting example
which can help to prove the lower semi-continuity of the BV-seminorm (see section
4.4.3):
Example 3.23.
The space of functions with bounded variation BV (Ω) is compactly embedded in L1 (Ω).
Proof.
See [5].
3
Mathematical Background
37
3.3.4 Bregman Distances
In order to find a suitable distance function to distinguish two elements of a Banach
space and therefore to measure the quality of regularized solutions, a very important
tool is the so-called Bregman distance, which is not only applied for error measures in
the context of variational methods, but can also be used as a regularizer itself [8]. Let
us first introduce the basic definitions of generalized Bregman distances as given in [8].
Definition 3.24 (Bregman distance).
Let X be a Banach space, J : X → R ∪ {+∞} convex with ∂J 6= ∅. Then the Bregman
distance is defined as
DJ (u, v) := {J (u) − J (v) − hp, u − viX : p ∈ ∂J (v)}
For a single element ρ ∈ ∂J (v) the Bregman distance is a functional DJρ : X × X → R
with
DJρ (u, v) := J (u) − J (v) − hρ, u − viX
The canonical example of a Bregman distance is generated by the squared L2 -norm,
which can easily be recalculated:
Example 3.25 (Bregman distance of squared L2 -norm).
For the convex functional J (u) := 21 kuk2L2 (Ω) the Bregman distance is given by
DJ (u, v) = 12 ku − vk2L2 (Ω)
For any convex J one can easily see that DJ (u, u) = 0 for any u ∈ X . Furthermore,
from the convexity of the functional J we obtain the positivity of the Bregman distance, i.e. DJ (u, v) ≥ 0 for every u, v ∈ X . Hence, the Bregman distance can be seen
as a ’distance’ in some sense. Note that it is also possible that 0 ∈ DJ (u, v) for u 6= v,
thus DJ is different from a metric [15].
As mentioned, the use of the Bregman distance as an error measure for regularized
solutions compared to the exact one is a common application. Although it does not
fulfill the criteria of a distance in the classical (metrical) sense and therefore seems
to be a weak tool to measure the difference between two elements, it bears certain
advantages in contrast to, for example, the norm error of the corresponding Banach
3
Mathematical Background
38
space. In the context of regularization methods, one is often interested in measuring
the difference between a minimizer of a regularized variational model which satisfies
the optimality condition and the exact solution of an inverse problem. In this case the
choice of a Bregman distance instead of the Banach space norm as a distance measure
seems more natural: In contrast to the Banach space norm, the Bregman distance
only considers errors that can be distinguished by the regularization term. Assume for
example that we have a regularizer R which satisfies
R (u + v) = R (u) ∀ v ∈ U ⊂ X
(3.42)
Then one can show that hp, vi = 0 for every p ∈ ∂R (u) and therefore also DR (u + v, u) =
0, but at the same time ku + v − uk = kvk can be arbitrarily large. Hence the Bregman
distance does not note any difference between elements with the same regularization
energy. Those differences have to be measured by the data term, but they do not play
any role in the regularization process [9].
3.3.5 Error Measures and Source Condition
In the following we want to outline a popular application of the Bregman distance as
stated in [15] in order to derive error measures for minimizers of regularized variational
models in the context of inverse problems. Let u
e be the exact solution of the inverse
problem K (u) = g. In reality we are often faced with the problem that we only have
access to noisy data g δ . We are interested in a solution of K (u) = g δ which is close
to the exact solution u
e of K (u) = g. Therefore we apply a regularized variational
framework of the general form
argmin Fgδ (K (u)) + αR (u)
(3.43)
u
with an arbitrary Fréchet-differentiable data fidelity F : Y → R ∪ {+∞} and convex
regularization term R : X → R ∪ {+∞}. Let û be a minimizer of (3.43), which satisfies
optimality due to the KKT conditions (see theorem 3.21). In this context, naturally
one question arises: How does û act in comparison to the exact solution u
e? Note that
a solution of the variational model strongly depends on the choice of the regularizer R,
hence in this case the Bregman distance is an appropriate tool for error measures. To
obtain a basic result about the error of û in the sense of the Bregman distance, a certain
3
Mathematical Background
39
smoothness assumption is necessary. This assumption is called a source condition and
can be formulated as follows [8], [30]:
∃ ρ ∈ ∂R (e
u) , ∃ q ∈ Y ∗ \ {0} : ρ = K 0 (e
u)∗ q
(3.44)
The source condition will ensure the existence of a function h such that the exact
solution minimizes the functional
Fh (K (u)) + αR (u)
(3.45)
This is a necessary condition for the choice of the regularizer: Obviously, the exact
solution should contain features that can be distinguished by the regularization term
[9], otherwise the choice of R would be inappropriate. This is shown by the following
result, which can be found in the linear case in [8]:
Lemma 3.26.
The source condition (3.44) is equivalent to the existence of a function h ∈ Y, such
that
u
e ∈ argmin Fh (K (u)) + αR (u)
u
with a Fréchet-differentiable Fh such that (Fh0 )−1 exists.
Proof.
”⇒”: Let the source condition (3.44) be satisfied, i.e.
∃ ρ ∈ ∂R (e
u) , ∃ q ∈ Y ∗ \ {0} : ρ = K 0 (e
u)∗ q
⇒ 0 = αρ − αK 0 (e
u)∗ q
(3.46)
Define Gh (K (u) − h) := Fh (K (u)). Then Gh is Fréchet-differentiable. For any
Fréchet-differentiable functional G : X → R it holds that (G∗ )0 = (G0 )−1 : The convex
conjugate is given by
3
Mathematical Background
40
G∗ (v) = sup (hv, ui − G (u))
u∈X
−1
−1
= hv, (G0 ) (v)i − G (G0 ) (v)
with Fréchet-derivative G0 . Then the derivative of the convex conjugate can be computed as


0
0


−1
−1
−1
−1
(G∗ )0 (v) = (G0 ) (v) + v (G0 )
(v) −  (G0 )
(v) G0 (G0 ) (v) 
{z
}
|
=v
0 −1
= (G )
(v)
Applying this result to our functional Gh , it follows that
G0h (G∗h )0 (p) = p ⇒ G0h (G∗h )0 (−αq) = −αq
and from equation (3.46) we obtain
0 = αρ − αK 0 (e
u)∗ q
= αρ + K 0 (e
u)∗ (−αq)
= αρ + K 0 (e
u)∗ G0h (G∗h )0 (−αq)
If we define h := K (e
u) − (G∗h )0 (−αq) ⇔ K (e
u) − h = (G∗h )0 (−αq), it follows that
⇒ 0 = αρ + K 0 (e
u)∗ G0h (K (e
u) − h)
⇔ 0 = αρ + K 0 (e
u)∗ Fh0 (K (e
u))
⇒u
e minimizes the functional Fh (K (u)) + αR (u).
3
Mathematical Background
41
”⇐”: Let u
e ∈ argmin Fh (K (u)) + αR (u). Then u
e satisfies optimality, i.e.
u
0 = K 0 (e
u)∗ Fh0 (K (e
u)) + αρ
for a ρ ∈ ∂R (e
u). This is equivalent to
ρ = K 0 (e
u)∗ q for q = −
Fh0 (K(e
u))
α
We obtain the following error estimate, which is based on [15] and [30]:
Theorem 3.27 (Error estimate in the Bregman distance).
Let u
e be the exact solution of the inverse problem K (u) = g and let the source condition
(3.44) be satisfied. Let R : X → R∪{+∞} be convex. Furthermore, let the nonlinearity
condition
hq, K (u) − K (v) − K 0 (v) (u − v)i ≤ C kqkY ∗ kK (u) − K (v)kY
hold for q. If there exists a minimizer û of the variational model (3.43) for α > 0 which
satisfies the KKT optimality conditions, then
ρ
Fgδ (K (û)) + αDR
(û, u
e) ≤ Fgδ (g) + α (C + 1) kqkY ∗ kK (û) − K (e
u)kY
Proof.
From the definition of û we obtain that û ∈ argmin Fgδ (K (u)) + αR (u), hence it
u
follows that
Fgδ (K (û)) + αR (û) ≤ Fgδ (K (e
u)) + αR (e
u)
and by adding −αhρ, û − u
eiX to both sides and K (e
u) = g we obtain
eiX
Fgδ (K (û)) + α (R (û) − R (e
u) − hρ, û − u
eiX ) ≤ Fgδ (g) − αhρ, û − u
{z
}
|
ρ
=DR
(û,e
u)
3
Mathematical Background
42
By inserting the source condition ρ = K 0 (e
u)∗ q it follows that
ρ
Fgδ (K (û)) + αDR
(û, u
e) ≤ Fgδ (g) − αhK 0 (e
u)∗ q, û − u
eiX
= Fgδ (g) + αhq, −K 0 (e
u) (û − u
e)iY
= Fgδ (g) + α hq, K (û) − K (e
u) − K 0 (e
u) (û − u
e)iY
{z
}
|
≤CkqkY ∗ kK(û)−K(e
u)kY
−αhq, K (û) − K (e
u)iY
|
{z
}
≤αkqkY ∗ kK(û)−K(e
u)kY
≤ Fgδ (g) + α (C + 1) kqkY ∗ kK (û) − K (e
u)kY
This interesting result provides an estimation for the Bregman distance between exact
and reconstructed solution. An interesting example is the case of Poisson noise, i.e.
the given data g δ is Poisson noise-corrupted. Then the appropriate data fidelity term
is the Kullback-Leibler divergence
Fg (K (u)) := KL (K (u) , g)
(3.47)
For this case we obtain an error estimate for the Bregman distance which only depends
on the data error in the sense of the fidelity Fgδ (g).
Example 3.28 (Error estimate in case of Poisson noise).
Let u
e be the exact solution of the inverse problem K (u) = g with an operator K :
L1 (Ω) → L1 (Σ) and let the source condition (3.44) be satisfied. Let R : L1 (Ω) →
R ∪ {+∞} be convex. Furthermore, let the nonlinearity condition
hq, K (u) − K (v) − K 0 (v) (u − v)i ≤ C kqkL∞ (Σ) kK (u) − K (v)kL1 (Σ)
hold for q. If there exists û ∈ argmin KL K (u) , g δ +αR (u) for α > 0 which satisfies
u
the KKT optimality conditions, then
ρ
DR
with constants C1 , C2 .
(û, u
e) ≤
1
Fδ
α g
q
(g) + αC1 + C2 Fgδ (g)
3
Mathematical Background
43
Proof.
From theorem 3.27 we obtain the following estimate for Fg (K (u)) := KL (K (u) , g):
ρ
u)kL1 (Σ)
Fgδ (K (û)) + αDR
(û, u
e) ≤ Fgδ (g) + α (C + 1) kqkL∞ (Σ) kK (û) − K (e
|
{z
}
=:c1
≤ Fgδ (g) + αc1 K (û) − g δ L1 (Σ) + K (e
u) − g δ L1 (Σ)
From a variant of the Csiszár-Kullback inequality (see for example [37]) we obtain
q
K (û) − g δ 1 ≤ c2 Fgδ (K (û))
L (Σ)
and
q
q
δ
K (e
u) − g L1 (Σ) ≤ c3 Fgδ (K (e
u)) = c3 Fgδ (g)
It follows that
q
q
ρ
Fgδ (K (û)) + αDR
(û, u
e) ≤ Fgδ (g) + α c1 c2 Fgδ (K (û)) + α c1 c3 Fgδ (g)
|{z}
|{z}
=:c4
=:c5
q
q
ρ
1 2 2
1 2 2
⇔ Fgδ (K (û)) − αc4 Fgδ (K (û)) + 4 α c4 + αDR (û, u
e) ≤ Fgδ (g) + 4 α c4 + αc5 Fgδ (g)
q
q
2
ρ
2
1
1 2
Fgδ (K (û)) − 2 αc4 + αDR (û, u
Fgδ (g)
⇔
e) ≤ Fgδ (g) + 4 c4 α + α c5
|{z}
|{z}
=:C2
=:C1
q
ρ
(û, u
e) ≤ α1 Fgδ (g) + αC1 + C2 Fgδ (g)
⇒ DR
44
4 Dynamic SPECT Image
Reconstruction Using a Basis
Approach
In this chapter, we present the main reconstruction approach which forms the basis of
this work and briefly outline the differences to other approaches and methods. We derive an appropriate variational model, which is suitable for solving the inverse problem
of dynamic SPECT reconstruction. Furthermore, some analysis is performed, such as
establishing the optimality and source condition and proving existence of a minimizer
of the proposed variational model.
4.1 Basis Representation
4.1.1 Basis Pursuit and Sparsity
As we already pointed out, the reconstruction of a dynamic time-dependent tracer distribution bears several problems. Due to the limited given data, we cannot properly
reconstruct f out of the equations given by the attenuated Radon transform. Since the
gamma camera stops at every projection angle to count emitted photons, every movement of the camera must be seen as the beginning of a new time step in the model.
This means that we only have access to measurements from one (or two, depending on
the actual number of rotating cameras) projection angle per time step. Therefore, the
most simple and intuitional idea of reconstruction, namely to handle each projection
at every time step as a static SPECT projection and reconstruct f for every ti independently from the others, will probably not lead to an adequate solution.
Thus, it makes sense to search for a suitable way to effectively combine the differ-
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
45
ent measurements. A common approach is the so-called basis pursuit. It makes use
of the assumption that f can be written as a linear combination of space-independent
basis functions with temporarily independent coefficients, i.e.
f (x, t) =
K
X
uk (x) ck (t) .
(4.1)
k=1
Many reconstruction algorithms aim at modelling the ck first to obtain a dictionary of
possible basis functions and afterwards try to reconstruct the corresponding coefficients
with respect to sparsity. Furthermore, there exist strategies of so-called joint estimation
[28], which try to alternatingly update coefficients and basis functions. We want to
give a brief overview over some examples of the main techniques for these approaches
in the following section.
4.1.2 Common Reconstruction Methods
There exist several approaches to define a set of basis functions a priori before computing the corresponding coefficients. A common example, which provides a realistic
set of possible concentration curves, is kinetic modelling (see section 4.1.2), which is
usually applied for dynamic PET reconstruction as in [10] or [21], but also for SPECT
imaging [22], [24]. Another approach uses splines as temporal basis functions [6].
Another simultaneous reconstruction approach for dynamic PET, comparable to the
one proposed in this work, was done by Reader et al. in [28], which alternatingly estimates the coefficients and updates the temporal basis, where only the number of basis
function must be specified.
Kinetic Modelling
A common approach to incorporate a priori information about the structure of the
basis functions is to model the physiological parameters of the tracer for emission tomography [10]. This procedure is called kinetic modeling and is described in detail in
[41].
Like explained in the previous section, the region of interest is assumed to be separated into so-called compartments: Regions, in which the tracer concentration varies
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
46
in time, but not in space. One compartment can either be one pixel (or voxel) or a
whole region consisting of the same tissue. One of the main model assumption is furthermore that the tracer input is represented by the concentration in the blood, which
is modelled as a function of time and is assumed to be known (since it can be measured
separately). Hence, the blood vessel is not seen as a compartment as such, but as an
additional region where the concentration is given a priori.
The compartmental modelling starts from a very simple model, which only describes
the tracer flux between blood and a single tissue type described by a differential equation for the unknown concentration in the tissue. In a subsequent approach, the blood
compartment is extended by distinguishing between an arterial and a venous part (see
figure 4.1). A relation between these parts is modelled applying the Fick principle.
Figure 4.1: Compartmental model including blood flow [41]
This leads to an extended differential equation for the concentration in the tissue CT
d
CT (t) = F
dt
CT (t)
CA (t) −
λ
(4.2)
where CA , CV are the concentration in the arterial or, likewise, the venous part of the
blood vessel, F is the blood flow and λ = CCVT the rate between tissue and venous
concentration, which is assumed to be constant since the concentration reaches an
equilibrium state very fast. This model can again be extended by separating the tissue
region into several independent compartments and therefore adding a space component
to the concentration in the tissue CT . This way, equation (4.2) turns into a partial
differential equation
∂
CT (x, t)
CT (x, t) = F (x) CA (t) −
∂t
λ
and the solution with initial value CT (x, 0) = 0 can easily be computed as
(4.3)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
Z
t
CA (τ ) e−
CT (x, t) = F (x)
F (x)
(t−τ )
λ
47
dτ
(4.4)
0
Under the assumption that we have prior knowledge about F λ(x) , we can provide a set
of possible values cek for this factor and therefore, obtain a set of possible concentration
curves. This set might not contain the true concentration curve in every single compartment, but still it makes sense to conclude that every concentration curve can be
written as a linear combination of at most a few of these basis functions. This leads to
the common approach
f (x, t) =
K
X
k=1
Z
uk (x)
0
|
t
CA (τ ) e−cek (t−τ ) dτ
{z
}
(4.5)
=ck (t)
where for every x, only a few coefficients uk (x) are nonzero. Thus, many common
reconstruction methods make use of a sparsity-based approach.
Whereas this model makes sense for the estimation of blood flow components in different tissue types, there exist other approaches of kinetic modeling for more special
fields of interest, for example measurement of glucose use in different brain regions [41].
These models mostly have in common that the behaviour of the radiotracer is described
by one or more differential equations, which include some physiological properties of
the specified region.
4.1.3 A Slightly Different Approach
The approach we want to introduce in this thesis is, refering to the underlying philosophy, a slightly different one, but will lead to a similar model. We assume that the
contemplated region Ω of the patient’s body can be separated into a certain number
of disjoint regions Ω1 , . . . , ΩK , whose borders remain constant over time. The tracer
distribution in every region now does not spatially change, so that every Ωk has its
own space-independent concentration curve ck (t).
In the mathematical sense, this approach leads to a very similar model to the one
emerging from the basis pursuit. Under the assumption that every pixel or voxel x
belongs to exactly one region, f can be written as a sum of the regional concentrations
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
48
Ck (t) and spacial indicator functions uk (x), i.e.


c
(t)
1
K

X
.. 

f (x, t) =
uk (x) ck (t) = u1 (x) . . . uK (x) 
 .  =: u(x)c(t)
k=1
cK (t)
(4.6)
where
(
uk (x) =
1 if x ∈ Ωk
0 otherwise
(4.7)
Here, we write u, respectively c, as the collection of all indicator or basis functions.
This way of writing f is similar to (4.1), with the only difference that the uk are not
denoted as coefficients but as indicator functions. Hence we will also speak of the ck
as basis functions of the tracer distribution f . Figure 4.2 shows the typical shape of
the concentration curves ck , which can also be used as a priori information to further
define the basis functions.
Figure 4.2: The typical shape of the concentration curves of the radiotracer in different
types of tissue
In practice, this model makes sense and is easily transferable to the anatomical reality,
e.g. thinking of roughly separating the patient’s body into different tissue types, where
each has its own unique chemical texture, which in turn causes a different behaviour
of the added radiotracer. Nevertheless, we have to keep in mind that this type of ap-
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
49
proach still bears several risks due to the inaccuracy typically occuring in mathematical
models describing real natural processes.
The task arising from this model as well as from the basis pursuit model is now either
to reconstruct the coefficients or indicator functions uk based on a known set of basis
functions ck or to simultaneously search for every uk and ck . In this thesis, we want to
concentrate on the simultaneous reconstruction, which means that both, the regions
and the basis functions, are completely unknown.
4.2 Selection of the Variational Model
We now want to search for an appropriate variational model, which adequately describes the relationship between data and unknown variables as well as includes a
suitable regularization to sort out infeasible solutions. Let us first have a look at the
data term and then discuss the different ways of regularizing the problem.
As already mentioned in section 3.3.1, reconstruction problems occuring in emission
tomography are typically corrupted with Poisson noise. Therefore, the appropriate
data term for our model is the Kullback Leibler divergence adapted to our case, i.e.
Z Z
KL (Ruc, g) :=
Σ
0
T
Ruc − g + g log
g Ruc
(4.8)
where R is the Radon transform acting on f = uc as denoted in definition 2.2.
Now we need to find a way to further define the unknowns u and c. The general
approach for our variational model therefore contains the data term as well as different
regularization terms either dependent on u or c, so we obtain a minimization problem
of the form
argmin KL (Ruc, g) + Reg (u, c)
(4.9)
u,c
At first, we have to determine what the subregions defined by uk should look like.
To preserve sharp edges in the resulting image, a bounded variation term for each
subregion seems practical (cp. section 3.3.2). Therefore, we include the regularization
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
50
term
K
X
|uk |BV
(4.10)
k=1
in our model, where we try to minimize the sum over all occuring edges in our image.
Furthermore, we would like u to satisfy the structure of an indicator function. Because
practically it might be possible that the reconstructed c does not contain the true
concentration curve of every subregion, it makes sense not to enforce uk (x) ∈ {0, 1},
P
but to allow also uk (x) ∈ [0, 1], as long as K
k=1 uk (x) = 1 is satisfied. This also makes
sense because we avoid a constraint which is highly non-convex and replace it by the
convex contraint
(
u ∈ S :=
v:
K
X
)
vk (x) = 1 ∀x, vk (x) ∈ [0, 1] ∀x
(4.11)
k=1
Additionally, we insert a sparsity regularization term (cp. section 3.3.2), which, in
the continuous case, is represented by the sum over the L1 -norm for every indicator
function, i.e.
K
X
kuk kL1
(4.12)
k=1
Transfered to the discrete model, in which the indicator functions are represented by
a n × K-matrix, where n is the number of pixels and K the number of subregions (cp.
section 4.1.3), this type of regularization will favour a sparse solution, where many
matrix entries equal zero.
The limitations for the basis funtions c are intuitive. First, as every ck represents
one concentration curve in each subregion, it makes sense to enforce c to be positive or
zero only. Secondly, the temporal behaviour of the radiotracer (independent of space)
in practice is likely to be smooth (see also figure 4.2), so it makes sense to include a
smoothness regularization term. This happens in form of a L2 -norm penalty of the
time gradient of each concentration curve, so we receive the regularization functional
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
51
2
K 1 X
∂ ck 2 k=1 ∂t L2
(4.13)
All in all, the variational model in the continuous case looks as follows:
K
X
K
X
2
K δ X
∂
ck argmin KL (Ruc, g) + α
|uk |BV (Ω) + β
kuk kL1 (Ω) +
(4.14)
2 k=1 ∂t L2 ((0,T ))
u∈S,c≥0
k=1
k=1
4.3 Discretization of the Model
For the numerical realization of the minimization problem described above, we first
discretize the model and then apply an appropriate optimization method. Referring to
section 2.4.3, we are already able to denote the problem in its discrete version Rf = g,
where f is a n × N - and g a m × N -matrix with n pixels, N timesteps and m detector
bins. Therefore, the matrix equation for f separated into K indicator functions and
concentration curves is

 


f11 . . . f1N
u11 . . . u1K
c11 . . . c1N
 .


.. 
.. 
.. 
 ..
 =  ...
  ...
.
.
. 

 


fn1 . . . fnN
un1 . . . unK
cK1 . . . cKN
(4.15)
⇔ f = U CT
(4.16)
or
where U ∈ Rn×K , C T ∈ RK×N . In this setting, having knowledge of U representing
the indicator function matrix means that we have strong a priori information about
the structure of U , namely
(
U∈
H ∈ Rn×K :
K
X
)
hij = 1 ∀i, hij ∈ {0, 1} ∀i, j
.
(4.17)
j=1
As in the continuous case, this setting is not practical because of its missing convexity,
so we again replace it by the convex constraint
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
(
H ∈ Rn×K :
U ∈ S :=
K
X
52
)
hij = 1 ∀i, hij ≥ 0 ∀i, j
(4.18)
j=1
Thereby, we do not enforce U to have only 0 or 1 entries, but also allow values in
P
between as long as the constraint K
j=1 Uij = 1 is satisfied.
For our investigations, we previously penalized the BV - and L1 -norm of every indicator function. Since every column of the matrix U represents one indicator function
uk , we now have to minimize over the spatial gradient of each column Uk to enforce
sharp edges in the resulting image. This leads to the discrete regularization term
K
X
k∇Uk k1
(4.19)
k=1
where k·k1 represents the matrix-1-norm, i.e. the case where p = 1 in
kAkp :=
n X
m
X
! p1
|aij |p
(4.20)
i=1 j=1
Thereby one also turns the sparsity regularization via the L1 -norm in the continuous
case into its natural discrete version, which in turn forms the convex relaxation of
the intuitive sparsity-enforcing regularization via the `0 -norm (see section 3.3.2). In
the same way we transform the regularizations depending on C T into their discretized
counterpart.
b, C
bT by minimizing the constrained opAll in all, we receive the optimal values U
timization problem
K
X
K
δX
argmin KL RU C , g + α
k∇Uk k1 + β kU k1 +
k∇t Ck k22
2 k=1
U ∈S,C T ≥0
k=1
T
or the unconstrained version
(4.21)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
53
K
K
X
δX
argmin KL RU C , g + α
k∇t Ck k22 + δ+ C T
k∇Uk k1 + β kU k1 + δS (U ) +
2 k=1
U,C T
k=1
(4.22)
T
where δS (or likewise δ+ ) is given by
(
δS (U ) =
0 if U ∈ S
∞ otherwise
(4.23)
4.4 Analysis of the Model
In this section we want to investigate the continuous variational model
J : BV (Ω)K × W01,2 ((0, T ))K → R ∪ {+∞}
(4.24)
K
X
K
X
K
δ X
∂ ck 2 2
kuk kL1 (Ω) +
|uk |BV (Ω) + β
J (u, c) = KL (Ruc, g) + α
L ((0,T ))
2 k=1 ∂t
k=1
k=1
(4.25)
with regard to existence and uniqueness of a solution. Here, with W01,2 ((0, T )) we
denote the space of functions in W 1,2 ((0, T )) with left boundary value equal to zero. We
first derive an optimality condition referring to section 3.3.3 and the source condition
(see section 4.4.2). Afterwards, we will prove the necessary tools to obtain existence of
a minimizer.
4.4.1 Optimality Condition
To derive an optimality condition for the variational model given by J, we have to
investigate the derivatives of the different parts of the functional. We start with the
Kullback-Leibler data term and first compute the directional derivative of the general
form
Z
Kx − y log (Kx)
F (Kx) =
D
(4.26)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
54
for a linear operator K, which only differs from the KL divergence by some additional terms, which are independent of the argument Kx and hence do not change the
directional derivative. The directional derivative is given by
1
t&0 t
(F (Kx + tKv) − F (Kx))
Z
1
= lim t
Kx + tKv − y log (Kx + tKv) − Kx + y log (Kx)
t&0
D
Z
1
= lim t
tKv − y log Kx+tKv
Kx
t&0
D
Z
Z
tKv
1
y
log
1
+
=
Kv − lim
t
Kx
t&0
ZD
ZD
1
tKv t
=
Kv − lim
y log 1 + Kx
t&0
ZD
ZD
Kv 1 n
y log 1 + Kx
=
Kv − lim
n
n→∞
D
Z
ZD
Kv
y log e Kx
Kv −
=
D
ZD
Kv
Kv − y Kx
=
dKv F (Kx) = lim
(4.27)
(4.28)
(4.29)
(4.30)
(4.31)
(4.32)
(4.33)
(4.34)
D
Since the functional F is only defined for Kx ≥ 0, the definition of the directional
derivative requires Kx + tKv ≥ 0 for t sufficiently small and Kx > 0. Furthermore, we
can only compute the Fréchet-derivative of the functional F : X → R ∪ {+∞}, if the
directional derivative is defined for every direction in X . In our case, this is satisfied
for X := L∞ (D): For Kx, Kv ∈ L∞ (D), let
0 < a := inf Kx (z) , 0 < b := sup |Kv (z)|
z
(4.35)
z
Then it follows that
Kx (z) + tKv (z) ≥ a − tb > 0 ⇔ t <
a
b
(4.36)
Thus, the condition Kx+tKv > 0 for t sufficiently small is satisfied and the directional
derivative is defined for arbitrary directions for every Kx > 0. In this case we see that
the directional derivative only consists of one functional, and from
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
55
Z
Kv
Kv − y Kx
y =
1−
Kv
Kx
D
y
, KviL2 (D)
= h1 −
Kx
y , viL2 (D)
= hK ∗ 1 −
Kx
dKv F (Kx) =
ZD
where h·, ·iL2 (D) is the L2 -scalar product on the Hilbert space L2 (D) and K ∗ is the
adjoint operator of K, we obtain
Lemma 4.1 (Fréchet-derivative of KL).
The Fréchet-derivative of the functional F : L∞ (D) → R ∪ {+∞} is given by
d
F
dx
(Kx) = K
∗
y 1−
Kx
Now let us remind the structure of the true data term of our variational model, which
is
Z Z
T
Ruc − g log (Ruc)
F (Ruc) =
Σ
(4.37)
0
Since we have the condition u ∈ S, so especially uk (x) ∈ [0, 1] for every x ∈ Ω, we can
assume that u ∈ L∞ (Ω)K . For c we have c ∈ W 1,2 ((0, T ))K ,→ L∞ ((0, T ))K . Hence,
we can assume that uc ∈ L∞ (Ω × (0, T )) and therefore also Ruc ∈ L∞ (Σ × (0, T )),
thus the conditions for the existence of a Fréchet-derivative described above are satisfied
and we can compute the partial derivatives with respect to u and c. Here, we make
use of the fact that R (uc) = (Ru) c, since c does not depend on space.
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
56
dRvc F (Ruc) = lim 1t (F (Ruc + tRvc) − F (Ruc))
t&0
Z Z T
(4.34)
Rvc
Rvc − g Ruc
=
dtdx
Σ 0
Z Z T
g
c 1 − Ruc
=
dt Rv dx
Σ 0
Z T
g
c 1 − Ruc
=h
dt, RviL2 (Σ)
0
Z T
g
=h
cR∗ 1 − Ruc
dt, viL2 (Ω)
(4.38)
(4.39)
(4.40)
(4.41)
(4.42)
0
∂
⇒
F (Ruc) =
∂u
T
Z
cR∗ 1 −
0
g
Ruc
dt
(4.43)
In the same way for c, we obtain
dRud F (Ruc) = lim 1t (F (Ruc + tRud) − F (Ruc))
t&0
Z Z T
(4.34)
Rud − g Rud
dtdx
=
Ruc
Σ 0
Z TZ
Ru
=
Ru − g Ruc
dx d dt
0
Σ
Z
Ru
= h Ru − g Ruc
dx, diL2 ((0,T ))
(4.44)
(4.45)
(4.46)
(4.47)
Σ
∂
⇒
F (Ruc) =
∂c
Z
Ru 1 −
Σ
g
Ruc
dx
(4.48)
Next we want to characterize the subdifferential of the bounded variation. The concepts
of this part can be found in [14]. We use the reformulation of the subdifferential given
in theorem 3.19. In case of the bounded variation, we have
n
o
∂ |u|BV (Ω) = p ∈ (BV (Ω)∗ ) : hp, ui = |u|BV (Ω) , hp, vi ≤ |v|BV (Ω) ∀ v ∈ BV (Ω)
(4.49)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
57
For each element p ∈ (BV (Ω)∗ ), applying the definition of the dual norm (definition
3.8), the following inequality holds:
kpk∗ =
hp, vi ≤
sup
v∈BV (Ω), |v|BV (Ω) ≤1
sup
v∈BV (Ω), |v|BV (Ω) ≤1
|v|BV (Ω) = 1
(4.50)
Hence, it follows that
n
o
∂ |u|BV (Ω) ⊂ p ∈ (BV (Ω)∗ ) : kpk∗ ≤ 1, hp, ui = |u|BV (Ω) =: M
(4.51)
For the other inclusion, let p ∈ M , i.e. kpk∗ ≤ 1. Then
1 ≥ kpk∗ =
hp, vi =
sup
v∈BV (Ω), |v|BV (Ω) ≤1
sup
v∈BV (Ω)
hp, |v|
v
i
(4.52)
BV (Ω)
Since we have found an upper bound for the supremum over every v ∈ BV (Ω), it
follows that
⇒ hp, |v|
v
i ≤ 1 ∀ v ∈ BV (Ω)
(4.53)
BV (Ω)
⇔ hp, vi ≤ |v|BV (Ω) ∀ v ∈ BV (Ω)
(4.54)
and thus M ⊂ ∂ |u|BV (Ω) . We conclude that
n
o
∂ |u|BV (Ω) = p ∈ (BV (Ω)∗ ) : kpk∗ ≤ 1, hp, ui = |u|BV (Ω)
(4.55)
Now let us regard the structure of the dual space of BV (Ω): From
|u|BV (Ω) =
hp, ui
sup
(4.56)
p∈BV (Ω)∗
Z
=
u∇ · g dx
sup
g∈C0∞ (Ω,Rd ), kgk∞ ≤1
Ω
(4.57)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
58
we obtain that for every p ∈ BV (Ω)∗ with kpk∗ ≤ 1, there exists g ∈ L∞ Ω, Rd such
that p = ∇ · g and kgkL∞ ≤ 1. Inserting this into (4.55) finally leads to
Lemma 4.2 (Subdifferential of BV).
The subdifferential of |·|BV (Ω) is given by
o
n
∂ |u|BV (Ω) = ∇ · g : kgkL∞ ≤ 1, h∇ · g, ui = |u|BV (Ω)
To characterize the subdifferential of the L1 -term, we again make use of the equivalent
∗
definition given in theorem 3.19. With (L1 (Ω)) = L∞ (Ω) this means
∂ kukL1 (Ω)



∞
= p ∈ L (Ω) : hp, ui = kukL1 (Ω) , hp, vi ≤ kvkL1 (Ω) ∀ v ∈ BV (Ω)

{z
} |
{z
}
|





(i)
(ii)
(4.58)
Thus, every element of the subdifferential has to fulfill two conditions. From condition
(i) we directly obtain the basic structure of a subgradient:


 1 if u (x) > 0
pu dx =
|u| dx ⇔ p (x) =
−1 if u (x) < 0

Ω
Ω

a otherwise
Z
(i) hp, ui = kukL1 (Ω) ⇔
Z
(4.59)
The first condition is satisfied for a variable a ∈ R. More information can be achieved
from the second condition:
Z
Z
(ii) hp, vi ≤ kvkL1 (Ω) ⇔
pv dx ≤
Ω
|v| dx ∀ v ∈ BV (Ω)
(4.60)
Ω
The inequality must hold for every v. Especially for
(
v (x) =
u (x) if x ∈ Ω1
1
if x ∈ Ω2
with Ω1 = {x : u (x) 6= 0} and Ω2 = Ω \ Ω1 it follows
(4.61)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
Z
59
Z
p (x) v (x) dx ≤
ZΩ
|v (x)| dx
ZΩ
(4.62)
Z
Z
p (x) u (x) dx +
p (x) dx ≤
|u (x)| dx +
1 dx
Ω2
Ω1
Ω2
Z
Z
Z
Z
⇔
p (x) u (x) dx +
a dx ≤
|u (x)| dx +
1 dx
Ω
Ω2
Ω
Ω2
Z
Z
Z
Z
⇔
|u (x)| dx +
a dx ≤
|u (x)| dx +
1 dx
Ω
Ω2
Ω
Ω2
Z
Z
a dx ≤
1 dx
⇔
⇔
(4.63)
Ω1
Ω2
(4.64)
(4.65)
(4.66)
Ω2
⇔ a≤1
(4.67)
In the same way with
(
u (x) if x ∈ Ω1
−1 if x ∈ Ω2
v (x) =
(4.68)
we obtain a ≥ −1. It follows that p (x) ∈ [−1, 1] ∀ x, for which condition (ii) is
satisfied automatically, also for arbitrary functions v: Since for p as defined above, we
have |p (x)| |v (x)| ≤ |v (x)| and it follows
Z
Z
Z
p (x) v (x) dx ≤
Ω
|p (x) v (x)| dx =
Ω
Ω
Z
|p (x)| |v (x)| dx ≤
| {z }
≤1
|v (x)| dx ∀ v ∈ BV (Ω)
Ω
(4.69)
Finally we obtain
Lemma 4.3 (Subdifferential of L1 ).
The subdifferential of kukL1 (Ω) is given by
∂ kukL1 (Ω)






1
if u (x) > 0
∞
= p ∈ L (Ω) : p (x) =
−1
if u (x) < 0




∈ [−1, 1] otherwise





4
Dynamic SPECT Image Reconstruction Using a Basis Approach
60
For the remaining L2 -term we can directly compute the Fréchet-derivative, starting
∂ 2
cL2 ((0,T )) :
from the directional derivative for F (c) := 21 ∂t
1
t&0 t
(F (c + tv) − F (c))
Z T
2
2
1 ∂
∂
1
(c + tv) − 21 ∂s
c ds
= lim t
2
∂s
t&0
0
Z T
2
2
1 ∂
∂ ∂
1 2 ∂
c
+
t
c
v
+
t
v
−
= lim 1t
2 ∂s
∂s ∂s
2
∂s
t&0
0
Z T
2
1
∂
∂ ∂
= lim t
t ∂s
c ∂s v + 12 t2 ∂s
v ds
t&0
0
Z T
∂ ∂
c v ds
=
∂s ∂s
dv F (c) = lim
=
(4.70)
(4.71)
1
2
2
∂
c
∂s
ds
0
∂
∂
h ∂t
c, ∂t
viL2 ((0,T ))
(4.72)
(4.73)
(4.74)
(4.75)
∂2
= h− ∂t2 c, viL2 ((0,T ))
(4.76)
We have shown
Lemma 4.4 (Fréchet-derivative of squared L2 ).
∂ 2
The Fréchet-derivative of the functional F (c) := 21 ∂t
cL2 ((0,T )) is given by
2
−1,2
∂
((0, T ))
F 0 (c) = − ∂t
2c ∈ W
Since we are faced with a constrained minimization problem, we still have to consider
the conditions u ∈ S and c ≥ 0. For c we can apply the KKT conditions (see theorem
3.21). In the unconstrained minimization with respect to u we added the function
(
δS (u) =
0 if u ∈ S
∞ otherwise
to the variational model. Thus we can compute its subdifferential:
(4.77)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
61
Lemma 4.5 (Subdifferential of indicator functions).
For any closed set S ⊂ X of a Banach space X , the subdifferential is given by
(
∂δS (u) =
NS (u) if u ∈ S
∅
otherwise
where
NS (u) := {p ∈ X ∗ : hp, u − vi ≥ 0 ∀ v ∈ S}
denotes the so-called normal cone.
Proof.
We distinguish between two cases:
/S
Case 1: u ∈
⇒ ∂δS (u) =








p ∈ X ∗ : hp, ui = δS (u), hp, vi ≤ δS (v) ∀ v ∈ X

|
{z
} |
{z
}

(i)
(ii)
(ii) is always satisfied, whereas (i) implies that hp, ui = ∞. Hence, ∂δS (u) = ∅.
Case 2: u ∈ S
⇒ ∂δS (u) = {p ∈ X ∗ : δS (v) − δS (u) − hp, v − ui ≥ 0 ∀ v ∈ X }






∗
= p ∈ X : hp, v − ui ≤ δS (v) ∀ v ∈ X


|
{z
}


(iii)
For v ∈
/ S condition (iii) is always fulfilled. For v ∈ S, (iii) is equivalent to hp, u−vi ≥ 0.
Hence, it follows that for u ∈ S
∂δS (u) = {p ∈ X ∗ : hp, u − vi ≥ 0 ∀ v ∈ S}
Finally, we are almost well-prepared to state the optimality condition of the continuous
variational model given in (4.14). Therefore we need to state some rules for the subdifferential of a sum of functionals. This is done by the theorem of Moreau-Rockafellar
and can be found in [31].
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
62
Theorem 4.6 (Moreau-Rockafellar).
Let F1 , . . . , Fn be proper convex functionals on a Banach space X . Then
∂F1 (x) + · · · + ∂Fn (x) ⊂ ∂ (F1 + · · · Fn ) (x)
for every x ∈ X . If additionally all functionals F1 , . . . , Fn except for possibly one are
continuous at a point x0 ∈ dom (F1 ) ∩ . . . ∩ dom (Fn ), then
∂F1 (x) + · · · + ∂Fn (x) = ∂ (F1 + · · · Fn ) (x)
for every x ∈ X .
Every term in our functional J is convex except the KL data term, which is at least
convex in u for a fixed c (and the other way round). Thus to show that the theorem
holds in our case, we have to find u ∈ S and c ≥ 0, where every term of the functional
is continuous. Therefore, we can simply choose the constant functions uk (x) = K1 and
arbitrary ck (t) ≥ 0. With these tools, we are finally able to state the (necessary, but
not sufficient, since J is not convex) optimality condition for our minimization problem
(reminding the KKT conditions in theorem 3.21 for c ≥ 0):
Theorem 4.7 (Necessary optimality condition).
If the pair (u, c), u = (u1 , . . . , uK ), c = (c1 , . . . , cK )T is a minimizer of the model
(4.14), then
Z
T
0=
Z0
0=
Σ
1
cT R ∗ 1 −
g
Ruc
dt + αpBV + βpL + pδS
(Ru)T 1 −
g
Ruc
T
∂
dx − δ ∂t
2 c − (λ1 , . . . , λK )
2
ck ≥ 0 ∀ k = 1, . . . , K
λk ≥ 0 ∀ k = 1, . . . , K
λk ck = 0 ∀ k = 1, . . . , K
BV
BV
L1
L1
L1
L1
where pBV = pBV
,
.
.
.
,
p
,
p
∈
∂
|u
|
,
p
=
p
,
.
.
.
,
p
k
1
1
K
K , pk ∈ ∂ kuk kL1 (Ω)
k
BV (Ω)
and pδS ∈ ∂δS (u).
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
63
4.4.2 Source Condition
In addition to the optimality condition we derived in the previous section, another
interesting concept is the source condition as formulated in section 3.3.5. In our case,
we denote the exact solution of the inverse problem given by the functional
F (u, c) := R (uc) = g
(4.78)
as the pair (e
u, e
c). The sum of regularization terms of the unconstrainded model is given
by
H (u, c) := α
K
X
k=1
|uk |BV (Ω) + β
K
X
K
X
∂ 2
ck 2
+ δ+ (c)
kuk kL1 (Ω) + δS (u) +
∂t
L ((0,T ))
k=1
k=1
δ
2
(4.79)
Then the source condition (3.44) can be reformulated as
∃p ∈ ∂H (e
u, e
c) , ∃q ∈ L∞ (Σ × (0, T )) \ {0} : p = F 0 (e
u, e
c)∗ q
(4.80)
In order to derive the necessary optimality condition, we already computed the subgradient of the regularization parts, i.e.
1
∂u H (u, c) = αpBV + βpL + pδS
∂2
∂c H (u, c) = −δ ∂t2 c
(4.81)
(4.82)
(4.83)
Hence the source condition reads
1
BV
1
L1
L
L1
BV
uk kL1 (Ω) ,
,
p
∈
∂
|e
u
|
,
∃
p
=
p
,
.
.
.
,
p
, pLk ∈ ∂ ke
∃ pBV = pBV
,
.
.
.
,
p
k
1
K
K
k
1
BV (Ω)
∃ pδS ∈ ∂δS (e
u) :
BV
L1
δS
∂2
c = F 0 (e
u, e
c)∗ q for a q ∈ L∞ (Σ × (0, T )) \ {0}
αp + βp + p , −δ ∂t2 e
(4.84)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
64
This equation is equivalent to
1
u, e
c)∗ q
αpBV + βpL + pδS = Fu0 (e
(4.85)
∂
−δ ∂t
c = Fc0 (e
u, e
c) ∗ q
2e
(4.86)
2
Let us have a closer look at 4.86. Since c does not appear in the right side of the
equation, we can solve for c:
Z tZ
s
Fc0 (e
u, e
c)∗ q dz ds
Z0 t ZT s Z
(Ru)T q dx dz ds
= − 1δ
Z0 T ΣZ t Z s
T
1
= −δ
(Ru)
q dz ds dx
Σ
0
T
|
{z
}
=:Q(x,t)
Z
1
= −δ
(Ru)T Q dx
e
c=
− 1δ
(4.87)
(4.88)
(4.89)
(4.90)
Σ
Inserting this formula for c into equation 4.85 leads to
1
u, e
c) ∗ q
αpBV + βpL + pδS = Fu0 (e
Z T
cT R∗ q dt
=
0
Z
T
Z T
T
1
(Ru) Q dx R∗ q dt
=
−δ
0
Σ
Z TZ
= − 1δ
Q (Ru) R∗ q dx dt
0
Hence, we obtain the source condition
Σ
(4.91)
(4.92)
(4.93)
(4.94)
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
∃p
BV
=
BV
pBV
1 , . . . , pK
,
pBV
k
∈ ∂ |e
uk |BV (Ω) , ∃ p
L1
=
1
1
pL1 , . . . , pLK
65
1
, pLk ∈ ∂ ke
uk kL1 (Ω) ,
∃ pδS ∈ ∂δS (e
u) :
αp
BV
+ βp
L1
+p
δS
=
− 1δ
T
Z
0
Z
Q (Ru) R∗ q dx dt for a q ∈ L∞ (Σ × (0, T )) \ {0}
Σ
(4.95)
4.4.3 Existence
We now want to state the existence of a minimizer of the given variational model (4.14).
For this purpose, we apply the fundamental theorem of optimization 3.7. Hence, to
obtain existence of a minimizer of the functional J, we need to prove
n
o
K
K
1,2
• compactness of sub-level sets Sa = (u, c) ∈ BV (Ω) × W0 ((0, T )) : J (u, c) ≤ a
• lower semi-continuity
• convergence of the constraint
Let us first have a look at the compactness of sub-level sets. From the theorem of
Banach-Alaoglu 3.10 we know that, if BV (Ω) and W01,2 ((0, T )) are both the dual space
of a Banach space, it is enough to show that Sa is bounded in BV (Ω)K ×W 1,2 ((0, T ))K
to obtain compactness in the weak-* topology. This first requires
Lemma 4.8.
We define
Y = ∇ · g : g ∈ C0∞ Ω, Rd
with the norm
kpkY :=
inf
g∈C0∞ (Ω,Rd ), ∇·g=p
kgkL∞
Then BV (Ω) can be identified with the dual space of Y.
Proof.
The proof can be found in [14], Prop. 3.4
W 1,2 ((0, T )) is a reflexive Hilbert space, hence W 1,2 ((0, T )) and its dual space W −1,2 ((0, T ))
are isometrically isomorph to each other. Especially W −1,2 ((0, T )) is a Banach space
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
66
and therefore, the first condition of the Banach-Alaoglu theorem holds for W 1,2 ((0, T ))
[3].
Now that we have shown one of the conditions from the theorem of Banach-Alaoglu,
we need to prove the existence of a constant λ ∈ R+ such that
kukBV (Ω)K =
K
X
kuk kBV (Ω) ≤ λ, kckW 1,2 ((0,T ))K =
k=1
K
X
kck kW 1,2 ((0,T )) ≤ λ
(4.96)
k=1
This is done by the following result.
Theorem 4.9 (Compactness of sub-level sets).
Let u ∈ BV (Ω)K , c ∈ W01,2 ((0, T ))K and J (u, c) ≤ a, K ∈ N finite. Then there exists
λ ∈ R+ such that
kukBV (Ω)K ≤ λ, kckW 1,2 ((0,T ))K ≤ λ
and consequently the sub-level set
n
o
K
K
1,2
Sa = (u, c) ∈ BV (Ω) × W0 ((0, T )) : J (u, c) ≤ a
is not empty and compact in the weak-* topology.
Proof.
We start with a bound for kukBV (Ω)K . From the conditions, we have 0 ≤ J (u, c) ≤ a,
which is hence satisfied for every term of the sum, especially it follows that
α
K
X
|uk |BV (Ω) + β
k=1
K
X
kuk kL1 (Ω) ≤ a
k=1
so especially
K
X
|uk |BV (Ω)
k=1
K
a
a X
kuk kL1 (Ω) ≤
≤ ,
α k=1
β
and this implies
|uk |BV (Ω) ≤
a
a
, kuk kL1 (Ω) ≤ ∀ k = 1, . . . , K
α
β
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
67
Hence, it follows that there exists a constant λ1 ∈ R+ such that
kuk kBV (Ω) = kuk kL1 (Ω) + |uk |BV (Ω) ≤ λ1 ∀ k = 1, . . . , K
Finally we obtain a bound for the norm of u
kukBV (Ω)K =
K X
kuk kL1 (Ω) + |uk |BV (Ω) ≤ Kλ1 =: λ2 ∈ R+
k=1
For c, we have to find a bound for the W 1,2 -norm. For every ck , we have
kck k2W 1,2 ((0,T ))
Z
=
0
T
∂ 2
∂ 2
ck = kck k2L2 ((0,T )) + ∂t
ck L2 ((0,T ))
|ck |2 + ∂t
|
{z
} |
{z
}
(i)
(ii)
(ii) is bounded since J (u, c) ≤ a, so especially
δ
2
K
X
∂ 2
ck 2
∂t
L ((0,T ))
≤a ⇒
K
X
∂ 2
ck 2
∂t
L ((0,T ))
≤
2a
δ
k=1
k=1
and therefore there exists a constant λ3 ∈ R+ such that
∂ 2
ck 2
∂t
L ((0,T ))
≤ λ3
To obtain an upper bound for (i), we consider the Friedrichs inequality for W 1,2 . Since
(0, T ) is bounded with diameter T and ck (0) = 0, we obtain from the Friedrichs
inequality
∂ ck L2 ((0,T ))
kck kL2 ((0,T )) ≤ T ∂t
and it follows that there exists a constant λ4 ∈ R+ such that
∂ 2
kck k2L2 ((0,T )) ≤ T 2 ∂t
ck L2 ((0,T )) ≤ T 2 λ3 =: λ4
Adding together the bounds for (i) and (ii), we obtain
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
68
kck k2W 1,2 ((0,T )) ≤ λ4 + λ3 ⇒ kck kW 1,2 ((0,T )) ≤ λ5 ∈ R+
so finally there exists a bound for c in the W 1,2 -norm
kckW 1,2 ((0,T ))K =
K
X
kck kW 1,2 ((0,T )) ≤ Kλ5 =: λ6 ∈ R+
k=1
From the theorem of Banach-Alaoglu we obtain the compactness of the sub-level sets
in the sense of the weak-* topology.
In the next step we need to prove lower semi-continuity of J (u, c) with respect to the
weak-* topology on BV (Ω)K and W01,2 ((0, T ))K . Therefore we consider two sequences
(un )n in BV (Ω) with un *∗ u and (cn )n in W 1,2 ((0, t)) with cn * c (since W 1,2 ((0, T ))
is reflexive, weak convergence directly implies weak-* convergence). We initially show
the lower semi-continuity for the single parts of the functional.
Lemma 4.10 (KL is l.s.c.).
Let K be a linear and bounded operator satifying Kf ≥ 0 for any f ≥ 0 and equality
holds if and only if f = 0. Then, for any fixed g, the Kullback-Leibler divergence
KL (Kf, g) is lower semi-continuous with respect to the weak topology.
Proof.
The proof can be found in [29], Lemma 3.4
Now let (un )n be a sequence with un *∗ u in BV (Ω) and (cn )n with cn * c in
W01,2 ((0, T )). Then, (un )n is also strongly convergent in L1 : Due to the compact
embedding BV (Ω) ⊂⊂ L1 (Ω) (cp. 3.23) every sequence in a bounded subset of BV (Ω)
has a subsequence which is Cauchy in L1 (Ω). BV (Ω) is a Banach space and thus,
every Cauchy sequence converges. Since (un )n is convergent itself, every subsequence
converges to the same limit and therefore we obtain
kun − ukL1 (Ω) → 0
which proves the strong L1 -convergence, especially weak L1 -convergence of the sequence
(un )n . From cn * c in W01,2 ((0, T )) we obtain that cn → c in L1 . With these two results
we can show that the product of un and cn converges as well:
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
T
Z Z
Ω
|un (x) cn (t) − u (x) c (t)| dtdx
(4.97)
|un (x) cn (t) − un (x) c (t) + un (x) c (t) − u (x) c (t)| dtdx
(4.98)
0
Z Z
T
=
Ω
69
0
Z Z
T
|un (x) (cn (t) − c (t))| + |c (t) (un (x) − u (x))| dtdx
Z
Z T
Z
Z T
|c (t)| |un (x) − u (x)| dx dt
|cn (t) − c (t)| dt dx +
≤
|un (x)|
0
0
Ω
|
|Ω
{z
}
{z
}
=
Ω
(4.99)
0
→ 0
(4.100)
→ 0
→ 0
(4.101)
Hence, it follows that un cn → uc in L1 (Ω × (0, T )) and especially un cn * uc. From
lemma 4.10, it follows that
KL (Ruc, g) ≤ liminf KL (Run cn , g)
(4.102)
n
for un *∗ u in BV (Ω) and cn * c in W01,2 ((0, T )). The proof holds for (un )n ∈
BV (Ω)K and (cn )n ∈ W01,2 ((0, T ))K .
A similar result for the BV seminorm can be found in [14] and is shortly repeated
here.
Lemma 4.11 (BV is l.s.c.).
|·|BV (Ω) is lower semi-continuous with respect to the weak-* topology.
Proof.
Let (un )n be a sequence in BV (Ω) with un *∗ u and let (ϕk )k ∈ C0∞ Ω, Rd be a
sequence with kϕk k∞ ≤ 1 and
Z
Z
u∇ · ϕk dx →
Ω
u∇ · v = |u|BV (Ω)
sup
v∈C0∞ (Ω,Rd ),kvk∞ ≤1
Ω
From the weak-* convergence of uk , we obtain that for every w ∈ Y
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
70
Z
Z
un wdx →
Ω
uw
Ω
so especially with the choise of ϕk
Z
Z
u∇ · ϕk dx = lim
un ∇ · ϕk dx
Z
= liminf
un ∇ · ϕk dx
n
Ω
Ω
n
Ω
Z
≤ liminf
n
un ∇ · ϕk dx
sup
ϕk ∈C0∞ (Ω,Rd ),kϕk k∞ ≤1
Ω
= liminf |un |BV (Ω)
n
Z
⇒
u∇ · ϕk dx = |u|BV (Ω) ≤ liminf |un |BV (Ω)
n
Ω
Thus |·|BV (Ω) is lower semi-continuous in the weak-* topology.
Next we want to state the lower semi-continuity of the L1 -term. Therefore we make
use of the following theorem.
Lemma 4.12 (Norm is l.s.c.).
Every norm k·k on a Banach space X is lower semi-continuous with respect to the
weak topology.
Proof.
Let (xn )n ∈ X be a sequence with xn * x. Then we get from a corollary from the
Hahn-Banach theorem (see for example [40], Corollary III.1.6) that for every x ∈ X ,
there exists x∗ ∈ X ∗ such that kx∗ k = 1 and x∗ (x) = kxk. Then it follows that
(∗)
kxk = |x∗ (x)| = lim |x∗ (xn )| = liminf |x∗ (xn )| ≤ liminf kxn k
n→∞
n→∞
where the inequality (∗) holds because
|x∗ (xn )| = kx∗ (xn )k ≤ kx∗ k kxn k = kxn k
n→∞
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
71
since x∗ is a bounded operator.
Now we are able to prove the weak-* lower semi-continuity of the L1 -term.
Lemma 4.13 (L1 -norm is l.s.c.).
k·kL1 (Ω) is lower semi-continuous in BV (Ω) with respect to the weak-* topology.
Proof.
Let (un )n be a sequence in BV (Ω) with un *∗ u. We have already shown that (un )n
converges strongly (and especially also weakly) in L1 . From the previous result, we
know that every norm on a Banach space is weakly lower semi-continuous, hence, from
the weak convergence in L1 , we directly obtain that
kukL1 (Ω) ≤ liminf kun kL1 (Ω)
n
for un *∗ u in BV (Ω).
Finally we state the result for the remaining L2 -term. This requires
Lemma 4.14.
Let J : X → R ∪ {+∞} be a convex functional on a reflexive Banach space X . Then
J is lower semi-continuous in the sense of the weak topology on X .
Proof.
We assume that J is not weakly lower semi-continuous. Then, there exists a sequence
(xk )k in X with xk → x and J (x) > lim J (xk ). J is convex, hence the epigraph
k
epi (J) := {(x, a) ∈ X × R : J (x) ≤ a}
is convex. Since J (x) > lim J (xk ), there exists b ∈ R such that
k
J (x) > b > lim J (xk )
k
hence, it follows that (x, b) ∈
/ epi (J). From the Hahn-Banach theorem (see for example
[40], Corollary III.1.6) we obtain that there exist c, d ∈ R and p ∈ X ∗ such that
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
72
cb + hp, xi + d ≤ 0
ca + hp, yi + d ≥ 0
for every (y, a) ∈ epi (J). If we choose a = J (x), it follows from the two inequalities
that
cb ≤ cJ (x)
and therefore, c > 0. Hence, we can divide by c and combine the inequalities, where
we set a = J (xk ) and y = xk :
b + 1c hp, xi ≤ J (xk ) + 1c hp, xk i
If we take the limit on the right side, this leads to
b + 1c hp, xi ≤ lim J (xk ) + 1c hp, xk i
k
= lim J (xk ) + 1c lim hp, xk i
k
k
= lim J (xk ) + 1c hp, xi
k
where the last step arises from the weak convergence of xk . We obtain
b ≤ lim J (xk )
k
which leads to a contradiction.
Lemma 4.15 (Squared L2 -norm is l.s.c.).
∂ 2
c 2
is lower semi-continuous in W 1,2 ((0, T )) with respect to the weak topol∂t
L ((0,T ))
ogy.
Proof.
∂ Let c ∈ W 1,2 ((0, T )). Then ∂t
cL2 ((0,T )) is convex, since for β ∈ (0, 1) we have
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
73
∂
∂
β c1 + (1 − β) ∂ c2 2
(βc1 + (1 − β) c2 ) 2
=
∂t
∂t
∂t
L ((0,T ))
L ((0,T ))
∂ ∂ ≤ β ∂t c1 L2 ((0,T )) + (1 − β) ∂t
c2 L2 ((0,T ))
∂ 2
2
is convex as well.
Convexity holds for squares of convex functionals, hence ∂t
L ((0,T ))
From theorem 4.14, we know that every convex functional on a reflexive Banach space
is weakly lower semi-continuous. Since W 1,2 ((0, T )) is reflexive, we directly obtain
lower semi-continuity in the sense of the weak-* topology on W 1,2 ((0, T )).
Finally we have shown the condition for every part of the functional J and are now
able to prove the central result:
Theorem 4.16 (Lower semi-continuity).
Let u ∈ BV (Ω) and c ∈ W 1,2 ((0, T )). Then the functional J is lower semi-continuous
with respect to the weak-* topology.
Proof.
We have already shown in 4.10 that the Kullback-Leibler divergence is weakly lower
semi-continuous, likewise |·|BV (Ω) (see 4.11), k·kL1 (Ω) (see 4.13) and the L2 -term (see
4.15). Hence, it remains to be shown that any sum of weak-* lower semi-continuous
functionals is weak-* lower semi-continuous. Therefore let J1 and J2 be two weak-*
lower semi-continuous functionals, i.e. for a weak-* convergent sequence uk *∗ u we
have
J1 (u) ≤ liminf J1 (uk ) , J2 (u) ≤ liminf J2 (uk )
k
k
Then it follows that
J1 (u) + J2 (u) ≤ liminf J1 (uk ) + liminf J2 (uk ) ≤ liminf (J1 (uk ) + J2 (uk ))
k
k
k
where the last inequality holds because of the superadditivity of the limit inferior.
Since J consists of a sum of weak-* lower semi-continuous functions, J is weak-* lower
semi-continuous itself.
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
74
Finally we need to prove the convergence of the constraint to finish the proof of existence. For the positivity constraint for c this is trivial. For u, we provide the following
result. Note that with the convergence in the sense of the weak-* topology, we can
only state that for the weak-* limit u, the constraint conditions are satisfied for almost
every x ∈ Ω (meaning that the set of x for which the conditions are vulnerated is a
Lebesgue null set).
Theorem 4.17 (Convergence nof the constraint).
o
P
Let (uj )j be a sequence in S := v ∈ BV (Ω)K : K
v
(x)
=
1
∀x,
v
(x)
∈
[0,
1]
∀x
k
k=1 k
with uj *∗ u in BV (Ω)K for j → ∞. Then u ∈ S.
Proof.
We need to verify the two conditions of the constraint for u:
(i) From uj *∗ u in BV (Ω)K , we obtain strong convergence in L1 (Ω)K , hence
K Z
X
j
u (x) − uk (x) dx → 0
k
k=1
Ω
Z X
K
j
u (x) − uk (x) dx → 0
⇒
k
⇒
⇒
⇒
Ω k=1
K
X j
u (x) − uk (x) → 0 a.e.
k
k=1
K
K
X
X
ujk (x) →
uk (x) a.e.
1=
k=1
k=1
K
X
uk (x) = 1 a.e.
k=1
(ii) In order to show that uk (x) ∈ [0, 1] a.e., let M ⊂ Ω be a subset which is not
a Lebesgue null set and assume that uk (x) < 0 for x ∈ M . From the strong L1 convergence of ujk → uk , we obtain on the one hand side that
Z
Ω
and on the other hand side
j
u (x) − uk (x) dx → 0
k
4
Dynamic SPECT Image Reconstruction Using a Basis Approach
Z
Ω
75
Z j
u (x) − uk (x) dx
k
| {z }
M | {z }
<0
≥0
Z
j
u (x) + |uk (x)| dx > 0
=
k
| {z }
M | {z }
j
u (x) − uk (x) dx ≥
k
≥0
>0
since M is not a Lebesgue null set, which is a contradiction to the strong L1 -convergence.
Thus, u satisfies the constraint conditions for almost every x ∈ Ω.
76
5 Numerical Realization
In this chapter, we want to have a closer look at a suitable numerical reconstruction method of our previously introduced variational problem. We apply a forwardbackward type algorithm based on an expectation-maximization (EM) method, which
is outlined in section 5.1. For comparison, we introduce a second method in section
5.2, namely a primal-dual algorithm designed for constraint minimization problems.
We will compare the numerical results for both methods in the following chapter.
5.1 A Forward-Backward EM-Type Algorithm
The content of this section as well as the algorithm used for our application is based on
[14] and [17]. We want to provide a step-by-step understanding of the applied method
and the arising subproblems. Therefore, the section is organized as follows: In section
5.1.1 we remind the reader of the basic concepts of the expectation maximization
algorithm and its application to our problem. In section 5.1.2, we handle the method
including the regularization. The arising subproblems from the main algorithm are
solved within section 5.1.3.
5.1.1 The EM Algorithm
We aim at solving the previously introduced variational model
K
X
K
δX
k∇t Ck k22 + δ+ C T
argmin KL RU C , g + α
k∇Uk k1 + β kU k1 + δS (U ) +
2 k=1
U,C T
k=1
(5.1)
T
simultaneously with respect to both variables U and C T . This means we divide the
5
Numerical Realization
77
model into two parts, which are optimized alternatingly, so we consider the two minimization problems
T
argmin KL RU C , g + α
U
δ
argmin KL RU C T , g +
2
CT
K
X
k=1
K
X
k∇Uk k1 + β kU k1 + δS (U )
(5.2)
k∇t Ck k22 + δ+ C T
(5.3)
k=1
To find the minimum value of the Kullback-Leibler divergence, disregarding the regularization parts, a popular technique is the so-called EM algorithm [14]. For instance,
in the non-regularized case of static SPECT reconstruction, the minimization problem
to be solved is given by
argmin KL (Rf, g)
(5.4)
f ≥0
Applying the Karush Kuhn Tucker (KKT) conditions for the non-negativity constraint
will lead to the existence of a Lagrange multiplier λ, so the solution has to satisfy the
two equations
∗
g
0=R 1−
−λ
Rf
(5.5)
0 = λf
(5.6)
which can be solved by the iteration scheme
fk+1
fk
= ∗ R∗
R1
g
Rfk
(5.7)
In this simple case, R acts as a linear operator on f . For our application, we are
confronted with the Kullback Leibler divergence between RU C T and g. Therefore we
need to ’directize’ the problem to allow a more convenient implementation. R can
be interpreted as a downscaled 2-dimensional matrix emerging from a 3-dimensional
(m × n × N )-matrix R, which provides the measured data g through
5
Numerical Realization
78
R (:, :, j) f (:, j) = g (:, j) ∀ j = 1, . . . , N
(5.8)
In other words, R arises from R by applying a downscaling operator D↓ on R, i.e.
D↓ (R) = R.
For our application in Matlab, we use a three-dimensional matrix to produce the data.
To allow a simple updating procedure for the algorithm, we aim at writing RU C T = g
as a matrix vector equation. In other words, we want to find A and B such that
vec RU C T = A · vec (U ) and vec RU C T = B · vec C T . Let us consider the socalled Kronecker product [21].
Definition 5.1 (Kronecker product).
Let A be a (n × m)-matrix and B a (k × l)-matrix. Then the Kronecker product of A
and B is defined by


a11 B . . . a1m B
 .
.. 
..
A ⊗ B := 
. 


an1 B . . . anm B
where A ⊗ B is a (nk × ml)-matrix.
Using the Kronecker product, we are able to rewrite the expression RU C T = g as a
matrix-vector equation taking the following result into consideration:
Theorem 5.2.
Let A be a (n × m)-, B a (m × k)- and C a (l × k)-matrix. Then solving the matrix
equation ABC T = D is equivalent to solving
(C ⊗ A) vec (B) = vec (D)
where vec(B) is a column vector arising from the column-by-column vectorization of
the matrix B.
Proof.
See [21].
A special case of the equation above is U C T = f . If we consider A := En , the (n × n)identity matrix, we have
5
Numerical Realization
79
f = U C T ⇔ vec (f ) = vec U C T = (C ⊗ En ) vec (U )
(5.9)
Subsequently we can rewrite the expression RU C T as follows (keeping in mind that
for the application we have R (:, :, j) f (:, j) = g (:, j) ∀ j = 1, . . . , N ):

 
 
RU C T (:, 1)
g (:, 1)
R (:, :, 1) ·





.
.
 =  ..  = 
..
vec RU C T = 

 
 
T
RU C (:, N )
g (:, N )
R (:, :, N ) ·

U C T (:, 1)

..

.

T
U C (:, N )
(5.10)

R (:, :, 1)

=

|


U C T (:, 1)


..
..


.
.


T
U C (:, N )
R (:, :, N )
{z
}
(5.11)
e
=:R
e · vec U C T = R
e · (C ⊗ En ) ·vec (U )
=R
|
{z
}
(5.12)
=:A
Finally we obtain the linear operator A ∈ RmN ×nK , which turns the vectorized indica
tor function matrix U into vec RU C T = vec (g).
For the minimization with respect to C T , we need to apply the same trick on C T ,
which leads to the operator B: According to theorem 5.2, we can rewrite U C T = f as
f = U C T ⇔ vec (f ) = vec U C T = (EN ⊗ U ) vec C T
(5.13)
and therefore


U C T (:, 1)


..
T
e
e
e · (EN ⊗ U ) ·vec C T (5.14)
 = R·vec
vec RU C T = R·
U
C
=
R
.


|
{z
}
T
=:B
U C (:, N )
Defining the two linear operators A and B finally allows us to denote the EM algorithm
5
Numerical Realization
80
applied to the minimization problem of the Kullback-Leibler divergence with respect
to U and C T :
argmin KL (A · vec (U ) , vec (g))
(5.15)
U ≥0
argmin KL B · vec C T , vec (g)
(5.16)
C T ≥0
Hence, the EM algorithm for the non-regularized functional solved alternatingly with
respect to U and C T is given by the iteration scheme
Algorithm 1 Alternating EM algorithm
Require: U0 , C0T > 0
1: while (k < maxiteration) and (error > tolerance)
do
T
2:
(i) Compute Ak s.t. Ak vec (Uk ) = vec
RUkCk
vec(Uk ) T
vec(g)
A Avec(U
AT 1
k)
T
Compute Bk s.t. Bk vec CkT = vec
U
C
k+1
k
vec(CkT ) T
vec(g)
T
Update vec Ck+1 = B T 1 B
Bvec(C T )
3:
(ii) Update vec (Uk+1 ) =
4:
(iii)
5:
(iv)
k
6:
7:
(v) k ← k + 1
end while
5.1.2 The Regularized Problem
Let us now proceed with a closer look at the regularized functional and provide an
algorithm for the whole model based on the method proposed in [14], chapter 3.1.
For the sake of simplicity, we first consider the simple case of reconstructing f out of
Rf = g with the regularized variational model
argmin KL (Rf, g) + Reg (f )
(5.17)
f ≥0
where Reg (f ) contains all regularization terms. The optimality condition can be easily
computed as
g
R 1−
+ p = 0, p ∈ ∂Reg (f )
Rf
T
(5.18)
5
Numerical Realization
81
where ∂Reg is the subgradient of the regularization terms. So a simple iteration scheme
according to the EM algorithm is given by
1
1 − T RT
R 1
g
Rfk
+
1
pk+1 = 0, pk+1 ∈ ∂Reg (fk+1 )
RT 1
(5.19)
This iteration scheme is problematic since the update of the primal variable fk+1 does
not appear directly in the equation, so we still have to find an explicit iteration formula. Furthermore, this scheme does not ensure the solution to be positive, which will
be necessary for the later application on C T . To overcome both problems, one can
, which leads to
substitute the constant 1 by fk+1
fk
fk+1
fk
= T RT
R 1
g
Rfk
−
fk
pk+1 , pk+1 ∈ ∂Reg (fk+1 )
RT 1
(5.20)
If we now separate the reconstruction (EM) step from the regularization step, equation
(5.20) can be written as
(i) fk+ 1
2
fk
= T RT
R 1
(ii) fk+1 = fk+ 1 −
2
g
Rfk
(5.21)
fk
pk+1 , pk+1 ∈ ∂Reg (fk+1 )
RT 1
(5.22)
The second part (ii) can be interpreted as an iteration scheme for a modified denoising
problem with a weighted Gaussian data term and the regularization term included in
Reg (f ):
fk+1 = argmin
f
2
Z RT 1 f − f 1
k+
1
2
2
Ω
fk
+ Reg (f )
(5.23)
This two-step iteration scheme is similar to a modified forward-backward splitting
method, so we denote it as a forward-backward EM algorithm. An advantage of the
two-step structure of this method is the fact that we are able to additionally control the
relation between the current iterate fk , which already includes the regularization part,
and the current reconstruction update emerging from the first step. For this purpose,
we can introduce an additional weighting parameter ωk and replace the update (5.22)
by a damped update
5
Numerical Realization
82
fk+1
fk
= (1 − ωk ) fk + ωk fk+ 1 − α T pk+1
2
R 1
(5.24)
Hence, we obtain a three-step iteration scheme given by
(i) fk+ 1
2
fk
= T RT
R 1
g
Rfk
(5.25)
(ii) fek+ 1 = ωk fk+ 1 + (1 − ωk ) fk
2
2
2
Z RT 1 f − fe 1
k+ 2
1
(ii) fk+1 = argmin
+ ωk Reg (f )
2 Ω
fk
f
(5.26)
(5.27)
Before we have a closer look at the solution of the subproblem (5.27), let us denote
the main algorithm for our model, considering our direct representation A · vec (U ),
respectively B · vec C T , of vec RU C T . The two resulting denoising subproblems
arising in the last step of the described method for both U and C T are
argmin
U
argmin
CT
2
Z AT 1 U − U
e 1
k+
1
2
2
Ω
Uk
+ ωk α
2
Ω
CkT
k∇Ui k1 + ωk β kU k1 + ωk δS (U )
(5.28)
i
2
Z BT 1 C T − C
eT 1
1
k+
2
X
+ ωk
δX
k∇t Ci k22 + ωk δ+ C T
2 i
(5.29)
Hence, the algorithm for the regularized problem is denoted in algorithm 2.
5.1.3 The Weighted Denoising Subproblems
Now we have to deal with the solution of the arising subproblems (5.28) and (5.29).
The proposed solution is based on a primal-dual algorithm introduced by Chambolle
and Pock in [17].
We generally consider a primal minimization problem, which can be written as
5
Numerical Realization
83
Algorithm 2 Alternating forward-backward EM regularized algorithm
Require: U0 , C0T > 0, α, β, δ
1: while (k < maxiteration) and (error > tolerance) do
2:
Solve for U :
T
3:
(i) Compute Aks.t. Ak vec (Uk ) = vec
RU
C
k
k
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
k)
(ii) Update vec Uk+ 1 = vec(U
ATk Akvec(g)
vec(Uk )
AT
1
2
k
Reshape vec Uk+ 1 to matrix
2
e 1 = ωk U 1 + (1 − ωk ) Uk
(iii) Update U
k+ 2
k+ 2
(iv) Solve subproblem (5.28) → Uk+1
Solve for C T :
T
(v) Compute Bk s.t. Bk vec CkT = vec
RU
C
k+1
k
vec C T
( k) T
vec(g)
T
(vi) Update vec Ck+ 1 = B T 1 Bk B vec C T
( k)
k
k
2
T
Reshape vec Ck+
to matrix
1
2
fT 1 = ωk C T 1 + (1 − ωk ) C T
(vii) Update C
k+ 2
k+ 2
(viii) Solve subproblem (5.29) →
k ←k+1
end while
k
T
Ck+1
argmin F (Kx) + G (x)
(5.30)
x∈X
where K : X → Y is a continuous linear operator, X, Y finite-dimensional real vector
spaces. The associated dual problem is therefore
argmax − (F ∗ (y) + G∗ (−K ∗ y))
(5.31)
y∈Y
F ∗ and likewise G∗ , K ∗ represent the convex conjugate of the corresponding primal
functions. By combining the primal and dual problem, we attain the common saddle
point problem, i.e.
argmin argmax hKx, yi − F ∗ (y) + G (x)
x∈X
(5.32)
y∈Y
If we split this expression into subproblems referring to either x or y, this leads to
5
Numerical Realization
84
ŷ = argmax − F ∗ (y) + hKx, yi
(5.33)
y∈Y
= argmin F ∗ (y) + hKx, −yi
(5.34)
y∈Y
x̂ = argmin G (x) + hKx, −ŷi
(5.35)
x∈X
This is the final setting on which we want to apply a forward-backward splitting method
using the proximity operator. For a general minimization problem consisting of a sum
of two functions f and g, such as
argmin f (x) + g (x)
(5.36)
x
it can be shown (see [19]) that the solution is given by the fixed point equation
x = proxγf (x − γ∇g (x))
(5.37)
which makes use of the so-called proximity operator
proxf (x) := argmin f (y) +
y
1
ky − xk2
2
(5.38)
Therefore, the following iteration scheme arises intuitively:
xk+1 = proxγf (xk − γ∇g (xk ))
(5.39)
Adopting this method to our saddle point problem (5.32) finally leads to a three step
algorithm, where the last step performs a relaxation between the old and the new
iterate:
5
Numerical Realization
85
yk+1 = proxσF ∗ (yk − σ∇y hK x̂k , −yi)
(5.40)
= proxσF ∗ (yk + σK x̂k )
xk+1 = proxτ G (xk − τ ∇x hKx, −yk+1 i)
= proxτ G (xk − τ K ∗ yk+1 )
x̂k+1 = xk+1 + θ (xk+1 − xk )
(5.41)
(5.42)
In the update (5.40), we need to compute the convex conjugate of the functional F .
To avoid the direct computation, the famous Moreau’s identity provides a solution
which is quite easy to handle. It indicates the relationship between primal and dual
minimization (see [17]):
Theorem 5.1.1. (Moreau’s identity)
x = proxτ f (x) + τ prox 1 f ∗ xτ
τ
Now we want to look back to our weighted denoising subproblems (5.28) and (5.29)
arising from algorithm 2. For both we want to specify F , G and K, so that we are able
to apply the previously proposed method. For (5.28), we define
FU (KU U ) := ωk α
X
k∇Ui k1 + ωk β kU k1 + ωk δS (U )
(5.43)
i
GU (U ) :=
2
Z AT 1 U − U
e 1
k+
1
2
2
Ω
Uk
KU U := (∇U, U, U )T
(5.44)
(5.45)
and, in the same way, for subproblem (5.29)
δX
FC KC C T := ωk
k∇t Ci k22 + ωk δ+ C T
2 i
2
T
Z BT 1 C T − C
e
1
k+ 12
GC C T :=
2 Ω
CkT
T
KC C T := ∇C T , C T
(5.46)
(5.47)
(5.48)
5
Numerical Realization
86
According to the iteration scheme described above, we obtain an algorithm for solving
(5.28) with respect to U. For the sake of simplicity we precomputed the input argument
of the first proximity step. Furthermore, we apply Moreau’s identity within the second
update, which decomposes into three independent parts due to the three-dimensionality
of the linear operator KU . The full algorithm is denoted in algorithm 3.
Algorithm 3 Algorithm for subproblem (5.28)
Require: U0 , y0 , v0 , α, β
1: while (k < maxiteration) and (error > tolerance) do
2:
(i) Compute zk = yk + σKU Uk
2
3:
(ii) Update yk+1 = zk − σ argmin σ1 FU (y) + 12 y − σ1 zk 2
y
4:
(iii) Update vk+1 = argmin τ GU (v) + 21 kv − (vk − τ KU∗ yk+1 )k22
v
5:
6:
7:
(iv) Update Uk+1 = vk+1 + θ (vk+1 − vk )
k ←k+1
end while
There are still two minimization problems which have so be solved in step (ii) and
(iii). As mentioned before, the first one consists of three minimization subproblem due
1
2
to the fact that the iterate yk+1 actually consists of three components yk+1
, yk+1
and
3
yk+1
. Therefore, we are able to solve the minimization with respect to each component
independently. In the first and second case, this leads to a simple L2 -norm data term
plus L1 -norm regularization, for which the solution is easily computed by the well1
2
known soft shrinkage operator. The subproblems for yk+1
and yk+1
are
2
1
yk+1
= zk1 − σ argmin σ1 wk α y 1 1 + 21 y 1 − σ1 zk1 2
(5.49)
2
2
yk+1
= zk2 − σ argmin σ1 wk β y 2 1 + 21 y 2 − σ1 zk2 2
(5.50)
y1
y2
3
In the case of yk+1
, we are faced with one data term plus the projection onto the convex
set S, which can be easily solved:
2
3
yk+1
= zk3 − σargmin σ1 wk δS y 3 + 21 y 3 − σ1 zk3 2
y3
Here, the solution of the minimization problem is simply given by
(5.51)
5
Numerical Realization
87
y 3 = P rojS
1 3
z
σ k
(5.52)
In step (iii), the new iterate vk+1 is given by the minimizing argument of two L2 norm data terms, so it can be determined directly by solving the following pointwise
multiplication equation with respect to v:
AT
k1
Uk
+
1
τ
v=
AT
k1 e
Uk+ 1
Uk
2
+ τ1 (vk − τ KU∗ yk+1 )
(5.53)
The similar method for the second subproblem (5.29) with respect to C T is derived
equivalently and summarized in algorithm 4. As in the first one, the update in step
(ii) can be done by solving two separated minimization problems, of which one is
composed of two L2 data terms and one of a data term and the projection onto the
positive orthant:
2
2
1
yk+1
= zk1 − σ argmin σ1 wk 2δ y 1 2 + 21 y 1 − σ1 zk1 2
(5.54)
2
2
yk+1
= zk2 − σ argmin σ1 wk δ+ y 2 + 12 y 2 − σ1 zk2 2
(5.55)
y1
y2
As before, both allow the direct computation of a minimizer. In the same way the
update in step (iii) can be done by direct computation.
Algorithm 4 Algorithm for subproblem (5.29)
Require: C0T , y0 , v0 , δ
1: while (k < maxiteration) and (error > tolerance) do
2:
(i) Compute zk = yk + σKC CkT
2
3:
(ii) Update yk+1 = zk − σ argmin σ1 FC (y) + 12 y − σ1 zk 2
y
4:
(iii) Update vk+1 = argmin τ GC (v) + 12 kv − (vk − τ KC∗ yk+1 )k22
v
5:
6:
7:
T
(iv) Update Ck+1
= vk+1 + θ (vk+1 − vk )
k ←k+1
end while
Finally, the whole reconstruction method as summarized in algorithm 2 consists of
two alternating update steps, of which each contains a full updating procedure via
algorithm 3, respectively 4.
5
Numerical Realization
88
5.2 A Primal-Dual Algorithm for Constrained
Minimization
Like in the previous section, we are facing the two minimization problems (5.2) and
(5.3), which can be solved alternatingly. In this section, we are going to present an
alternative way of solving (5.2) with respect to U , which makes use of a primal-dual
method especially designed for a (constrained) problem of the general form
argmin F ◦ B (U ) + G (U )
(5.56)
U ∈S
The following algorithm for the unconstrained problem was first presented by Chen,
Huang and Zhang in [18] and then adapted to the constrained version.
A simple way of providing an iteration scheme is given by a forward-backward splitting method introduced by Combettes and Wajs [19], which makes use of the proximity
operator (see (5.38)). We assume that G has a Lipschitz-continuous gradient with Lipschitz constant β1 , β ∈ (0, ∞). Then the iteration scheme is given by
Uk+1 = P roxγF ◦B (Uk − γ∇G (Uk ))
(5.57)
with a proximal constant γ ∈ (0, 2β). To apply this general framework to our model,
we simplify the model by defining G (U ) = 0, so that all terms are included in F , and
therefore avoid the computation of ∇G and its Lipschitz constant. Hence, we use for
the application the setting
T
F (BU ) := KL RU C , g + α
K
X
k∇Uk k1 + β kU k1
(5.58)
k=1
BU := U C T , ∇U, U
T
(5.59)
The remaining iteration scheme simply consists of computing the proximity operator
proxγF ◦B (Uk ) (while in case of G (U ) 6= 0, there is an additional forward step which
includes the computation of the gradient of G and is the reason why we speak of a
forward-backward type method). Since there is no explicit expression of the proximity
5
Numerical Realization
89
operator of γF ◦ B, we have to find a way to further describe the new update Uk+1 .
Therefore we define the operator
HUk (v) := I − prox 1 F BUk + I − λBB T v
λ
(5.60)
where λmax is the largest eigenvalue of BB T and λ ∈ (0, λmax ). Then, one can prove
that, if v ∗ is a fixed point of HUk , i.e. v ∗ = H (v ∗ ), then proxγF ◦B (Uk ) = Uk − λB T v ∗
(see [18]). Therefore, within each outer iteration, we need to compute the fixed point
of HUk to update the variable Uk . This results in several inner iterations per each outer
iteration step to reach a good approximation of the fixed point. Thus, an intuitive idea
to accelarate the method is to simply perform one inner fixed point update using the
old iterate as an initial guess and then directly proceeding with the update of U . So
basically we obtain a two-step iteration scheme:
(i) vk+1 = I − P rox 1 F BUk + I − λBB T vk
λ
(ii) Uk+1 = Uk − λB T vk+1
(5.61)
(5.62)
Now we finally want to consider the constrained version of our general problem (5.56).
This constraint can easily be respected by inserting a projection term into our algorithm
framework. For this purpose we first rewrite the operator H to
HUk (vk ) = I − P rox 1 F
BUk + I − λBB T vk


(5.63)


= I − P rox 1 F B Uk − λB T vk +vk 
λ
|
{z
}
(5.64)
λ
=:yk+1
Finally, we obtain a three-step algorithm for the variable U by
5
Numerical Realization
90
(i) yk+1 = P rojS Uk − λB T vk
1
(ii) vk+1 = I − P rox F (Byk+1 + vk )
λ
(iii) Uk+1 = P rojS Uk − λB T vk+1
(5.65)
(5.66)
(5.67)
The full alternating primal-dual updating process is shown in algorithm 5. The remaining minimization subproblems from vk+1 are split into two L2 -L1 -models, which can
easily be solved using the soft shrinkage operator, and one functional containing the
Kullback-Leibler data term plus L2 -norm. For the latter, we again applied the forwardbackward EM-type method presented in the previous section to split the problem and
replace the KL data term by another weighted L2 -norm Gaussian term.
Algorithm 5 Primal-dual algorithm for constrained minimization
Require: U0 , C0T > 0, α, β, δ
1: while (k < maxiteration) and (error > tolerance) do
2:
Solve for U :


vec Uk CkT
3:
(i) Compute Bk s.t. Bk vec (Uk ) =  vec (∇Uk ) 
vec (Uk )
T
4:
(ii) Update vec
(y
)
=
P
roj
vec
(U
)
−
λB
vec
(v
)
k+1
S
k
k
k
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
Reshape vec Uk+ 1 to matrix
2
(iii) Solve subproblem 5.66 → vk+1
(iv) Update vec (Uk+1 ) = P rojS vec (Uk ) − λBkT vec (vk+1 )
Reshape vec (Uk+1 ) to matrix
Solve for C T :
T
(v) Compute Dk s.t. Dk vec CkT = vec
RU
C
k+1
k
vec C T
( k) T
vec(g)
T
(vi) Update vec Ck+ 1 = DT 1 Dk D vec C T
( k)
k
k
2
T
Reshape vec Ck+
to matrix
1
2
fT 1 = ωk C T 1 + (1 − ωk ) C T
(vii) Update C
k+ 2
k+ 2
(viii) Solve subproblem 5.29 →
k ←k+1
end while
k
T
Ck+1
91
6 Computational Results
In this chapter, we want to present some results of the previously presented methods on
artificial data. For this purpose, we created two different data sets which are presented
in section 6.1. Afterwards, we show the recontructed images and error measures for
both algorithms in section 6.2. A comparison of both methods is given in section
c
6.3. All the results (with some marked exceptions) were done with MATLAB
(The
TM
MathWorks , Inc., Natick, MA).
6.1 Test Data
To test our reconstruction method, we first created two different image phantoms consisting of 64 × 64 pixels and containing a different number of subregions. Based on
these subregions, we created three-dimensional matrices containing the true image in
one time step within each slice. The underlying concentration in each subregion is
related to a realistic shape of a the time-dependent behaviour of the tracer in different
tissue types.
The first data set consists of three subregions, which are arranged in two circles (figure
6.1 (a)). The outer region represents the surrounding area around the object of interest, where the concentration equals zero. The concentration in every subregion of the
first data set is shown in figure 6.2 (a).
The second artificial data set is also a 64 × 64 pixel image, which consists of four
subregions with a slightly higher complexity (figure 6.1 (b)). As before, the tracer
concentration in the outer region is equal to zero. The two smaller circles are assumed
to belong to the same tissue type and therefore to the same subregion. Figure 6.2 (b)
shows the concentration curves.
To receive the artificial SPECT data, we applied a Radon transform matrix written
6
Computational Results
(a) First data set
92
(b) Second data set
Figure 6.1: Image phantoms of different data sets
(a) First data set
(b) Second data set
Figure 6.2: Concentration curves of the different data sets with 90 time steps
in C++. We assume a double detector gamma camera, which counts photons from
two opposing projection angles per time step and then rotates clockwise around two
degree. Each collimator consists of 85 detector bins, so we obtain 190 data points per
time step and projection angle. The resulting sinogram data of the two underlying
data sets are shown in figure 6.3.
6.2 Reconstruction Results
6.2.1 Reconstruction Without Regularization
In section 4.1, we outlined the necessity of a priori information to gain a suitable solution of the variational model. To verify the benefits of our assumptions and conditions
by the computational results, we want to have a short look a the results without a
6
Computational Results
(a) First data set
93
(b) Second data set
Figure 6.3: Sinogram data of the two data sets
priori information or the proposed regularization terms.
In this context we pointed out that reconstruction without any a-priori information
cannot lead to a practical solution, since we naturally only have access to one or two
projections per time step. To make this clear, we first minimized the Kullback-Leibler
divergence of the image sequence f with given data g and applied the unregularized
EM algorithm without using the basis representation. Obviously, the results cannot be
compared to the favoured solution which corelates with our knowledge about anatomical reality and the physical behaviour of a tracer substance in an organism. Figure 6.4,
6.5 respectively, (second row) shows the independently reconstructed images at certain
time steps.
In a second step, we applied the basis representation of f and tested a simple alternating EM algorithm as denoted in section 5.1.1 without including any regularization
term. The results are visualized in the third row of figure 6.4 and 6.5.
Compared to the exact solution both ways do not provide a satisfying result. Hence, in
the next step, we will show that the application of the proposed regularization terms
included in our variational model (4.14) will noticably improve the image quality with
regard to both detection of the subregional borders as well as the tracer concentration
curves.
6.2.2 Reconstruction With Regularization Via Algorithm 2
The algorithm presented in section 5.1 reconstructs both the borders of the subregions
and the underlying tracer concentrations. We assume the sinogram data shown in fig-
6
Computational Results
94
1
5
10
15
25
50
90
1
5
10
15
25
50
90
1
5
10
15
25
50
90
Figure 6.4: First data set: Exact solution (first row), solution of static SPECT reconstruction for each time step independently (second row) and reconstruction
with basis representation without regularization (third row)
1
5
10
15
25
50
90
1
5
10
15
25
50
90
1
5
10
15
25
50
90
Figure 6.5: Second data set: Exact solution (first row), solution of static SPECT reconstruction for each time step independently (second row) and reconstruction
with basis representation without regularization (third row)
6
Computational Results
95
ure 6.3, as well as the number of subregions (3 in the first and 4 in the second data
set), as known a priori. Furthermore, we included all regularization terms as described
in section 4.2.
To initialize the algorithm, we first created start values for U and C T . For this purpose, a static SPECT reconstruction was performed by neglecting the time component
in the given sinogram data and assuming every projection to result from the same
static image. Out of this time-independent reconstruction fstatic , we computed an initial guess U0 via a TV-L1 -regularized model with a Gaussian data term. To obtain an
initial C0T , we first performed static SPECT reconstruction on every time step image
independently (where only two projection angles for each reconstruction were used) to
receive f0 and reconstructed C0T out of f0 = U0 C0T via least squares.
We tested the method for both data sets and compared the results with the underlying
true concentration at every point of time. Figure 6.6 and 6.7 show the comparison between the exact and the reconstructed image after a certain number of time steps. To
test the behaviour of the method in the presence of noise, we added Poisson-distributed
noise to the sinogram data. Therefore, we first scaled the sinogram by a certain scaling
parameter, added Poisson noise and finally rescaled the image to the original range.
Figure 6.6 and 6.7 also show the results for different scaling parameters, which cause
an error in the sinogram data with different SNRs.
We see that the reconstruction via algorithm 2 works quite well for the simple data
set consisting of an inner and outer circle. Even with a higher level of noise the reconstructed subregions are almost correct, so is the tracer concentration in every region.
Especially the reconstructed images resulting from data with a small noise level (the
third row in figure 6.6) do not significantly distinguish from the reconstruction based
on the exact sinogram (the second row in figure 6.6), at least the difference is not
visible to an unaided eye. In both cases only the borders of the subregions cause slight
problems compared to the exact solution.
Although, in case of a very high noise level (the last row in figure 6.6), some limitations of the method become visible, the reconstructed images still look quite reasonable. With an increasing noise level, the exact detection of the subregional borders
becomes more difficult, hence the edges in the reconstructed images are more blurry.
Furthermore, the regions contain some ’false’ pixels, which are assigned to the wrong
subregion. Nevertheless, the main structure of the exact image is preserved, which
6
Computational Results
96
1
5
10
15
25
50
90
1
5
10
15
25
50
90
1
5
10
15
25
50
90
1
5
10
15
25
50
90
1
5
10
15
25
50
90
Figure 6.6: First data set: Comparison between exact images (first row), reconstructed
images with exact data (second row) and with noisy data: SNR 41.7494
(third row), SNR 21.8651 (fourth row), SNR 11.6340 (fifth row)
proves that the reconstruction method even works for highly noisy data.
In figure 6.7, one can see the comparison between the reconstructions of the second
data set, which differs from the first one mainly in the complexity of the subregional
borders. Even in this case, the reconstruction method performs very well. In case
of noise-free given data, the shape of every object, where especially the heart is of
higher interest, is clearly defined. As expected, we often observe errors in the edges of
each region and where two regions are directly connected (the heart and the upper left
circle). This causes the algorithm to incorrectly assign these pixels to another region.
Furthermore, the reconstruction difficulties increase with an increase in noise (cp. rows
6
Computational Results
97
1
3
10
20
30
50
90
1
3
10
20
30
50
90
1
3
10
20
30
50
90
1
3
10
20
30
50
90
1
3
10
20
30
50
90
Figure 6.7: Second data set: Comparison between exact images (first row), reconstructed images with exact data (second row) and with noisy data: SNR
46.6872 (third row), SNR 26.8069 (fourth row), SNR 16.7458 (fifth row)
three, four and five of figure 6.7). Some more pixels are assigned to the wrong region,
which leads to a small hole-like structure within the heart region and causes a highly
blurring effect especially in case of a high noise level (the last row in figure 6.7). At
this point we want to mention that the reconstruction of regions which do not touch
any of the others (except for the outer region with zero concentration) seems to be
more effective, since even with a very high SNR, the edges of the two smaller circles
are clearly defined in most time steps.
We also visualized the reconstructed concentration curves for exact and noisy data
6
Computational Results
98
(see figure 6.8). Here, the effect of the added noise becomes significantly more visible.
What immediately attracts attention is that the concentration curves with noisy data
seem more blurry and less smooth than the exact and also the reconstructed ones based
on exact data. Nevertheless, the overall behaviour of the concentration curves generally
matches the underlying true functions. It is also noticable that the peak in the highest
concentration curve is heavily overestimated, especially in case of the second data set.
At this point we have to keep in mind that the reconstructed indicator functions are
allowed to contain values in between 0 and 1, which means that at some points the
value for one pixel can be small and therefore causes a compensation with the value
of the concentration at one timestep. We denote this observation as the compensation
effect. It becomes visible in the resulting image sequence, we can see that, for example,
the concentration in the heart-shaped region of the second data set does not become
significantly higher than the true one, although the peak of the reconstructed concentration curve is much higher. In case of the second data set, the difference between the
two reconstructed versions (either with exact or with noisy data) is smaller due to the
lower signal-to-noise ratio of the noisy data (cp. table 6.1 and 6.2).
(a) First data set
(b) Second data set
Figure 6.8: Concentration curves of the different data sets with 90 time step
Error Measures
To compare the results of the reconstruction via algorithm 2 with the true image data,
we need to have a look at the full three-dimensional image matrix as well as the underlying regions and concentration curves.
Since the reconstructed indicator functions do not only consist of zero or one entries,
we compared the values of each indicator function in every pixel and set the highest
6
Computational Results
99
value to one. Therefore, a pixel is assumed to belong to one region, if the value of the
indicator function of this region is highest of all other indicator functions. Then we
computed the error by determining the number of pixels which are assigned correctly
divided by the total number of image pixels to obtain the percentage of incorrectly assigned pixels. Furthermore, we compared the subregion concentration curves as well as
the whole three-dimensional (two space dimensions plus time) image by computing the
signal-to-noise ratio of the exact and the reconstructed image with or without noise.
Table 6.1 and 6.1 show the quantified results of the experiments.
Exact
error(U)
SNR(C)
SNR(f)
Noisy 1
SNR= 41.7494
0.0049
0.0049
16.9667
17.3475
18.2950
17.9783
Noisy 2
SNR= 21.8651
0.0083
17.7693
15.4435
Noisy 3
SNR= 11.6340
0.1479
3.8284
10.8089
Table 6.1: Error measures of the first data set with algorithm 2
Exact
error(U)
SNR(C)
SNR(f)
Noisy 1
SNR= 46.6872
0.0913
0.0796
5.8281
6.1968
17.8015
15.7741
Noisy 2
SNR= 26.8069
0.0356
7.7954
14.5378
Noisy 3
SNR= 16.7458
0.4146
-2.7950
11.9395
Table 6.2: Error measures of the second data set with algorithm 2
The value error(U ) describes the percentage of falsely assigned pixels (after the reconstructed indicator functions were set to zero or one in every pixel). We see that in
every case, this percentage is relatively low, especially when a low noise level is added
to the data, the number of incorrect pixels does not change at all. Nevertheless, this
number is several times higher in case of the highest noise level (0.49% compared to
14.79%). Note here that we artificially enforced the zero-one structure of the reconstructed indicator functions, which naturally contain values in between zero and one as
well. This automatically leads to another visible effect in the measurements, namely
that the SNR of the tracer concentrations in the subregions might increase with higher
noise level, but still, the SNR of f slightly worsens due to the compensation between
the value in U and C. Therefore, to compare the final results of the method, due
to the compensation effect it is most reasonable to compare the whole reconstructed
image f . One can see that the SNR of f significantly decreases when higher noise is
added. While a lower noise level hardly influences the error (as we already pointed out
by regarding the reconstructed images), the effect of the highest noise level becomes
6
Computational Results
100
obvious in the resulting SNR.
In the errors of the second data set again the above described compensation effect
is visualized: Comparing the values resulting from exact data with those from data
with a low noise level (first and second column of table 6.2), one can see that both the
error in the indicator functions as well as the error in the subregional concentration
curves increases, but nevertheless, the SNR of the image sequence f worsens. Furthermore, the difficulties of detecting the more complicated edges within the second data
set become visible, resulting in a very high percentage of falsely assigned pixels in case
of a high noise level (41.46%). Still, the quantified results of the proposed method look
quite promising.
6.2.3 Reconstruction With Regularization Via Algorithm 5
To compare the effectiveness and suitability of the proposed EM-type method, we also
tested the second algorithm using the same data sets.
The performance of the second algorithm is quite promising as well, although there are
some limitations, especially within the second data set. The subregions of the first data
set are reconstructed preserving the original structure, also the tracer concentrations
match the underlying true ones. Some problems occuring especially in this method are,
on the one hand side, a blurring effect which appears in the structure of the subregions
and looks like an additional ’stripe’ in the image. A natural explanation could be that,
since at the projection angle where the stripe seems to come from, the tracer concentration in the middle region is high, the algorithm assumes that the concentration
along this projection line is higher within the whole image. On the other hand side,
the reconstructed tracer concentrations in the subregions seem to become smaller when
higher noise is added, since the color intensity slightly decreases. This observation is
confirmed by the comparison of the subregional concentrations (figure 6.11): We see
that the peak of the concentration curves with middle or high level noise is significantly
lower than the true one. Here, the compensation effect does not completely balance
the values of the indicator function and tracer concentration.
As previously mentioned, some drawbacks of the method can be observed when viewing
the results of the algorithm applied to the second data set. Although the subregions
are more or less clearly identified, there are many artifacts which already appear in the
reconstruction without noise and become more obvious with a higher noise level.
6
Computational Results
101
1
5
10
15
25
50
90
1
5
10
15
25
50
90
1
5
10
15
25
50
90
1
5
10
15
25
50
90
1
5
10
15
25
50
90
Figure 6.9: First data set: Comparison between exact images (first row), reconstructed
images with exact data (second row) and with noisy data: SNR 41.7494
(third row), SNR 21.8651 (fourth row), SNR 11.6340 (fifth row)
As before we can see that the reconstructed concentrations more or less match the
true curves. Furthermore, the higher peak in the tracer concentration of inner circle,
respectively the heart, is still present. Figure 6.11 shows a comparison of the exact
and reconstructed functions.
6
Computational Results
102
1
3
10
20
30
50
90
1
3
10
20
30
50
90
1
3
10
20
30
50
90
1
3
10
20
30
50
90
1
3
10
20
30
50
90
Figure 6.10: Second data set: Comparison between exact images (first row), reconstructed images with exact data (second row) and with noisy data: SNR
46.6872 (third row), SNR 26.8069 (fourth row), SNR 16.7458 (fifth row)
Error Measures
Like in case of the first method described in the previous section, we also tried to
quantify the results using algorithm 5. The errors are displayed in table 6.3 and 6.4.
6
Computational Results
103
(a) First data set
(b) Second data set
Figure 6.11: Concentration curves of the different data sets with 90 time step
error(U)
SNR(C)
SNR(f)
Exact
Noisy 1
SNR= 41.7494
Noisy 2
SNR= 21.8651
Noisy 3
SNR= 11.6340
0.0278
2.8113
14.9491
0.0278
2.7246
14.9380
0.0244
8.7975
12.5798
0.0752
7.3356
8.2739
Table 6.3: Error measures of the first data set with algorithm 5
error(U)
SNR(C)
SNR(f)
Exact
Noisy 1
SNR= 46.6872
Noisy 2
SNR= 26.8069
Noisy 3
SNR= 16.7458
0.0615
3.1283
14.5378
0.0669
3.2462
13.7105
0.1152
4.0798
14.1849
0.1846
2.9408
11.2836
Table 6.4: Error measures of the second data set with algorithm 5
Here, we obtain similar results to the ones before: The assignment of pixels to the
correct subregion becomes more and more difficult, which leads to an increasing value
of error(U ). While the SNR of the reconstructed subregional concentrations alternates,
the value of SN R(f ) mostly decreases due to the higher noise level.
6
Computational Results
104
6.3 Comparison of the Methods
In this section, we want to discuss and compare the results from the two proposed
methods in order to verify their effectiveness. The reconstructed image sequences as
well as the error tables serve as a basis for the following. We see that algorithm 2 is
the method of choice for the problem we deal with in this thesis, however algorithm 5
produced suitable results as well.
From the comparison of the image sequences given in figure 6.6, 6.7, 6.9 and 6.10,
we conclude that the visible quality of the reconstruction seems to be better in case
of the first method. Especially with noise-free data or data with low or medium noise
level, the method produces less artifacts and preserves the edges of the underlying true
subregions very well.
By comparing the percentage of incorrectly assigned pixels of both methods, it attracts attention that the range from noise-free to extremely noisy data is notably lower
in case of the second method. Whereas the values for algorithm 2 change from 0.49%
to 14.79% for the first and from 9.13% to critical 41.46% for the second data set, the
values for algorithm 5 lie closer to each other (from 2.78% to 7.52% for the first set and
from 6.15% to 18.46% for the second set). Hence, the reconstruction of the indicator
functions seems to be less susceptible to noise.
If we examine the different SNRs of the whole image sequence f , the forward-backward
EM-type method performs better in every case. For the same number of iterations and
for each reconstructed set, the algorithm provides a higher SNR compared to the corresponding reconstruction via algorithm 5. Also the SNR of the concentration in the
subregions mostly improves within the first method.
All in all we can conclude that the overall performance of the EM-type method proposed in section 5.1 leads to promising results and is the preferred algorithm for this
variational model.
105
7 Conclusion and Outlook
In this thesis we presented a new simultaneous reconstruction approach in dynamic
SPECT imaging, derived and implemented a suitable variational model and presented
promising results on artificial data. In conclusion, we are able to state that the aim of
providing an appropriate reconstruction scheme could be achieved.
After having introduced the basic tools to provide a detailed understanding of the
main parts of this work, we have presented a decomposition approach of the timedependent tracer concentration. Therefore, we have made the assumption that the
region of interest can be separated into fixed subregions with spacially constant subconcentration curves. In this context, we have derived a constrained variational model,
which aims at reconstructing both the indicator functions of the subregions as well
as the time concentration curves. Therefore, we have avoided the need of certain basis functions for the reconstruction, which is a common approach in dynamic PET or
SPECT imaging. After establishing the optimality and source condition for the model,
we have shown that a minimizer of the variational model exists.
In order to find a minimizer of the proposed model, we presented two different algorithm frameworks, which simultaneously update the two variables in a reasonable time
period. The first method, which consists of an alternating expectation-maximization
updating structure, bears some arising subproblems, which are in turn solved via a
primal-dual method. The second algorithm is especially suitable for constrained minimization and was applied to the updating procedure of the indicator functions of the
subregions.
Both methods yield promising results, which we have demonstrated on both exact
and noisy artificial data. Therefore, we created different data sets which satisfy the
reconstruction assumptions and tested the proposed algorithms on different noise levels. We have seen that, even in the presence of a very low signal-to-noise ratio, all
the results were quite plausible and the reconstructed image sequences significantly
7
Conclusion and Outlook
106
match the exact ones. Especially the first method provides outstanding results. The
reconstructions prove that the choice of the regularization methods as well as the reconstruction approach is reasonable.
In the following we want to outline some limitations and open questions arising from
this thesis that could be addressed in future work:
• Due to the usage of the Kronecker product to simplify the implementation, this
work has been limited to tests on low resolution data (i.e. image sequences of
64 × 64 pixels and a temporal resolution of 90 time steps). One enhancement to
this work could be to test the behaviour of the proposed method on high resolution data. This would especially enable experiments with real clinical dynamic
SPECT data, which is the main goal of all works concerning medical imaging.
• Another drawback we have not yet addressed in this work is the strong dependence on the parameters. Since we have chosen a variational framework which
contains at least, as in the constrained version, three regularization weights. Furthermore, some of the proposed algorithms require additional proximal parameters, which have to be chosen with respect to certain properties of the data and
regularization functionals. A future task could be to discuss the parameter choice
in detail and maybe to improve the model by eliminating some of the variables.
• In order to make the approach applicable to real data, one also has to face the
problem of extending the idea to three-dimensional image sequences. Therefore,
a slightly different implementation of the forward operator, i.e. the Radon transform, is necessary. Furthermore, this would significantly enlarge the number of
pixels in each frame, which automatically leads to the problem we have addressed
in the first point.
107
List of Figures
2.1
2.2
2.3
2.4
2.5
3.1
3.2
4.1
4.2
6.1
6.2
6.3
6.4
6.5
6.6
6.7
A typical double detector SPECT scanner [1] . . . . . . . . . . . . . . .
3D SPECT reconstruction of the brain: Transaxial slices (first row),
coronal slices (second row), sagital slices (third row) [41] . . . . . . . .
A circular PET camera [41] . . . . . . . . . . . . . . . . . . . . . . . .
The collimator principle in SPECT imaging [41] . . . . . . . . . . . . .
Visualization of a two-dimensional object and the projection process [41]
4
5
6
9
A constant and a Poisson noise-corrupted signal do not have significantly
different integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Comparison of the effect of different regularizers for denoising [14] . . .
22
26
Compartmental model including blood flow [41] . . . . . . . . . . . . .
The typical shape of the concentration curves of the radiotracer in different types of tissue . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Image phantoms of different data sets . . . . . . . . . . . . . . . . . . .
Concentration curves of the different data sets with 90 time steps . . .
Sinogram data of the two data sets . . . . . . . . . . . . . . . . . . . .
First data set: Exact solution (first row), solution of static SPECT
reconstruction for each time step independently (second row) and reconstruction with basis representation without regularization (third row)
Second data set: Exact solution (first row), solution of static SPECT
reconstruction for each time step independently (second row) and reconstruction with basis representation without regularization (third row) .
First data set: Comparison between exact images (first row), reconstructed images with exact data (second row) and with noisy data: SNR
41.7494 (third row), SNR 21.8651 (fourth row), SNR 11.6340 (fifth row)
Second data set: Comparison between exact images (first row), reconstructed images with exact data (second row) and with noisy data: SNR
46.6872 (third row), SNR 26.8069 (fourth row), SNR 16.7458 (fifth row)
4
46
48
92
92
93
94
94
96
97
List of Figures
6.8
6.9
108
Concentration curves of the different data sets with 90 time step . . . . 98
First data set: Comparison between exact images (first row), reconstructed images with exact data (second row) and with noisy data: SNR
41.7494 (third row), SNR 21.8651 (fourth row), SNR 11.6340 (fifth row) 101
6.10 Second data set: Comparison between exact images (first row), reconstructed images with exact data (second row) and with noisy data: SNR
46.6872 (third row), SNR 26.8069 (fourth row), SNR 16.7458 (fifth row) 102
6.11 Concentration curves of the different data sets with 90 time step . . . . 103
109
List of Tables
6.1
6.2
6.3
6.4
Error
Error
Error
Error
measures
measures
measures
measures
of
of
of
of
the
the
the
the
first data set with algorithm 2
second data set with algorithm
first data set with algorithm 5
second data set with algorithm
.
2
.
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 99
. 99
. 103
. 103
110
Bibliography
[1] SPECT gamma camera, http://img.medicalexpo.fr/images_me/photo-g/
tomographie-gamma-cameras-spect-70717-3803911.jpg, date: 2014-07-17,
2:35 pm. 4, 107
[2] R. Acar and C. R. Vogel. Analysis of bounded variation penalty methods for
ill-posed problems. Inverse Problems, 10:1217–1229, 1994.
[3] R. A. Adams and J. J. F. Fournier. Sobolev Spaces. Academic Press, Oxford: 2003
(Second Edition). 66
[4] H. Ahmadzadehfar and H.-J. Biersack, editors. Clinical Applications of SPECTCT. Springer, Berlin: 2014. 8
[5] L. Ambrosio, N. Fusco, and D. Pallara. Functions of Bounded Variation and Free
Discontinuity Problems. Clarendon Press, Oxford: 2000. 36
[6] E. Asma and R. M. Leahy. Mean and covariance properties of dynamic PET
reconstructions from list-mode data. IEEE transactions on medical imaging, 25
(1):42–54, 2006. 45
[7] R. C. Aster, B. Borchers, and C. H. Thurber. Parameter Estimation and Inverse
Problems. Elsevier Academic Press, Oxford: 2005 (Second Edition). 16
[8] M. Benning. Singular Regularization of Inverse Problems: Bregman Distances
and their Applications to Variational Frameworks with Singular Regularization
Energies. PhD thesis, Westfälische Wilhelms-Universität Münster, 2011. 37, 39
[9] M. Benning and M. Burger. Error estimates for general fidelities. Electronic
Transactions on Numerical Analysis, 38:44–68, 2011. 38, 39
[10] M. Benning, P. Heins, and M. Burger. A solver for dynamic PET reconstructions
based on forward-backward-splitting. Article, http://wwwmath.uni-muenster.
Bibliography
de/num/publications/2010/BHB10/ICNAAM2010.pdf, date:
am. 45
111
2014-9-19, 11:33
[11] P. Blake, B. Johnson, and J. VanMeter. Positron emission tomography (PET)
and single photon emission computed tomography (SPECT): Clinical applications.
Journal of Neuro-Ophthalmology, 23:34–41, 2003. 7, 8
[12] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,
Cambridge: 2009. 35
[13] C. Brune. 4D Imaging in Tomography and Optical Nanoscopy. PhD thesis,
Westfälische Wilhelms-Universität Münster, 2010. 22, 24, 28, 29, 31, 32
[14] M. Burger, A. C. G. Mennucci, S. Osher, and M. Rumpf. Level Set and PDE
Based Reconstruction and Methods in Imaging. Springer, Cetraro: 2008. 13, 17,
26, 28, 33, 56, 65, 69, 76, 77, 80, 107
[15] M. Burger and S. Osher. Convergence rates of convex variational regularization.
Inverse Problems, 20:1411–1421, 2004. 37, 38, 41
[16] M. Burger, E. Resmerita, and L. He. Error estimation for Bregman iterations and
inverse scale space methods in image restoration. Computing, 81:109–135, 2007.
[17] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision,
40(1):120–145, 2010. 76, 82, 85
[18] P. Chen, J. Huang, and X. Zhang. A primal-dual fixed point algorithm for convex
separable minimization with applications to image restoration. Inverse Problems,
29:025011, 2013. 88, 89
[19] P. Combettes and V. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation, 4(4):1168–1200, 2005. 84, 88
[20] L. C. Evans. Partial Differential Equations. Oxford University Press, Oxford:
1998 (Second Edition). 36
[21] P. Heins. Sparse model-based reconstruction in dynamic positron emission tomography. Diploma thesis, Westfälische Wilhelms-Universität Münster, 2011. 5, 7, 16,
17, 24, 26, 45, 78
[22] R. H. Huesmann, B. W. Reutter, G. L. Zeng, and G. T. Gullberg. Kinetic pa-
Bibliography
112
rameter estimation from SPECT cone-beam projection measurements. Physics in
Medicine and Biology, 43 (4):973–982, 1998. 45
[23] C. Li, D. Fang, G. López, and M. López. Stable and total Fenchel duality for
convex optimization problems in locally convex spaces. SIAM Journal on Optimization, 20(2):1032–1051, 2009. 31
[24] J. Maeght, D. Noll, and A. Celler. Methods for dynamic SPECT tomography. Article, http://www.math.univ-toulouse.fr/~noll/PAPERS/methodsfordynamic.
pdf, date: 2014-9-19, 11:36 am. 10, 11, 45
[25] G. Mariani, L. Bruselli, T. Kuwert, E. Kim, A. Flotats, O. Israel, M. Dondi, and
N. Watanabe. A review on the clinical uses of SPECT/CT. European Journal of
Nuclear Medicine and Molecular Imaging, 37:1959–1985, 2010. 8
[26] S. L. Pimlott and A. Sutherland. Molecular tracers for the PET and SPECT
imaging of disease. Chem. Soc. Rev., 40:149–162, 2011. 7
[27] A. Rahmim and H. Zaidi. PET versus SPECT: Strengths, limitations and challenges. Nuclear Medicine Communications, 29:193–207, 2008. 6
[28] A. J. Reader, F. C. Sureau, C. Comtat, R. Trébossen, and I. Buvat. Joint estimation of dynamic PET iimage and temporal basis function using fully 4D ML-EM.
Physics in Medicine and Biology, 51:5455–5474, 2006. 45
[29] E. Resmerita and R. S. Anderssen. Joint additive Kullback-Leibler residual minimization and regularization for linear inverse problems. Mathematical Biosciences
methods in the applied Sciences, 30:1527–1544, 2007. 68
[30] E. Resmerita and O. Scherzer. Error estimates for non-quadratic regularization
and the relation to enhancement. Inverse Problems, 22:801–814, 2006. 39, 41
[31] R. T. Rockafellar. Convex analysis. Princeton University Press, Princton: 1970.
31, 61
[32] C. Rossmanith. SPECT-Dämpfungskorrektur ohne zusätzliche Messungen. Bachelor thesis, Westfälische Wilhelms-Universität Münster, 2011. 10
[33] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal
algorithms. Physica D, 60:259–268, 1992. 24
[34] W. Rudin. Functional Analysis. McGraw-Hill Inc., Singapore: 1991 (Second
Edition). 30
Bibliography
113
[35] O. Scherzer, O. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen. Variational
methods in imaging. Springer, New York: 2009. 12, 13, 16, 17, 22, 23, 32
[36] D. Trede. Inverse problems with sparsity constraints: Convergence rates and exact
recovery. Logos, Berlin: 2010. 25, 27
[37] A. Unterreiter, A. Arnold, P. Markowich, and G. Toscani. On generalized CsiszárKullback inequalities. Monatshefte für Mathematik, 131:235–253, 2000. 43
[38] Z. Wang, E. P. Simoncelli, and A. C. Bovik. Multi-scale structural similarity for
image quality assessment. In Proceedings of the 37th IEEE Asilomar Conference
on Signals, Systems and Computers, Pacific Grove, CA, Nov. 9-12, 2003, https:
//ece.uwaterloo.ca/~z70wang/publications/msssim.pdf, date: 2014-09-22,
5:26 pm. 15
[39] F. Wübbeling and F. Natterer. Mathematical Methods in image reconstruction.
SIAM, Philadelphia: 2001. 8, 16
[40] D. Werner. Funktionalanalysis. Springer, Berlin: 2010. 70, 71
[41] M. Wernick and J. Aarsvold. Emission Tomography - The Fundamentals of PET
and SPECT. Elsevier Academic Press, Oxford: 2004. 3, 4, 5, 6, 8, 9, 45, 46, 47,
107