University of Groningen Wavelet-based methods for the

University of Groningen
Wavelet-based methods for the analysis of fMRI time series
Wink, Alle Meije
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to
cite from it. Please check the document version below.
Document Version
Publisher's PDF, also known as Version of record
Publication date:
2004
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):
Wink, A. M. (2004). Wavelet-based methods for the analysis of fMRI time series Groningen: s.n.
Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the
author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the
number of authors shown on this cover page is limited to 10 maximum.
Download date: 18-06-2017
Denoising fMRI time series
Chapter 1
Introduction
The field of functional imaging uses medical imaging modalities, such as magnetic resonance imaging (MRI), see Fig. 1.1, and positron emission tomography (PET), to visualise physiological processes. Applications range from PET imaging of tumours, ultrasound imaging of coronary blood flow and magnetic resonance angiography (MRA),
to tracking metabolic activity of proteins, hormones and neurotransmitters. In functional neuroimaging, these modalities are used to visualise brain function (see Fig. 1.2).
Examples are the visualisation of neurodegenerative diseases with PET/MRI, electroencephalography (EEG) and magnetoencephalography (MEG) measurement of brain
action potentials, and visualisation of brain regions activated during the processing of
(visual, auditory, tactile) stimuli or during the execution of a specific task.
(a)
(b)
(c)
Figure 1.1. T1 -weighted MR image of the head. (a) Transverse slice, (b) sagittal slice,
(c) coronal slice.
Since its introduction in the early 1990s, functional magnetic resonance imaging
(fMRI) has become the most influential modality for functional neuroimaging. Because
of its flexibility, fMRI supports a very broad range of experiments, such as: localisation
of brain regions where pain stimuli are projected, activation of different brain regions
for listening to or reading correct words and non-words, such as ‘neuron’ vs. ‘noreun’,
localisation of brain regions activated while listening to sounds at different frequencies, or finding the centres in the brain responsible for dealing with emotions. While
2
the first fMRI experiments were very simple and straightforward (measuring the difference between active periods and rest periods, without modelling temporal behaviour),
today’s experimental setups are much more complex. Consider a visual fMRI experiment of an object recognition task. The object is only vaguely visible, and as the scanning starts, the image quality increases. As soon as the test person recognises the object,
he or she presses a button. The time of the button press is recorded, and the scanning
continues while the image keeps improving until the end of the run. Such an experiment, which usually involves multiple time series, uses different analysis methods. The
moment of recognition and the button press are events, while seeing only noise vs. recognising the object are states. A combined analysis of both the recognition event and
the state difference gives very detailed information about how and where the images
are processed inside the brain.
Figure 1.2. Surface rendering of the brain, with motor cortex activation displayed
on the brain surface using normal fusion, and on the orthographic planes using direct
volume rendering (thanks to Michel Westenberg for composing this picture).
Because of their size and complexity, fMRI data sets require powerful analysis methods. This thesis only treats the analysis of single-subject (first-level) experiments. Analyses of group studies (second-level experiments) require further statistical analysis.
The rest of this chapter is organised as follows. Section 1.1 introduces MR techniques relevant to fMRI and discusses fMRI analysis methods relevant to the rest of the
thesis. Section 1.2 introduces the concept of a wavelet transform and demonstrates the
algorithms to compute the wavelet transforms used in this thesis. Section 1.3 gives an
overview of the thesis and discusses its main contributions.
Introduction
1.1
3
MRI and fMRI
This section introduces MRI in general and fMRI in particular. Some concepts in MR
physics are briefly discussed. The aim of this section is not to give a complete overview
of the MR image formation process, but to treat only the concepts relevant to fMRI. A
short overview of fMRI is given, and topics that are discussed in the next chapters are
highlighted.
1.1.1
MRI physics
Magnetic resonance imaging (MRI) is based on the fact that atoms with an odd atomic
mass or an odd atomic charge have an angular momentum, called spin. Each atom has
a magnetic moment with the axis parallel to the spin, inducing a small magnetic field
(see Fig. 1.3a). Without the presence of an external magnetic field, the axes of a group
of nuclei are randomly oriented. In a static magnetic field, the nuclei assume preferred
orientations, either parallel or anti-parallel to the static field. For atoms whose spin
number I has the form n+1/2, n∈N (these are the atoms with an odd atomic mass and an
odd atomic charge) this alignment is not static: each atom experiences a torque, causing
its magnetic axis to precess around the axis of the external field. This precession is called
Larmor precession (Aine 1995, Birn et al. 1999), see Fig. 1.3b. The frequency of precession
depends on the properties of the nucleus (described by the gyromagnetic ratio) and the
strength of the magnetic field, and is called the Larmor frequency.
(a)
(b)
Figure 1.3. (a) Rapidly spinning nuclei possessing a magnetic moment induce a magnetic field (with direction µ), so that they act as tiny bar magnets. (b) Nuclei do not
align with a static magnetic field, but they precess about the axis of the field. From
“Basic principles of MR imaging” (1996), Courtesy Philips Medical Systems.
4
1.1 MRI and fMRI
The hydrogen nucleus (a proton) has spin I=1/2, is abundant in the human body,
and has a high magnetic moment. An MR scanner contains of a number of magnetic
coils generating magnetic fields with various orientations and strengths (see Fig. 1.4).
A static field B0 is generated by a superconducting coil. If a person is placed inside an
MR scanner, the protons inside the body will precess around B0 in two states: either
aligning partly with B0 (see Fig. 1.3), or partly against B0 , with a tiny preference for the
first: these atoms are in a lower-energy state than those partly aligned against B0 . This
results in a net magnetisation B0 . The magnitude of this net magnetisation increases
with the strength of the magnetic field, and decreases with temperature.
The net magnetisation is changed by a radio-frequency radio-frequency (RF) pulse,
a short burst of electromagnetic energy. The direction of the RF field is orthogonal to
that of B0 , and it rotates around B0 with the Larmor frequency that corresponds to the
energy difference between the two preferred orientations of the nuclei. During the RF
pulse, the direction of M rotates around the axis of the RF field. In the rotating frame
of reference where the direction of the RF field is fixed, M has two components: Mz ,
the longitudinal component of M parallel to B0 , and Mxy , the transverse component
of M in the plane orthogonal to B0 . In the equilibrium state, Mz is maximal and Mxy
is minimal (zero). Immediately after the RF pulse, Mz is small and Mxy is large. The
return to the equilibrium state that follows the RF pulse is called relaxation. Relaxation
consists of two components: longitudinal relaxation (characterised by T1 ), and transverse relaxation (characterised by T2 ). Longitudinal relaxation is the recovery of the Mz
component of the net magnetisation, and T1 denotes the time in which Mz regains 63%
of its equilibrium value. T1 is typically short in environments where molecules can efficiently transfer energy. A stronger magnetic field results in a longer T1 . Transverse
relaxation is the decay of Mxy , and T2 denotes the time in which Mxy decreases to 37%
of its maximum value. Transverse relaxation is the exchange of energy between nuclei.
Short T2 values occur in solids and large molecules, long T2 values occur in fluids. T2
is independent of the magnetic field strength. Local field inhomogeneities greatly accelerate the decay of Mxy . The decay time T2∗ , characterises the decay resulting from
transverse relaxation in combination with field inhomogeneities, and is shorter than T2 .
The spin-echo method, which can be used to measure T2 , applies a second RF pulse at
some time after the first. The effect of this pulse is that the spins refocus. This results in
an echo signal at time TE (echo time). The amplitude of the echo signal depends on T2 .
1.1.2
MR imaging
The formation of an image using these principles consists of a number of steps. Gradient
fields are provided by extra coils. If a relatively small gradient in field strength along
the longitudinal (z) axis is added to the B0 field (see Fig. 1.4), every transverse slice has
its own Larmor frequency. There is a difference in the spacing of slices (the distance
between the centres of the slices along the z-axis) and the slice thickness. Spacing is
controlled by the z-gradient and the central frequencies of the RF pulses, thickness is
Introduction
5
Figure 1.4. Coil arrangement in an MR scanner. The direction of the gradients can be
found via the right-hand rule. If the fingers of your right hand point in the direction
of the current in the coil (light arrows), the thumb points in the direction of the field.
In the case of the z-gradient coil, the field at one end of the B0 coil is parallel to B0 , at
the other end is anti-parallel to B0 , creating a gradient in the z-direction. From “Basic
principles of MR imaging” (1996), Courtesy Philips Medical Systems.
controlled by the shape and the bandwidth of each RF pulse. The (x, y)-position in a
slice is determined by gradient fields orthogonal to B0 (see Fig. 1.4). With a gradient
field in the x-direction, the Fourier transform of the detected signal is a projection onto
the x-axis. The amplitude of each frequency along the x-axis is the sum of amplitudes
measured along the y-axis at that frequency. The y-position is often determined by
applying a phase-encoding gradient in the y-direction. The x-gradient is also used as
the readout gradient, i.e., the field gradient that is active when the signal is received.
After termination of the RF pulse, M returns to its original (equilibrium) direction.
After applying the z-gradient, an image of a 2D slice is measured in the so-called kspace, with frequencies encoded along the x-dimension and phases encoded along the
y-dimension. Two-dimensional (2D) Fourier techniques are used to reconstruct an image of the slice. Using the field gradient parallel to B0 for slice selective excitation (z)
and using a field gradient orthogonal to B0 for frequency encoding (x) and phase encoding (y), a 2D image can be computed. By making multiple slices for different values
of z, a three-dimensional (3D) image is obtained (see Fig. 1.5).
The sequence of excitation pulses and readouts determines the image contrast. A
sequence is determined by a number of scanner parameters, such as the repetition time
6
1.1 MRI and fMRI
Figure 1.5. The x-, y-, and z-dimensions of medical images.
TR, and the echo time TE. A short TR and a short TE yield a T1 -weighted image, a long
TR and a long TE yield a T2∗ -weighted image. Sequences differ in speed and signal-tonoise ratio: some sequences have great contrast but require minutes to complete, others
have low contrast but can be completed in less than a second. Echo planar imaging
(EPI), discussed in the next subsection, is a fast sequence that is often used in functional
MRI experiments.
1.1.3
Functional MRI
Functional magnetic resonance imaging (fMRI) is one of the most versatile methods
to study human brain function. The discovery that the changes in the concentration
of oxyhaemoglobin induced by local brain activity are measurable in an MRI scanner,
was made in the early 1990s by a number of independent groups (Ogawa et al. 1990,
Belliveau et al. 1991, Bandettini et al. 1992).
The transporter of oxygen in the blood is haemoglobin, a protein that binds oxygen.
Haemoglobin carrying oxygen is called oxyhaemoglobin, and haemoglobin not carrying
oxygen is called deoxyhaemoglobin. Oxyhaemoglobin is diamagnetic and deoxyhaemoglobin is paramagnetic. A paramagnetic substance becomes magnetised in the presence
of a magnetic field, causing field inhomogeneities and a faster T2∗ relaxation, which in
turn causes a local decrease in signal intensity.
The signal in fMRI, called the blood oxygenation level dependent (BOLD) contrast,
shows an increase during increased local activity. This is explained as follows. As local
activity increases, so does the local oxygen consumption, demanding an increase in
oxyhaemoglobin concentration. Oxygen is delivered to the tissue by passive diffusion
(i.e., not mediated by a transport mechanism) through the blood vessel walls. To increase this diffusion by the same amount as the extra consumption, the local supply
Introduction
7
of oxygenated blood needs to increase much more, leading to a relative increase in the
oxyhaemoglobin level and decrease in the deoxyhaemoglobin level. The result is an increase in signal intensity (Birn et al. 1999). In T2∗ -weighted images, this BOLD contrast
can be measured.
arteriole
oxyhaemoglobin
deoxyhaemoglobin
venule
capillaries
blood flow
(a)
arteriole
oxyhaemoglobin
deoxyhaemoglobin
venule
capillaries
blood flow
(b)
Figure 1.6. (a) Regional blood flow in the brain during rest. (b) Local activity increases
the regional blood flow and the local concentration of oxyhaemoglobin.
Functional imaging requires fast acquisition of MRI images to obtain time series with
a high temporal resolution. Echo planar imaging (EPI) is a common MR sequence for
fMRI. Single-shot EPI scans a whole slice with one RF excitation by alternating the direction of the x-gradient after reading each line in k-space, using a zig-zag pattern. Typical
scanning times for 3D images are within three seconds. Howseman and Bowtell (1999)
give an overview of the BOLD contrast and the required MRI techniques. All functional
MR images used in this thesis are EPI scans.
1.1.4
The analysis of fMRI data
This subsection describes the analysis process of a single-subject fMRI data set. Consider an fMRI experiment in which the subject has to perform a very short short task,
once every 30 seconds. The analysis of this event-related experiment (a short task may
be considered as a single event) takes a number of steps.
First, the images of the time series must be aligned, i.e., their voxel coordinates must
be relative to the same coordinate system. Scans of the head are relatively easy to align,
compared to images of soft tissues. The head does not deform: it is a rigid body. Realignment only requires a rigid transformation for each image, e.g. by aligning every
image to the first image of the time series. To compare time series of multiple subjects,
it is necessary to map one person’s (anatomical) coordinate system to another’s. In this
case, the images must be spatially normalised. Contrary to realignment, normalisation
usually requires nonlinear transformations (Ashburner and Friston 1997, 1999).
The next step is noise suppression. One of the traditional preprocessing steps in
fMRI is to smooth each image, which is done by convolving the image with a lowpass
filter kernel. The effect of smoothing is that the highest spatial frequencies in an image
are suppressed. High-frequency signals are assumed to be mainly noise and according
to certain measures, smoothing reduces noise. The problem with distinguishing noise
8
1.1 MRI and fMRI
from true signal is that for fMRI, the ground truth, the noise-free image, cannot be measured. In (experimental) situations where the ground truth is available, smoothing is
shown to seriously degrade the images while suppressing noise (see chapter 3).
After denoising, the images are ready for the statistical analysis. Different effects
measured in the time series can be predicted and modelled as time signals. These effects
may be experiment-related (e.g., the timing of the stimuli, or the description of different
states), or concern unwanted effects (e.g., signal changes due to head movements, or
signals originating from heart beat or blood flow). The residual noise in the analysis is
the unmodelled part of the signal, and the more effects can be modelled, the smaller the
residuals. During the statistical analysis, the role that each of the modelled effects plays
in the total signal at each location in the image is represented by a weighting factor.
Active regions are those regions where the weighting factors of the effects of interest are
significantly high.
The performance of this analysis method depends heavily on the quality of the data
after preprocessing and the precision of the model, which are discussed in the next
subsection.
1.1.5
Models and methods in fMRI
The first fMRI experiments were aimed at detecting a BOLD signal. The experiment was
usually a block design, which alternates between periods of rest and periods of stimulus
presentation. During periods of rest, the subject lay still in the scanner, and during
periods of stimulus presentation either a sound was played, a strong-contrast image
(like a black-and-white checkerboard) was shown, or the test person had to perform
a task. Images scanned during rest periods were averaged together into a mean rest
image, and images scanned during activity were averaged into a mean active image.
The difference between those images represented the BOLD contrast. If different active
states were used, a number of mean images were combined to produce the contrast
image.
Active regions are the parts of the contrast image that have significantly high values.
The term significance is usually defined in a statistical context, by assuming a distribution of the noise in the contrast image. The BOLD images are thresholded at a quantile
of the distribution, guaranteeing that the percentage of false positives does not exceed
a given level.
A more general analysis is possible within the framework of the general linear model
(GLM), which treats the BOLD responses as the output of a linear, time-invariant (LTI)
system. Linear here indicates that the total response to multiple stimuli is the sum of
the responses to all stimuli individually. Time invariance implies that the response does
not change between stimulus times. A result of the GLM is that the fMRI response to
a stimulus pattern can be modelled by convolving the stimulus pattern with the corresponding impulse response function, the so-called haemodynamic response function
(HRF). Statistical parametric mapping (SPM, Friston et al. 1995c), is based on the GLM.
Introduction
9
SPM assumes the noise to be additive and Gaussian distributed. The total fMRI signal
measured in the experiment is a weighted sum of explanatory signals (signals that model
components of the responses) plus noise. Given a Gaussian temporal distribution of
the residual noise, which follows from the GLM if the original data contains Gaussian
temporal noise, the significance of the weighting factors can be found in each voxel
via standard hypothesis testing. For spatially stationary noise, the threshold need only
be determined once, after which it can be applied to the entire map of statistic values,
making SPM a very efficient analysis method.
The GLM definition of the BOLD contrast as a measure of covariance rather than as a
difference image enables more advanced study designs. Bandettini et al. (1993) measure
the strength of the BOLD response by computing its correlation with the expected block
signal. The experiments have a blocked design, but the BOLD contrast is computed via
analysis of covariance (ANCOVA), rather than computing mean block images. Worsley
and Friston (1995, 2002) have developed a general statistical framework for fMRI time
series analysis based on Gaussian random field (GRF) theory. Boynton et al. (1996) test
the GLM by independently varying stimulus duration and contrast, and investigating
the additivity of the noise. They conclude that although deviations from linearity can
be measured, these are not strong enough to reject the GLM.
With faster scanners and a better temporal resolution, fMRI shifted from state-related,
i.e., comparing scans taken during different experimental conditions, to event-related:
explicitly modelling the time signal as a response to short stimuli. Josephs and Henson (1999) present an overview of event-related fMRI, demonstrating the benefits of
event-related fMRI over state-related fMRI, and describing how to optimise experimental designs for event-related analysis. The BOLD response during an active period
may be described by modelling the active state as a block signal. One step towards
event-related analysis is to treat the block state signal as a stimulus signal and convolve it with the HRF (Bandettini et al. 1993). Event-related fMRI involves stimuli much
shorter than the repetition time TR. According to the GLM, such a short stimulus will
evoke a signal with the shape of a HRF. The HRF is often modelled as a smooth curve,
rising about two seconds after the stimulus, peaking at approximately seven seconds
after the stimulus, followed by a negative undershoot and returning to baseline around
30 seconds after the stimulus. As a result, the responses to stimuli much shorter than
the acquisition time of an EPI volume can still be measured using fMRI.
A number of studies use the GLM in an event-related setting (Dale and Buckner
1997, Miezin et al. 2000, Glover 1999). Friston et al. (1995b) test for differential responses,
i.e., responses in different brain regions that vary in temporal shape, by comparing the
responses in those regions to different temporal basis functions. In a later study, differential responses are captured by expanding the responses as a superposition of basis
functions (Friston et al. 1998a). Josephs et al. (1997) acquire high-resolution temporal
samples of the HRF by using interleaved post-stimulus sampling.
Next to detecting active regions, fMRI analysis also tries to describe the signals measured in those regions. A number of methods estimate and model the HRF itself (Hinrichs
et al. 2000, Ollinger et al. 2001a, Ollinger et al. 2001b, Ciuciu et al. 2003). Other meth-
10
1.1 MRI and fMRI
ods assume a fixed response waveform, and estimate the delay between stimulus and
the onset of the HRF (Menon et al. 1998, Calhoun et al. 2000, Liao et al. 2002, Henson
et al. 2002). Aguirre et al. (1998) show that although there is substantial variance in the
HRF between subjects, the differences between response functions measured within one
subject in subsequent experiments are much smaller. Jasdzewski et al. (2003) show that
the temporal shape of the HRF differs significantly between the motor cortex and the
visual cortex, but within those regions it does not vary significantly between subjects.
The next paragraphs present some assumptions used for neuroimage analysis which
have long been used without questioning, but have recently been subjected to critical
inspection.
Validity of the GLM
Experiments with varying stimulus durations and varying interstimulus intervals (ISI)
have shown that the BOLD response is in general not linear. A number of solutions have
been developed for the problem of modelling nonlinear effects in BOLD fMRI (Friston
et al. 2000a). Dynamical models have been used to describe the temporal changes in
the blood oxygenation levels (Buxton et al. 1998). Another solution is to estimate the
nonlinear component of event-related responses in fMRI by expanding the response as a
second-order Volterra series (Friston et al. 1998b). The GLM is still the most widely-used
model for fMRI time series, but other, more advanced methods are gaining popularity.
Distribution of temporal fMRI noise
The hypothesis testing based on the GLM assumes that temporal fMRI noise is (i) additive and Gaussian distributed, and (ii) temporally uncorrelated. The assumption of
Gaussian distributed temporal BOLD noise is widely accepted. In MR physics however, noise in MR images has been shown to have a Rician distribution (Henkelman
1985, Gudbjartsson and Patz 1995, Sijbers et al. 1998a, Sijbers et al. 1998c). Recent research of BOLD noise has shown deviations from a Gaussian distribution (Hanson and
Bly 2001) and validity tests for the assumption of Gaussian noise have been developed
(Luo and Nichols 2003). Nichols and Holmes (2002) have developed a nonparametric
hypothesis testing method for functional neuroimaging. This method does not make
any assumptions about the distribution of temporal noise. It is based on permutation
tests, which makes it useful for PET experiments and second-level analysis of fMRI data,
but not for (temporally correlated) fMRI time series. Raz et al. (2003) have developed a
permutation test which is suitable for fMRI time series analysis, permuting the stimulus
pattern rather than the sequence of scans.
Most testing methods assume the temporal noise to be uncorrelated (white). Fadili
and Bullmore (2001) assume that fMRI time signals contain 1/f (pink) noise. They introduce a technique called wavelet-generalised least squares (WLS) to get unbiased estimators of the GLM in the presence of temporally correlated noise. In another paper by
Introduction
11
the same group (Bullmore et al. 2001), temporal autocorrelations are removed by transforming the time signals to the wavelet domain, permuting the detail coefficients, and
reconstructing the signals. Alternatives to removing autocorrelations (whitening) are
high-pass filtering, i.e., remove only the low-frequency autocorrelations, and band-pass
filtering, i.e., keep only autocorrelations within a certain range of frequencies (Friston et
al. 2000b).
Gaussian spatial autocorrelations
The threshold selection scheme used for SPM needs to correct for multiple hypothesis
testing: one volume consists of thousands of voxels, which are all tested simultaneously. Large numbers of simultaneous tests entail an increased risk of false positives. Common procedures for multiple testing correction are (i) manually increasing
the threshold, (ii) controlling the familywise error (FWE), and (iii) controlling the false
discovery rate (FDR). Manually changing the threshold is not a favourable solution, because experiments are not reproducible. Two popular methods for controlling the FWE
are Bonferroni correction and Gaussian random field (GRF) theory. Bonferroni correction guarantees to control the FWE, but is often considered too conservative: spatial
correlations in the noise are not taken into account. GRF-based tests assume the spatial
correlations to be Gaussian, and controls the FWE for a Gaussian random field with
parameters estimated from the data. This method is less conservative than Bonferroni
correction, but it relies heavily on Gaussian field assumptions. To bring the images into
agreement with these assumptions, they often require heavy smoothing, leading to deformed (rounded) regions of detected activity. Control of the FDR is a relatively recent
introduction in fMRI analysis (Genovese et al. 2002), and has rapidly gained popularity.
The FDR is the expected proportion of false positives among the total number of positives. Control of this measure is possible with spatially uncorrelated data (Benjamini and
Hochberg 1995) and spatially positively correlated data (Benjamini and Yekutieli 2001),
without the need for more specific knowledge about the autocorrelation function. This
method does not require the images to meet stringent shape criteria, which may explain
the popularity of FDR control in fMRI analysis.
Conclusion
A spirit of healthy criticism is found in many areas of fMRI analysis, including even the
assumption that an increased BOLD signal is an indicator for increased neuronal activity itself (Raichle 2001, Logothetis et al. 2001, Logothetis 2002). Petersson et al. (1999a)
present a good overview of the possibilities, and also of the limitations, assumptions
and risks in contemporary fMRI methodology.
12
1.2
1.2 Wavelets
Wavelets
The word wavelet was first coined by Jean Morlet, a geophysical engineer working for an
oil company. Morlet was French, and wavelet is actually a literal translation of ondelette.
The let at the end signifies that a wavelet is a small wave, where small in this case stands
for transient. Whereas a wave continues to oscillate, a wavelet is only a small ripple. A
wavelet transform converts the time-domain representation of a signal to its waveletdomain representation. In the wavelet domain, a signal is described as a superposition
of localised basis functions that vary in offset and scale. The wavelet transform is introduced here via the more common Fourier transform, which is sometimes used to
compute wavelet transforms.
1.2.1
Fourier transforms and wavelet transforms
Wavelet analysis is related to Fourier analysis. A Fourier transform of a discrete signal x(n) of length N decomposes the signal into N sines and N cosines of different
frequencies. The Fourier basis functions are represented by complex exponentials, the
cosines being the real part and the sines being the imaginary part. Given the signal x =
x(n), n = 0, . . ., N − 1, its discrete Fourier transform (DFT) X(k) is defined as:
X(k) =
N
−1
X
x(n) e
−2πink
N
,
(1.1)
n=0
ik
e = cos(k) + i sin(k).
Every frequency coefficient X(k) contains the weights of the sinusoids at that frequency.
The real part represents the weight of the cosine, and the imaginary part represents the
weight of the sine. The inverse Fourier transform is given by:
x(n) =
N
−1
X
X(k) e
2πink
N
,
(1.2)
k=0
which superimposes the sinusoids of the frequencies k = 0, . . ., N − 1 at every point
n = 0, . . ., N − 1. The complexity of both transforms is O(N 2 ). The DFT of a signal
can be efficiently computed by the fast Fourier transform (FFT), and reconstructed by
the inverse fast Fourier transform (IFFT), with complexity O(N log2 N ) (Mallat 1998,
chapter 3).
An orthogonal wavelet basis is defined by two basis functions: a scaling function φ
and the corresponding wavelet ψ. The basis itself consists of translated dilations of φ and
ψ:
φj,l (n) =2−j/2 φ(2−j n − l)
ψj,l (n) =2−j/2 ψ(2−j n − l),
(1.3)
Introduction
13
where j and l denote scale and translation, respectively. The basis is orthogonal in L2 if
hl exist so that
X
X
φ(n) =
hl φ(2n − l)
ψ(n) =
gl φ(2n + l),
(1.4)
l
l
l
where gl = (−1) hl+1 .
This condition is called the two-scale relation (Daubechies 1988). Most of the algorithms
presented in this thesis use orthogonal wavelets. Other basis functions, like those based
on B-splines, can be used to constitute a biorthogonal wavelet basis (Cohen et al. 1992).
The conditions for biorthogonality are less strict than for orthogonality, and biorthogonal bases lack certain properties of orthogonal bases, like preserving the amount of
energy of a signal during the wavelet transform. In Chapters 4 and 5, biorthogonal basis
functions are used in a detrending algorithm for fMRI time signals.
The J-level biorthogonal wavelet decomposition of a discrete signal x(n) is given by:
x(n) =
X
cJl φeJ,l (n)
l
cJl = hx, φJ,l i
+
J X
X
j=1
djl ψej,l (n),
l
(1.5)
djl = hx, ψj,l i,
where cJl is called the approximation signal and djl are called the detail signals. From this
equation it follows that the signal c0 represents the input signal and that φ0,n = δn,0 .
In the discrete setting, φ and ψ are represented by the filters h and g, respectively. The
reconstruction from the wavelet-domain representation of a signal back to its original
form is possible with a dual scaling function and wavelet, respectively, which are represented in the discrete setting by the filters e
h and ge. The dual basis functions must
satisfy the same conditions, and it is also necessary that:
hφej,l , ψj,m i = 0,
hφej,l , φj,m i = δl,m ,
hψej,l , φj,m i = 0
hψej,l , φk,m i = δj,k δl,m
(1.6)
For an orthogonal basis, φe = φ and ψe = ψ, and the filters and their duals satisfy e
hl = h−l
and gel = g−l , respectively, where x denotes the complex conjugate of x. The relation
between the filters and the basis functions is defined by:
√ X
√ X
φ(n) = 2
hl φ(2n − l)
ψ(n) = 2
gl φ(2n − l), and
√ X
e
e
e
φ(n)
= 2
hl φ(2n
− l)
l
√ X
e
e
ψ(n)
= 2
gel φ(2n
− l).
l
l
l
(1.7)
Some basis functions, like Daubechies’ orthogonal wavelets with compact support (Daubechies
1988), are defined as recursive refinements of the filters, starting with φ0,n = δn,0 . Other
basic functions, like orthogonal spline wavelets, are defined in the frequency domain
14
1.2 Wavelets
(Mallat 1989). The scaling filters h are then found via the inverse Fourier transform.
Given a scaling filter, the wavelet filter g is found via (1.4), by reversing the filter h and
then multiplying it with the vector (−1, 1, −1, 1, . . . , −1, 1).
The fast wavelet transform (FWT) is an efficient wavelet transform based on multiresolution analysis (Mallat 1989). The algorithm repeatedly applies the filters h and g,
each time followed by downsampling. Denoting downsampling ↓2 and upsampling ↑2
by a factor of 2, respectively, by:
↓2 x(n) =
b x(2n),
x(n/2),
↑2 x(n) =
b
0,
n = 0, . . . , (N/2) − 1
even n
odd n,
(1.8)
n = 0, . . . , 2N − 1
and using ∗ to denote discrete circular convolution, the FWT algorithm is defined by
the decomposition step at each level j:
cj = ↓2 (h ∗ cj−1 )
dj = ↓2 (g ∗ cj−1 ),
(1.9)
j = 1, . . . , J
Reconstruction via the inverse fast wavelet transform (IFWT) is defined by:
cj−1 = e
h ∗ (↑2 cj ) + ge ∗ (↑2 cj ),
(1.10)
j = 1, . . . , J
e
Figure 1.7 shows the structures of both the FWT and the IFWT. The operators H, G, H,
e represent the the convolutions with the filters h, g, e
and G
h, and ge, respectively.
c0
H
2
c1
G
2
d
1
H
2
c 2 ...
G
2
d
2
H
2
cJ
G
2
d
J
(a)
c
2
~
H
2
~
G
J
J
d
+
c
2
~
H
2
~
G
J−1
J−1
d
+
... c
2
~
H
2
~
G
1
1
d
+
c0
(b)
Figure 1.7. Graphical representations of the FWT (a) and the IFWT (b).
The FWT is not shift-invariant, i.e., the coefficients of an FWT of a shifted version
of a signal c0 are not the shifted coefficients of the FWT of c0 . A shift-invariant discrete
wavelet transform (SI-DWT) exists (Mallat 1991). Instead of subsampling at every next
level, the SI-DWT algorithm does a polyphase decomposition, filters all phases separately, and then does a monophase reconstruction. This operation is performed on two
Introduction
15
copies of the signal, once with filter h and once with g. A polyphase transform of Q
phases subsamples the signal with a factor Q for shifts 0, . . . , Q − 1. A monophase transform interleaves the Q phases again into one signal. At the first level of decomposition
Q equals 1, and Q doubles every next level. The approximation cJ and each dj have the
same dimensions as the original c0 , so the total size of a J-level transform is (J + 1)×N .
The complexity is N log2 (N ). The SI-DWT is used in the second part of the thesis. Its
inverse is denoted as SI-IDWT. Each step of the SI-IDWT filters the approximation and
the detail at level j + 1, respectively, and adds them together in the new approximation.
Figure 1.8 shows the decomposition step and the reconstruction step, respectively. The
operators ↓2j and ↑2j perform up- and downsampling with a factor of 2j , respectively,
in the same way as the operators in (1.8). The /2j operator divides the signal values by
2j .
monophase
polyphase
0
cj
2
z
z
...
j
H
2
j
H
2
j
H
2
j
+
/2 j
j
+
z −1
j
+ z −1
...
1
2
2
2
j
polyphase
c j+1
0
c j+1
2
z
z
...
2
j
H
2
2
j
G
2
j
G
2
z
...
j
G
2
j
+
/2 j
j
+
z −1
j
+ z −1
...
1
2
2
2
z −1
j
0
z
2
j
2
2
j
j
dj+1
2
j
0
dj+1
2
z
z
...
j
1
2
j
2
2
j
j
G
2
j
(a)
z −1
z
2
2
2
j
+
/2 j
j
+
z −1
j
+ z −1
...
~
H
~
G
~
G
~
G
2
j
z −1
2
2
2
j
+
/2 j
j
+
z −1
j
+ z −1
...
j
z −1
j
2−1
2
~
H
~
H
~
H
j
z
2−1
z
1
2−1
2−1
z
j
monophase
2
j
~
G
2
+
/2
cj
(b)
Figure 1.8. One level of the SI-DWT (a) and of the SI-IDWT (b).
1.2.2
Wavelet bases
The Haar basis is the simplest wavelet basis. The Haar basis functions are members of
many wavelet families, such as Daubechies wavelets and spline wavelets. The scaling
function φ and the wavelet ψ are given by:

0 ≤ x < 12
 1,
1, 0 ≤ x < 1
−1, 12 ≤ x < 1
(1.11)
φ(n) =
ψ(n) =
0, otherwise

0,
otherwise
16
1.2 Wavelets
Figure 1.9 shows the Haar scaling function and wavelet, respectively. Haar basis func1.5
1.5
1
1
ψ(x)
φ(x)
0.5
0.5
0
−0.5
0
−1
−0.5
−1
−0.5
0
0.5
x
(a)
1
1.5
2
−1.5
−1
−0.5
0
0.5
x
1
1.5
2
(b)
Figure 1.9. (a) Haar scaling function. (b) Haar wavelet.
tions have a number of favourable properties. They are symmetric and they have compact support. Disadvantages of the Haar basis functions are: poor approximation, and
bad localisation in the frequency domain.
Daubechies’ wavelets (Daubechies 1988) are among the most popular wavelets presently
in use. They are identified by their number of vanishing moments. The number of vanishing moments is the maximum degree of the polynomials the scaling function can
reproduce. Daubechies shows that the minimal length of a filter h with v vanishing
moments is 2v. The filter g follows from h via (1.4). Daubechies-1 is the Haar basis.
Daubechies-2 has two vanishing moments, and h and g both have 4 filter taps. The
corresponding basis functions φ(n) and ψ(n) are shown in Fig. 1.10. Daubechies’ filters
are the shortest filters that generate an orthogonal wavelet basis, given a number of
vanishing moments. As a result, they enable very efficient wavelet transforms via the
FWT. Better localisation in the frequency domain and better approximation is obtained
by higher filter numbers (at the cost of longer filters). Disadvantages of these filters
are: they are not symmetric, and not well localised in the frequency domain. Variants
of Daubechies wavelets are: symlets, which are more symmetric (they are made from
Daubechies wavelets by rearranging the filter coefficients), and coiflets, whose wavelets
also have vanishing moments. Of the coiflet basis with index v, the scaling function has
2v − 1 vanishing moments and the wavelet has 2v vanishing moments. Both filters have
a support of length 6N − 1.
Wavelet basis functions based on spline bases are smooth, well localised in the frequency domain, and they have good approximation properties. Many types of spline
wavelet bases exist, such as: orthogonal, biorthogonal, causal, and symmetric. The
spline wavelets used in this thesis are orthogonal. Orthogonal spline wavelets do not
have compact support. Unser and Blu (2000) have made a spline wavelet basis construction tool based on fractional splines, which is used to produce the spline wavelet basis
Introduction
17
1.5
2
1.5
1
ψ(x)
φ(x)
1
0.5
0.5
0
−0.5
0
−1
−0.5
−1
0
1
2
3
−1.5
−1
4
0
1
x
2
3
4
x
(a)
(b)
Figure 1.10. (a) Daubechies-2 scaling function. (b) Daubechies-2 wavelet.
functions in this thesis. An example of spline wavelets are Battle-Lemarie wavelets (see
Fig. 1.11), whose scaling functions are based on symmetric orthogonal cubic spline basis
functions.
1.5
1.5
1
ψ(x)
φ(x)
1
0.5
0.5
0
0
−0.5
−6
−0.5
−4
−2
0
x
2
4
6
−1
−6
−4
(a)
−2
0
x
2
4
6
(b)
Figure 1.11. (a) Battle-Lemarie scaling function. (b) Battle-Lemarie wavelet.
1.2.3
Applications of the wavelet transform
Wavelet transforms are common tools in many signal and image processing tasks. In
this subsection, the applications relevant to the thesis are discussed: denoising and
waveform extraction.
Wavelet-based denoising uses the separation of the approximation cJ and the detail
signals dj , j = 1, . . ., J. The idea is that the relevant part is mainly captured in the ap-
18
1.2 Wavelets
proximation, and of the detail coefficients, only the largest are relevant. Wavelet-based
denoising thresholds the detail signals: small detail coefficients are either removed or
shrunk. The main characteristic of the different wavelet-based denoising methods is the
choice of the thresholds. Many methods use a priori hypotheses about the distribution of
detail coefficients, usually assuming them to be N (0, 1) distributed. Thresholds may be
based on the false discovery rate (FDR) (Abramovich and Benjamini 1995, 1996) or other
statistical thresholding procedures. The WaveLab project (Buckheit and Donoho 1995)
aims to distribute wavelet-based algorithms via the literature, and to provide implementations of those algorithms for other researchers. WaveLab is a collection of MatLab (The Mathworks, USA) routines for wavelet transforms and wavelet-related operations. This thesis uses a number of WaveLab routines for denoising signals (Donoho
and Johnstone 1994, 1995). These methods are based on white N (0, 1)-distributed noise
and compute optimal thresholds for different criteria. If the data contains correlations,
wavelet thresholding must be applied to each resolution channel separately (Johnstone
and Silverman 1997). The detail coefficients within one channel of a signal with correlated noise show almost no correlation, so threshold selection schemes based on white
noise can be used. New wavelet-based denoising techniques are based on the likelihood ratio of the wavelet coefficients (Pizurica et al. 2003). Wavelet-domain Wiener filtering (Ghael et al. 1997, Alexander et al. 2000a) combines the MSE-optimal properties of
(frequency domain) Wiener filtering and the minimax mean-square error (MSE) properties of wavelet domain thresholding. Nowak (1999) has developed a wavelet-domain
filtering procedure for removing Rician noise that is characterised as a data-adaptive
wavelet-domain Wiener filter. Wavelet-domain Wiener filtering is also used to denoise
fMRI time signals (LaConte et al. 2000, Alexander et al. 2000b). Hilton et al. (1996) denoise and analyse fMRI data in the wavelet domain. They compare the performance
of their own data analytic thresholding technique to a standard technique available in
WaveLab. Ruttimann et al. (1998) perform the statistical analysis of fMRI time series in
the wavelet domain, by thresholding the differences between block mean images. This
method is fast and statistically robust, although it is only suitable for blocked designs.
Recently, a number of wavelet-based methods has been introduced to remove noise,
while also deconvolving a blurring function (Dragotti and Vetterli 2002, Figueiredo and
Nowak 2003, Neelamani et al. 2004, Johnstone et al. 2004, Kalifa et al. 2003). This is the
main subject of the second part of the thesis. The difference between our application
and the ones mentioned above is that usually in image enhancement applications, the
effect of the impulse response function is simply removed. In our case, the signal is an
fMRI time series, and the response function is the HRF, which is the object of interest.
Wavelet-based extraction of waveforms has successfully been applied to elecroencephalographic (EEG) data (Zibulevsky and Zeevi 2002). A major difference between our approach and those methods, is that the FWT is not used. The FWT is not shift-invariant,
therefore the SI-DWT is used instead.
In some situations, variations on the FWT algorithm yield much faster implementations than the original algorithm (Rioul and Duhamel 1992, Vetterli and Herley 1992). A
wavelet transform with basis functions that do not have compact support is computed
Introduction
19
most efficiently via the frequency domain. The fast wavelet transform in the frequency
domain is denoted by FWD, its inverse by FWR (Westenberg and Roerdink 2000). For
both the FWT used in the first part of the thesis, and the SI-DWT used in the second part,
implementations in both the frequency domain and the time domain are presented (see
appendix B).
1.3
Thesis contribution and organisation
The remainder of this thesis is divided into two parts.
The first part deals with noise in fMRI data. Chapter 2 introduces a definition of
BOLD noise derived from MR physics. MR (magnitude) images contain Rician noise
(Gudbjartsson and Patz 1995): MR signals are measured in frequency space, and a complex value is measured at each voxel location. MR images are generally magnitude
images, and if the (complex) noise is Gaussian distributed, its magnitude is Rician distributed. Most fMRI analysis methods, however, assume Gaussian distributed noise,
without mentioning the Rician distribution of MR data. We define each BOLD fMRI
image as the difference of two MR images: one measured during activation, the other
measured during the baseline condition. The difference between two images containing
Rician noise, with the same underlying image and the same signal-to-noise ratio (SNR) is,
to close approximation, Gaussian distributed. We find that the probability density function of this difference (i) approximates a Gaussian very well, and (ii) actually decays
faster to zero than a Gaussian. Therefore, this model agrees with the Gaussian noise
model used in fMRI analysis methods. The mathematical derivations used to characterise the difference distributions are given in appendix A.
Chapter 3 tests a number of wavelet-based denoising schemes in the context of functional MR time series analysis, and compares them to Gaussian smoothing. Gaussian smoothing is the standard preprocessing method for removing noise from fMRI
data (Gold et al. 1998), but smoothing changes the images in an irreversible way; spatial
patterns of activity found in smoothed images are likely to differ from those found in
the true underlying signal. Wavelet-based denoising methods are shown to improve the
SNR beyond higher input values than Gaussian smoothing. In addition, the activation
patterns found after denoising remain closer, in terms of false positives and false negatives, to the original images. Gaussian smoothing is a prerequisite if GRF theory is used
for type I error control. The wavelet-based denoising routines are therefore combined
with FDR control, which does not require any form of smoothing prior to the analysis.
The second part of the thesis introduces a new method to extract a HRF from an
fMRI data set. The BOLD response is assumed to be LTI, and this property is used in
chapter 4 to extract the HRF from an fMRI time series with a combination of frequency
domain methods and wavelet domain methods. The ForWaRD method (Neelamani et
al. 2004) requires only few assumptions about the shape of the HRF and is shown to
be very robust. This method is adapted to extract the HRF from fMRI time series. The
extracted HRF coefficients are used to fit a novel HRF model, which can be used in the
20
1.3 Thesis contribution and organisation
analyses of other data sets. Combining the new model with the extracted HRFs proves
to yield a powerful analysis tool in subsequent ANCOVAs of similar data sets.
Chapter 5 extends the ForWaRD-based HRF extraction routine to support families of
wavelet basis functions which do not have compact support in the time domain. An efficient algorithm to compute the SI-DWT in the frequency domain is given in this chapter,
using definitions from appendix B. The implementation of this algorithm facilitates the
use of fractional spline wavelet bases, as introduced by Unser and Blu (2000). Comparisons of the computation times of the time-domain SI-DWT and the frequency-domain
SI-DWT show that the frequency-domain version is much faster for long signals. The
frequency-domain extraction method is tested in a similar way as the earlier version,
and test results confirm its usability.
Chapter 6 contains a summary and general conclusions of the thesis, and gives recommendations for future research.