WaveletAnalysis_fMRI..

Stat 233 Statistical Methods in
Biomedical Imaging
fMRI Simulation movie
Wavelet-Based Statistical Analysis
of fMRI Data
Ivo Dinov
http://www.stat.ucla.edu/~dinov
UCLA, Ivo Dinov
Slide 3
Slide 1
Problems with BOLD
Activation Imaging using BOLD
z How localized is the BOLD response to the site of
neuronal activity?
z Resting state versus Active state
„ e.g. Finger tapping, word recognition.
„ Is the signal from draining veins rather than the tissue
itself?
z Whole brain scanned in ~3 seconds using a high
speed imaging technique (EPI).
z How is the signal change coupled to neuronal
activity?
z Perform analysis to detect regions which show a
signal increase in response to the stimulus.
„ Do changes in timing of BOLD responses reliably tell us
about changes in timing of neural activity?
Slide 4
UCLA, Ivo Dinov
Slide 5
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Hypotheses vs. Data driven Approaches
fMRI Data Analysis Tools
Hypothesis-driven
z Brain Voyager http://www.brainvoyager.de/
z CARET http://brainmap.wustl.edu/resources/caretnew.html
z FSL http://www.fmrib.ox.ac.uk/fsl/
z SCIrun http://software.sci.utah.edu/scirun.html
z McGill BIC www.bic.mni.mcgill.ca/usr/local/matlab5/toolbox/fmri/
•
•
•
•
•
•
z AFNI http://afni.nimh.nih.gov/afni/
z SPM http://www.fil.ion.ucl.ac.uk/spm/
UCLA, Ivo Dinov
Examples: t-tests, correlations, general linear model (GLM)
a priori model of activation is suggested
data is checked to see how closely it matches components of the model
most commonly used approach (e.g., SPM)
Data-driven – Independent Component Analysis (ICA)
•
z LONI Pipeline http://www.loni.ucla.edu
Slide 6
•
•
•
No prior hypotheses are necessary
Multivariate techniques determine the patterns in the data that account
for the most variance across all voxels
Can be used to validate a model (see if the math comes up with the
components you would’ve predicted)
Can be inspected to see if there are things happening in your data that
you didn’t predict
Can be used to identify confounds (e.g., head motion)
Need a way to organize the many possible components
New and upcoming
Slide 7
UCLA, Ivo Dinov
1
Two approaches: Global vs. Regional
Two Approaches: Whole Brain Stats
A. Region-Of-Interest (ROI) approach
1.
2.
3.
A localizer run(s) to find a region ( e.g., show moving rings to find medial temporal MT area)
Extract time course information from that region in separate independent runs
See if the trends in that region are statistically significant
The runs that are used to generate the area are independent from those used
to test the hypothesis.
Example study: Tootell et al, 1995, Motion Aftereffect
Localize “motion area” MT in a run
comparing moving vs. stationary rings
Extract time courses from MT in subsequent runs
while subjects see illusory motion (motion aftereffect)
B. Whole volume statistical approach
1. Make predictions about what differences you should see if
your hypotheses are correct
2. Decide on statistical measures to test for predicted
differences (e.g., t-tests, correlations, GLMs)
3. Determine appropriate statistical threshold
4. See if statistical estimates are significant
Statistics available
1. T-test
2. Correlation
3. Frequency-Based (Fourier/Wavelet/Fractal) modeling
4. General Linear Model
-overarching statistical model that lets you perform many types
of statistical analyses (incl. correlation/regression, ANOVA)
MT
Slide 8
Slide 9
UCLA, Ivo Dinov
Comparing the two approaches
Why do we need statistics?
Region of Interest (ROI) Analyses
• Gives you more statistical power because you do not have to correct
for the number of comparisons
• Hypothesis-driven
• ROI is not smeared due to intersubject averaging
• Easy to analyze and interpret
• Neglects other areas which may play a fundamental role
• Popular in North America
Whole Brain Analysis
• Requires no prior hypotheses about areas involved
• Includes entire brain
• Can lose spatial resolution with intersubject averaging
• Can produce meaningless lists of areas that are difficult to interpret
• Depends highly on statistics and threshold selected
• Popular in Europe
NOTE: Though different experimenters tend to prefer one method over the other,
they are NOT mutually exclusive. You can check ROIs you predicted and
then check the data for other areas.
Slide 10
MR Signal intensities are random
-variation caused by scanner (magnet, coil, time)
-variation in neurophysiology (area, task, subject)
The rest-state scans used against activation tasks in
subtraction paradigm setting to correct for some of
these effects within the same run.
Statistical analyses help us confirm whether the
eyeball tests of significance based on visual
thresholding are real/correct.
Because we do so many comparisons (~106), we
need a way to compensate (increase of Type I error).
Slide 11
UCLA, Ivo Dinov
Paradigm: Event-related
design to assess between-population
differences of amplitude and variance of hemodynamic
response to visual stimulus (14 young, 14 old noAD and
13 AD subjects) blank
Look for differences b/w One-Trial & Two-Trial Activation, for a given voxel:
Measure average MR signal and SD for
each volume in which 1-Trial stimuli were
presented (7-8 epochs x 8 volumes/epoch
= 56-64 volumes).
Measure average MR signal and SD for
each volume in which 2-Trial stimuli were
presented (56-64 volumes).
Buckner et al. J. Cog. Neurosci. 2000
4 runs per Subject
128 volumes per run
2 (randomly chosen)
conditions (always 7+8)
1 Trial = 21.44 sec (8 vol’s)
Total 60 Trials/subject
1-trial condition–1.5 second 8Hz B/W checkerboard flicker
To look for OneTrial – TwoTrial
Activation, look at the negative
tail of the comparison
2-trial condition – 2 visual stimuli 5.36 sec apart
15 Random Trials / Run
Run 1
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Run 2
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Run 3
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Run 4
1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120128
Slide 12
UCLA, Ivo Dinov
P>F
F>P
Stat-Significant Mean difference?
Calculate To value. Look up p value
for that number of degrees of freedom
(df >= 56 x 2 = 112).
e.g., For ~112 df To>1.98 → p <0.05
To > 3.39 → p < 0.001
Repeat this process 65,536 more times
(64x64x16), once for each voxel.
fMRI
Signal
16 coronal slices
64x64 in-plane voxels
volume time = 2 sec
3 x 3 x 5 mm3 voxels
UCLA, Ivo Dinov
One Approach – Voxel-Based T-tests
fMRI of Young, Nondemented and Demented Adults
Stimulus required a right
hand index-finger click
Source: Tootell et al
UCLA, Ivo Dinov
1 Trial 2 Trial 1 Trial 2 Trial
If p = .001 and 65,536 voxels, 65,536*0.001 =
66 voxels could be significant purely by chance
Slide 13
UCLA, Ivo Dinov
2
T-test: Maps
Another Statistical Analysis – Basic Correlation
For each voxel in the brain, we can now color code a map based
on the computed t and p values:
• voxels with time course correlated with a reference function
• can incorporate hemodynamic response function (HRF) to predict
time course more accurately
Value of fMRI Signal
We can do this for the positive
tail (1 Trial – 2 Trial)
Orange = low significance
Yellow = high significance
Schmutz or ESP voxels?
56 data points
56 data points
Calculate r
0
1 Ref.Model
Value of Predictor
2
Remember r is the proportion of
variance accounted for by our
predictor, e.g., if r=0.7, r2=0.49Î49%
Data
And we can also do this for the
negative tail (2 Trial – 1 Trial)
Blue = low significance
Green = high significance
Slide 14
UCLA, Ivo Dinov
For each voxel:
• Find the correlation between the predictor and the MR signal
• Extract the correlation (r value) and find the corresponding p value (Pearson/Spearman).
• Determine whether it is statistically significant
• Similar in spirit to a T-test.
UCLA, Ivo Dinov
Slide 15
Predictor – Hemodynamic Response Function - HRF
signal begins to rise
soon after stimulus
begins
m(x)
time to peak
signal peaks 4-6 sec
after stimulus begins
Stimulus
On
Time from Stimulus
Onset (sec)
post stimulus
undershoot
To find a 1-Trial responsive area, we
can correlate the convolved face
predictor with each voxel time course
Model
(m*g)(x)
signal suppressed
after stimulation ends
% signal change
= (point – baseline)/baseline
usually 0.5-3%
Data
f(x)
initial dip
-more focal and potentially a better measure
-somewhat elusive so far, not everyone can find it
Slide 16
56 Model points
56 data points
0
1 Model
Value of Predictor
Data
Slide 17
UCLA, Ivo Dinov
Correlations: Negative Tail
UCLA, Ivo Dinov
Two Main fMRI Designs
Looking for anything that shows a
reduced response during 1-Trial
condition (compared to both 2-Trial
and fixation/rest)
Slide 18
g(x)
Kernel
Predictor
Value of fMRI Signal
Percent Change
From Baseline
time to rise
Correlation: Incorporating the HRF
We can model the expected curve of the data by convolving our
predictor with the hemodynamic response function.
UCLA, Ivo Dinov
Block design
• Compare long periods
(e.g., 16 sec) of one
condition with long
periods of another
• Traditional approach
• Most statistically
powerful approach
• Less dependent on how
well you model the
HRF (hemo)
• Habituation effects?!?
Event-related design
• Compare brief trials (1 sec)
of 1 condition with brief of
another
• Newer approach (1997+)
• Less statistically powerful
but has some advantages
• Trials can be well-spaced to
allow the MR signal to return
to baseline b/w trials (e.g.,
21+ sec apart) or closely
spaced (e.g., every 2 sec)
Slide 19
UCLA, Ivo Dinov
3
General Linear Model for fMRI
Problems with T-tests and Correlation Analyses
1) How do we evaluate runs with different orders?
We could average our 4 runs (60 trials) by 1-Trial of 2-Trial stimuli and then do
stats comparing the functional differences between the 2 paradigms. There is no
way to collapse between orders.
2) If we test more subjects, how can we evaluate all
subjects together?
Parse out variance in the
voxel’s time course to the
contributions of six
predictors plus residual
noise (what the predictors
can’t account for).
As with the single subject runs, we could average all the subjects together (after
morphing them into an atlas space) but that still means we have to run all of them
in the same order.
3) We can get nice hemodynamic predictors for 1-Trial or 2Trial stimuli, but how can we compare them accurately?
Completeness and Efficiency of Signal
Representation – Wavelet functions 1D
Subject
Run
+
β4 ×
…
+
β5 ×
+
Trial
residuals
Group (AD/Young)
+
β6 ×
A Solution: Use General Linear Modeling
Slide 20
+
β3 ×
fMRI signal
Examples:
Stimulus
+
β2 ×
=
If this predictor is significant, we
won’t know if it’s because
1-Trial > 2-Trial OR
1-Trial > Fixation
UCLA, Ivo Dinov
Design Matrix
β1 ×
Adapted from Brain Voyager course slides
ROI
Slide 21
UCLA, Ivo Dinov
Features of Natural Images
SOCR Wavelet/Fourier Demo!
z Mostly no global multi-scale self-similarity.
http://socr.stat.ucla.edu/
z Contain both man-made and natural features
z Mostly no simple and universal underlying physical
or biological processes that generate the patterns in a
general image.
Ref. www.math.umn.edu/~jhshen
Slide 22
UCLA, Ivo Dinov
Slide 23
UCLA, Ivo Dinov
Efficiency of representation
Fourier Representation
Claim: Harmonic waves are bad vision neurons…
Because.
„ A typical Fourier neuron is
z David Field (Cornell U, Vision psychologist):
φ = exp(iax ).
„ To process a simple bright spot
“To discriminate between objects, an
δ (x )in the visual field,
effective transform (representation)
encodes each image using the smallest
possible number of neurons, chosen from a
large pool.”
all such neurons have to respond to it (!) since
δ ,φ
=
∫
δ ( x ) e − ( i π x ) dx
Slide 24
= 1 .
UCLA, Ivo Dinov
Slide 25
UCLA, Ivo Dinov
4
Marr’s Edge Detection
Asking our own “headtop”…
z Detection of edge contours is a critical ability of human vision
(Marr, 1982). Marr and Hildreth (1980) proposed a model for
human detection of edges at all scales. This is Marr’s Theory of
Zero-Crossings:
2
 2
z Psychologists show
that visual neurons
are spatially
organized, and each
behaves like a small
sensor (receptor) that
can respond strongly
to spatial changes such
as edge contours of
objects (Fields, 1990).
x +y 
exp −
, Gaussian Kernel

2πσ
2σ 2 

Laplacian of the Gaussian is :
∂ 2Gσ ∂ 2Gσ
∆Gσ =
+
=
∂x 2
∂y 2
 x2 + y 2 
1  x 2 + y 2 
,
=−
1−
exp −
4
2 

2σ
2σ 2 
πσ 


Edge occurs in I where ( ∆Gσ ∗ I ) = 0. Why?
Gσ =
Slide 26
1
2
Slide 27
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Laplacian of a Gaussian –
Why an Edge at Zero-Crossing?
Marr’s Edge Detection
z Laplacian of (an isotropic) Gaussian Kernel (at (0,0))
(http://www.dai.ed.ac.uk/HIPR2/flatjavasrc/)
z Uses the second spatial derivatives of an image.
z The operator response will be zero in areas where the image has a constant
intensity (i.e. where the intensity gradient is zero).
z Near a change in image intensity the operator response
will be
positive on the darker side, and negative on the lighter side.
z A sharp edge between two regions of different uniform
intensities, the Laplacian of Gaussians response will be:
Gσ =
 x2 + y2 

exp −
2

2πσ
2σ 2 

1
∆Gσ =
Slide 28
∂ 2Gσ ∂ 2Gσ
+
∂x 2
∂y 2
zero away from the edge,
positive to one side of the edge,
negative to the other
zero on the edge in b/w.
Example:
Response of 1D Laplacian
of Gaussian (σ = 3 pixels)
to a 1D step edge.
Slide 29
UCLA, Ivo Dinov
Laplacian of a Gaussian –
Why an Edge at Zero-Crossing?
z A zero crossing detector looks for places where the Laplacian
changes sign. Such points often occur around edges in images.
UCLA, Ivo Dinov
Laplacian of a Gaussian –
Why an Edge at Zero-Crossing?
z Demos:
C:\Ivo.dir\UCLA_Classes\Applets.dir\ImageProcessing\LaplacianOfGaussian
z But they also occur at places that are not as easy to associate with
edges. Zero crossings always lie on closed contours.
„ LaplacianOfGaussianDemo.html
z The starting point for the zero crossing detector is an image which
has been convolved with a Laplacian of Gaussian filter.
„ LaplacianDemo.html
z The zero crossings that result are strongly influenced by the
size of the (isotropic) Gaussian, change filter-size in applet demo.
z As the smoothing is increased then fewer and fewer zero
crossing contours will be found, and those that do remain will
correspond to features of larger and larger scale in the image.
Slide 30
UCLA, Ivo Dinov
Slide 31
UCLA, Ivo Dinov
5
Marr vs. Haar’s edge encoding
A good representation should respect edges
z Edges are important features of images.
z Marr’s edge detector is to use second derivative to
locate the maxima of the first derivative (which the
edge contours pass through).
z A good image representation (or basis) should be
capable of providing the edge information easily.
z Haar Basis (1909) encodes the edges into image
representation via the first derivative operator (i.e.
moving average/difference):
z Edge is a local feature. Local operators, like
differentiation, must be incorporated into the
representation, as in the coding by the Haar basis.
(... x 2 n , x 2 n +1 ,... )←Haar →  ... a n = x2 n +2x2 n +1 , d n = x2 n −2x2 n +1 ,... 
z Wavelets improve Haar encoding, while respecting
the above edge representation principle.
Slide 32
What is a good image representation?
UCLA, Ivo Dinov
Mathematics of wavelet image representation
z Mathematically rigorous (i.e. a clean and stable analysis
and synthesis program exists. FT & IFT…).
z Having an independent fast digital formulation
framework (e.g., FFT, FWT… ).
z Let Ω denote the collection of all images. What is the
mathematical structure of Ω ? Suppose that f ∈ Ω is
an observed image. Then Ω should be invariant under
„ Euclidean motion of the scanner/imager:
z Capturing the characteristics of the input signals, and
thus many existing processing operators (e.g. image
indexing, image searching …) are directly permitted on
such representation.
z Improving statistical inference power.
Slide 34
Slide 33
UCLA, Ivo Dinov
f ( x) → f (Qx + a), Q ∈ O(2), a ∈ R 2 .
„ Brightness/Flashing:
f ( x ) → µ f ( x ),
f ( x ) → h ( f ( x )),
„ Scaling:
h : R → R , h' > 0.
f ( x ) → f ( λ x ), λ ∈ R + .
Slide 35
UCLA, Ivo Dinov
E.g., Zooming in 2-D
µ ∈ R+ ,
or, more generally, a morphological transform –
UCLA, Ivo Dinov
What is zooming?
z Zooming (aiming) center: a.
z Zooming scale: h.
z Zoom into the h-neighborhood at a in a given image I:
I a , h ( x ) = I (a + h ⋅ x ), x ∈ Ω , the visual field;
y−a 
y −a
I a , h 
 = I ( y ) ⋅ 1Ω  h , the aperture.
 h 


z Zooming is one of the most fundamental and
characteristic operators for image analysis and visual
communication. It reflects the multi-scale nature of
images and vision.
Slide 36
UCLA, Ivo Dinov
Slide 37
UCLA, Ivo Dinov
6
The zooming/scaling and wavelets
Good Signal representation should be adaptable
ψ (x)
z The base function:
z A good representation should react strongly to
abrupt changes, and weakly to stable intensities
z Aiming (a) and zooming (h), in-or-out:
ψ
a ,h
(x) =
1
h
ψ (
x−a
h
).
z Generating response (to centering/scaling of signals):
I a , h = I ,ψ
a ,h
z The zooming space:
=
∫ I ( x )ψ
a ,h
z That means, for a stable image, e.g., constant, I=c,
the responses Ia , h should be approximately zeros:
I a ,h =
( x ) dx .
(a, h) ∈ R × R + .
Slide 38
∫R ψ
≈ 0
(x) = 0.
z This is the “differentiating” property of the
encoding, e.g., d/dx
Slide 39
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Synthesizing a wavelet representation
The continuous wavelet representation
z Goal: to recover perfectly an image signal I from its wavelet
representation I(a , h).
Definition:
Definition
A differentiating zooming function ψ(x) is said to be
a (continuous) wavelet. Representing a given image
I(x) by all the scale/shift responses
I a , h = I ,ψ
z (Continuous) Wavelet synthesis (where ^ denotes the FT ):
I (a, h ) =
I ,ψ
a ,h
=
∧
I ,ψ
∧
a ,h
∧
∧
I ψ ( h ξ ), e
=
− ia ξ
,
∧ ∧
a ,h
J (ξ , h ) = I ψ ( h ξ )
which is in the form of IFT. Let
then it can be perfectly recovered via the a-FT of I(a , h).
is the corresponding wavelet representation.
∧
Questions:
Then I can be perfectly recovered from J via
„ Is there a best wavelet ψ(x) ?
„ Does such wavelet representation allow perfect image
reconstruction?
Slide 40
∧
∫
∫
∞
0
∧
R
UCLA, Ivo Dinov
z Make a scale-adaptive discretization of the zooming
centers:
at scale
∧
= 0 , and ψ ( h ) = ch + o ( h ).
„ The (1D Laplacian of Gaussian) Marr wavelet (Mexican-hat):
second derivative of Gaussian.
ψ ( x) = 2sinc (2x) − sinc ( x).
sin(π t )
sinc(t ) =
πt
Slide 42
dh = 1 .
j → j = − log 2 h j = 0 , ± 1, ± 2 , L .
z Examples:
„ The Shannon wavelet:
h
z Make a log-linear discretization to the scale parameter h:
|ψ ( h ) | / h dh < ∞ .
∫ ψ ( x ) dx
2
The discrete set of zooming scales
2
z A differentiating zooming operator satisfies the admissibility
condition since:
ψ (0) =
∫[ 0 , ∞ )
ψ (h)
Slide 41
UCLA, Ivo Dinov
z The admissibility condition of a continuous wavelet:
∧
∧
J (ξ , h ) ψ ( h ξ )
dh , as
I (ξ ) =
h
[ 0 ,∞ )
The admissibility condition & differentiation
∧
I ,ψ a , h
UCLA, Ivo Dinov
h
j
= 2 − j : k → a k = kh
j
= k / 2 j,
k = 0 , ± 1, ± 2 , L .
z The discrete set of zooming operators:
ψ
j ,k
( x) =
1
hj
ψ(
x − kh
hj
j
)= 2
Slide 43
j/2
ψ ( 2 j x − k ).
UCLA, Ivo Dinov
7
The discrete wavelet representation
z The wavelet coefficients:
d j , k = I ,ψ j , k = 2
j/2
∫
Example 1: Haar wavelet
z The Haar “aperture” function is
I ( x )ψ ( 2 x − k ) dx.
ψ (Harr) ( x ) = I {0 ≤ x < 1 }( x ) − I {1 ≤ x <1}( x ).
2
2
j
R
d j , k = I 2 − j k , 2 − j , in terms of the continuous WT.
{ ψ (Haar)
j ,k
z Questions:
„ Does the set of all wavelet coefficients still encode the
complete information of each input image I ? Í Î
„ Is the set of wavelets {ψ j , k ( x ) : j , k ∈ Z } a basis?
„ Or more precisely what space has these basis functions?
| j ∈ Z ;k ∈ Z
}
1
z Haar’s theorem (1905):
1/2
1
-1
All Haar wavelets together with the constant function
I[0;1](x), form an orthonormal basis for the Hilbert space of
all square integrable functions on [0, 1].
We don't know. But let's check out some examples...
Slide 44
Slide 45
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Haar wavelets
Haar wavelets
z Haar’s mother wavelet:
ψ (Harr) ( x ) = I {0 ≤ x < 1 }( x ) − I {1 ≤ x <1}( x ).
2
2
I(x)
1
1 constant + 3 wavelets are enough.
z Why orthonormal basis?
ψ
;ψ
{
= 1,
j =l&k =m
-1
j , k l ,m
0, otherwise
„.
„ Completeness is due to the fact that:
All dyadically piecewise constant functions are dense in L2 (0,1).
Slide 46
z Three Haar wavelets and the mean (constant) encode all the
information of the piecewise constant approximation (or, the
analog-to-digital transition).
Slide 47
UCLA, Ivo Dinov
Shannon wavelets
Example 2: The Shannon wavelets
z An orthonormal basis is visualized better in Fourier space!
z The Shannon’s “aperture” function is:
ψ
UCLA, Ivo Dinov
(Shannon)
( x ) = 2 sinc ( 2 x ) − sinc ( x ),
sin( π x )
sinc ( x ) =
π x
ψ
(Shannon)
( x ) = 2 sinc ( 2 x ) − sinc ( x ).
Fourier of 2sinc(2x)
Fourier of sinc(x)
–2π
–π
π
2π
z Shannon proved that:
z Theorem:
Theorem
(Shannon)
{ψ j,k
( x) : j, k ∈ Z }
is an orthonormal basis of L2(R).
Slide 48
UCLA, Ivo Dinov
„ All (−π, π) bandlimited signals can be represented by sinc(x-n)…
„ (−2π, π ) U (π, 2π) band-limited by ψ 0, n ( x − n)
„ (−4π, 2π ) U (2π, 4π) band-limited by
ψ1,n = 2ψ (2 x − n)
Slide 49
UCLA, Ivo Dinov
8
Shannon wavelets
Partition of the Time-Frequency space
z Shannon proved that:
„ All (−π, π) band-limited signals can be represented by sinc(x-n)
„ (−2π, π ) U (π, 2π) band-limited by ψ 0,n ( x − n)
„ (−4π, 2π ) U (2π, 4π) band-limited by ψ1,n = 2ψ (2 x − n)
„…
.
2ψ ( 2 x − n ) ' s
represented by ψ 0 ,n ( x − n ) ' s.
represented by ψ -1,n = 1 /
.
2ψ ( x / 2 − n ) ' s
...
...
ξ
π
2
π
∆ t ⋅ ∆x ≥ 2π .
z So, to get an optimal spatial-localization, we must
influence the particle’s scale or frequency content.
2π
4π
sinc(4x-k)
Frequency
represented by ψ 1,n =
z Heisenberg’s uncertainty principle postulates that for
each particle’s position and momentum we have:
sinc(2x-m)
sinc(x-n)
Space/Time
Slide 50
Mallat and Meyer (1986):
An (orthogonal) multiresolution of L2(R) is a chain
of closed subspaces indexed by all integers:
LV−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 L
subject to the following three conditions:
„ (completeness) lim V n = L2 ( R ),
lim Vn = {0}.
n→ ∞
n → −∞
f ( x ) ∈ V n ⇔ f ( 2 x ) ∈ V n +1 .
„ (translation seed) V0 has an orthonormal basis consisting
of all integral translates of a single function
φ ( x) : {φ ( x − n) : n ∈ Z }
Slide 52
UCLA, Ivo Dinov
Equations for designing MRA
Multiresolution analysis (MRA)
„ (scale similarity)
Slide 51
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Wavelets representation
z The refinement/dilation equation for the father-wavelet
function:
φ ( x) = 2∑n hnφ (2 x − n), for a suitable set of hn ' s.
This father-wavelet function is often called: scaling
function, shape function
LV− 2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 L
z Let Wo denote the orthogonal complement of Vo in V1.
Then Wo is also orthogonally spanned by the integer
translates of a single translation function ψ(x) the
(mother) wavelet!
ψ ( x) = 2
∑n gnφ (2 x − n),
for some suitable set of g n' s.
Slide 53
UCLA, Ivo Dinov
An example of wavelet decomposition
Theorem: {ψ = 2 j / 2ψ (2 j x − k ) : j , k ∈ Z }
j,k
One level wavelet decomposition of a 1-D signal
is an orthonormal basis for L2
Wavelets representation of a signal:
I ~ I j ∈Vj
I j−1 ∈Vj−1
I0 ∈V0.
+
+
d j−1 ∈Wj−1
I j = d j −1 + d
+
d j−2 L
j−2
d0 ∈W0.
+ L + d0 + I0.
……
Course Scale +
Wavelet Detail Coefficients
Slide 54
UCLA, Ivo Dinov
Slide 55
UCLA, Ivo Dinov
9
2-Channel filter bank: Analysis bank
Forward Discrete Wavelet Transform
• H’ is the low-pass filter ;
•G’ is the high-pass filter ;
• 2 is the down-sampling operator: (1 3 4 6 5 8 7)
(1 4 5 7).
2-channel filter bank: Synthesis bank
Inverse Discrete Wavelet Transform
• H is the low-pass filter ;
• G is the high-pass filter.
• 2 is the up-sampling operator: (1 4 5 7)
Low-pass channel
Æ (1 0 4 0 5 0 7).
Low-pass channel
H’
2
s
Scales
s
2
d
Details
d
2
H
2
G
x
y
G’
High-pass channel
High-pass channel
Slide 56
Slide 57
UCLA, Ivo Dinov
An orthogonal filter bank
A biorthogonal filter bank
H’
2
s
G’
2
d
2
s
G’
2
d
s
2
H
d
2
G
y
H
2
y
d
H’
x
x
s
UCLA, Ivo Dinov
G
2
Biorthogonal (or perfect) filter bank: if
y=x for all inputs x .
Slide 58
Definition: Orthogonal filter bank is biorthogonal, where both
analysis filters H’ and G’ are the time reversals of the
synthesis filters H & G: e.g., H=(1, 2, 3)
H’=(3, 2, 1).
Slide 59
UCLA, Ivo Dinov
The fundamental theorem (cont’d)
The fundamental theorem of MRA
z An orthogonal Mallat-Meyer MRA corresponds to an
orthogonal filter bank with the synthesis filters:
H = ( h n : n ∈ Z ),
G = (g
n
I j−1 ∈Vj−1
I j ∈Vj
+
d j−1 ∈Wj−1
Ij = d
j −1
+d
Suppose j=2, and I 2 =
φ ( x ) = 2 ∑ n h n φ ( 2 x − n ), ψ ( x ) = 2 ∑ n g n φ ( 2 x − n ).
And, the multiresolution wavelet decomposition of f
corresponds to the iteration of the analysis bank with the φcoefficients of f as the input digital data.
UCLA, Ivo Dinov
I0 ∈V0.
+
: n ∈ Z ).
where, the hn’s and gn’s are the 2-scale connection
coefficients in the dilation and wavelet equations:
Slide 60
UCLA, Ivo Dinov
c2
∑
d j−2 L
j−2
k
+
d0 ∈W0.
+ L + d0 + I0.
c 2 ( k )φ 2 , k ( x ).
H’
2
c1
G’
2
d1
Slide 61
H’
2
c0
G’
2
d0
UCLA, Ivo Dinov
10
The Discrete Wavelet Transform – DWT
Alternating quardature mirror – Smooth (S) and Detail (D)
filters for Daub4 filterbank.
Half the results of
S c0 c1 c2 c3
D c3 - c2 c1 - c0
 the S and D filters
 are discarded
 (decimation) .
 N Æ N/2
 Smooth/Low-Pass
 components and N/2
 Detail/High-Pass
c0 c1  components.
c3 - c2  N ×N
c0 c1 c2 c3


c3 - c2 c1 - c0
.
.
.
.
.
.
 c 2 c3
c1 - c0
UCLA, Ivo Dinov
Elements
 y1   s1   s1   ss1   ss1  Smooth
Components ,
 y2  d1   s2  dd1   ss2  Mother
 y3   s2   s3   ss2   dd1  function
coefficients
 y4  d 2   s4  dd 2   dd 2 
→
→
→
→
 y5   s3  d1  d1   d1  Details,
 y6  d 3  d 2  d 2   d 2  wavelet
 y7   s4  d 3  d 3   d 3  coefficients
 y8  d 4  d 4  d 4   d 4 
Slide 63
Apply Permute Apply
CTxw
z FBI fingerprints.
z JPEG2000 image compression file format.
Elements CTxw
z Image indexing and image search engines (for databanks).
 y1  ss1  ss1  s1  s1  w1 
 y2  ss2  dd1  s2  d1  w2 
 y3  dd1  ss2  s3  s2  w3 
 y4  ? dd2  dd2  s4  d2  w4 
 y5  = d1  ← d1  ← d1  ← s3  ← w5 
 y6  d2  d2  d2  d3  w6 
 y7  d3  d3  d3  s4  w7 
 y8  d4  d4  d4  d4  w8 
Slide 64
UCLA, Ivo Dinov
Major Applications of Wavelets
Alternating quardature mirror – Smooth (S) and Detail (D) filters for
Daub4 filterbank.
Permute
Alternating quardature mirror – Smooth (S) and Detail (D) filters for
Daub4 filterbank.
Data(y) Apply
Permute Apply
Permute Forward DWT
Matrix: Cxy Elements Matrix: Cxy Elements
Slide 62
DWT – Pyramidal Algorithm (inverse)
Data(y)
DWT – Pyramidal Algorithm (forward)
z Signal modeling.
z Image denoising and restorations (optimal representation).
Inverse
DWT
C=ortho
matrix
UCLA, Ivo Dinov
Completeness and Efficiency of Signal
Representation – Wavelet functions 2D
z Texture analysis.
z Brain Mapping and Neuroimaging.
z Algorithm speeding up based on multi-resolution
representation (e.g., solving large sparse-matrix linear equ’s)
z Time series analysis.
Slide 65
UCLA, Ivo Dinov
Completeness and Efficiency of Signal
Representation – Wavelet functions 3D
Signal & it’s 3D WT
A 2D Daubechies wavelet illustrating the compact support,
fast decaying and oscillatory properties of wavelets.
Slide 67
UCLA, Ivo Dinov
Slide 68
UCLA, Ivo Dinov
11
Completeness and Efficiency of Signal
Representation – Wavelet functions 3D
Completeness and Efficiency of Signal
Representation – Wavelet functions 3D
Signal & it’s 3D WT
Signal & it’s 3D WT
Slide 69
UCLA, Ivo Dinov
Slide 70
UCLA, Ivo Dinov
Wavelets
Completeness and Efficiency of Signal
Representation – Wavelet functions 3D
Signal & it’s 3D WT
The three curves on this graph represent the original signal
(HeavySine function, thin red curve), its wavelet transform (thin blue
curve) and the reconstructed function estimator using only the largest
2% of the wavelet parameters. Note the space-frequency decorrelation
of the original data in the wavelet-space (blue curve).
Slide 71
UCLA, Ivo Dinov
Wavelets
Slide 72
UCLA, Ivo Dinov
Atlas–Based Wavelet–Space Stat Analysis
If Time Permits:
C:\Ivo.dir\Research\movies\WaveletMovie.mpg
Also available online at the Wavelet Analysis of
Image Registration site:
http://www.loni.ucla.edu/~dinov/WAIR.html
Slide 73
UCLA, Ivo Dinov
Design protocol for the wavelet-based construction and utilization of the fMRI
functional atlas (F&A SVPA). The construction of the anatomical probabilistic
atlas is augmented by including the regional distributions of the wavelet
coefficients of the functional signals for one population. Statistical analysis of
functional variability between a new fMRI volume and the F&A SVPA atlas is
assessed following wavelet-space shrinkage.
UCLA, Ivo Dinov
Slide 74
12
Distribution of the wavelet coefficients
-5
0
Distr’n of the wavelet coefficients – 3 volumes
5
Displayed is the frequency histogram of the wavelet coefficients at
1,000 randomly selected locations (random indices of wj,k), averaged
across all 578 MRI volumes part of the ICBM database [Mazziotta et
al., 1995]. Notably, most of the wavelet coefficients are near the
origin, with some having sporadic, but large magnitudes. Heavy-tail
distributions models will be appropriate for these data.
Slide 75
Shown here are the frequency distributions of the wavelet coefficients for three
separate MRI volumes (randomly selected from the pool of 578). There is little
variation between these three individuals, however the overall shape of the
distribution of the magnitudes of the wavelet coefficients of each individual across
the 1,000 random locations is more regular, still heavy-tailed, than the averaged
distribution across subjects depicted in the previous Figure
Slide 76
UCLA, Ivo Dinov
Voxel Location
Wavelet Coefficient distributions MRI data
UCLA, Ivo Dinov
fMRI data – 1 subject wavelet coefficient distributions
Distribution Across 578 Subjects
This image illustrates a
portion of all of the individual
distributions of the wavelet
parameters for the 578 ICBM MRI volumes. The horizontal axis
represents the magnitude of the wavelet coefficients in the range
[-4 ; 4], vertical axis labels the subject index and the row color map
indicates the frequencies of occurrence of a wavelet coefficient of
certain magnitude across all 1,000 voxels. Bright colors indicate
high, and dark colors represent low, frequencies.
Slide 77
This diagram depicts the frequency histogram of the average
magnitude of a wavelet coefficient, across all 128 fMRI 3D timevolumes (part of the fMRI study of Buckner and colleagues
[Buckner et al., 2000]) at 1,000 randomly selected locations (random
indices of wj,k). Again, we observe the heavy-tailness of the data.
Slide 78
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Distribution of 4D wavelet coefficients
Wavelet coefficient Distribution – 1 run
64
128
A 2D image displaying all of the
individual distributions for each 128
3D time points (1 run, 15 trials) of
an fMRI timeseries. On the horizontal
axis is the magnitude of the wavelet
and the vertical axis labels the time
point of the fMRI epoch.
-10
-4
Frequency
1
Colors indicate the frequencies of
occurrence of a wavelet coefficient of
certain magnitude across all 1,000
voxel locations. The across row
average is effectively shown on
previous Figure.
0
4
Time
Magnitude of wavelet-coefficients
Slide 79
UCLA, Ivo Dinov
0
10
This graph shows the frequency distribution of the wavelet coefficients of the
complete 4D WT of the fMRI timeseries. Note the slow asymptotic decay of the
tails of this distribution (data-size: 64x64y16z128t, floating point). Here the x-axis
represents the magnitude of the wavelet coefficients and the vertical axis indicates
the frequency of occurrence of a wavelet coefficient of the given magnitude within
the 4D dataset. Sporadic behavior of the large-magnitude wavelet coefficients is
clearly visible at the tails of this empirical distribution.
Slide 80
UCLA, Ivo Dinov
13
Distribution of (and Models for) the Wavelet
Coefficients of the 4D fMRI volume
Right Tail of Distribution of (and Models for) the
Wavelet Coefficients of the 4D fMRI volume
Shows the extreme left trail of the data distribution (data is symmetric). The double-exponential and
the T distribution models underestimate the tails of the data asymptotically, but provide good fits
around the mean. Cauchy, Bessel K forms and double-Pareto distribution models, in that order, provide
increasingly heavier tails with Cauchy being the most likely candidate for the best fit to the observed
wavelet coefficients across the entire range. The double-Pareto and the Bessel K form densities provide
the heaviest tails, however, they are inadequate in the central range [-3 : 3] and undefined near the
mean of zero.
Several heavy-tail distribution models are fitted to the frequency histogram of 4D wavelet
coefficients. These include double-exponential, Cauchy, T, double-Pareto and Bessel K
form models. Because our statistical tests will be applied on the coefficients that survive
wavelet shrinkage it is important to utilize a distribution model that provides accurate
asymptotic approximation to the real data in the tail regions.
Slide 81
Slide 82
UCLA, Ivo Dinov
Wavelet-domain Atlas-based fMRI Analysis
• Anatomical Sub-Volume Probabilistic Atlas (SVPA)
Wavelet-domain Atlas-based fMRI Analysis
20
0
•
UCLA, Ivo Dinov
• Raw results following analysis in the wavelet-domain
(has ringing effects outside of the true ROI where the
activation supposedly occurred).
• Post-processed wavelet domain statistics [final uniform
thresholding (top 1%), reconstruction of all waveletspace stats into a single volume – regional calculations]:
Analysis using conventional time-domain SVT (sub-volume thresholding)
Subtraction
paradigm:
AD-ElderlyNC
Slide 83
Slide 84
UCLA, Ivo Dinov
Bayesian Mixture Models for fMRI Analysis
• We use two component mixture prior distributions on
the wavelet coefficients θj,k with
( )
θ j ,k | π jτ j ~ π j N 0,τ 2j + (1 − π j )δ (0)
where πj is a proportion between 0 and 1, δ(0) is the Dirac
point mass at zero and τj >0. In other words, there is a
level-dependent positive probability πj a priori that
each wavelet coefficient will be exactly zero. If not, the
coefficient will be normally distributed with mean zero
and a level-specific standard deviation τj
Bayesian Mixture Models for fMRI Analysis
• Given the observed data over an ROI y=f+ε, the
corresponding wavelet representation w = θ +z, where
W is the discrete WT, w = Wy, θ=Wf and z = Wε,
and the above prior distribution for the true wavelet
coefficients, the posterior distributions of θj,k are
again independent two-component mixtures:
 w τ 2 σ 2τ 2 
j,k j
j 
p θ j ,k | w j ,k ~ λ j ,k N 
+ (1 − λ j , k )δ (0)
,
σ 2 +τ 2 σ 2 +τ 2 
j
j

Where λj,k= 1/(1+ ρj,k) and and the posterior odds that θ j,k
is exactly zero are:ρ = 1 − π j τ 2j + σ 2 exp − τ 2j w j, k 
(
)
j ,k
Slide 85
UCLA, Ivo Dinov
UCLA, Ivo Dinov
πj
Slide 86
σ
2 
 2 2
 2σ (τ j + σ ) 
UCLA, Ivo Dinov
14
Completeness and Efficiency of Signal
Representation
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Slide 87
Completeness and Efficiency of Signal
Representation - Functional Bases
UCLA, Ivo Dinov
Completeness and Efficiency of Signal
Representation - Functional Bases
Completeness and Efficiency of Signal
Representation - Functional Bases
UCLA, Ivo Dinov
Slide 89
Slide 88
Completeness and Efficiency of Signal Representation –
Why use Wavelet functions?
Slide 90
Wavelet-space Shrinkage
IWT(Uniform 5%)
HeavySin
e
UCLA, Ivo Dinov
Slide 93
UCLA, Ivo Dinov
Slide 94
15
Wavelet-space Shrinkage
UCLA, Ivo Dinov
Wavelet-space Shrinkage
UCLA, Ivo Dinov
Slide 95
Slide 96
Wavelet-space Shrinkage
Wavelet-space Shrinkage
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Slide 97
Slide 98
Wavelet-space Warp Classification
Five PET (left) and corresponding MRI (right) data volumes (left columns), their LS,
MI and Affine warps (columns 2-4) and the target (right column).
Stereotactic data alignment - warping
Target
Which one is the best warp?
Data
Warp 1
Warp 2
Slide 99
Warp 3
UCLA, Ivo Dinov
UCLA, Ivo Dinov
Slide 100
16
Spread Group Classification (SGC) Analysis of PET data
-separation of saccade PET frequency clouds
UCLA, Ivo Dinov
Slide 101
Wavelet-space Warp Classification
Smaller classifying values Æ better warps
Wavelet-space Warp Classification
Cluster Group Classification (CGC) Analysis of MRI data
- how close together are all 5 MRI under each warp strategy?
UCLA, Ivo Dinov
Ran
k 3
2
1
4
Slide 102
17