New Concepts in Frame
Theory Motivated by
Acoustical Applications
Peter Balazs
Habilitationsschrift
Universität Wien, Fakultät für Mathematik
Wien, March 3, 2011
Chapter 1
Preface
Application-oriented mathematics develops theoretical results and new
mathematical concepts, motivated by application, in contrast to “applied
mathematics” focusing just on providing and applying mathematical tools
for the applied sciences. The application-oriented approach produces results
significant both for mathematics and the applied sciences.
In this context we developed new concepts in frame theory motivated by
signal processing and acoustical applications. Frames are generalizations of
bases, and give more freedom for the analysis and modification of information. The concept of frames is a theoretical background for signal processing.
On the other hand, signal processing algorithms and processes are essential
for application in audio and acoustics. Linking the mathematical frame
theory, the signal processing algorithms, their implementations and finally
acoustical applications leads to a very promising, synergetic combination of
research in different fields, which has not been fully exploited yet.
To establish that link a thorough investigation of the theory is important. So we have investigated topics in frame theory, extending the standard
mathematical concepts. As a particular case of analysis and synthesis systems we have researched mathematical topics in time-frequency analysis.
Furthermore a big focus was the mathematical theory of multipliers, which
are operators created by combining frame analysis, multiplication and resynthesis. To show that frame theory is important for applications we have
included two applied topics, which both apply Gabor frame multipliers in
acoustical projects.
The focus of our work is also the focus of this habilitation thesis and can
be summarized by the following grouping:
• Theory:
– Frame Theory [1-3]
– Time-Frequency Analysis [4-5]
– Frame Multipliers [6-8]
a
• Acoustical Applications:
– Time-Frequency Sparsity by Perceptual Irrelevance [9-10]
– Acoustic System Estimation [11-12]
Please note we use numeric references, e.g. [1], for the papers included
in this habilitation thesis, while we use citations using the name and the
year of publications for all other references, e.g. [Balazs, 2007].
Scientific Achievement
One of the first scientific goals assigned to me at the Acoustics Research
Institute of the Austrian Academy of Sciences, after finishing my mathematical studies, was to find a precise formulation of the heuristic irrelevance
algorithm developed in [Eckel, 1989]. This was one of the reasons, why I
started to be interested in the topic of multipliers, and it took me some
years to reach the set goal in [9]. So the application-oriented mathematics approach was right at the start of my career and it continues to be the
foundation of my scientific work.
With my PhD, I have set the first steps towards creating the connection
of mathematics, signal processing and psychoacoustics. The contained novel
ideas have been extended in several journal publications subsequently.
In particular, the idea of double preconditioning [Balazs et al., 2006]
resulted in a new approach for an efficient algorithm to calculate the perfect reconstruction window for a Gabor transform.
While being useful also to my own work, e.g.
for [1,9], it has
also inspired several colleagues in their research, combining Gabor theory and numerical mathematics, proven by many citations,
[Werther et al., 2005] (citing the preprint) and [Søndergaard, 2007,
Janssen and Søndergaard, 2007,
Hampejs and Kracher, 2007,
Chai et al., 2008, Mi et al., 2009b, Mi et al., 2009a, Cheng et al., 2009,
Chai et al., 2010, Moreno-Picot et al., 2010, Dörfler, 2010].
I was fascinated by the beautiful theory of frames and related sequences, by reading [Christensen, 2003, Casazza, 2000, Gröchenig, 2001].
This concept was not only an enchanting abstract theory, it also had a
connection to applications in acoustics. The fascination with this mathematical theory led to an investigation of semi-frames [2] and the relation of the properties of sequences and the associated (frame-related) operators [Balazs and El-Gebeily, 2008] [3], which were studied purely out
of mathematical interest. I am quite happy that also small results like
the investigation of the connection of frames and finite dimensionality
[Balazs, 2008a] have received some recognition by the scientific community [Cotfas and Gazeau, 2010, Rahimi, 2009, Špiřı́k et al., 2010]. Connected to the theory of frame multipliers the concept of weighted frames
b
was investigated [1]. This work again was used as basis for many of
my own papers, e.g. [2,6], but also found recognition in [Aceska, 2009,
Antoine and Vandergheynst, 2007] (as preprint).
I have developed the novel concept of frame multipliers in [Balazs, 2007]
by generalizing Gabor multipliers to the general frame case.
Also
this work was the basis for many of my later papers, e.g. [6-8], but
was also cited by other authors [Rudol, 2011, Arias and Pacheco, 2008,
Ambroziski and Rudol, 2009, Rahimi, 2009, Dörfler, 2010].
Within
this topic of frame multipliers, I have shown further mathematical results, for example, finding the best approximation in the
Hilbert-Schmidt setting [Balazs, 2008b].
Also here I can show several non-self citation [Arias and Pacheco, 2008, Chen et al., 2009a,
Dörfler and Torrésani, 2010, Aceska, 2009, Li, 2009, Xiao et al., 2009,
Chen et al., 2009b, Ahmad and Iqbal, 2009, Rahimi, 2009]. This generalization of the concept of Gabor multipliers was motivated by applications,
as for many acoustical challenges other analysis systems like wavelets or
auditory filterbanks are often advantageous. Also in these cases a simple
modification by analysis, multiplication and resynthesis would be a powerful
tool. With the new concept of frame multipliers basic properties are shown.
Furthermore it was established that these results do not depend on an
underlying group structure, but can be shown for general frames.
The natural extension of the standard approach to a frame representation of operators in [Balazs, 2008c] is an abstract mathematical topic
and was started as purely mathematical fundamental research. But
later the connection to the Galerkin approach in the boundary element
method (BEM), see e.g. [Gaul et al., 2003], became apparent. BEM
is used for finding numerical solutions to operator equations, and its
connection to frame theory will be further developed in future research
(sketched in [Rieckh et al., 2010a]). [Balazs, 2008c] was used in some
of my own work, but also was cited by [M. L. Arias and Pacheco, 2007,
Ambroziski and Rudol, 2009, Rahimi, 2009, Dörfler and Torrésani, 2010,
Rudol, 2011].
I have shown the importance of mathematical approaches for applications in the estimation of the perceptual irrelevance in the time-frequency
plane based on a simple simultaneous masking model [9]. This is the basis
for future work using current psychoacoustical experiments (as in [10]) and a
perceptual based filterbank (based on an implementation of a nonstationary
Gabor frame [5]).
Also mentioned in this habilitation thesis is the Multiple Exponential Sweep Method (MESM) [11], which is a system identification for weakly non-linear, weakly time-variant systems.
This
method relied on a time-frequency motivated approach.
It was
applied in the research of the Acoustics Research Institute several
times [Majdak et al., 2011, Majdak et al., 2010], but also was cited
c
by [Rébillat et al., 2011, Farina, 2009, Enzner, 2009, Søndergaard, 2007,
Rébillat et al., 2010, Weinzierl et al., 2009, Pulkki et al., 2010].
Due to my pluridisciplinary orientation, I was also able to introduce a
mathematical view-point to other applied topics and create novel methods
for them, like in the simulation of vibrations, see [Balazs et al., 2007]
(cited by [Hähnel, 2010]) and [Kreuzer et al., 2011], and the estimation
of a vocal tract model [Marelli and Balazs, 2010]. Currently the usefulness of the later method for forensic speech comparison is investigated
[Enzinger et al., 2011].
The actual relevance of the mathematical topics mentioned above is confirmed by a variety of projects realized in the last few years. In particular,
I was able to attract funding for the project Frame Multipliers: Theory
and Applications in Acoustics within the call “Mathematics and ....” as a
‘High Potential’ , which gave me the possibility to create a working group
’Mathematics and Signal Processing in Acoustics’ at the Acoustics Research
Institute.
I have been establishing active cooperations with internationally
renowned scientists. This eagerness to cooperate on an international level
can also be seen by numerous talks and 18 proceedings publications for
conference and workshops, as well as by being a partner in funded projects
organized by other scientists. I have also been the organizer of a number
of workshops. At the start of my career, I was employed in Marseille and
Louvain-la-Neuve.
The subset of papers chosen for this habilitation thesis, out of the 15 published (or accepted) and 8 submitted journal papers were selected because
of their topical connection, as well as some habilitation regulations. Please
note that in the PhD thesis [Balazs, 2005] a lot of material was included,
that directly lead to five successive journal publications [Balazs et al., 2006,
Balazs, 2007, Balazs, 2008c, Balazs, 2008b, Balazs, 2008a], which, due to habilitation thesis rulings, are not included here. Thus, although I can refer to
15 submitted or accepted journal papers as well as 7 peer-reviewed proceedings publications, rather many recently accepted or submitted papers can be
found in the list of the papers included in this thesis. For the updated status
of my publications, please refer to http://www.balazs.at/wissenen.html.
Frames Theory for Acoustical Applications
While we have addressed the mathematical importance of our work in
the last section, here we would like to explain in more details, why the
particular chosen connection between mathematics and acoustics in my
personal work had (and still has) a powerful synergetic effect.
d
We live in the age of information where the analysis, classification, and
transmission of information is of essential importance. Signal processing
tools and algorithms form the backbone of important technologies like MP3,
digital television, mobile phones and wireless networking. Many signal processing algorithms have been adapted for applications in audio engineering
and acoustics, also taking into account the properties of the human auditory
system.
The mathematical concept of frames is an important theoretical background for sampling theory and signal processing. Frames are generalizations of bases that give more freedom for the analysis and modification of
information - however, this concept is still not firmly rooted in applied research. Our past experience in the work on scientific projects has shown that
linking mathematical frame theory, signal processing algorithms, their implementations and finally acoustical applications leads to a very promising,
synergetic combination of research in different fields.
During the years I have been working in application-oriented mathematics for acoustics, I have made the following three observations regarding the
link of theoretical and applied research:
(1.) Frame theory is very useful by not fully understood in applications:
Frames very often occur in signal processing and acoustical applications. They have been implicitly used for many years without fully
exploiting the related theory. To use analysis / synthesis systems
other than orthonormal bases is sometimes seen as problematic in applied sciences. The mathematical theory provides enough knowledge
to establish the fact that frames are an applicable, stable and favorable
tools for applications. The link from frame theory to signal processing
and from signal processing to acoustical application is partially recognized, but needs further strengthening. The full link between all three
fields leads to very promising pluri-disciplinary research and is a novel
approach.
(2.) Understanding the mathematical theory improves modeling in applications: The results of frontier research in mathematical theory is often
not directly and immediately adaptable to given applications. But,
given a thematic framework, the abstraction level and deep understanding of the theory needed for those results are of essential importance in a modeling and implementation stage for applications. Many
applied sciences in acoustics measure empirical data and formulate
heuristic models, usually with a modest mathematical basis. Mathematically precise statements considerably enhance the precision and
stability of algorithms and models and can already be implemented at
an early stage.
(3.) Applications lead to interesting mathematical questions: On the other
e
hand the acoustical applications often raise mathematical questions,
which by themselves can be very interesting on an abstract mathematical level. Those questions might not have arisen in a purely theoretical
setting.
The work that led to this habilitation thesis started with the observation
that many of the methods developed and applied in acoustics, employ timefrequency analysis / synthesis systems, often with possible modification in
between. A typical example is the phase vocoder, see e.g. [Dolson, 1986].
The importance of prefect reconstruction in analysis / synthesis systems and
the scientific interest in the abstract theory behind it lead to investigations
in frame theory, also connected to frame multiplier. For audio applications
the natural setting for analysis is the the time-frequency plane, so we also
studied Gabor frames and Gabor multipliers.
Being fascinated by the abstract frame theory lead to the development
of the concept of frame multipliers [Balazs, 2007], extended and used in
[Balazs, 2008b] and [1,6]. While some of these investigations lead to results
in an abstract setting, the basic motivation still came from acoustical applications. Even if this abstract analysis did not lead to results, which could be
directly applied in the applications, the abstract and theoretical treatment
of this topic helped handling these concepts in an applied setting, in the
sense of observation (2) above.
The inversion of a system is an important topic in many applications, like
in vibration modeling. This, again in the beautiful abstract frame theory
setting, lead to the investigation of the invertibility of multipliers [7-8], which
are aimed to be the basis for future implementations usable in acoustical
applications.
In applications implementations are needed.
Within the Acoustics Research Institute all developed algorithms are integrated into the
ST X software system [Balazs and Noll, 2003, Noll et al., 2007]. While the
investigation of the double preconditioning algorithm [Balazs et al., 2006]
was first motivated by the goal of speeding up algorithms in ST X , it lead
to numerical and mathematical fundamental research, where only the most
basic approach is needed and implemented in ST X . Integrating algorithms
in a supported software systems keeps the code available and accessible. It
also shows the relevance of the developed methods. Therefore current (and
future) methods, e.g. based on research in [10] and [12], will be included
there.
Furthermore all my developed algorithms are and will be incorporated in
the Linear Analysis Time-Frequency Toolbox (LTFAT) [Søndergaard, 2007,
Soendergaard et al., 2010]. This is an open source software, which therefore
is used both by applied and mathematical researchers.
Because of the above mentioned importance of analysis / synthesis systems with possible modification as well as time-frequency representation,
f
Gabor frame multipliers are a very useful method to realize time-variant
filters. They are applied in the topics of perceptual irrelevance models [910] and system identification by exponential sweeps [11-12]. In this settings
theory and applications are converging more closely together beyond mere
conceptual connection mentioned in observation (2). As mentioned above,
not all theoretical results can be directly useful for applications, apart from
a better grasp for the basic idea and concept. In the mentioned topics, theory and applications are converging more directly. In [9] and in [12] rather
recently investigated theoretical properties were utilized. This resulted in
methods, which could not be created without the mathematical background.
Acknowledgments
I thank all my co-authors and cooperation partners, who provided me with
a lot of productive ideas, comments and research projects. Because there
are too many to mention (and also for that I am very, very thankful) let me
just state it like this: Thank you, friends! This goes, in particular, to all
the co-authors of papers included in this habilitation thesis.
I thank the Acoustics Research Institute, in particular Werner A.
Deutsch, for providing me with perfect conditions for working in a productive and open pluri-disciplinary environment. I warmly thank Hans G.
Feichtinger for introducing me to this wonderful part of science, connecting
mathematics to applications, and his continuous support since the start of
my PhD.
I thank Hans G. Feichtinger, K. Gröchenig and G. Rieckh for providing
useful comments and suggestions on this document, as well as T. Krutzler
and D. Stoeva for proof-reading.
Part of the work leading to this thesis was supported by the European Union’s Human Potential Programme, under contract HPRN-CT2002-00285 (HASSIP), the WWTF project MULAC (Frame Multipliers:
Theory and Application in Acoustics; MA07-025) and the WTZ Amadée
project 1/2006. I acknowledge gratefully the hospitality of the Groupe
de Traitement du Signal, Laboratoire d’Analyse Topologie et Probabilités,
CMI, Université de Provence, the group Modélisation, Synthése et Contrôle
des Signaux Sonores et Musicaux, Laboratoire de Mécanique et dAcoustique, CRNS Marseille and the Institut de Recherche en Mathématique et
Physique, Université catholique de Louvain.
For all my scientific life I was supported by the Acoustics Research Institute of the Austrian Academy of Sciences.
I especially thank my family, Claudia, Barbara and Michael for making
my life rich, also outside mathematics and acoustics.
g
List of Included Papers
Mathematical Theory
Frame Theory
[1] ”Weighted and Controlled Frames: Mutual Relationship and first Numerical Properties” (with J.-P. Antoine and A. Grybos), International
Journal of Wavelets, Multiresolution and Information Processing, Volume 8 (1), pp. 109-132 (2010)
[2] ”Frames and Semi-Frames” (with J.-P. Antoine), arXiv:1101.2859v1,
submitted to Journal of Physics A: Mathematical and Theoretical
(2011)
[3] ”Classification of General Sequences by Frame-Related Operators”
(with D. Stoeva and J. P. Antoine), Sampling Theory in Signal and
Image Processing (STSIP), to appear (2011)
Time-Frequency Analysis
[4] ”The Phase Derivative Around Zeros of the Short-Time Fourier
Transform” (with D. Bayer, F. Jaillet and P. Søndergaard), submitted
to Advances in Pure and Applied Mathematics (2011)
[5] ”Non-stationary Gabor Frames” (with F. Jaillet and M.
Dörfler), SAMPTA’09, International Conference on SAMPling
Theory and Applications proceedings, pp.
227-230 (2009)
[http://hal.archives-ouvertes.fr/hal-00495456/en/]
(peerreviewed proceedings paper; an extended journal paper is in
preparation)
Theory of Frame Multipliers
[6] ”Multipliers for p-Bessel sequences in Banach spaces” (with A.
Rahimi), Integral Equations and Operator Theory, Volume 68 (2),
193-205 (2010)
h
[7] ”Unconditional convergence and invertibility of multipliers” (with
D. Stoeva), arXiv:0911.2783v3, in revision for Applied and Computational Harmonic Analysis (2010)
[8] ”Detailed characterization of conditions for the unconditional
convergence and invertibility of multipliers” (with D. Stoeva),
arXiv:1007.0673v1, submitted to Complex Analysis and Operator Theory (2011)
Applications in Acoustics
Time-Frequency Sparsity by Perceptual Irrelevance
[9] ”Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking” (with B.
Laback, G. Eckel and W. Deutsch), IEEE Transactions on Audio,
Speech and Language Processing, Vol. 18 (1) , pp. 34-49, (2010)
[10] ”Additivity of nonsimultaneous masking for short Gaussian-shaped
sinusoids” (with B. Laback, T. Necciari, S. Savel, S. Ystad, S. Meunier
and R. Kronland-Martinet), The Journal of the Acoustical Society of
America, to appear (2011)
Acoustic System Estimation
[11] ”Multiple Exponential Sweep Method for Fast Measurement of Head
Related Transfer Functions” (with P. Majdak and B.Laback), Journal
of the Audio Engineering Society , Vol. 55, No. 7/8, July/August
2007, Pages 623 - 637 (2007)
[12] ”A Time-Frequency Method for Increasing the Signal-To-Noise Ratio
in System Identification with Exponential Sweeps” (with P. Majdak,
W. Kreuzer and M. Dörfler), 36th International Conference on Acoustics, Speech and Signal Processing ICASSP 2011, Prag, to appear,
2011 (peer-reviewed proceedings paper; an extended journal paper is
in preparation)
i
Contents
1 Preface
a
2 Introduction and Summary
2.1 Frame Theory . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 State of the Art . . . . . . . . . . . . . . . . . . . . .
2.1.2 Weighted and Controlled Frames: Mutual Relationship and first Numerical Properties [1] . . . . . . . . .
2.1.3 Frames and Semi Frames [2] . . . . . . . . . . . . . . .
2.1.4 Classification of General Sequences by Frame-Related
Operators [3] . . . . . . . . . . . . . . . . . . . . . . .
2.2 Time-Frequency Analysis . . . . . . . . . . . . . . . . . . . .
2.2.1 State of the Art . . . . . . . . . . . . . . . . . . . . .
2.2.2 The Phase Derivative Around Zeros of the Short-Time
Fourier Transform [4] . . . . . . . . . . . . . . . . . .
2.2.3 Non-Stationary Gabor Frames [5] . . . . . . . . . . . .
2.3 Theory of Frame Multipliers . . . . . . . . . . . . . . . . . . .
2.3.1 State of the Art . . . . . . . . . . . . . . . . . . . . .
2.3.2 Multipliers for p-Bessel sequences in Banach spaces [6]
2.3.3 Unconditional Convergence and Invertibility of Multipliers [7] . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 Detailed characterization of conditions for the unconditional convergence and invertibility of multipliers [8]
2.4 Applications in Acoustics: Time-Frequency Sparsity by Perceptual Irrelevance . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 State of the art . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Time-Frequency Sparsity by Removing Perceptually
Irrelevant Components Using a Simple Model of Simultaneous Masking [9] . . . . . . . . . . . . . . . . .
2.4.3 Additivity of nonsimultaneous masking for short
Gaussian-shaped sinusoids [10] . . . . . . . . . . . . .
2.5 Applications in Acoustics: Acoustic System Estimation . . .
2.5.1 State of the art . . . . . . . . . . . . . . . . . . . . . .
1
1
1
j
4
6
9
10
10
12
15
17
17
19
20
22
25
25
26
28
31
31
2.5.2
2.5.3
Multiple Exponential Sweep Method for Fast Measurement of Head Related Transfer Functions [11] . .
A Time-Frequency Method for Increasing the SignalTo-Noise Ratio in System Identification with Exponential Sweeps [12] . . . . . . . . . . . . . . . . . . . .
k
32
34
Chapter 2
Introduction and Summary
2.1
Frame Theory
2.1.1
State of the Art
A sequence Ψ = (ψk )k∈K in the Hilbert space H is a frame for H, if there
exist positive constants AΨ and BΨ (called lower and upper frame bound,
respectively) that satisfy
X
AΨ kf k2 ≤
|hf, ψk i|2 ≤ BΨ kf k2 ∀f ∈ H.
(2.1)
k∈K
If at least the upper (or the lower) inequality is fulfilled this sequence is
called a Bessel sequence (or a lower frame sequence, respectively). A frame
that is not a basis is called over-complete. A frame where the two bounds
can be chosen to be equal, i.e. AΨ = BΨ , is called tight.
By CΨ : H → l2 1 we denote the analysis operator defined by (C
PΨ f )k =
hf, ψk i. The adjoint of CΨ is the synthesis operator DΨ (ckP
) =
k ck ψk .
The frame operator SΨ = DΨ C can be written as SΨ f =
k hf, ψk i ψk .
(Ψ)
(Ψ)
The Gram matrix (Gk,l )k,l is defined by Gk,l = hψl , ψk i, k, l ∈ N. This
matrix defines an operator on l2 by matrix multiplication, corresponding to
G = CD. If no confusion will arise, we will omit the indexes, writing, for
example, S for SΨ .
Frame theory gives a stable way to reconstruct the signal perfectly from
these coefficients by using the canonical dual frame (ψ̃k ). It is found by
applying the inverse of the frame operator S to the original frame elements,
i.e. ψ̃k = S −1 ψk for allD k. Then
for all f ∈ H we have the reconstruction
E
P
P
f = hf, ψk i ψ̃k =
f, ψ̃k ψk . In this way, perfect reconstruction, which
k
k
is very often a goal in signal processing analysis/synthesis systems, can be
reached easily. The so called canonical tight frame is defined by ψ (t) =
S −1/2 ψk .
1
We denote by lp the p-summable sequences, and by c0 the sequences converging to
zero.
1
We will denote an orthonormal basis (ONB) of the Hilbert space by
E = (ek ), i.e. a complete sequence for which the Gram matrix is the identity. Frames are generalizations of bases. Contrary to bases, frames lead to
redundant representations. In general, the range of C is a proper subset of
l2 . It is equal to all of l2 if and only if the frame is a basis. Choosing an arbitrary sequence in l2 and applying the Gram matrix G = CD corresponds
to a mapping from l2 into the ran(C), obviously. Even more, this mapping
is a projection. This can be called the ’reproducing kernel’ property of the
over-complete frame.
Finding and constructing frames, satisfying certain a-priori properties,
is often an easier task, than doing that for bases. This can readily be
experienced in time-frequency analysis. The often used Gabor transform,
see Section 2.2, can be much better localized in the time-frequency domain
if the associated sequence has a frame rather than a basis-property. This is
the well-known Balian-Low theorem, see e.g. [Gröchenig, 2001], which states
that it is impossible to have a Gabor sequence, which has both good timefrequency localization and a basis property. This shows two things: First,
even if it is impossible to find a basis with certain properties, it can still be
possible to find a frame. Secondly, analysis with redundant frames instead
of bases can have the big advantage, that it is easier to directly interpret the
coefficients (e.g. for Gabor sequences by the time-frequency localization).
This is advantageous for many applications in acoustics and therefore frames
have been implicitly used for many years, without the benefit of having the
mathematical background.
Frame theory is one of the most important foundations of Gabor theory [Feichtinger and Strohmer, 1998, Gröchenig, 2001] and wavelet theory
[Ali et al., 2000, Daubechies, 1992, Flandrin, 1999], see also Section 2.2. It
is a highly active mathematical discipline, whose results have also been
proved to be relevant for signal processing, see e.g. [Bölcskei et al., 1998].
Frames also emerged in the context of the theory of (generalized) coherent
states in quantum physics [Gazeau, 2009]. In this setting often continuous
frames are considered, i.e., loosely speaking, the index of the frame is continuous and in Equation 2.1 integrals instead of sums are considered, see
Definition 2.4.
We
have
investigated
several
topics
in
frame
theory
[Balazs and El-Gebeily, 2008, Balazs, 2008a], some of them included
also in this habilitation thesis, see Sections 2.1.2-2.1.4, [1-3]. Several other
topics are under investigation, for example related to Frames of translates,
where a publication is currently in submission2 .
The frame representation introduced above is applied on functions,
but frames can be also used to represent operators. In computational
2
P. Balazs, C. Cabrelli, S. Heineken and U. Molter, Frames of Irregular Translates
2
acoustics, for example, one aims to solve operator equations numerically,
for example equations for vibration modeling [Balazs et al., 2007]. Here
the finite element [Hackbusch, 2003] and the boundary element method
[Sauter and Schwab, 2004] are widely used. One particular scheme to
discretize the operator equations is the Galerkin method [Gaul et al., 2003].
This corresponds to taking finite sections of the standard matrix description
[Gohberg et al., 2003] of operators O using an ONB (or biorthogonal basis)
(ek ) by constructing a matrix M with the entries Mj,k = hOek , ej i. But,
as was indicated before, the search for bases with certain properties, like
sparsity of the system matrix, can be a very restrictive approach. The
relaxation and generalization to frames can lead to more stable and faster
algorithms. Using frames instead of bases led directly to the matrix
representation of operators using frames in [Balazs, 2008c]. In future work
this approach will be linked to adaptive frame methods [Dahlke et al., 2005]
and the Galerkin method. A particular way to define operators is to
apply frame analysis, multiplication and frame synthesis, which results in
the concept of frame multipliers, which will be the main topic in Section 2.3.
Frame theory is also important for numerical purposes. At first it might
seem unfeasible to use redundant systems for numerical purposes. But
frame theory was already successfully applied in the field of compressed
sensing/sparsity [Gribonval and Nielsen, 2003, Dahlke and Teschke, 2008].
The basic idea, why frames are advantageous for sparsity can be seen
in the following motivation: In a ’rich’ dictionary, with a lot of entries,
it is much easier to find the correct pieces to have a short, i.e. sparse,
representation of a given sentence, i.e. signal. Furthermore, as we have
already noted, it is often much easier to construct frames. The concept
of sparsity was also shown to be significant for applications in audio and
acoustics, see e.g. [Daudet, 2010, Plumbley et al., 2010]. Sparsity is also a
topic when solving matrix equations efficiently. The hierarchical matrices
(or H-matrices) [Hackbusch, 1999] use a data-sparse approach. While
sparsity is included in this thesis only in the sense of perceptual sparsity,
see Section 2.4, in future work we aim to connect the two sparsity concepts.
In the future we also plan to use the data-sparse approach together
with adaptive frame approaches [Dahlke et al., 2005], already sketched in
[Rieckh et al., 2010a, Rieckh et al., 2010b].
We show recent expansions of frame theory in the next sections. For the
summary of these ideas we will need some further definition of sequences.
The sequence Ψ is called
• a frame-sequence if it is a frame for its closed linear span;
• a P
Riesz sequence
with
A, B if A > 0, B < ∞ and
P for H
Pbounds
2
2
2
A |ck | ≤ k ck ψk k ≤ B |ck | for all finite scalar sequences (ck )
3
(and hence, for all (ck ) ∈ `2 );
P
• aP
Riesz-Fischer sequence with bound A if A > 0 and A |ck |2 ≤
k ck ψk k2 for
Pall finite scalar sequences (ck ) (and hence, for all (ck ) ∈
`2 such that ∞
k=1 ck ψk converges in H);
• a Riesz basis if it is a complete Riesz sequence.
• norm-bounded below (resp. norm-bounded above) if inf n kφn k > 0
(resp. supn kφn k < ∞).
• norm-semi-normalized if 0 < inf n kφn k ≤ supn kφn k < ∞.
We will call a sequence of numbers (mn ) semi-normalized if 0 < inf n |mn | ≤
supn |mn | < ∞.
2.1.2
Weighted and Controlled Frames: Mutual Relationship and first Numerical Properties [1]
A sequence Ψ = (ψk ) and a complex weight (ωn ) are called a weighted frame,
if there exist constants A > 0 and B < ∞ such that
X
A kf k2 ≤
|wn |2 |hf, ψn i|2 ≤ B kf k2 .
(2.2)
n∈Γ
These are sequences (ψn ) with complex weights (ωn ) such that the sequence (ωk ψk ) is a frame. They were introduced in the PhD thesis
[Jacques, 2004] and then taken over in [Bogdanova et al., 2005], in order
to get a numerically more efficient approximation algorithm for spherical wavelets. A similar but not equivalent concept are signed frames in
[Peng and Waldron, 2002]. Weighted frames also occur naturally in the
theory of fusion frames [Casazza and Kutyniok, 2004] as well as for Gabor
[Gabardo, 2009] or wavelet frames [Heil and Kutyniok, 2003]. This concept
lacked the investigation in the general frame theory context.
By decreasing the ratio of the frame bounds, weighting can improve the numerical efficiency of iterative algorithms like the ‘frame algorithm’ [Christensen, 2003] for the inversion of the frame operator. The
works [Jacques, 2004, Bogdanova et al., 2005] introduced and used controlled frames, that is, a frame (ψn ) and an operator T such that the combination of T with the frame operator is positive and invertible, i.e. there
exist positive constants AT L and BT L , such that
X
AT L kf k2 ≤
hψn , f i hf, T ψn i ≤ BT L kf k2 , for all f ∈ H.
(2.3)
n
Since these concepts were used there just as a tool for spherical wavelets,
they were not discussed in full detail.
4
In [1], we developed the related theory and derived some results, among
them properties used in [Jacques, 2004] and [Bogdanova et al., 2005] without proof, as well as give the results of numerical experiments. We showed
that controlled frames are equivalent to standard frames and so this concept
gives a generalized way to check the frame condition. The operator T acts
as a preconditioning operator and so can improve the numerical properties
of the inversion of the frame operator. For general frames, it seems difficult to find an appropriate preconditioning matrix, but for wavelet frames
this technique is used in [Jacques, 2004, Bogdanova et al., 2005]. For Gabor
frames, a way to find advantageous preconditioning matrices is presented in
[Balazs et al., 2006].
In [1] we have put some emphasis on the mutual relationship between
weighted and controlled frames, showing in particular that weighted frames
cannot always be considered as controlled frames. We also have investigated how these concepts can improve the efficiency of iterative algorithms
for inverting the frame operator. As a special case, we have considered
semi-normalized weights, for which the concepts of frames and weighted
frames are interchangeable again. The connection to frame multipliers
[Balazs, 2007], see also Section 2.3, was addressed.
In particular we showed the following result:
Theorem 2.1.1 Let (ψn ) be a sequence of elements in H. Let w = (wn ) be
a sequence of positive, semi-normalized weights. Then the following properties are equivalent:
1. (ψn ) is a frame.
2. Mw,Ψ,Ψ is a positive and invertible operator.
3. The pair (wn ),(ψn ) forms a weighted frame.
√
4. ( wn ψn ) is a frame.
5. Mw0 ,Ψ,Ψ is a positive and invertible operator for any positive, seminormalized sequence (wn0 ).
We investigated the concept of weighted frames in numerical experiments. We analyzed three different a-priori choices for weights with the aim
of making frames tighter, i.e., reducing the quotient of the frame bounds.
These choices were
kψn k
(2)
1. ωn =
rP
|hψn ,ψk i|2
.
k
(∞)
2. ωn
=
kψn k
supk |hψn ,ψk i| .
5
(mult)
3. ωn
v
u M uX
(2) †
t
=
G
Ψ
k=1
kψk k2 ,
nk
where the last one corresponds to the best approximation of the identity
(2)
(2) by frame multipliers [Balazs, 2008b], where GΨ is the matrix GΨ pq =
| hψq , ψp i |2 and † denotes the pseudo-inverse. In preliminary tests we found
that other ’p-weights’ are outperformed by ω (2) or ω (∞) . In [1] we gave the
results of some numerical experiments, showing that these weights very often
improve the condition number of the frame operator matrix. In particular
the weight ω (2) nearly always improves the frame bounds, while the weight
ω (mult) often, but not always is the best choice of the given weights. We
saw that redundancy is an important parameter for the optimality of these
weights.
We also examined the computational behavior of weighted Gabor
frames. In particular we investigated how well the canonical dual weighted
frame is approximated by the inversely weighted dual frame. We saw that
the error depends linearly on the amount of weighted elements and the
redundancy.
As shown above, the concept of weighted frames is naturally connected to the topic of multipliers [Balazs, 2007], [6-8]. It was already
cited in [Aceska, 2009, Balazs, 2008b, Antoine and Vandergheynst, 2007] (as
preprint) and in [2,6].
2.1.3
Frames and Semi Frames [2]
There are situations where the notion of frame is too restrictive, in the
sense that one cannot satisfy both frame bounds simultaneously. The very
famous sequence Gabor dealt with in his original paper [Gabor, 1946], a
Gabor system with a Gaussian window and redundancy 1, is a complete
Bessel sequence, but does not fulfill the lower frame condition.
By symmetry, there is room for two natural generalizations. We will say
that a sequence Ψ is an upper (resp. lower) semi-frame, if
(i) it is total in H;
(ii) it satisfies the upper (resp. lower) frame inequality.
Note that the lower frame inequality automatically implies that the
sequence is total, i.e. (ii) ⇒ (i) for a lower semi-frame. Also, in the upper
case, S is bounded and S −1 is unbounded, whereas, in the lower case, S
is unbounded and S −1 is bounded. We may also remark that a discrete
upper semi-frame is nothing but a complete Bessel sequence. These are the
concepts we investigated in [2], also for continuous frames.
6
The definition of frames above, Equation 2.1, concerns sequences, as required in numerical analysis. However, more general objects, called continuous frames, emerged in the context of the theory of (generalized) coherent
states in theoretical and mathematical physics and were thoroughly studied
[Ali et al., 2000, Rahimi et al., 2006, Fornasier and Rauhut, 2005].
Let X be a locally compact
S space with measure ν. We assume that X
is σ-compact, that is, X = n Kn , Kn ⊂ Kn+1 , Kj relatively compact. Let
Ψ := {ψx , x ∈ X} be a family of vectors from a Hilbert space H indexed by
points of X. Then we say that Ψ is a set of coherent states or a generalized
frame if the map x 7→ hf, ψx i is measurable for all f ∈ H and
Z
hf, ψx ihψx , f 0 i dν(x) = hf, Sf 0 i, ∀ f, f 0 ∈ H,
X
where S is a bounded, positive, self-adjoint, invertible operator on H, called
the frame operator. In Dirac’s notation, the frame operator S reads
Z
S=
|ψx ihψx | dx.
X
The operator S is invertible, but its inverse S −1 , while still self-adjoint
and positive, needs not be bounded. Thus, we say that Ψ is a frame if S −1
is bounded or, equivalently, if the (optimal) frame bounds satisfy A > 0 and
B < ∞, so that
Z
2
A kf k ≤ hf, Sf i =
|hψx , f i|2 dν(x) ≤ B kf k2 , ∀ f ∈ H.
(2.4)
X
For frames the spectrum Sp(S) of S is contained in the interval [m, M], these
two numbers being the infimum and the supremum of Sp(S), respectively.
These definitions are completely general. In particular, if X is a discrete
set with ν the counting measure, we recover the standard definition 2.1 of a
(discrete) frame.
If one has
Z
0<
|hψx , f i|2 dν(x) ≤ M kf k2 , ∀ f ∈ H, f 6= 0,
(2.5)
X
then Ψ is called a (continuous) upper semi-frame. In this case, S −1 is unbounded, with dense domain dom(S −1 ).
By symmetry (in fact, duality), we will speak of a lower semi-frame if the
upper frame bound is missing. Note that, since S may now be unbounded,
a lower semi-frame is no longer a coherent state, as defined above.
In [2] we studied mostly upper semi-frames and gave some remarks for
the dual situation. In particular, we show that reconstruction is still possible, in a certain sense. We covered the general (continuous) case, then
7
particularize the results to the discrete case. An important difference between these cases is how convergence is understood, weak convergence for
the continuous case, strong convergence for the discrete case.
For the discrete case clearly reconstruction on a dense subset works for
upper semi-frames, that fulfill an additional condition:
Proposition 2.1.2 Let Ψ be a regular upper semi-frame for H, i.e Ψ ∈
dom(S −1 ). Then
X
f = SS −1 f =
hS −1 ψk , f iψk , ∀f ∈ RS .
(2.6)
k
If we use the Gram matrix we can give a different reconstruction formula:
Proposition 2.1.3 For all f ∈ RD , we have the reconstruction formula
X
G−1 (hf, ψk iH ) ψk
(2.7)
f=
k
with unconditional convergence.
For upper semi-frames, we can show that the following diagram is commutative:
RC
G1/2
−→
RC
G1/2
−→
C(RD ) −→
C(RS )
@
@
@
D@
C
D@
C
D@
C
@
H
G1/2
S
@
@
R
@
1/2
−→
RD
S
@
@
R
@
1/2
@
R
@
1/2
−→
RS
S
−→
S(RD )
This connection can be described in the context of Gelfand triples. It
leads to an extension of the reconstruction formula. The reconstruction
formulas given above are only valid for every f ∈ RD or require regular
upper semi-frames. With the connections in the diagram we can give a
reconstruction formula valid for all f ∈ H, even in the case when Ψ 6⊆
dom(S −1 ), if we allow the analysis coefficients to be altered.
Theorem 2.1.4 Let (ψk ) be an upper semi-frame. Then, for all f ∈ H, we
have the reconstruction formula
i
Xh
f = S −1/2
G−1/2 hψk , f i ψk .
k
8
2.1.4
Classification of General Sequences by Frame-Related
Operators [3]
The frame condition cannot always be satisfied, and so other classes
of sequences have been investigated, for example, frame sequences,
Bessel sequences, lower frame sequences, and Riesz-Fischer sequences
[Balazs and El-Gebeily, 2008,
Casazza et al., 2002,
Christensen, 1995,
Christensen, 2003]. For such sequences, which need not be frames in general,
the frame-related operators, i.e. the analysis, the synthesis and the frame operator, can still be defined, see e.g. [Casazza et al., 2002, Christensen, 1995].
In these cases, these operators can be unbounded.
In [3] we gave an overview of the connection between the properties of
those operators and those of the sequences. This paper is both a survey as
well as an original research paper. While some results about the connection
of the properties of the frame-related operators and the sequences existed,
they were spread out on many papers. Also for complete results a lot of
holes remained. We collected existing results, extended them and added
new, original results, leading to results like the following:
Proposition 2.1.5 Given a sequence Ψ, the following statements hold.
(a1) Ψ is a Bessel sequence if and only if the domain of D is all of `2 , i.e.
dom(D) = `2 .
(a2) Ψ is a Bessel sequence with
√ bound B if and only if dom(D) = `2 and
D is bounded with kDk ≤ B.
(b1) Ψ is a frame sequence if and only if dom(D) = `2 and ran(D) is closed.
(b2) Ψ is a frame sequence if and only if ran(D) is closed and ran(D) ⊆
dom(C).
(b3) Ψ is a frame sequence if and only if dom(D) = `2 and ran(D) =
ran(S).
(c) Ψ is a frame if and only if dom(D) = `2 and D is surjective.
(d) Ψ is a Riesz basis for H if and only if dom(D) = `2 and D is bijective.
(e) Ψ is a lower frame sequence for H if and only if ran(D) is dense in
H and ran(D∗ ) is closed.
(f) Ψ is a Riesz-Fischer sequence if and only if D is injective and D−1 is
bounded on ran(D).
(g) Ψ is complete in H if and only if ran(D) is dense in H.
9
Some of these connections are well known or rather apparent, while others
had to be proved. Similar results for C, S and G were also proved in [3].
Another way of classifying sequences is to consider them as images of
orthonormal bases under specific classes of operators. For this approach we
showed:
Proposition 2.1.6 Let (ek )∞
k=1 be an orthonormal basis for H.
(a) The Bessel sequences for H are precisely the families (V ek )∞
k=1 , where
V : H → H is a bounded operator.
(b) The frame sequences for H are precisely the families (V ek )∞
k=1 , where
V : H → H is a bounded operator with closed range.
(c) The frames for H are precisely the families (V ek )∞
k=1 , where V : H →
H is a bounded and surjective operator.
(d) The Riesz bases for H are precisely the sequences (V ek )∞
k=1 , where
V : H → H is a bounded bijective operator.
(e) The lower frame sequences for H are precisely the families (V ek )∞
k=1 ,
where V : dom(V ) → H is a densely defined operator such that ek ∈
∗
dom(V ),
injective with bounded inverse P
on ran(V ∗ ),
Pn∀k ∈ N, V isP
∞
and V ( k=1 ck ek ) → V ( k=1 ck ek ) as n → ∞ for every ∞
k=1 ck ek ∈
dom(V ).
(f) The Riesz-Fischer sequences are precisely the families (V ek )∞
k=1 , where
V is an operator having all ek in the domain and which has a bounded
inverse V −1 : ran(V ) → H.
(g) The complete sequences are precisely the families (V ek )∞
k=1 , where V :
dom(V ) → H is a densely defined operator such that ek ∈ dom(V ),
∀k ∈ N,P
ran(V ) is dense in
(equivalently, the adjoint V ∗P
is injective)
PH
n
∞
and V ( k=1 ck ek ) → V ( k=1 ck ek ) as n → ∞ for every ∞
k=1 ck ek ∈
dom(V ).
2.2
2.2.1
Time-Frequency Analysis
State of the Art
The Fourier Transformation is a well known mathematical tool to analyze
the frequency content of a signal. It is defined in L1 (R) by
Z
ˆ
F (f ) (ω) = f (ω) = f (t)e−2πiωt dt.
R
10
It can be extended by density to L2 (R).
Due to the very efficient algorithms of the fast Fourier transformation
(FFT) , see e.g. [Walker, 1991], it has been used in many signal processing
methods. If humans listen to a sound, a voice or music, they do not only
hear frequencies and their amplitudes but also their dynamic development.
So it is very natural to search for a joint time frequency analysis.
A well known method for a time frequency representation is the short
time Fourier transformation (STFT). It is defined for f, g ∈ L2 (R), see e.g.
[Gröchenig, 2001], by
Z
Vg (f )(τ, ω) = f (t)g(t − τ )e−2πiωt dt.
The STFT Vg (f )(x, ω) provides information about the frequency content
of the signal f at time τ and frequency ω. One possibility to look at this
method is the following: the signal f is multiplied with the shifted window function g(t − τ ). This results in a windowed version of the signal,
that is concentrated at the time τ (if the window is chosen accordingly,
localized around zero). Then the Fourier transformation is applied to the
result. Thus, the analyzing window g determines the resolution in time and
frequency, which is the same in the whole time-frequency domain.
This can also be seen as a projection of the signal f (x) on the timefrequency shifted Gabor atoms Mω Tτ g(t), where T denotes the translation
operator (Tτ f ) (t) = f (t − τ ) and M the modulation operator (Mω f ) (t) =
e2πiωt f (t):
Vg (f )(τ, ω) = hf, Mω Tτ g(t)i .
The STFT is invertible:
Corollary 2.2.1 Let g,γ ∈ L2 (R) and hg, γiL2 (R) 6= 0. Then
Z
1
f (t) =
Vg f (s, ω)γ(t − s)e2πiωt dsdω .
hg, γiL2 (R)
R
This is a direct consequence of the orthogonality relations for the STFT:
Theorem 2.2.2 Let f1 , f2 , g1 , g2 ∈ L2 (R), then Vgj fj ∈ L2 R2 for j =
1, 2 and
hVg1 f1 , Vg2 f2 iL2 (R2 ) = hf1 , f2 i2L (R) · hg1 , g2 i2L (R)
If the STFT is not considered for continuous variables ω and τ , but in
a sampled version, Vg f (ka, lb) for k, l ∈ Z and a, b fixed constants, it is
called a Gabor transform. A Gabor system with time shift parameter a and
frequency shift parameter b is given by:
G(g, a, b) = {Mb·l Ta·k g : k, l ∈ Z} = {e2πiblx g(x − k · a) : k, l ∈ Z}.
11
In this sampled version the inversion is not ’automatic’. Inversion is
possible if the Gabor system forms a frame. The dual frame for a Gabor
frame is just the Gabor system of the dual window g̃ = S −1 g. This is a
very special property of Gabor systems. For example, for wavelet frames,
the canonical dual does not need to be wavelet frames again.
Apart from the topics mentioned in this habilitation thesis, [4-5], we have
dealt with Gabor systems a lot in the past. The PhD thesis [Balazs, 2005]
was focused on irregular Gabor frames, as well as an efficient way to invert
the Gabor frame operator by double preconditioning [Balazs et al., 2006].
We have investigated a particular property of the phase derivative of the
STFT in [4], which was first discovered in numerical experiments and then
proved mathematically. We also extended the standard Gabor approach to
a more general one [5] allowing an adaptive time-frequency resolution either
in time or in frequency.
2.2.2
The Phase Derivative Around Zeros of the Short-Time
Fourier Transform [4]
The interpretation of the modulus of the STFT is relatively easy, considering
the fact that the spectrogram (defined as the square absolute value of the
STFT) can be interpreted as a time-frequency distribution of the signal
energy. This interpretation led to the important success of the STFT in
signal processing.
But the interpretation of the phase of the STFT is less obvious, and
is often not considered in applications. In most analysis/synthesis schemes
that modify the STFT, the magnitude is modified, but the phase is not
changed. This is a problem, as it is known, that amplitude and phase for
the STFT are not independent, but instead can even carry the same information, for Gaussian windows see [Gardner and Magnasco, 2006]. So a
modification of the amplitude itself, without controlling the effect on the
phase, will have strange results. Therefore phase information is also pivotal
for applications modifying the STFT coefficients. So, for this type of applications, in particular for applications using STFT or Gabor frame multipliers
[Feichtinger and Nowak, 2003, Balazs, 2007] a better understanding of the
structure of the phase is necessary to improve the processing possibilities.
It is known that a multiplier has a ’local effect’ in the time-frequency plane,
in the sense of a small time-frequency spread [Kozek, 1998]. But due to the
uncertainty principle, it can never be perfectly localized. This contradicts
the intuitive approach to a multiplier, where the multiplication would just
correspond to amplification or attenuation of single time-frequency components. As a particularly interesting consequence of this phenomenon a
time-frequency shift of a signal could be realized by a complex multiplier,
which manipulates the phase. To control this behavior and investigate how
12
to exploit it, for example in an optimization of the effect of a multiplier by
manipulating the phase, a thorough understanding of the effect of the phase
is essential.
It is known [Carmona et al., 1998] that the phase of the DFT becomes
arbitrary near zeros, see [Balazs et al., 2003]. So it could be expected that
the STFT shows a similar behavior. Interestingly, in this paper we observe
that the behavior of the phase derivative around zeros is far from being
arbitrary. The over-complete representation of the STFT and the resulting
reproducing kernel property is in contrast to the basis property of the DFT.
This difference, however, leads also to the afore-mentioned difference in the
phase.
The phase of the STFT is usually not considered directly. In fact, it
is more interesting to consider the phase derivative over time or frequency.
Indeed, these quantities appear naturally in the context of reassignment
[Auger and Flandrin, 1995] and manipulations of the phase derivative over
time is the idea behind the phase vocoder [Dolson, 1986]. Their interpretation is easier, as the derivative of phase over time can be interpreted as local
instantaneous frequency while the derivative of the phase over frequency can
be interpreted as a local group delay.
The phase derivative over time is of particular interest for analysis of
signals containing sinusoidal components, as often encountered in acoustics
[Dolson, 1986]. In [Auger and Flandrin, 1995] it is shown how the local
instantaneous frequency gives access to the exact frequency of a slowly
changing sinusoid, despite the usual spread in time and frequency normally
produced by the STFT.
In numerical tests presented first in [Jaillet et al., 2009b] numerical experiments have been reported that show the peculiar behavior of the derivative of the phase at zeroes of the STFT. These experiments are included and
updated in [4]. When analyzing white noise, as can be seen on Figure 2.1,
the time-frequency distribution of the values appeared to be structured and
in particular, the values of the phase derivative with high absolute values are
concentrated around several time-frequency points, which can be identified
as the zeros of the transform when looking at the modulus. Furthermore,
the shape of the phase derivative seems to be very similar in the neighborhood of the zeros, with a typical pattern repeating at each zero. This typical
pattern is represented on the third image of Figure 2.1. When going from
low to high frequencies, it presents a negative peak followed by a positive
one.
In [4] the mathematical background was investigated and it was shown
that for STFTs with certain regularity the mentioned phenomenon occurs.
13
Angular Frequency
2.4
2.2
2
2
1.5
1.8
1
1.6
0.5
1.4
200
220
280
0
2
2.4
Angular Frequency
240
260
Time
2
2.2
1
1
2
0
1.8
1.6
0
−1
−1
−2
−2
1.95
1.4
200
220
240
260
Time
280
1.9 1.85
Angular Frequency
1.8
230
220
Time
Figure 2.1: Observation for a Gaussian white noise, using a Gaussian window. Top: modulus of the STFT. Bottom-left: derivative over time of the
phase of the STFT using the definition (2.2.1). Bottom-right: mesh plot of
the derivative over time of the phase in the neighborhood of a zero of the
STFT.
Theorem 2.2.3 (Phase derivatives of the STFT) Let f, g ∈ L2 (R).
Assume that
• V (f, g) = V = U + i · W ∈ C 2 (R2 , R2 )
• V (x0 , ω0 ) = 0
• det JV (x0 , ω0 ) 6= 0, where
Ux (x0 , ω0 )
Uω (x0 , ω0 )
JV (x0 , ω0 ) =
Wx (x0 , ω0 ) Wω (x0 , ω0 )
denotes the Jacobian matrix of V at the point (x0 , ω0 ). Here we use
the notation Ux = ∂U
∂x , etc.
Denote ψ (x, ω) = arg (V (f, g) (x, ω)). Then the phase derivative of the
STFT satisfies
(
−∞, if ω ↑ ω0 from below
∂ψ
lim
(x0 , ω) =
ω→ω0 ∂x
+∞, if ω ↓ ω0 from above
14
for det JV (x0 , ω0 ) > 0,
respectively
∂ψ
lim
(x0 , ω) =
ω→ω0 ∂x
(
+∞,
−∞,
if ω ↑ ω0 from below
if ω ↓ ω0 from above
for det JV (x0 , ω0 ) < 0.
If V (f, g) ∈ C 3 (R2 , R2 ), then the phase derivative of the STFT satisfies
lim
x→x0
∂ψ
(x, ω0 ) = c ∈ R,
∂x
if x → x0
converges to some real number c ∈ R.
A similar result can be shown for the derivation in the other dimension,
i.e. the frequency axis. Furthermore it was shown that for windows in the
Schwartz class this differentiability conditions of the STFT are fulfilled.
2.2.3
Non-Stationary Gabor Frames [5]
Frequency
Gabor
analysis
[Feichtinger and Strohmer, 1998]
is widely used for applications in
signal processing.
Nevertheless,
when dealing with signals, with
characteristics changing over the
time-frequency plane, the fixed
time-frequency resolution over the
whole time-frequency plane can be
very restrictive.
This led to the
Time
use of alternative decompositions
with time-frequency resolution evolvFigure 2.2: Example of sampling
ing with frequency, such as the
grid of the time-frequency plane when
wavelet transform [Flandrin, 1999],
building a decomposition with timethe constant Q transform (CQT)
frequency resolution evolving over
[Brown and Puckette, 1992] or filtime.
ter banks based on perception, for
example gammatone filters [Hartmann, 1998].
The standard Gabor theory was extended in [5]3 , extending ideas in
[Jaillet, 2005, Jaillet et al., 2009a] to provide some freedom of evolution of
the time-frequency resolution of the decomposition in either time or frequency. Furthermore, this extension is well suited for applications, because
it can easily be implemented using a fast algorithm based on the fast Fourier
3
A corresponding journal paper is in preparation.
15
transform [Walker, 1991]. We replaced the regular time translation in standard Gabor analysis by the use of different windows. For each time position
we still built atoms by regular frequency modulations:
gm,n (t) = gn (t)ei2πmbn t = (Mmbn gn ) (t).
Assuming that the windows gn are centered at different temporal positions,
the sampling of the time-frequency plane is done on a grid, which is irregular
over time, but regular over frequency. Figure 2.2 shows an example of such
a sampling grid.
Here, as in the regular case, see e.g.[Gröchenig, 2001], we found conditions, where an efficient way to calculate the canonical dual window can
easily be given (’painless reconstruction’). More precisely:
Theorem 2.2.4 For every n ∈ Z, let the function gn ∈ L2 (R) be compactly
supported with supp(gn ) ⊆ [cn , dn ] and let bn be chosen such that dn − cn ≤
1
i2πmbn t . m ∈ Z
bn . Then the frame operator S of the system gm,n (t) = gn (t)e
and n ∈ Z, is given by a multiplication operator of the form
!
X 1
2
Sf (s) =
|gn (s)| f (s).
b
n
n
When this condition is fulfilled, the canonical dual frame elements are given
by:
gn (t)
ei2πmbn t ,
g̃m,n (t) = P 1
2
|g
(t)|
k
k bk
and the associated canonical tight frame elements can be calculated by:
(t)
gm,n
(t) = qP
gn (t)
1
2
k bk |gk (t)|
ei2πmbn t .
An analog construction is possible with a sampling of the time-frequency
plane irregular over frequency, but regular over time. In this case, we introduced a family of functions {hm }m∈Z of L2 (R), and for m ∈ Z and n ∈ Z,
we define atoms of the form:
hm,n (t) = hm (t − nam ).
In practice each function hm will be chosen as a well localized pass-band
function having a Fourier transform centered around some frequency bn .
In this case the frame operator is given by:
XX
Sf =
hf, hm,n ihm,n ,
m
n
16
20000
15000
15000
Frequency (Hz)
Frequency (Hz)
20000
10000
10000
5000
0
0.2
5000
0.4
0.6
0.8
1
0
0.2
1.2
0.4
0.6
Time (s)
0.8
1
1.2
Time (s)
20000
Frequency (Hz)
15000
10000
5000
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Figure 2.3: Two (stationary) spectrograms of the same ’glockenspiel’ signal
obtained using two different window lengths. On the left plot, a narrow
window of 6 ms is used, on the right plot, a wide window of 93 ms is used.
At the bottom a spectrogram using a non-stationary Gabor transform is
shown.
and the problem is completely analog to the preceding up to a Fourier transform:
XX
c =
d
Sf
hfb, hd
m,n ihm,n ,
m
n
−i2πnam ν . With this approach filter-banks can be imc
and hd
m,n = hm (ν)e
plemented, in particular an invertible constant Q transform can be defined
and implemented. It can be shown that wavelet frames can interpreted (and
therefore implemented) in this setting4 .
2.3
2.3.1
Theory of Frame Multipliers
State of the Art
As mentioned above, frames need not only be used for analyzing functions,
but can also be used for the description of operators. One particular way
to define operators is the following: Let H1 and H2 be Hilbert spaces. Fix
a m = (mk ) ∈ l∞ (K). Then the operator defined by
4
A journal paper is in preparation.
17
Mm,(φk ),(ψk ) (f ) =
X
mk hf, ψk iφk
(2.8)
k
is called the Bessel multiplier for the Bessel sequences (ψk ) and (φk ), or
frame multiplier, if the two sequences are frames. The sequence m = (mk ) is
called the symbol of the multiplier. In [Schatten, 1960], such operators were
investigated for orthonormal families (φk ) and (ψk ). This kind of operators
was investigated for regular Gabor frames in [Feichtinger and Nowak, 2003].
In [Balazs, 2007] such operators were introduced for general Bessel and frame
sequences. Several basic properties of frame multipliers were investigated
there.
In particular the implications of summability properties of the symbol
for the membership of the corresponding operators in certain operator
classes are specified. In particular, for Bessel sequences, symbols in l∞ ,
c00 , l2 and l1 induce bounded, compact, trace-class and Hilbert-Schmidt
operators, respectively. As a special case the multipliers for Riesz bases
are examined and it is shown that multipliers in this case can be easily
composed and inverted. The inverted multiplier is just the multiplier with
the inverted symbol and the bi-orthogonal sequences (in switched roles),
i.e. M−1
mk ,(φk ),(ψk ) = M 1 ,(ψ̃k ),(φ̃k ) . Finally the continuous dependence of
mk
a Bessel multiplier on the parameters (i.e. the involved sequences and the
symbol in use) is verified, using a special measure of similarity of sequences.
Applications in acoustics traditionally use time-invariant filters. These
systems can be described by the multiplication of the frequency spectrum of the signal by a fixed function, the so-called transfer function
[Oppenheim and Schafer, 1999]. If the multiplication is done on the timefrequency plane, we naturally arrive at Gabor frame multipliers, which therefore are a particular choice to implement time-variant filtering, called ’Gabor
filters’ [Matz and Hlawatsch, 2002] in signal-processing. In computational
auditory scene analysis they are known by the name ’time-frequency masks’
[Wang and Brown, 2006] and are used to extract single sound source out of
a mixture of sounds in a way linked to human auditory perception.
Multipliers are interesting not only from a theoretical point of view,
see e.g. [Balazs, 2008b, Dörfler and Torrésani, 2010], but also for sound
morphing [Depalle et al., 2007], sound classification [Olivero et al., 2009],
psychoacoustical modeling [9], see Section 2.4, or denoising in the timefrequency plane [12], see Section 2.5.
This concept was extended to p-frames in Banach spaces in [6]. In [7]
sufficient and necessary conditions for the unconditional convergence and
invertibility of multipliers were investigated. In [8] an extensive list of examples and counter-examples for the invertibility of multipliers was collected.
To shorten notation in [7] and [8] we use the following abbreviations:
18
R.b. - Riesz basis, fr. - frame, B. - Bessel sequence, SN - semi-normalized,
k · k-SN - norm-semi-normalized, N BB - norm-bounded below, unc. conv.
- unconditionally convergent on H, INV. - invertible on H. Recall that a
Riesz basis is always k·k-SN and a Bessel sequence is always N BA.
2.3.2
Multipliers for p-Bessel sequences in Banach spaces [6]
One way to extend the concept of frames (and related concepts) from Hilbert
spaces to Banach spaces is the following:
A countable family (ψi )i∈I ⊆ X ∗ is a p-frame for the Banach space X
(1 < p < ∞) if constants A, B > 0 exist such that
!1
p
Akf kX ≤
X
|ψi (f )|p
≤ Bkf kX
for all
f ∈ X.
i∈I
It is called a p-Bessel sequence with bound B if the second inequality holds.
In this Banach space setting we can define multipliers in the following
way:
Definition 2.3.1 Let (ψk ) ⊆ X1∗ be a p-Bessel sequence for X1 with bound
B1 , let (φk ) ⊆ X2 be a q-Bessel sequence for X2∗ with bound B2 , let m ∈ l∞ .
The operator Mm,(φk ),(ψk ) : X1 → X2 defined by
Mm,(φk ),(ψk ) (f ) =
X
mk ψk (f )φk .
k
is called an (p, q)-Bessel multiplier.
We obtained the following theorem which is a generalization of one of
the results in [Balazs, 2007]:
Theorem 2.3.1 Let M = Mm,(φk ),(ψk ) be a (p, q)-Bessel multiplier for the
p-Bessel sequence (ψk ) ⊆ X1∗ , the q-Bessel sequence (φk ) ⊆ X2 with bounds
B1 and B2 . Then, the following hold.
1. If m ∈ l∞ , M is a well defined bounded operator with
kMkOp ≤ B2 B1 · kmk∞ .
P
Furthermore, the sum
mk ψk (f )φk converges unconditionally for all
k
f ∈ X1 .
2. M∗m,(φk ),(ψk ) =
P
k
mk ψk ⊗ κ(φk ) = Mm,(ψk ),(κ(φk )) .
3. If m ∈ c0 , M is a compact operator.
19
Also a perturbation result could be shown, as a generalization of the
results for the Hilbert space setting.
On the other hand the concept of p-Schatten class operators, like HilbertSchmidt operators, cannot be easily extended to the Banach frame case. For
the definition of nuclear operators as found in [Pietsch, 1980] it is easy to
show:
Corollary 2.3.2 Let (ψk ) ⊆ X1∗ be a p-Bessel sequence for X1 with bound
B1 , let (φk ) ⊆ X2 be a q-Bessel sequence for X2∗ with bound B2 . Let r > 0
and m ∈ `r . Then Mm,(φk ),(ψk ) is a (r, p, q)-nuclear operator.
2.3.3
Unconditional Convergence and Invertibility of Multipliers [7]
For a frame (φn ) and a positive (resp. negative) semi-normalized sequence
(mn ), the multiplier M(mn ),(φn ),(φn ) is the frame operator S (resp. −S) for
p
the frame ( |mn | φn ) and thus, M(mn ),(φn ),(φn ) is invertible [1]. When (φn )
and (ψn ) are Riesz bases and (mn ) is semi-normalized, then M(mn ),(φn ),(ψn )
en and ψen
,
where
φ
=
M
is invertible and M−1
1
e
e
(mn ),(φn ),(ψn )
( m ),(ψn ),(φn )
n
denote the canonical duals of (φn ) and (ψn ), respectively, see [Balazs, 2007].
If φdn is a dual frame of the frame (φn ), then M(1),(φn ),(φd ) is the identity
n
operator and therefore, invertible. If m ∈ c0 , and both (φn ) and (ψn )
are Bessel sequences, then the multiplier M(mn ),(φn ),(ψn ) is never invertible
on an infinite dimensional Hilbert space, because it is a compact operator
[Balazs, 2007].
In [7] we considered the question of the invertibility of multipliers
M(mn ),(φn ),(ψn ) more closely. The involved sequences did not necessarily
have to be Bessel sequences, and the symbol was not always considered to
be bonded. So different cases for (φn ) and (ψn ) are considered - non-Bessel,
Bessel sequences, overcomplete frames, and Riesz bases. The unconditional
convergence of multipliers was considered, in particular sufficient and/or necessary conditions were determined, which are needed for the results about
invertibility. As an example, let us mention the following results:
Proposition 2.3.3 For any sequences m, Φ, and Ψ, the multiplier Mm,Φ,Ψ
is unconditionally convergent on H if and only if Mm,Ψ,Φ is unconditionally
convergent on H.
For conditional convergence this result is not true any more, a counterexample is given in [7].
Sufficient and/or necessary conditions for the invertibility of
M(mn ),(φn ),(ψn ) were given. So, in particular, if a multiplier for two Bessel
sequences and bounded symbol is invertible, the involved sequences were
already frames:
20
Theorem 2.3.4 Let Mm,Φ,Ψ be invertible on H. If Ψ and Φ are Bessel
sequences for H and m ∈ `∞ , then Ψ and Φ are frames for H; mΦ and mΨ
are also (weighted) frames for H.
If the multipliers are invertible, formulas for M−1
(mn ),(φn ),(ψn ) are determined. For example in the following case:
Proposition 2.3.5 Let Φ = (φk ) be a frame for H. Assume that
A2
P1 : ∃ µ ∈ [0, BΦΦ ) such that
P
| hf, mn ψn − φn i |2 ≤ µkf k2 , ∀ f ∈ H.
Then mΨ is a frame for H, the multipliers Mm,Φ,Ψ and Mm,Ψ,Φ are invertible on H and
BΦ +
1
√
µBΦ
khk ≤ kM−1 hk ≤
M−1 =
∞
X
AΦ −
1
√
µBΦ
khk, ∀h ∈ H,
−1
−1
[SΦ
(SΦ − M)]k SΦ
(2.9)
(2.10)
k=0
where M denotes any one of Mm,Φ,Ψ and Mm,Ψ,Φ .
As a consequence, if m is semi-normalized, then Ψ is also a frame for
H.
Several results in the spirit of the one above were shown. The sharpness
of the bounds of those results as well as the independence of them were
shown with examples and counter-examples. It is planned to use those in
numerical algorithms in the future, improve their efficiency and apply them
to acoustical applications.
Finally for the case, that one of the sequences is a Riesz sequence, we give
a full classification of the possibilities, when multipliers can be invertible.
The results are collected in the following corollary:
Corollary 2.3.6 Let Φ be a Riesz basis for H. Then Mm,Φ,Ψ (resp.
Mm,Ψ,Φ ) is invertible on H if and only if mΨ is a Riesz basis for H.
Further, the following holds.
(i) If Ψ is a Riesz basis for H, then Mm,Φ,Ψ (resp. Mm,Ψ,Φ ) is invertible
on H if and only if m is SN .
(ii) If m is SN , then Mm,Φ,Ψ (resp. Mm,Ψ,Φ ) is invertible on H if and
only if Ψ is a Riesz basis for H.
(iii) If m is not SN , then Mm,Φ,Ψ (resp. Mm,Ψ,Φ ) can be invertible on H
only in the following cases:
• Ψ is non-N BB and Bessel for H, which is not a frame for H, and
m is N BB, but not in `∞ ;
21
• Ψ is non-N BA, N BB, and non-Bessel for H, m is non-N BB and
m ∈ `∞ ;
• Ψ is non-N BA, non-N BB, and non-Bessel for H, m is non-N BB
and m ∈
/ `∞ .
−1
−1
In the cases of invertibility, Mm,Φ,Ψ
= M(1),m
.
e and Mm,Ψ,Φ = M(1),Φ,
g
em
g
Ψ,Φ
Ψ
−1
In the cases (i) and (ii) this corresponds to Mm,Φ,Ψ
= M1/m,Ψ,
e Φ
e and
−1
Mm,Ψ,Φ
= M1/m,Φ,
eΨ
e.
2.3.4
Detailed characterization of conditions for the unconditional convergence and invertibility of multipliers [8]
In [7] the focus was on existence results and formulas for the inversion. In [8] we presented tables and investigated the unconditional convergence and the invertibility of multipliers Mm,Φ,Ψ . There we gave a
complete set of examples varying the type of the sequences Φ = (φn ),
Ψ = (ψn ) (non-Bessel, Bessel non-frames, frames non-Riesz bases, Riesz
bases; norm-semi-normalized, non-norm-semi-normalized with all possible
combinations) and varying the symbol m (semi-normalized, ∈ `∞ but nonsemi-normalized, ∈
/ `∞ ). In this paper we decided to focus on the general
frame level, not including coherent frames [Ali et al., 2000] like Gabor systems [Feichtinger and Strohmer, 1998], wavelet systems [Flandrin, 1999] or
frames of translates [Casazza et al., 2001]. Therefore we had constructed
the examples by manipulating abstract orthonormal sequences or bases. We
listed all possible combinations. We gave a full classification, if multipliers
under those conditions can be (’POSSIBLE’), have to be (’ALWAYS’) or
never can be (’NOT POSSIBLE’) unconditionally convergent and invertible
(resp. non-invertible) on the given Hilbert space. Please note that examples
for multipliers that are identical to the identity give examples for those cases,
where sequences can be dual to each other. We only considered sequences
with non-zero elements, as in this case, for example, the invertible identity
operator and the zero operator can be described as multiplier, if zeros are
put at appropriate places (see [7] for details.)
To shorten notation, for ν = (νn ), Θ = (θn ), Ξ = (ξn ), we will write
∇
Mm,Φ,Ψ = Mν,Ξ,Θ if there exist scalar sequences (cn ), (dn ) so that ξn = cn φn ,
θn = dn ψn and mn = νn cn dn for every n. This means that in the series of
the multipliers the summands are the same element-wise.
As an example for the results in [8] we present one table with the connected examples, where we only consider unconditionally convergent multipliers. For this we need the following results:
Lemma 2.3.7 Let Gk denote the multiplier M( 1 ),(en ),(en ) , k ∈ N. Then
nk
Gk is unconditionally convergent on H and not invertible on H.
22
Proposition 2.3.8 Let Φ be a N BB Bessel for H, which is not a frame
for H. Then, for any Ψ and any m, the multiplier Mm,Φ,Ψ (resp. Mm,Ψ,Φ )
can not be both unconditionally convergent on H and invertible on H.
Example 2.3.1 Let Φ = (e2 , e3 , e4 , e5 , . . .).
(i) Let m = (1, 1, 1, 1, . . .). Then Mm,Φ,Φ is clearly unconditionally convergent on H and not surjective.
(ii) Let m = ( 12 , 31 , 41 , 15 , . . .). Then Mm,Φ,Φ is clearly unconditionally
convergent on H and not surjective.
Example 2.3.2 Let Φ = (e2 , e3 , e4 , e5 , . . .) and Ψ = ( 21 e2 , 13 e3 , 41 e4 , 15 e5 , . . .).
(i) Let m = (1, 1, 1, 1, . . .). Then Mm,Φ,Ψ and Mm,Ψ,Φ are clearly unconditionally convergent on H and not surjective.
(ii) Let m = ( 12 , 31 , 14 , 15 , . . .). Then Mm,Φ,Ψ and Mm,Ψ,Φ are clearly
unconditionally convergent on H and not surjective.
(iii) Let m = (2, 3, 4, 5, . . .). Then Mm,Φ,Ψ and Mm,Ψ,Φ are clearly unconditionally convergent on H and not surjective.
Example 2.3.3 Let Φ = ( n1 en ).
∇
(i) Let m = (1). Then Mm,Φ,Φ = M(
1
),(en ),(en )
n2
= G2 - unconditionally
convergent and non-invertible on H (see Lemma 2.3.7).
∇
(ii) Let m = ( n1 ). Then Mm,Φ,Φ = M(
1
),(en ),(en )
n3
= G3 - unconditionally
convergent and non-invertible on H (see Lemma 2.3.7).
∇
(iii) Let m = (n2 ). Then Mm,Φ,Φ = M(1),(en ),(en ) = I.
∇
(iv) Let m = (n). Then Mm,Φ,Φ = M( 1 ),(en ),(en ) = G1 - unconditionally
n
convergent and non-invertible on H (see Lemma 2.3.7).
23
24
ψ - B.
not fr.
k·k-SN
non-N BB
non-N BB
φ - B.
not fr.
k·k-SN
k·k-SN
non-N BB
NOT POSSIBLE
see Prop. 2.3.4
NOT POSSIBLE
see Prop. 2.3.4
NOT POSSIBLE
see Prop. 2.3.4
ALWAYS
apply Prop. 2.3.4
Example 2.3.3(i)
ALWAYS
apply Prop. 2.3.4
Example 2.3.2(i)
ALWAYS
apply Prop. 2.3.4
Example 2.3.1(i)
NOT POSSIBLE
see Prop. 2.3.4
NOT POSSIBLE
see Prop. 2.3.4
NOT POSSIBLE
see Prop. 2.3.4
ALWAYS
apply Prop. 2.3.4
Example 2.3.3(ii)
ALWAYS
apply Prop. 2.3.4
Example 2.3.2(ii)
ALWAYS
apply Prop. 2.3.4
Example 2.3.1(ii)
Mm,Φ,Ψ , Mm,Ψ,Φ
NON-INV.
Mm,Φ,Ψ , Mm,Ψ,Φ
INV.
Mm,Φ,Ψ , Mm,Ψ,Φ
INV.
Mm,Φ,Ψ , Mm,Ψ,Φ
NON-INV.
m ∈ `∞ , but non-SN
m - SN
Table 3: two Bessel sequences which are not frames
Mm,Φ,Ψ , Mm,Ψ,Φ
NON-INV.
POSSIBLE
Example 2.3.3(iii)
NOT POSSIBLE
see Prop. 2.3.8
POSSIBLE
Example 2.3.3(iv)
POSSIBLE
Example 2.3.2(iii)
NOT POSSIBLE
NOT POSSIBLE
Mm,Φ,Ψ , Mm,Ψ,Φ - not unc. conv.,
see [7]
Mm,Φ,Ψ , Mm,Ψ,Φ
INV.
m∈
/ `∞
2.4
2.4.1
Applications in Acoustics: Time-Frequency
Sparsity by Perceptual Irrelevance
State of the art
An interesting area of acoustics research is the field of human auditory perception. It is known in psychoacoustics [Zwicker and Fastl, 1990] that not
all time-frequency components of a “real-world” acoustic signal can be perceived by the human auditory system. More precisely, it turns out that some
time-frequency components mask other components, which are close in the
time-frequency domain.
Masking refers to the process where the threshold of audibility for one
sound (the target) is raised by the presence of another sound (the masker).
Masking can render the masked sound inaudible. Masking occurs in two
main signal configurations; simultaneous occurrence of target and masker
is referred to as simultaneous, frequency or spectral masking [Moore, 1989];
non-simultaneous occurrence of target and masker is referred to as temporal
masking, see e.g. [Fastl, 1976].
To investigate spectral masking, the frequency separation between target
and masker is varied. In the most common method of masking patterns,
the masker frequency is fixed and the amount of masking is measured for
various target frequencies. To investigate temporal masking, the frequencies
of masker and target are identical and the temporal separation between
masker and target is varied, resulting in the temporal masking function.
Backward masking (the target temporally precedes the masker) is weaker
than forward masking. The amounts of backward and forward masking
depend on the masker duration. Because of the specific demands in the
simultaneous and non-simultaneous masking experiments reported in the
literature, the experimental stimuli were almost always broad either in the
temporal domain (e.g., long-lasting sinusoids), the frequency domain (e.g.,
clicks), or both.
“Real-world” sounds are broadband signals and therefore involve mutual
masking effects between the individual narrowband components into which
the signal can be separated. This raises the question how the masking effects
of more than one simultaneous masker on a target add up. To a first approximation, the masked thresholds elicited by two individual maskers have
to be added linearly in the power domain to derive the combined masked
threshold [Moore, 1985]. For two equally effective maskers this means, in
the logarithmic scale, that the masked threshold in the presence of both
maskers is 3 dB higher than that for one masker alone. This rule may apply
if side effects are ruled out, such as the detection of combination products,
the detection of the target at a tonotopic place aside from the target frequency (so-called off-frequency listening), or detection during the minima
25
in the temporal envelope of the masker (dip-listening). In many configurations, however, particularly if the maskers do not overlap at the auditory
filter centered at the target, the additivity of masking can be larger than according to the linear addition rule [Humes et al., 1992]. Furthermore, little
is known about the additivity of masking for more than two maskers [9].
Another effect complicating the prediction of masking effects for “realworld” sounds is that the auditory system integrates signal information
across frequencies to detect a signal. As an example, the masked threshold
for two simultaneously presented sinusoids equally contributing to detection
is about 2.5 dB lower than the masked thresholds for each sinusoid alone.
This implies that two (or more) spectral components of a complex signal may
be audible even if each of them separately is below the masked threshold.
In addition, the maximum bandwidth up to which spectral integration is efficient depends on the signal duration [van den Brink and Houtgast, 1990].
Furthermore, mutual suppression effects between individual spectral
components of a sound may reduce the effective masking effect evoked by
those components [Humes and Jesteadt, 1989].
A well-known technique to reduce the digital size of an audio file, the
MP3 audio codec [Brandenburg, 1999], uses a model of human auditory
perception for compression. This and similar perceptual audio codecs,
allocate low bit rates to frequency channels which are subject to masking
effects and thus have little or no perceptual relevance. This technique
is very efficient in reducing the capacities required for transmitting and
storing audio files. Contrary to this coding approach, which results in
additional quantization noise, an irrelevance filter detects those timefrequency components, whose removal causes no audible difference to the
original signal. Deleting such masked and thus perceptually irrelevant
components makes the signal representation more sparse. An irrelevance
filter based on a Gabor frame multiplier approach was presented in [9].
This filter only considers a simple simultaneous masking model and still
many more irrelevant components could be extracted. It will be extended
using new psychoacoustical data using time-frequency masking data for
well-concentrated atoms and models as in [10]. The development of such
an extension of the irrelevance algorithm based on a non-stationary Gabor transform adapted to an auditory filterbank is currently in development.
2.4.2
Time-Frequency Sparsity by Removing Perceptually
Irrelevant Components Using a Simple Model of Simultaneous Masking [9]
A heuristic ’irrelevance algorithm’ developed by Eckel and Deutsch
[Eckel, 1989] existed at the Acoustics Research Institute for years. To pro26
vide the mathematical and signal processing background for this started our
whole research towards Gabor and frame multipliers.
This was the goal of the algorithm presented in [9], referred to as the
“irrelevance filter”. Its goal is to remove those time-frequency components
in a standard Gabor transform, whose removal causes no audible difference
to the original signal after resynthesis. Note the difference to perceptual
audio codecs; they use a low bit depth and thus introduce quantization
noise in frequency bands, where the signal falls below the masked threshold.
In contrast, in the proposed model we want to either keep a component or
remove it if irrelevant. Thus, we attempt to introduce “silence” in bands,
where the signal falls below the irrelevance threshold. In other words, the
algorithm searches for a time-frequency representation, which is sparser but
perceptually equivalent to the original representation after resynthesis.
The parameters of the algorithm were chosen to be suitable for most
’every-day’ sounds, i.e. real-world music and speech signals, and no calibration of the audio system should be necessary, so it should work on most
’reasonable’ setting for a standard PC.
The proposed algorithm uses a simple model of simultaneous masking
(also referred to as spectral masking) which is based on data from the
psychoacoustic literature. As mentioned above, a basic model for the simultaneous masking effect, referred to as the excitation pattern model of
masking [Moore and Glasberg, 1983], is that the auditory system can detect a target presented simultaneously with a masker only if the excitation pattern of target plus masker significantly differs from that of the
masker alone. If the two excitation patterns do not differ in a way detectable by the auditory system the target cannot be perceived, it is masked
[van der Heijden and Kohlrausch, 1994]. This basic model allows for the
prediction of the masked threshold of a target signal in the presence of
a masker signal, with certain constraints upon the stimuli. The masked
threshold is defined as the minimum level of the target at which it is audible
in the presence of the masker. A convolution kernel is defined by
F (x) =
l−u
l+u p
·x−
· e + x2
2
2
(2.11)
shifted to the point (0, 0). Here the parameters are the lower frequency slope
l and the upper slope u and the non-negative parameter e that allows to
control the smoothness of the function at point zero. Our method estimates
the excitation pattern by applying the kernel for l = 27, u = 24 and e = 0.3
on single frequency components (in the bark frequency scale). It assumes
linear additivity of mutual effects (in the power domain). In this way, by
transforming a spectrum into the bark scale, this estimation can be found
by using convolution of the spectrum with the kernel.
The masked threshold function was shifted in level by a certain amount
corresponding to the results of a perceptual experiment and all components
27
falling below the shifted function (the irrelevance threshold) are removed.
At the level shift determined the subjects could not discriminate the irrelevance filtered signal from the original signal. Using this approach allows to
cope with the uncontrolled effects associated with the removal of spectral
components, resulting from the over-completeness of the used Gabor frame.
Furthermore, it allows to cope with inaccuracies of the masking model itself. The masking model chosen for the current algorithm is not considering
the nonlinearities and complex interactions involved in auditory masking for
real-world sounds, mentioned above.
The irrelevance filter algorithm is implemented as a time-varying, adaptive filter. The irrelevance threshold function is calculated for each consecutive spectrum of a running signal using the mentioned simple simultaneous masking model. Only the components exceeding the threshold are
included in the re-synthesis stage. This step is equivalent to multiplying
each time-frequency point by 0 or 1. Fig. 2.4 shows the perceptually relevant TF components. This procedure is an example of an adaptive Gabor
filter with a symbol consisting of zeros and ones. The underspread property
[Kozek, 1998] is important, since the induced time-frequency shift should
be as ‘local’ as possible. The approximation process, in which only single time-frequency points are removed from the signal, was performed as
accurately as possible. The goal was to obtain an operator with good timefrequency localization, i.e. an underspread operator. To achieve that goal
and following Gabor theory, a high redundancy of 8 was chosen. At the high
redundancy short on/off cycles of single components that are close to the
irrelevance threshold are smoothed out, which is desirable from a psychoacoustical point of view as sharp on/off edges cause audible artifacts. For
the high redundancy and the chosen Hamming window the resulting frame
is ‘snug’, i.e. nearly tight, which allowed to use the original window also
as a synthesis window, with nearly no numerical error. Together with the
chosen simple masking model, which could be implemented as convolution,
this resulted in an efficient algorithm.
2.4.3
Additivity of nonsimultaneous masking for short
Gaussian-shaped sinusoids [10]
For an extension of the irrelevance algorithm using a multiplier based on a
Gabor or wavelet transform, data about the time-frequency masking spread
for well-concentrated atoms, like Gaussian-windowed sinusoids as well as the
additivity of these masking effects are essential.
Auditory masking has been extensively studied for non-simultaneous
(temporal masking) and simultaneous (spectral masking) presentation of
masker and target. Because of the specific demands in the nonsimultaneous and simultaneous masking experiments, the experimental stimuli were
almost always broad either in the temporal domain, the frequency domain,
28
Spectogram of bach
frequency sample k
110
90
70
50
30
10
20
40
60
80
100
120
140
120
140
120
140
Symbol for Irrelevance method
frequency sample k
110
90
70
50
30
10
20
40
60
80
100
Amplitude of relevant components
frequency sample k
110
90
70
50
30
10
20
40
60
80
100
time sample l
Figure 2.4: TOP: The spectrogram of test signal ’bach’ (classical music by J.
S. Bach), high amplitudes are displayed darkly, low ones brightly ; MIDDLE:
The symbol of the Gabor filter for the irrelevance filter, black = 1, white
= 0. BOTTOM: The result of the point-wise multiplication of these two
sets of coefficients, representing the amplitude of relevant components. To
get back to the signal domain, re-synthesis is applied.
or both. Quite little is known about nonsimultaneous and simultaneous
masking effects for masker and target signals that are well-concentrated in
both the time and frequency domains. Such well-concentrated stimuli can
be more flexibly arranged in time-frequency space compared to temporally
or spectrally broad stimuli. Thus, they are well-suited for studying masking effects with various time-frequency relations between masker and target
stimuli. Compared to maskers that are broad in at least one domain, wellconcentrated maskers may produce different masking effects.
In [10] we were concerned with the additivity of masking for multiple
well-concentrated maskers that are separated in time. For that extensive
psychoacoustical tests were performed. This, among other psychoacoustically measured data, will be used in a future extension of the irrelevance
algorithm. In particular the adaptation of the Gabor multiplier symbol
should consider the additivity of the masking effect of several Gabor atoms.
29
The averaged data for the additivity of four translated Gaussianwindowed sinusoids with the same frequency can be seen in Figure 2.5.
Figure 2.5: The open symbols show the averaged of the experimental results
for five listeners. Different masker combinations are indicated with symbols
shown in the legend. Error bars indicate 95% confidence intervals. The dotted lines indicate the predictions from linear masking additivity. The other
two lines in each panel show the predictions of the model of masking additivity proposed in [C. J. Plack and Drga, 2006], using I/O functions best
fitting their mean data obtained with long maskers (dashed) and I/O functions best fitting the data of the present study (solid).
As mentioned in a submitted paper5 it can be shown that the current
models for including time-frequency masking effect are based on a nonappropriate model for the combination of temporal and spectral masking.
In current work we develop an irrelevance model based on the new psycoacoustical data and a non-stationary Gabor transform [5]. The parameters
will again be tested in psychoacoustical tests. When an effective irrelevance
model is found, this knowledge can be used to improve perceptual coders.
5
Necciari, T.; Savel, S.; Laback, B.; Meunier, S.; Balazs, P.; Kronland-Martinet, R.
and Ystad, S. Time-frequency spread of auditory masking for spectrally and temporally
maximally-compact stimuli
30
2.5
2.5.1
Applications in Acoustics: Acoustic System
Estimation
State of the art
In many acoustical applications the parameters of a system, like the coefficients of an LTI filter, have to be estimated from a recorded signal corrupted
by noise. In the special case of the estimation of head related transfer functions (HRTF), the use of exponential sweeps (ESs) as input signals has a lot
of advantages.
Figure 2.6: (LEFT) The HRTF measurement assembly at ARI. (RIGHT) In-ear
microphone.
HRTFs describe the sound transmission from the free field to a place in
the ear canal in terms of LTI systems and are crucial for sound localization
in virtual environments. Measurements of HRTFs (see Figure 2.6) can be
considered a system identification of the weakly non-linear electro-acoustic
chain sound-source - room - HRTF - microphone, and can be done with
ESs. Those input signals show many promising properties, among them
the separation of linear and nonlinear parts of weakly nonlinear systems
[Müller and Massarani, 2001].
The optimization by the ‘multiple exponential sweep method’ (MESM),
was developed in [11] for the measurement of HRTFs with a substantially
reduced amount of time. In [12] a method for denoising the measured responses using a Gabor frame multiplier was developed. In current development we analyze the statistics of the amplitude of colored noise in an
overcomplete non-stationary Gabor analysis. As for the analysis of an ES a
nonstationary Gabor transform based on a CQT seem optimal, we will base
31
the future denoising algorithm on such an analysis method.
2.5.2
Multiple Exponential Sweep Method for Fast Measurement of Head Related Transfer Functions [11]
A head-related transfer function (HRTF) describes the sound transmission
from the free field to a place in the ear canal in terms of a linear timeinvariant system [Møller et al., 1995]. HRTFs contain spectral and temporal
cues which vary according to the sound direction. A set of HRTFs measured
for different positions can be used to create virtual free-field stimuli. Measuring of individual HRTF sets for each subject is necessary for most studies
on localization in virtual environments. Results of several studies imply that
to get an accurate spatial resolution the number of HRTFs in a set should
exceed 1000 positions for the upper hemisphere.
For the measurement of HRTFs the signal is presented from a given
position via a loudspeaker, using a digital-analog converter (DAC) and a
power amplifier. Acoustic waves propagate in the measurement room and
are altered by the torso, head, and pinna of a subject. Microphones are
placed in the ear canals of a subject and capture the arriving sound.
Many issues have to be considered when choosing a system identification
method for acoustic systems. When the measurements are performed in
noisy rooms, the background noise reduces the signal-to-noise ratio (SNR)
of the measurement. Also, the equipment, especially the power amplifier,
adds noise to the excitation signal.
In addition to the SNR two further issues should be considered: nonlinear distortions and time variability. Presenting signals via loudspeakers
yields in nonlinear distortions, mostly due to the saturation effects of the
loudspeaker membrane and the nonlinearity of the gain characteristics of the
power amplifier. Furthermore, due to the fact that the subject’s head position may move during the identification process, the HRTF measurement
must be considered as an identification of weakly time-variant systems.
Several system identification methods were taken into consideration.
The system identification with exponential sweeps (ES) showed some
promising properties [Müller and Massarani, 2001] like separation of linear
and nonlinear parts of weakly nonlinear systems, a high SNR (and therefore
robustness to noise) and fast processing using the FFT. Thus, system
identification with ES is a very good method for HRTF measurements.
The proposed method, MESM, uses an appropriate overlapping and interleaving of the excitation signals, see Figure 2.7.
Interleaving: Consider the effect of applying a sweep to a weakly nonlinear system and, after a small delay, applying the same sweep to a second
system. Recording the summed response signal and applying the deconvolution process will result in a signal where the harmonic impulse responses
32
(HIR), the result of the weakly nonlinear system, of the two systems are
interleaved in time. The interleaving mechanism results from delaying the
excitation of the second system in such a way that its IR is placed between
the IR and the second-order HIR of the first system. Therefore the measurement of the linear IRs is not disturebed. This can be best analyzed and
designed in the time-frequency plane.
Overlapping: In the most simple and straight forward method for the
system identification of multiple systems, it is logical to play a single sweep,
wait for its end, wait the length of the reverberation time, and then play
the next sweep. However, in systems with a small number of harmonics,
which is the case for weakly non-linear systems, it is not necessary to wait
for the end of the previous sweep. As long as the highest harmonic of the
next sweep response does not interfere with the reverberation caused by the
previous sweep the sweeps may overlap in time.
A combination of these two approaches form MESM. Note that analyzing
the time-frequency representation of the method gives a good estimation of
what delays can be used. See Figure 2.7. An optimization of the chosen
parameters was reached by an analytical solution of the involved parameter
equations.
Figure 2.7: Spectrogram of the recorded signal as an example of a system identification using MESM (two overlapping groups of two interleaving
sweeps)
Note that MESM is not restricted to HRTF measurements, it can be
used for the estimation of any system. For the estimation of any slowly
33
time-varying, weakly non-linear systems it will be very advantageous.
For the special case of HRTF measurements, it could be shown that the
measurement duration could be reduced by a factor of four, which is in
particular of interest, as human subjects are involved.
This method is connected to many projects at the Acoustics Research Institute, e.g.
for [Goupell et al., 2010, Kreuzer et al., 2009,
Majdak et al., 2011, Marelli et al., 2008], but it is also used by other scientists, see e.g. [Rébillat et al., 2011, Enzner, 2009, Farina, 2009].
2.5.3
A Time-Frequency Method for Increasing the SignalTo-Noise Ratio in System Identification with Exponential Sweeps [12]
Exponential sweeps (ESs), as mentioned above, are used in the field of audio engineering to measure impulse responses (IRs) of acoustic and electroacoustic systems. Such measurements are usually contaminated by the environmental noise. Even though environmental noise is often modeled as an
independent and identically distributed (i.i.d) process, most environmental
noise sources have a non-flat spectral characteristics (colored noise). In this
study, we propose a method that improves the SNR when systems with
frequency-dependent response decay are measured under colored-noise conditions.
In acoustics, most denoising methods have been developed for speech or
music. Many of these methods rely on the Wiener solution [Wiener, 1949]
which represents a mean-square-error (MSE) optimum for stationary signals
assuming a contamination with an i.i.d process. For colored noise, spectral
subtraction is used where the spectral noise signature is subtracted from
the recorded signal. Those methods modify the signal in each time window
independently which leads to artifacts like speech distortions or musical
noise [Vary, 1985]. A sweep-based method for improving the SNR has been
proposed in [Xia, 1997]. Even though this method shows promising results,
it is limited to very short IRs, relies on the frequency-independent variance
of the noise, and does not incorporate the properties of system-identification
with ESs.
In contrast to [Xia, 1997], in our method, we use the a priori knowledge
about the TF characteristic of ESs and the fact that the system response
is decaying with time. Further, by using frame theory, we approach perfect
reconstruction of clean signals and avoid artifacts like musical noise. The
method does not rely on any assumption of the noise but stationarity. It is
able to handle any arbitrary broadband delay in the recorded signal. In our
method, we represent the recorded response to the ES in the TF plane and
classify parts of that plane as either environmental noise or deterministic
34
signal with the goal to obtain a connected region defined as the signal region. In contrast to most speech-denoising methods, our method uses hard
thresholding: the parts considered as signal are not modified and the parts
considered as noise are removed. The classification in either signal or noise
is done for each frequency band independently and thus does not rely on
any assumption of the spectral characteristic of the noise. By applying a
Gabor frame multiplier corresponding to the signal region, we obtained a denoised version of the recorded signal which is used to estimate the IR of the
measured system. Ideally, this method provides both accurate identification
of the measured system and noise reduction. The method is evaluated by
comparing the SNR in the noisy and denoised IRs.
In particular we choose a Gaussian Gabor window and its canonical
dual for reconstruction. To find the multiplier symbol, we have to define
the signal region, whose coefficients are kept, and the other coefficients are
erased, i.e. multiplied by zero. The start of the signal region is set to the
analytical time-frequency known position of the input sweep, called ’sparse
TF representation of the sweep’ in Figure 2.8 (all involved filters are assumed
to be causal). The end is found by analyzing each frequency band. An
investigation on a part, where it is known that only noise exists, lead to
an estimation of the statistical properties of the noise. When the smoothed
frequency answer is below a certain threshold derived from the mean value
and standard deviation of the noise, this is considered as the end of the
signal region. The resulting symbol, called the sparse mask in Figure 2.8, is
broadened by convolving it with the absolute value of the Gram matrix to
take the reproducing property into account and guarantee not to lose any
significant coefficients for the signal.
The proposed method improves the SNR in the impulse response measured with exponential sweeps, as seen in Figure 2.8 and other simulation
results. In the low-SNR conditions, the SNR improves compared to direct
measurement and/or block-thresholding. In the high-SNR conditions, the
method does not fail, i.e., it does not introduce artifacts. Assuming stationary noise, decaying system response, and an exponential sweep as the
excitation signal, our method shows promising results in denoising measurements of electro-acoustic systems.
However, our method seems still to be far from the optimal solution. For
example, the noise estimator does not use any statistical mode. Exploiting
an appropriate statistical model for the amplitude of the overcomplete TF
representation of the noise can improve the estimation. Also, the separation
acuity is low in the low-frequency region and may be improved using the
non-stationary Gabor transform [16] in terms of a constant-Q transform.
35
Figure 2.8: TF representations of the recorded signal (a) and the exponential
sweep (b). Sparse TF representation of the sweep (c). Sparse (d) and
broadened mask (e) containing the signal region. TF representation of the
output of the Gabor multiplier (f). Spectral representation of the differences
in the noisy (g, black) and the denoised (g, colored) IRs relative to the clean
IR. Note the smaller errors in the denoised condition especially for the higher
frequencies.
36
Bibliography
[Aceska, 2009] Aceska, R. (2009). Functions of Variable Bandwidth: a TimeFrequency Approach. PhD thesis.
[Ahmad and Iqbal, 2009] Ahmad, M. K. and Iqbal, J. (2009). Vector-valued WeylHeisenberg wavelet frame. International Journal of Wavelets, Multiresolution
and Information Processing (IJWMIP), 7(5):605–615.
[Ali et al., 2000] Ali, S. T., Antoine, J.-P., and Gazeau, J.-P. (2000). Coherent
States, Wavelets and Their Generalization. Graduate Texts in Contemporary
Physics. Springer New York.
[Ambroziski and Rudol, 2009] Ambroziski, Z. and Rudol, K. (2009). Matrices defined by frames. Opuscula Math., 29(4):365–375.
[Antoine and Vandergheynst, 2007] Antoine, J.-P. and Vandergheynst, P. (2007).
Wavelets on the two-sphere and other conic sections. Journal of Fourier Analysis
and Applications, 13(4):369–386.
[Arias and Pacheco, 2008] Arias, M. L. and Pacheco, M. (2008). Bessel fusion multipliers. Journal of Mathematical Analysis and Applications, 348(2):581 – 588.
[Auger and Flandrin, 1995] Auger, F. and Flandrin, P. (1995). Improving the readability of time frequency and time-scale representations by the method of reassigment. IEEE T. Signal. Proces., 43(5):1068–1089.
[Balazs, 2005] Balazs, P. (2005). Regular and Irregular Gabor Multipliers with Application to Psychoacoustic Masking. Phd thesis, University of Vienna.
[Balazs, 2007] Balazs, P. (2007). Basic definition and properties of Bessel multipliers. Journal of Mathematical Analysis and Applications, 325(1):571–585.
[Balazs, 2008a] Balazs, P. (2008a). Frames and finite dimensionality: Frame transformation, classification and algorithms. Applied Mathematical Sciences, 2(41–
44):2131–2144.
[Balazs, 2008b] Balazs, P. (2008b). Hilbert-Schmidt operators and frames - classification, best approximation by multipliers and algorithms. International Journal
of Wavelets, Multiresolution and Information Processing, 6(2):315 – 330.
[Balazs, 2008c] Balazs, P. (2008c). Matrix-representation of operators using frames.
Sampling Theory in Signal and Image Processing (STSIP), 7(1):39–54.
[Balazs and El-Gebeily, 2008] Balazs, P. and El-Gebeily, M. (2008). A systematic
study of frame sequence operators and their pseudoinverses. International Mathematical Forum, 3(5):229 – 239.
37
[Balazs et al., 2006] Balazs, P., Feichtinger, H. G., Hampejs, M., and Kracher, G.
(2006). Double preconditioning for Gabor frames. IEEE T. Signal. Proces.,
54(12):4597–4610.
[Balazs et al., 2007] Balazs, P., Kreuzer, W., and Waubke, H. (2007). A stochastic
2D-model for calculating vibrations in liquids and soils. Journal of Computational
Acoustics, 15(3):271–283.
[Balazs and Noll, 2003] Balazs, P. and Noll, A. (2003). Masking filter, phase
vocoder and STx - an introduction. In SAMPTA03 (extended abstracts).
[Balazs et al., 2003] Balazs, P., Waubke, H., and Deutsch, W. A. (2003). Phasenanalyse mit akustischen Anwendungsbeispielen. In Proceedings DAGA 2003 Fortschritte der Akustik.
[Bogdanova et al., 2005] Bogdanova, I., Vandergheynst, P., Antoine, J.-P.,
Jacques, L., and M.Morvidone (2005). Stereographic wavelet frames on the
sphere. Applied and Computational Harmonic Analysis, 19:223–252.
[Bölcskei et al., 1998] Bölcskei, H., Hlawatsch, F., and Feichtinger, H. G. (1998).
Frame-theoretic analysis of oversampled filter banks. IEEE Trans. Signal Processing, 46(12):3256–3268.
[Brandenburg, 1999] Brandenburg, K. (1999). MP3 and AAC explained. In Audio
Engineering Society (AES) 17th International Conference on High QualityAudio
Coding, Florence, Italy.
[Brown and Puckette, 1992] Brown, J. C. and Puckette, M. S. (1992). An efficient
algorithm for the calculation of a constant Q transform. J. Acoust. Soc. Am.,
92(5):2698–2701.
[C. J. Plack and Drga, 2006] C. J. Plack, A. J. O. and Drga, V. (2006). Masking
by inaudible sounds and the linearity of temporal summation. J. Neurosci.,
26:8767–8773.
[Carmona et al., 1998] Carmona, R., Hwang, W.-L., and Torrésani, B. (1998).
Practical Time-Frequency Analysis. Academic Press San Diego.
[Casazza et al., 2001] Casazza, P., Christensen, O., and Kalton, N. J. (2001).
Frames of translates. Collectanea Mathematica, 1:p. 35–54.
[Casazza et al., 2002] Casazza, P., Christensen, O., Li, S., and Lindner, A. (2002).
On Riesz-Fischer sequences and lower frame bounds. Z. Anal. Anwend.,
21(2):305–314.
[Casazza, 2000] Casazza, P. G. (2000). The art of frame theory. Taiwanese J.
Math., 4(2):129–202.
[Casazza and Kutyniok, 2004] Casazza, P. G. and Kutyniok, G. (2004). Frames of
subspaces. In Wavelets, Frames and Operator Theory (College Park, MD, 2003),
87-113, Contemp.Math. 345, Amer. Math. Soc., Providence, RI,.
[Chai et al., 2008] Chai, L., Zhang, J., Zhang, C., and Mosca, E. (2008). Improving
frame-bound-ratio for frames generated by oversampled filter banks. In Acoustics,
Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 3525 –3528.
38
[Chai et al., 2010] Chai, L., Zhang, J., Zhang, C., and Mosca, E. (2010). Bound
ratio minimization of filter bank frames. Signal Processing, IEEE Transactions
on, 58(1):209 –220.
[Chen et al., 2009a] Chen, H., Li, L., and Tang, Y. (2009a). Analysis of classification with a reject option. International Journal of Wavelets, Multiresolution
and Information Processing (IJWMIP), 7:375–385.
[Chen et al., 2009b] Chen, Z., Yao, T., and Li, L. (2009b). Fast clustering-based
kernel Foley-Sammon transform. International Journal of Wavelets, Multiresolution and Information Processing (IJWMIP), 7(1):75–87.
[Cheng et al., 2009] Cheng, X., Hart, J., and Walker, J. (2009). Time-frequency
analysis of musical rhythm. Notices Amer. Math. Soc., 56(3):356–372.
[Christensen, 1995] Christensen, O. (1995). Frames and pseudo-inverses. J. Math.
Anal. Appl, 195(2):401–414.
[Christensen, 2003] Christensen, O. (2003). An Introduction To Frames And Riesz
Bases. Birkhäuser.
[Cotfas and Gazeau, 2010] Cotfas, N. and Gazeau, J. P. (2010). Finite tight frames
and some applications. Journal of Physics A: Mathematical and Theoretical,
43(19):193001.
[Dahlke et al., 2005] Dahlke, S., Fornasier, M., and Raasch, T. (2005). Adaptive
Frame Methods for Elliptic Operator Equations. Adv. Comput. Math.
[Dahlke and Teschke, 2008] Dahlke, S. and Teschke, G. (2008). Coorbit theory,
multi-alpha-modulation frames and the concept of joint sparsity for medical
multi-channel data analysis. EURASIP Journal on Advances in Signal Processing, 2008:Article ID 471601, 19 pages.
[Daubechies, 1992] Daubechies, I. (1992). Ten Lectures On Wavelets. CBMS-NSF
Regional Conference Series in Applied Mathematics. SIAM Philadelphia.
[Daudet, 2010] Daudet, L. (2010). Audio sparse decompositions in parallel : let
the greed be shared ! IEEE Signal Processing Magazine, 27(2):90–96.
[Depalle et al., 2007] Depalle, P., Kronland-Martinet, R., and Torrésani, B. (2007).
Time-frequency multipliers for sound synthesis. In Proceedings of the Wavelet
XII conference, SPIE annual Symposium, San Diego.
[Dolson, 1986] Dolson, M. (1986). The phase vocoder: a tutorial. Computer Musical Journal, 10(4):11–27.
[Dörfler, 2010] Dörfler, M. (2010). Quilted frames - a new concept for adaptive
representation. Advances in Applied Mathematics, to appear.
[Dörfler and Torrésani, 2010] Dörfler, M. and Torrésani, B. (2010). Representation
of operators in the time-frequency domain and generalized Gabor multipliers. J.
Fourier Anal. Appl., 16(2):261–293.
[Eckel, 1989] Eckel, G. (1989). Ein Modell der Mehrfachverdeckung für die Analyse
musikalischerSchallsignale. PhD thesis, University of Vienna.
39
[Enzinger et al., 2011] Enzinger, E., Balazs, P., Marelli, D., and Becker, T. (2011).
A logarithmic based pole-zero vocal tract model estimation for speaker verification. In Proceedings of the International Conference on Acoustics, Speech and
Signal Processing 2011, Prague.
[Enzner, 2009] Enzner, G. (2009). 3d-continuous-azimuth acquisition of headrelated impulse responses using multi-channel adaptive filtering. In Applications
of Signal Processing to Audio and Acoustics, 2009. WASPAA ’09. IEEE Workshop on, pages 325 –328.
[Farina, 2009] Farina, A. (2009). Silence sweep: A novel method for measuring
electroacoustical devices. In Audio Engineering Society Convention 126.
[Fastl, 1976] Fastl, H. (1976). Temporal masking effects: I. broad band noise
masker. Acustica, 35:287–302.
[Feichtinger and Nowak, 2003] Feichtinger, H. G. and Nowak, K. (2003). A first
survey of Gabor multipliers, chapter 5, pages 99–128. Birkhäuser Boston.
[Feichtinger and Strohmer, 1998] Feichtinger, H. G. and Strohmer, T. (1998). Gabor Analysis and Algorithms - Theory and Applications. Birkhäuser Boston.
[Flandrin, 1999] Flandrin, P. (1999). Time-Frequency/Time-Scale Analysis. Academic Press, San Diego.
[Fornasier and Rauhut, 2005] Fornasier, M. and Rauhut, H. (2005). Continuous
frames, function spaces, and the discretization problem. J. Fourier Anal. Appl.,
11(3):245–287.
[Gabardo, 2009] Gabardo, J. P. (2009). Weighted irregular Gabor tight frames
and dual systems using windows in the Schwartz class. Journal of Functional
Analysis, 256(3):635 – 672.
[Gabor, 1946] Gabor, D. (1946). Theory of communications. J. IEE, III(93):429–
457.
[Gardner and Magnasco, 2006] Gardner, T. and Magnasco, M. (2006). Sparse
time-frequency representations. Proc. Natl. Acad. Sci. USA, 103(16):6094–6099.
[Gaul et al., 2003] Gaul, L., Kögler, M., and Wagner, M. (2003). Boundary Element Methods for Engineers and Scientists. Springer.
[Gazeau, 2009] Gazeau, J.-P. (2009). Coherent states in quantum physics. Wiley,
Weinheim.
[Gohberg et al., 2003] Gohberg, I., Goldberg, S., and Kaashoek, M. (2003). Basic
Classes of Linear Operators. Birkhäuser.
[Goupell et al., 2010] Goupell, M. J., Majdak, P., and Laback, B. (2010). Medianplane sound localization as a function of the number of spectral channels using
a channel vocoder. J. Acoust. Soc. Am., 127(2):990–1001.
[Gribonval and Nielsen, 2003] Gribonval, R. and Nielsen, M. (2003). Sparse representations in unions of bases. IEEE Trans. Inform. Theory, 49:3320–3325.
[Gröchenig, 2001] Gröchenig, K. (2001). Foundations of Time-Frequency Analysis.
Birkhäuser Boston.
40
[Hackbusch, 1999] Hackbusch, W. (1999). A sparse matrix arithmetic based on
H-matrices. part i: Introduction to H-matrices. Computing, 62(2):89–108.
[Hackbusch, 2003] Hackbusch, W. (2003). Elliptic Differential Equations. Theory
and Numerical Treatment. Springer-Verlag, Berlin.
[Hähnel, 2010] Hähnel, H. (2010). Stochastische Charakteristiken von Lösungen
parabolischer Randanfangswertprobleme mit zufälligen Koeffizienten. PhD thesis,
Technische Universität Chemnitz.
[Hampejs and Kracher, 2007] Hampejs, M. and Kracher, G. (2007). The inversion
of Gabor type matrices. Signal Process., 87(7):1670–1676.
[Hartmann, 1998] Hartmann, W. M. (1998).
Springer.
Signals, Sounds, and Sensation.
[Heil and Kutyniok, 2003] Heil, C. and Kutyniok, G. (2003). Density of weighted
wavelet frames. J. Geom. Anal., 13(3):479–493.
[Humes and Jesteadt, 1989] Humes, L. and Jesteadt, W. (1989). Models of the
additivity of masking. J. Acoust. Soc. Am., 85(3):1288–1294.
[Humes et al., 1992] Humes, L., Lee, L., and Jesteadt, W. (1992). Two experiments on the spectral boundary conditions for nonlinear additivityof simultaneous masking. J. Acoust. Soc. Am., 92(5):2598–2606.
[Jacques, 2004] Jacques, L. (2004). Ondelettes, repères et couronne solaire. PhD
thesis, Univ. Cath. Louvain, Louvain-la-Neuve.
[Jaillet, 2005] Jaillet, F. (2005). Représentation et traitement temps-fréquence des
signaux audio numériques pour des applications de design sonore. PhD thesis,
Université de la Méditerranée - Aix-Marseille II.
[Jaillet et al., 2009a] Jaillet, F., Balazs, P., and Dörfler, M. (2009a). Nonstationary
Gabor frames. In SAMPTA’09, International Conference on SAMPling Theory
and Applications, pages 227–230.
[Jaillet et al., 2009b] Jaillet, F., Balazs, P., Dörfler, M., and Engelputzeder, N.
(2009b). On the structure of the phase around the zeros of the short-time fourier
transform. In Boone, M. M., editor, NAG/DAGA 2009, pages 1584–1587, Rotterdam.
[Janssen and Søndergaard, 2007] Janssen, A. and Søndergaard, P. (2007). Iterative
algorithms to approximate canonical Gabor windows: Computationalaspects.
The Journal of Fourier Analysis and Applications, 13(2):211–241.
[Kozek, 1998] Kozek, W. (1998).
to underspread environments,
[Feichtinger and Strohmer, 1998].
Adaption of Weyl-Heisenberg frames
chapter 10, pages 323–352.
In
[Kreuzer et al., 2009] Kreuzer, W., Majdak, P., and Chen, Z. (2009). Fast multipole boundary element method to calculate head-related transfer functions for a
wide frequency range. J. Acoust .Soc. Am., 126(3):1280–1290.
[Kreuzer et al., 2011] Kreuzer, W., Waubke, H., Rieckh, G., and Balazs, P. (2011).
A 3D model to simulate vibrations in a layered medium with stochastic material
parameters. J. Comput. Acoust., to appear:–.
41
[Li, 2009] Li, L. (2009). Regularized least square regression with spherical polynomial kernels. International Journal of Wavelets, Multiresolution and Information
Processing (IJWMIP), 7(6):781–801.
[M. L. Arias and Pacheco, 2007] M. L. Arias, G. C. and Pacheco, M. (2007). Characterization of bessel sequences. Extracta mathematicae, 22(1):55–66.
[Majdak et al., 2010] Majdak, P., Goupell, M., and Laback, B. (2010). 3-d localization of virtual sound sources: Effects of visual environment, pointing method,
and training. Attention, Perception & Psychophysics, 72(2):454–469.
[Majdak et al., 2011] Majdak, P., Goupell, M. J., and Laback, B. (2011). Twodimensional localization of virtual sound sources in cochlear-implant listeners.
Ear & Hearing.
[Marelli and Balazs, 2010] Marelli, D. and Balazs, P. (2010). On pole-zero model
estimation methods minimizing a logarithmic criterion for speech analysis. IEEE
Transactions on Audio, Speech and Language Processing, 18(2):237–248.
[Marelli et al., 2008] Marelli, D., Fu, M., Balazs, P., and Majdak, P. (2008). An
iterative method for approximating LTI systems using subbands. In 2008 IEEE
International Conference on Acoustics, Speech, and Signal Processing.
[Matz and Hlawatsch, 2002] Matz, G. and Hlawatsch, F. (2002). Linear TimeFrequency Filters: On-line Algorithms and Applications, chapter 6 in ’Application in Time-Frequency Signal Processing’, pages 205–271. eds. A. PapandreouSuppappola, Boca Raton (FL): CRC Press.
[Mi et al., 2009a] Mi, T., Hou, C., Ma, X., and Cai, L. (2009a). A recursive algorithm approximating frame coefficients related to riesz bases of translates. In
Information, Communications and Signal Processing, 2009. ICICS 2009. 7th International Conference on, pages 1 –4.
[Mi et al., 2009b] Mi, T., Hou, C., Ma, X., and Cai, L. (2009b). Transversal filter
synthesis based on frames of translates. In Image and Signal Processing, 2009.
CISP ’09. 2nd International Congress on, pages 1 –4.
[Møller et al., 1995] Møller, H., Sørensen, M. F., Hammershøi, D., and Jensen,
C. B. (1995). Head-related transfer functions of human subjects. J. Audio Eng.
Soc., 43:300–321.
[Moore, 1985] Moore, B. (1985). Additivity of simultaneous masking, revisited. J.
Acoust. Soc. Am., 78(2):488–494.
[Moore, 1989] Moore, B. (1989). An Introduction to the Psychology of Hearing.
Academic Press Limited London.
[Moore and Glasberg, 1983] Moore, B. and Glasberg, B. R. (1983). Suggested
formulae for calculating auditory-filter bandwidths and excitationpatterns. J.
Acoust. Soc. Am., 74(3):750–753.
[Moreno-Picot et al., 2010] Moreno-Picot, S., Arevalillo-Herraez, M., and DiazVillanueva, W. (2010). A linear cost algorithm to compute the discrete gabor
transform. Signal Processing, IEEE Transactions on, 58(5):2667 –2674.
[Müller and Massarani, 2001] Müller, S. and Massarani, P. (2001).
function measurement with sweeps. J. Audio Eng. Soc., 49:443–471.
42
Transfer-
[Noll et al., 2007] Noll, A., White, J., Balazs, P., and Deutsch, W. A. (2007). STX
- Intelligent Sound Processing, Programmer’s Reference. Acoustics Research Institute, Austrian Academy of Science.
[Olivero et al., 2009] Olivero, A., Daudet, L., Kronland-Martinet, R., and
Torrésani, B. (2009). Analyse et catgorisation de sons par multiplicateurs tempsfrquence. In Actes de la confrence GRETSI’09, Septembre 2009, Dijon, France.
[Oppenheim and Schafer, 1999] Oppenheim, A. V. and Schafer, R. (1999).
Discrete-Time Signal Processing. Oldenbourg, 3 edition.
[Peng and Waldron, 2002] Peng, I. and Waldron, S. (2002). Signed frames and
Hadamard products of Gram matrices. Linear Algebra Appl., 347:131–157.
[Pietsch, 1980] Pietsch, A. (1980). Operator Ideals. North-Holland Publishing
Company.
[Plumbley et al., 2010] Plumbley, M., Blumensath, T., Daudet, L., Gribonval, R.,
and Davies, M. (2010). Sparse representations in audio and music: From coding
to source separation. Proceedings of the IEEE, 98(6):995 –1005.
[Pulkki et al., 2010] Pulkki, V., Laitinen, M.-V., and Sivonen, V. (2010). Hrtf
measurements with a continuously moving loudspeaker and swept sines. In Audio
Engineering Society Convention 128.
[Rahimi, 2009] Rahimi, A. (2009). Multipliers of generalized frames. Bulletin of
Iranian mathematical society, 35:97–109.
[Rahimi et al., 2006] Rahimi, A., Najati, A., and Dehghan, Y. (2006). Continuous
frames in hilbert spaces, 12 (2006) 170–182. Math. Funct. Anal. Topol., 12:170–
182.
[Rébillat et al., 2011] Rébillat, M., Hennequin, R., Corteel, É., and Katz, B. F.
(2011). Identification of cascade of hammerstein models for the description of
nonlinearities in vibrating devices. Journal of Sound and Vibration, 330(5):1018
– 1038.
[Rébillat et al., 2010] Rébillat, M., Hennequin, R., Corteel, É., and Katz, B. F. G.
(2010). Prediction of harmonic distortion generated by electro-dynamic loudspeakers using cascade of hammerstein models. In Audio Engineering Society
Convention 128.
[Rieckh et al., 2010a] Rieckh, G., Balazs, P., and Kreuzer, W. (2010a). Frames and
acoustic bem. In Proceedings of the DAGA 2010, Berlin, CD-ROM.
[Rieckh et al., 2010b] Rieckh, G., Balazs, P., Kreuzer, W., and Waubke, H.
(2010b). Frames for acoustic bem. In Proceedings of the INTERNOISE 2010.
Lisbon, CD-ROM.
[Rudol, 2011] Rudol, K. (2011). Matrices related to some fock space operators.
Opuscula Mathematica, 2:289–297.
[Sauter and Schwab, 2004] Sauter, S. and Schwab, C. (2004). Randelementmethoden: Analyse, Numerik und Implementierung schneller Algorithmen. B. G.
Teubner Verlag.
[Schatten, 1960] Schatten, R. (1960). Norm Ideals of Completely Continuous Operators. Springer Berlin.
43
[Soendergaard et al., 2010] Soendergaard, P., Torrésani, B., and Balazs, P. (2010).
The linear time frequency toolbox. submitted, -:–. preprint.
[Søndergaard, 2007] Søndergaard, P. L. (2007). Finite Discrete Gabor Analysis.
PhD thesis, Uinversity of Denmark.
[Špiřı́k et al., 2010] Špiřı́k, J., Rajmic, P., and Veselý, V. (2010). Reprezentace
signálů: od bázı́ k framům (in czech). Elektrorevue, 2010/111(2):1–10.
[van den Brink and Houtgast, 1990] van den Brink, W. A. C. and Houtgast, T.
(1990). Spectro-temporal integration in signal detection. Journal of the Acoustical Society of America, 88:1703–1711.
[van der Heijden and Kohlrausch, 1994] van der Heijden, M. and Kohlrausch, A.
(1994). Using an excitation-pattern model to predict auditory masking. Hear
Res., (1):38–52.
[Vary, 1985] Vary, P. (1985). Noise suppression by spectral magnitude estimation
- mechanism and theoretical limits. Sig. Proc., 8:387–400.
[Walker, 1991] Walker, J. S. (1991). Fast Fourier Transforms. CRC Press.
[Wang and Brown, 2006] Wang, D. and Brown, G. J. (2006). Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE
Press.
[Weinzierl et al., 2009] Weinzierl, S., Giese, A., and Lindau, A. (2009). Generalized
multiple sweep measurement. In Audio Engineering Society Convention 126.
[Werther et al., 2005] Werther, T., Y.C.Eldar, and N.K.Subbanna (2005). Dual
Gabor frames: Theory and computational aspects. IEEE Transactions on Signal
Processing, 53(11):4147–4158.
[Wiener, 1949] Wiener, N. (1949). Extrapolation, interpolation, and smoothing of
stationary time serie.
[Xia, 1997] Xia, X.-G. (1997). System identification using chirp signals and timevariant filters in the joint time-frequency domain. Signal Processing, IEEE Transactions on, 45(8):2072 –2084.
[Xiao et al., 2009] Xiao, X.-C., Zhu, Y.-C., and Zeng, X.-M. (2009). Generalized
p-frame in separable complex Banach spaces. International Journal of Wavelets,
Multiresolution and Information Processing (IJWMIP), 8(1):133–148.
[Zwicker and Fastl, 1990] Zwicker, E. and Fastl, H. (1990).
Springer-Verlag, Berlin.
44
Psychoacoustics.
© Copyright 2026 Paperzz