Statistical modelling of time series using non

Statistical modelling of time series using
non-decimated wavelet representations
G.P. Nason†,
Department of Mathematics,
University of Bristol, U.K.
T. Sapatinas,
Institute of Mathematics and Statistics,
University of Kent, U.K.
and
A. Sawczenko,
Institute of Child Health,
Royal Hospital for Sick Children, Bristol, U.K.
17th September 1997
Revised 16th April 1998
† Address for correspondence: Department of Mathematics, University of Bristol, University Walk, Bristol BS8
1TW, England
1
Abstract
This article proposes the use of time-ordered non-decimated wavelet or nondecimated wavelet packet transforms to provide flexible representations of a time series
(explanatory). The resulting representations are then used as variables in a statistical
model to provide predictions of another (response) time series. The statistical model
provides valuable information about which components in the explanatory time series
drive the response time series.
To represent our explanatory time series we use a collection of basis functions known
as wavelet packets. Each wavelet packet component of the exploratory time series
corresponds to a particular linear combination of the time series and its lagged versions
(regressive models). The construction of the wavelet packet transform ensures that
all possible regressive models on a grid are rapidly computed. Hence our model fully
explores “model-space” which may be parametrised in terms of the time-frequency plane.
Our modelling methodology is illustrated with examples from two different arenas:
(a) a wind power example using a generalized linear model relating wind speeds at
one weather station to a time-ordered non-decimated wavelet packet transform of wind
speeds and wind directions at another station; (b) a biomedical example shows how infant
sleep states can be successfully classified using the time-ordered non-decimated wavelet
packet transform of heart rate and linear discriminant analysis.
Keywords: time series modelling; wavelets; biomedical time series; wind time
series; variable selection
2
1 Introduction
Suppose we are interested in the relationship between a response time series, Yt , and an
explanatory time series, Xt , t = 1, . . . , N. We might be interested in predicting future values
of Yt given future values of Xt (suppose, for example, Yt is expensive to collect but Xt is
cheap). In addition we might also be interested in learning about which components of
Xt influence the behaviour of Yt . The route that we take represents Xt in terms of basis
functions called wavelet packets. More precisely, we use a collection of all possible timeordered shifted wavelet packets (a non-decimated wavelet packet transform, NWPT) and use
standard regression-like methods for selecting the best of these wavelet packets to represent
the explanatory time series for our intended purpose. Occasionally we shall use instead the
time-ordered non-decimated wavelet transform (NWT) which is a particular subset of the
NWPT.
The time-ordered NWT/NWPT produces a matrix X of t = 1, . . . , N observations on K
variables, where each variable contains coefficients with respect to a particular wavelet
packet. The great advantage of non-decimated transforms is that they produce a coefficient,
Xtk on each variable k = 1, . . . , K at every time point t = 1, . . . , N. Thus standard multivariate
regression-like methods can be used to model the response the response, Y = (Y1 , . . . , YN )T ,
in terms of the data matrix, X. Wavelet packets may be organised into libraries: rich
collections of basis functions that can represent a wide range of time series sparsely.
Figures 4 and 9 show eight different wavelet packets and indicate how flexible they can
be in terms of scale, number of oscillations and position. The wavelet packet transform
determines how much of each basis function is present in a time series.
One novel aspect of our work is the use of time-ordered non-decimated transforms which
ensure that the coefficients are arranged in chronological order thus permitting direct and
simple application of standard regression-like techniques. Decimated wavelet transforms,
although already used and useful for some classification and regression problems, can not
be used in our response/explanatory setup because at a given time point it is not possible to
evaluate the transform at all scales (so each variable contain fewer numbers of coefficients
than original observations).
Hence our work introduces wavelets to a whole new area of statistical problems and
points out a useful modification of an existing wavelet transform.
Regressive processes. Applying our modified wavelet method to such time series
problems is new. However, there are strong connections with existing time series
analysis methods. First, imagine every possible regressive time series model involving the
explanatory time series Xt :
α1 Xt + · · · + αN Xt−N+1
(1)
with real coefficients {α1 , . . . , αN }, as a candidate for explaining Yt . Clearly, it would be an
impossible task to fit every model so some sort of model selection procedure is usually used.
We claim that our methodology computes every possible model on an “grid” of models in
model space (and moreover computes it with an extremely efficient wavelet algorithm). So,
although the exact underlying model is not necessarily fitted we will come close with a few
of our models. The model space grid notion is made more precise in Section 4.3.
We discuss two motivating examples in Section 5. Section 5.1 discusses the prediction
over time of wind speed at a target site from a reference site. Here the explanatory time
series, Xt , is the wind speed at the reference site and the response time series, Yt , is the
wind speed at the target site. This sort of problem is extremely important when building
large structures or developing a wind farm as the wind regime at the target site is of great
interest for both safety and economic reasons. Our modelling allows us to predict future
values of Yt from future values of Xt but also provides a fascinating insight into the link
between the two wind series. We show that true wavelet packets are identified as important
model components (i.e. not just wavelets, or other simpler time series models). We also
3
contemplate multiple versions of our modelling where we use more than one explanatory
time series: in the wind example the wind direction is an important extra variable.
In Section 5.2 our second example concerns a biomedical time series arising from the
child health field. Here we are interested in predicting infant sleep state (asleep/awake)
from heart rate. The explanatory time series, Xt , is heart rate: it is cheap and easy to
measure directly. The response time series, Yt , is a binary variable measuring infant sleep
state. Sleep state is more difficult and expensive to measure. Figure 11 shows the response
and explanatory time series for a child which we use to build a model between Yt and the
wavelet packet transformed Xt . Future values of Yt may be predicted by transforming future
values of Xt and using our model. Again, like the wind example, we can use our model to
develop understanding about what sorts of components in Xt are important for predicting
sleep state.
The paper is organised as follows. Section 2 reviews the NWPT together with some
important historical special cases. Section 3 introduces the time-ordered NWPT necessary for
our modelling methodology. Section 4 shows how standard regression-like modelling can be
used to model a response time series, Yt , in terms of time-ordered non-decimated transforms
of an explanatory time series, Xt , and outlines the advantages of these transforms over other
methods. Discussion of some aspects of the methodology and ideas for future work are
presented in Section 6 and we conclude in Section 7.
2 Wavelet review
This section provides an overview of wavelets, the wavelet transform and discrete versions
of the transform. Wavelets have been found to be useful in a number of disciplines such
as physics, signal processing and geophysics. In statistics wavelets have mainly been
used in: curve estimation, see Donoho and Johnstone (1994a; 1994b; 1995), Donoho et
al. (1995; 1996), Hall and Patil (1995), Abramovich and Benjamini (1996), Antoniadis (1996),
Nason (1996), Ogden and Parzen (1996), Chipman et al. (1997), Clyde et al. (1998), Hall and
Nason (1997), Johnstone and Silverman (1997), Neuman and von Sachs (1997), Abramovich
et al. (1998), Crouse et al. (1998) and Vidakovic et al. (1998); time series analysis, see
Moulin (1994), McCoy et al. (1995), von Sachs (1996), von Sachs and Schneider (1996),
von Sachs et al. (1996), Gao (1997) and Nason et al. (1998); survival data analysis,
see Antoniadis et al. (1999); inverse problems, see Donoho (1995) and Abramovich and
Silverman (1998); classification, see Saito (1994), Coifman and Saito (1994), Buckheit and
Donoho (1995), Learned and Willsky (1995) and Saito and Coifman (1996). For a general
statistical introduction to wavelets see Nason and Silverman (1994), Bruce and Gao (1996),
and Ogden (1997) or see Strang (1993) for a more mathematical view. More detailed
comprehensive expositions are Daubechies (1992), Meyer (1992) and Chui (1992).
Wavelets are building blocks for constructing functions. More precisely wavelets form
bases for function spaces such as L2 (R). This article only considers orthonormal wavelets.
Functions f (t) can be represented by expansions such as
X
X X
f (t) =
ck φ0k (t) +
djk ψjk (t),
(2)
k∈
Z
j≥0 k∈
Z
where wavelets, ψjk (t), are represented in terms of a mother wavelet, ψ(t), using the formula
ψjk (t) = 2j/2 ψ(2j t − k)
(3)
for integers k and j ≥ 0. The resolution level, j, controls the scale of the wavelet, the
translation number, k, controls the location of the wavelet. The functions φ0k (t) are obtained
from a father wavelet φ(t) using formula (3) with j = 0. Typically the father wavelet looks
like a statistical kernel function and the mother wavelet is a localised oscillation (hence
4
the name wavelet). The standard wavelet basis functions {φ0k (t), ψjk (t)}j≥0,k∈Z form an
orthonormal family and so the coefficients of expansion (2) can be obtained by
Z
f (t)ψjk (t) dt
(4)
djk =
R
and similarly for the ck . Expansion (2) can be characterised in words by: the function, f (t),
consists of a smooth part represented in (2) by integer translated father wavelets (kernel
functions) and detail at different scales (j) represented by the wavelets ψj· (t). A wavelet
packets representation is similar to that given in (2) except that the basis functions have an
extra degree of flexibility: the number of oscillations of the basis function.
Typically a wavelet ψ(t) will be localised around the origin. This implies that the derived
wavelets ψjk (t) are located near k2−j . This means that for standard wavelets the location
of the basis functions depends on the resolution level j (if j is large then the wavelets will be
close together and each will oscillate very rapidly enabling them to represent high-frequency
rapidly changing phenomena. If j is small the opposite happens). For non-decimated
wavelets formula (3) for ψjk (t) changes slightly to
n
o
ψjk (t) = 2j/2 ψ 2j (t − k)
for all integers k and j ≥ 0. Thus the location of non-decimated wavelets does not depend
on the resolution level: indeed non-decimated wavelets at each resolution level, j, appear at
all locations k.
The reason why wavelets are useful is that they provide sparse representations for a wide
set of different functions, including functions that exhibit discontinuities or time-varying
frequency behaviour. The sparsity of representation is due to the fact that wavelets provide
localised information about a function: the wavelet coefficient djk provides information
about a function at scale 2j and near location k2−j . Indeed wavelets provide a way of
examining a function at different scales: a multiresolution analysis, see Mallat (1989a) or
Jawerth and Sweldens (1994). The non-decimated representations are no longer orthogonal
although orthonormal bases can be selected from them. However, the lack of orthogonality
is not a problem for the statistical modelling we perform later.
In many cases, including time series applications, data arise as discrete observations and
so discretized versions of (2) have been developed for functions on finite intervals. The best
of these transforms exploit relationships between wavelets at different scales and lead to
extremely fast algorithms for computing coefficients from discrete data, as we shall describe
next.
2.1 Notation for discrete wavelet transforms
The notation we adopt is taken from Nason and Silverman (1995). There are five key
operators: H — a low-pass filter; G — a high-pass filter; D0 — “even” dyadic decimation;
D1 — “odd” dyadic decimation; S — a shifting operation. These operators are defined on
doubly infinite sequences {xn }n∈Z, although in practice the operators are adapted here for
use on finite sequences assuming periodic boundary conditions.
The H filter has coefficients {hn }n∈Z and operates on an input {un }n∈Z as follows:
X
hn−k un .
(5)
(H u)k =
n
The formula for G is identical but uses coefficients {gn }n∈Z with a high-pass frequency
response.
Most practical wavelet transforms rely on H and G being finite impulse-response
filters having only finitely many non-zero coefficients. The Daubechies’ (1988) compactly
supported wavelets used in this article fall into this category and have L non-zero filter
5
coefficients, where L depends on the desired smoothness of the underlying wavelet. As a
consequence the filtering operation in (5) only requires O(L) computations.
A discrete version of (2) along with a specified wavelet ψ(x) uniquely determines the
filter sequences and the discrete transform itself only uses the filter sequences. (In the
signal processing literature the filter sequences are known as quadrature mirror filters.)
The filter sequences satisfy certain internal relations as a consequence of being derived
from an orthonormal wavelet basis. For details on these relationships see Nason and
Silverman (1995), Mallat (1989b) or Daubechies (1988; 1992).
The shift operator, S, shifts the whole sequence along one position:
(Su)k = uk+1
(6)
for all k ∈ Z. The “even”/“odd” dyadic decimation operator, D0 /D1 , selects every even/odd
member of a sequence. “Odd” decimation is equivalent to shifting followed by “even”
decimation thus:
(D1 u)k = (D0 Su)k = u2k+1
(7)
for all k ∈ Z. For simplicity the k subscript is sometimes dropped and the operator notation
reduces to a vector/matrix style. On finite length sequences the decimation operators
produce an output sequence which is half the length of the input and therefore makes
decimated transforms unsuitable for our modelling described in Section 4.
In discrete wavelet transforms the filter operators and decimation operators are paired up
to produce the operators D0 H , D0 G, D1 H and D1 G. The vector of coefficients produced
by any of these pairs is sometimes known as a “packet” of coefficients and therefore we
shall refer to the four operators collectively as packet operators. Wavelet transforms obtain
signal detail at different scales or frequency bands. They achieve this by chaining the lowpass D0 H and the high-pass D0 G operators together: for example, a band-pass filter can
be obtained by repeated application of D0 H followed by D0 G.
Finally all the transforms compute expansion coefficients with a fast algorithm that
computes coarser level j − 1 coefficients from finer level j coefficients. To initiate this
recursive algorithm a finest resolution level J is chosen and the coefficients, c J , are computed
from the data, {Xt }N
t=1 , at this finest level. In general this can be done with interpolation or
projection methods such as direct computation as in (4), or see Donoho (1992), Delyon and
Juditsky (1995) and Kovac (1997). If there are N = 2J (for some integer J) equally spaced
data points then the finest coefficients can be obtained approximately by just setting them
to be equal to the data. In other words
J
= Xn , for n = 1, . . . , N.
cn
(8)
In all cases described below the smooth at level J, c J , forms the input to all the wavelet
transforms and coarser wavelet coefficients are rapidly computed from them.
2.2 Non-decimated wavelet packet transform
The non-decimated wavelet packet transform (NWPT) applies the four packet operators
recursively to c J the smooth at level J. This is illustrated schematically in Figure 1 for N = 8
data points starting at the root of the tree. Each packet of coefficients in the NWPT tree can
be addressed by an index written in base 4. The number of digits in the index indicates the
level of the packet: packets at level j have J − j digits for j = 0, . . . , J − 1. The actual entries
in the index describe how that packet was reached from the root: application of D0 H , D0 G,
D1 H , or D1 G augments a 0, 1, 2 or 3 respectively to the index. For example, the indices 0,
1, 2 and 3 at level 2 and 01, 03, 21 and 23 at level 1 are indicated in Figure 1. For example,
the 23 packet is so labelled because it is obtained by the operator D1 H followed by D1 G.
6
D0 H
D1 H
D0G
D1G
Level
0
03
01
21
1
0
1
23
2
3
2
3
Figure 1: Schematic of the NWPT operating on N = 8 points, i.e. J = 3. The smooth at level
J = 3 enters the bottom of the tree and coefficients of successively coarser non-decimated
wavelet packets are computed by climbing the tree.
There are 4J−j packets each of length 2j for levels j = 0, . . . , J − 1 and, therefore, the total
number of coefficients and hence computational effort is
J−1
X
4J−j 2j L
=
L2J
j=0
J
X
2j
j=1
=
L2J+1 (2J − 1)
=
2LN(N − 1) = O(LN 2 ).
The NWPT was proposed by Pesquet et al. (1996) who also investigated its potential for
curve and time-delay estimation. The NWPT is a generalisation of the following important
and well-known wavelet transforms:
NWT The non-decimated wavelet transform can be derived from the NWPT by keeping only
packets whose indices are a string of even digits except for the last digit which may be
odd or even (so for example, 23 is kept, but 32 is not). Figure 2 shows a schematic of
the NWT. The computational cost of the NWT is O(LN log N).
A detailed description of the NWT and some of its statistical applications can be
found in Nason and Silverman (1995). See also Percival and Guttorp (1994), Bruce and
Gao (1996) and Percival and Mofjeld (1997). Coifman and Donoho (1995) developed a
translation-invariant curve estimation procedure using the NWT. See also Nason and
Silverman (1995), Lang et al. (1995) and Johnstone and Silverman (1997) for more on
curve estimation with the NWT. The NWT appeared earlier in the mathematical and
signal processing literature, see Beylkin (1992) and Shensa (1992).
WPT The wavelet packet transform can be derived from the NWPT by retaining coefficients
that have been produced by even dyadic decimation (i.e. keep those packets whose
indices consist only of digits that are 0 or 1). Figure 3 shows a schematic of the WPT.
The computational cost of the WPT is O(LN log N).
An excellent reference for wavelet analysis in general and wavelet packets in particular
is Wickerhauser (1994). The WPT has been used successfully for: compression
purposes, see Coifman and Wickerhauser (1992); curve estimation, see Donoho
7
HD 0
HD1
GD 0
GD1
000
c
001
d
c
c
d
d
c
Level
010
c
011
d
c
c
100
d
c
d
101
d
c
c
d
d
c
c
d
110
c
111
d
c
c
d
d
d
0
1
2
3
Figure 2: Schematic of the NWT operating on N = 8 points, i.e. J = 3. The c and d
boxes represent packets of father and mother wavelet coefficients respectively. Packets at
level j contain 2j coefficients labelled in each packet from 0 to 2j − 1. We have not shown
individual coefficients — just packets. The DWT is contained within the left-most coefficients
by following the solid arrows from level 3 up to level 0.
and Johnstone (1994a) and Pesquet et al. (1996); classification, see Coifman and
Saito (1994), Saito (1994), Learned and Willsky (1995), and Saito and Coifman (1996).
Practitioners of experimental design might recognise that the WPT using Haar wavelets
is exactly Yates’ method for efficiently organising computations for the analysis of
factorial effects (see Cochran and Cox (1957)).
DWT The discrete wavelet transform can be derived from the NWPT by imposing both of the
conditions described in the derivation of the NWT and WPT above — only even dyadic
decimation is used and recursion is only applied to the H derived packets. The DWT
is the intersection of coefficients from the NWT and WPT. A schematic of the DWT can
be found in Nason and Silverman (1994) or Mallat (1989b). The computational cost of
the DWT is O(LN).
The DWT had been known for some time in the signal processing community as
subband coding, see Mintzer (1982; 1985) or Smith and Barnwell (1986). The wavelet
interpretation and transform was developed by Mallat (1989a; 1989b). The DWT forms
the basis of most statistical methods to date (see list of papers at the beginning of this
section).
The DWT is one-to-one and onto and therefore has a unique inverse. Indeed the DWT
may be represented by a (orthogonal) matrix multiplication although computation is almost
always performed with the fast DWT. The DWT is essentially a discretized version of the
expansion in (2). The DWT provides a representation with respect to one wavelet basis
whereas the NWT, WPT and NWPT provide representations with respect to a library of
bases (wavelet packets). To represent functions in a library a basis needs to be selected.
Coifman and Wickerhauser (1992) and Cohen et al. (1997) describe basis selection methods
for wavelet packets and non-decimated wavelet packets respectively. Decompositions of
8
HD0
c
GD0
Level
d
0
c c 0
d d 1
2
3
d d d d
c c c c
c c c c c c c c
1
2
3
Figure 3: Schematic of the WPT operating on N = 8 points, i.e. J = 3. At level j there are
2J−j packets each containing 2j points. The numbers 0,1,2,3 next to the packets at level 1
are the frequency indices of packets within that level from left to right. The DWT coefficients
are contained in the WPT and are shown in the dashed boxes and marked c and d. All other
coefficients are with respect to other wavelet packets such as those illustrated in Figure 4.
The total number of wavelet packets (excluding the original data) is 2N − 2 = 14.
functions using a library of bases are known in the literature as atomic decompositions, see
Mallat and Zhang (1993) and Chen et al. (1996).
Property: translation-equivariance of non-decimated transforms. The non-decimated
wavelet transforms have another useful property called translation equivariance which
means that
NT[SXt ] = S(NT[Xt ]),
where NT can be either the NWT or the NWPT. In other words a shift in the time series
is reflected by an identical shift in the non-decimated transform coefficients. The DWT
and WPT are neither translation-equivariant nor translation-invariant. Also, it is possible
to compute non-decimated transforms with an arbitrary number of points N, although our
transforms are restricted to N being a power of 2.
Summary. The NWT library contains all possible shifted wavelet bases and the WPT
library contains the richer set of wavelet packets (but no shifts). The NWPT library contains
all possible shifts of the richer WPT library. Figures 4 and 9 show eight such wavelet packets
with varying time-extent, number of oscillations and location. The NWPT is the best of both
NWT and WPT worlds. In our modelling, in Section 4, we can only use the non-decimated
wavelet transforms (NWT or NWPT, not the DWT or WPT). In our examples in Section 5 we
choose to use the NWPT over the NWT since it contains a richer set of basis functions.
2.3 A note on classification with wavelet methods
Wavelet methods have been used in classification problems following the usual paradigm:
a training set of samples is collected; a DWT is applied to all the training samples and the
important wavelet coefficients for classification are identified using some statistical method
9
150
200
0.2
0.4
100
0.0
Wavelet packet basis function
50
-0.2
0.2
0.0
Wavelet packet basis function
-0.2
0
250
0
50
100
150
200
250
150
200
250
x
0.05
0.0
-0.10
-0.05
Wavelet packet basis function
0.1
0.0
-0.1
Wavelet packet basis function
0.10
x
0
50
100
150
200
250
0
x
50
100
x
Figure 4: Four wavelet packets for an analysis where N = 256. Wavelet packets are indexed
in a tree like that in figure 3 as (j, i) where j is the resolution level and i is the frequency
index within a level. The wavelet packets here are (clockwise from top left) (4,2), (4,5), (2,2),
(1,64) and happen to form an orthonormal set.
10
5
Resolution Level
6
7
8
9
10
0
200
400
600
800
Packet Number
Filter: Daub cmpct on least asymm N=10
1000
Figure 5: Plot of wavelet coefficients from a NWT analysis of a chirp signal (see text for
description).
(e.g. linear discriminant analysis or CART). Pioneering work by Saito (1994) and Learned and
Willsky (1995) extended the method for use with the WPT by inserting a “select best-basis”
step before the “best coefficients” were selected. Model verification can be performed or
future samples classified by applying the same wavelet transform and predicting the class of
the sample by applying the statistical classifier to the selected wavelet coefficients/packets.
See also Buckheit and Donoho (1995).
The advantage of using wavelet coefficients/packets is that they represent functions
sparsely and so provide an effective non-statistical dimension reduction. The sparse
representation and resultant low dimensionality of transformed data makes it easier for
the statistical classifiers to do their work. Our modelling is of a somewhat different type to
that just described as we build models in situ rather than have a large set of training samples.
However, the models we propose also gain from the effective dimension reduction of the
sparse wavelet representation.
3 Time-ordered non-decimated transforms
For our statistical modelling the reason for using non-decimated transforms is that the
number of coefficients at each level (NWT) or packet (NWPT) is the same as the number
of data points in the original data. However, the coefficients in each resolution level/wavelet
packet in the NWT/NWPT are not in chronological order.
For example, Figure 5 shows the NWT wavelet coefficients of a simulated reflected
chirp signal using Daubechies’ least-asymmetric wavelet with ten vanishing moments, see
Daubechies (1992, p. 198). The chirp time series itself is shown in Figure 5 at resolution
level 10 — this is the smooth at level 10 and the series is of length N = 1024 = 210 . The
resolution levels above level 10 show successively coarser resolution wavelet coefficients
and that there are 1024 coefficients at each resolution level (the successive smooths c j have
not been shown). Decimated wavelet transforms (DWT, WPT) half the number of coefficients
at coarser scales. The layout of Figure 5 corresponds to the schematic in Figure 2, except that
there are ten resolution levels, not three, and also Figure 5 has not plotted any levels with
11
Even
d
d
Odd
d
d
d
d
d
d
d
d
d
d
Time ordered
Figure 6: How the even and odd decimated non-decimated wavelet coefficients may be woven
together into time-ordering for resolution level 9 in figure 5.
resolutions less than 5. The vertical dotted lines in Figure 5 indicate packet boundaries. In
Figure 5 there are two packets of coefficients at resolution level 9 each of length 512. These
coefficients are in time-order within each packet but the left-hand coefficients correspond
to even dyadic decimation and the right-hand ones correspond to odd.
The time-ordering may be recovered by weaving the odd and even coefficients together as
illustrated by Figure 6. Weaving can also be carried out at lower resolution levels to recover
the time-ordering — although the weaving is not always a simple meshing of odd and even
coefficients at lower resolutions. The Appendix provides details of how to produce timeordered coefficients at any given resolution level (NWT) or wavelet packet (NWPT). Figure 7
shows the NWT of the chirp signal after weaving of coefficients at each resolution level. The
horizontal axis of Figure 7 now corresponds exactly to the time-base of the original chirp
signal and each resolution level is aligned to this time-base. Moreover as the caption to
Figure 7 suggests the time-ordered NWT can form the basis of a time-frequency or timescale diagram where information about the time and frequency properties of a signal can
be inferred simultaneously (in contrast to the classical periodogram which only reveals
frequency content over all time). For more details about statistical time-scale analysis with
the NWT see Nason and Silverman (1995), von Sachs et al. (1996) and Nason et al. (1998).
Walden and Contreras Cristan (1997) also (independently) developed the time-ordered
NWPT using a different computation method. They used the time-ordered NWPT for
investigating the time-scale properties of a series and related it to the original series. Their
method uses the WPT schematic but pads out filter coefficients depending on the resolution
level (similar to the technique used by Nason and Silverman (1995) in producing time-ordered
NWT coefficients). Padding essentially modifies Mallat’s (1989b) DWT formulae to become
j
ck =
X
n
j+1
hn c2J−j−1 n+k
(9)
for j = J − 1, . . . , 0 (replace hn by gn for wavelet coefficients). Hess-Nielsen and
Wickerhauser (1996) and Walden and Contreras Cristan (1997) also specify a phasecorrection technique to bring levels into precise time-alignment. Without phase-correction
each successive level of coefficients would be shifted in time with respect to the previous
level. We have applied an empirically determined shift factor to time-align our levels which
works well in our applications.
Summary. The time-ordered non-decimated wavelet transforms (NWT, NWPT) convert
a time series Xt into a K-dimensional multivariate time series Xt for t = 1, . . . , N (or
alternatively as a K × N data matrix, X). In the time-ordered NWT each variable of Xt
corresponds to a (non-decimated) level in the transform (e.g. each resolution level in Figure 7
corresponds to one variable in Xt ) and K = log2 N. In the time-ordered NWPT each variable
corresponds to a (non-decimated) wavelet packet in the transform and K = 2N − 2.
12
0
1
Resolution Level
2
3
4
5
6
7
8
9
0
256
512
768
Translate
Stationary transform Daub cmpct on least asymm N=10
1024
Figure 7: Wavelet coefficients produced with the time-ordered NWT for the chirp signal. The
coefficients are ordered left to right in exact alignment with the time-base in the chirp time
series. The increase and decrease in frequency of the chirp over time is clearly visible with
coefficients moving from low to high and then back to low resolution bands as one sweeps
from left to right across the diagram.
4 Statistical model building
4.1 Model building procedures
It is unnatural to prescribe exactly how to build models between Y and X. Our motive
for using the time-ordered non-decimated transforms (NWT or NWPT) was to permit the
direct application of any appropriate regression-like method. The actual method used
depends on the task at hand (e.g. prediction or eliciting important wavelet packets). We
have found techniques such as linear and generalized linear modelling (GLM), generalized
additive modelling, discriminant analysis and CART techniques useful (see McCullagh and
Nelder (1989), Hastie and Tibshirani (1990), Mardia et al. (1979), and Breiman et al. (1984)
respectively) although this should not be taken as an exclusive list.
We offer no magic recipe for obtaining the “best” model in particular circumstances.
However, the non-decimated transforms do provide a framework that allows the above
well-known modelling techniques (and the wealth of experience that goes with them) to
be employed in extracting important components of an explanatory time series and relate
them to the response time series.
If the NWT is used then K = log2 N and any regression-like technique of our choice can be
used to model Y in terms of the K wavelet packets of X. If the NWPT is used then K = 2N − 2
and the problem of having more variables than observations arises. Again, we stress that
this is a general statistical modelling problem and this article does not pretend to find an
optimal solution. The examples in Section 5 use the time-ordered NWPT and the following
two strategies are applied to overcome the “excess variables” problem:
1. The naive method chooses a K1 < N and then correlates each variable in X directly
with Y. The variables that exhibit the largest K1 correlations then form the working
set. Since K1 < N we can then use any of the appropriate regression-like methods
described above on the working set. For the examples given in Section 5 we have
13
chosen K1 = 0.05K. We realise that this is completely arbitrary but it has worked
excellently for our two applied examples below.
2. Krzanowski et al. (1995) describe a sophisticated class of procedures for dealing
with classification problems when one has more variables than observations. We
have applied some of their antedependence models to the sleep state/heart rate
classification problem in Section 5.2.
Many variations on the naive scheme are possible: for example using forward stepwise
modelling techniques in a particular modelling situation (e.g. using forward stepwise GLM
in the wind example in Section 5.1). There are many other more sophisticated methods
for variable selection that might be useful here e.g. the Bayesian methods of George and
McCulloch (1993).
Finally we sometimes transform the coefficients by taking the logarithm of the squared
values of the coefficients. This particular transform is of use when local oscillatory power
is thought to be important in driving the response time series: for instance, in the sleep
state/heart rate example in Section 5.2, it is the power of oscillation rather than the
oscillation itself that is related to changes of sleep state. Using a power-based statistic
is a standard signal processing trick, see Learned and Willsky (1995). Nason et al. (1998)
also advocate the use power-based coefficients in local time-scale modelling.
4.2 Advantages of the time-ordered non-decimated transforms
With time-ordered non-decimated transforms (NWT, NWPT) the wavelet representation can
be determined at every time point, t0 : it will be Xt0 . Therefore time-ordered non-decimated
transforms will be able to detect components within Xt that occur at any time point. With
decimated transforms it is not always possible to determine what is happening at arbitrary
time points and so interesting components of Xt that arise “between” decimated coefficients
might be missed.
A further advantage of the time-ordered non-decimated transforms is that each
resolution level (NWT) or wavelet packet (NWPT) can be treated as a variable containing
the same number of points as in the Yt series. Therefore the problem of modelling Yt
with Xt is straightforward and all the power of standard regression-like methods and
verification techniques can be used. Once a successful statistical model has been obtained
one can interrogate the model to find out which variables are influential in the model. Since
the variables contain coefficients with respect to certain wavelets or wavelet packets one
can identify what sorts and shapes of component within Xt are important for modelling
particular aspects of Yt .
The final advantage of the time-ordered non-decimated transforms is that they
are translation-equivariant. For modelling the important consequence of translationequivariance is that components of Xt have coefficients in Xt which will be the same wherever
the components arise in Xt . This will not be the case for the decimated transforms where
the wavelet coefficients of identical, but shifted, components could potentially be different
and thus “confuse” the classifiers.
4.3 Connections to time series analysis
The previous sections demonstrate that we can model Yt in terms of various wavelet packet
combinations of Xt . Let us look at some of these wavelet packet models in more detail and
in terms of familiar time series models. For simplicity we shall concentrate on the wavelet
1
packets generated by Haar wavelets. Here the filter coefficients for H are √2 (1, 1) and G
are
√1 (1, −1).
2
J
Starting with a series {Xt }N
t=1 , where N = 2 , the application of H and G,
14
climbing the first level of the tree depicted in Figure 3, produces the two series:
1
Wt0 = √ (Xt + Xt−1 ) ,
2
and
1
Wt1 = √ (Xt − Xt−1 ) .
2
Since the transforms we use are non-decimated both Wt0 and Wt1 exist for t = 1, . . . , N. Each
of Wt0 and Wt1 appear as variables in the transformed version of Xt , i.e. they form components
in Xt . Both W -models are regressive precursors: noise is taken into consideration when the
model building described in Section 4.1 is executed. For example, if we were using linear
modelling then we could fit the model
Yt = Wt0 + t ,
where the usual conditions on the noise t were assumed to hold. So far, there is nothing
new and each of the W -models could be fitted by classical means. Climbing the next level in
the tree we would obtain the following four regressive models:
Wt00
=
(Xt + Xt−1 + Xt−2 + Xt−3 ) /2,
Wt01
Wt10
Wt11
=
(Xt + Xt−1 − Xt−2 + Xt−3 ) /2,
=
(Xt − Xt−1 + Xt−2 − Xt−3 ) /2,
=
(Xt − Xt−1 − Xt−2 + Xt−3 ) /2,
each of which could be fitted to Yt in a classical modelling exercise.
There are two interlinked reasons why computing models with respect to the wavelet
packet tree is a good idea:
Variety of models. The NWPT computes all the W -models in the wavelet packet tree (there
are K = 2N − 2 of them). The utility of the particular W -models in the scheme can
be understood by examining the time duration/frequency properties of each (in time
series language think of these properties as a localized frequency response function of
each W -model). Each wavelet packet may be indexed by scale, frequency and position
(see Hess-Nielsen and Wickerhauser (1996)). In the above example: W 0 and W 1 are
both at the finest scale J − 1 and frequency index 0 and 1 respectively; W 00 , W 01 , W 10
and W 11 are at scale J − 2 and increasing frequency indices of 0, 1, 2 and 3. Packet
position is indexed by t in each case. The whole collection of W -models form a grid
over time–frequency space. Each model defines a small region in time-frequency space
and collections of models completely cover the time-frequency space.
The idea is not that we are able to exactly represent every model of the form (1) for
arbitrary α1 , . . . , αN but we can compute a set of models spread out over model space.
Any arbitrary model should then be “near-to” one of our W -models or a modest linear
combination of the them.
Computational efficiency. The NWPT is computationally efficient and rapidly computes all
W -models in O(LN 2 ) as shown in Section 2.2. Classical time series methods might
compute individual W -models in O(LN 2 ) time resulting in an overall computational
effort of O(LN 3 ).
Finally, note that W -models themselves act as basis functions for all regressive processes
(less than or equal to order N, and including the process Xt itself to complete the basis).
15
Indeed, this happens in the √
modelling described
in Section 4.1. For example, the modelling
√
might identify that Xt + α/ 2Wt0 − α/ 2Wt1 is an important component for predicting Yt
behaviour for some real α. Of course
p
p
Xt + α/ 2Wt0 − α/ 2Wt1 = Xt + αXt−1 .
This means that if our W -models are not adequate in themselves then any model of the form
of (1) can be fitted, in our case usually by the statistical modelling method used.
5 Examples
The following two examples exhibit different features of our methodology. The wind speed
modelling example shows how two explanatory time series can be used to predict a response
time series (i.e. multiple explanatory variables) and also identifies the wavelet packets that
are important for prediction. The infant sleep state example is different because the
response time series Yt is a binary variable. In both examples we demonstrate that our
methodology is competitive to methods used in the respective fields (in terms of error rates).
Moreover, our methodology provides interpretable explanations about the important wavelet
packet components for linking the two time series unlike the methods currently used in the
respective fields.
5.1 Wind speed modelling and prediction
Before a wind farm is constructed a great deal of analysis is undertaken to establish whether
a particular target site is suitable. One aspect of this analysis involves the prediction of
the long-term mean wind speed at the target site. Typically, wind speeds are measured by
a pilot anemometer at a height of 10m at the target site for several months. These target
speeds are related to contemporaneous wind speeds measured at a nearby reference site
(UK Meteorological Office station) and a model predicting target speeds from the reference
speeds is constructed. The long-term mean wind speed at the target site can be estimated
from the model and the long-term mean at the Meteorological Office site. Modelling of this
kind is described by Cook (1985) and Haslett and Raftery (1989).
Figure 8 shows hourly wind speeds recorded at two Welsh Meteorological Office stations:
Valley and Aberporth. Valley is located approximately 120km north of Aberporth and they
are mostly separated by Cardigan Bay. In the following example our aim is to model Valley’s
wind speeds (Yt ) in terms of those at Aberporth (Xt ). We show how this model can be used
to predict the wind regime at Valley from future Aberporth values (and outperforms existing
methodology). More importantly, our model is interpretable and explains what types of wind
activity at Aberporth are important for predicting Valley wind speeds.
An established method. Linear regression is an extremely simple and effective
competitor to our methodology and it is widely used in practice (e.g. Hannah et al. (1996)).
First, the data is divided into (typically) twelve 30◦ direction sectors based on the direction
of the wind at Aberporth. Then 12 separate linear regression models are computed one for
each direction sector. Predictions of the wind speed are easily obtained by using the current
wind direction at Aberporth to select one of the twelve regression models and then predict
the windspeed at Valley by α̂ + β̂Xt where α̂ and β̂ are the fitted regression parameters for
that particular sector.
Wind speeds are usually non-normal, serially correlated and also subject to measurement
error so one might think that the above simple modelling strategy would not be very good.
On the contrary it does well as the residuals in Figure 10 show.
Using our methods. The time-ordered NWPT was applied to a segment of Xt of length
N = 512 using Haar wavelets. We then constructed the data matrix X using the coefficients
of the Haar NWPT as described in Section 4.1.
16
15
10
5
0
Wind speed (metres per second)
0 1 2 3 4 5 6 7 8 9
11
13
15
17
Days since 00:00 on 6th April 1995
19
21
Figure 8: Hourly wind speeds from 00:00 on 6th April 1995 at Valley (solid line) and
Aberporth (dashed line). (Data provided by Micon Turbines UK Ltd.)
Initially we used various procedures to model Yt in terms of X. However, residual plots
showed that our models were systematically in error with the error magnitude strongly
related to the wind direction at Aberporth. To improve our model we inserted an extra
wind direction sector factor variable: DIR. (The DIR factor has twelve levels corresponding
to winds in the different 30◦ direction sectors. See Table 2 for a list.)
We then used generalized linear/additive modelling and CART to find a good model for
the Valley wind speeds Y in terms of X. The final model was a generalized linear model
(GLM) that assumed Gamma distributed Y values (wind speeds are positive and skewed to
the right) with a log link function. Initially, there were K = 1022 possible wavelet packets
(variables) plus the DIR factor that we could use. To obtain an idea of which of these wavelet
packets were useful we first used our “naive variable-selection” approach from Section 4.1
with K1 = 51 and labelled the resulting wavelet packets S1 to S51 (n.b. 51 is 5% of 1022). A
backwards selection GLM technique was applied to {S1, . . . , S51, DIR} and 4 variables were
included in the final general linear model along with the DIR factor.
Our model and interpretation. Tables 1 and 2 show the coefficients of the final GLM
model
log(Yt ) ∼ µ + S2 + S20 + S35 + S39 + DIR.
The final model is highly interpretable. The DIR factor can be interpreted as a multiplier
reflecting the strength of association between the wind speeds at the two sites. Since the two
sites are in a NNE/SSW line there is a large multiplier of 12 when the wind direction is in sector
DIR7 (when the wind is at right angles to this in DIR10 the effect of all the other variables
is shrunk by the multiplier -0.87). This effect is enhanced when the wind comes from a
southerly or westerly direction rather than a northerly or easterly direction which is natural
given the prevailing wind directions in the UK from the west and south.
However, the
DIR factor only multiplies the linear predictor in the final model by a fixed amount depending
on the wind direction. The 4 wavelet packets actually model the variations in wind speed
17
Table 1: Significant wavelet packets included in the final GLM. The table shows the value of
the coefficient in the linear model along with the resolution level that the term corresponds
to and its frequency index within that resolution level.
Term
Intercept
S2
S20
S35
S39
Packet
Level
Frequency
Index
7
2
1
0
0
15
6
33
Term Coefficient
(×1000)
8300
-13
38
-10
-47
Table 2: GLM coefficients for the factor DIR in the final GLM along with the associated
direction sectors.
Term
DIR1
DIR2
DIR3
DIR4
DIR5
DIR6
DIR7
DIR8
DIR9
DIR10
DIR11
DIR12
Direction sector
(degrees)
0—29
30—59
60—89
90—119
120—149
150—179
180—209
210—239
240—269
270—299
300—329
330—359
18
Term Coefficient
(×1000)
-31
-18
22
21
5
-2
12000
-1300
-1100
-870
-720
aliased
Basis function
-9
-5
-1 2 5
a. Days
-0.05 0.0 0.05
S20
0.0 0.1 0.2 0.3 0.4 0.5
Basis function
S2
8
-9
-5
-9
-5
-1 2 5
c. Days
8
8
-0.04
0.0 0.020.04
S39
Basis function
-0.06 -0.02 0.02 0.06
Basis function
S30
-1 2 5
b. Days
-9
-5
-1 2 5
d. Days
8
Figure 9: The Haar wavelet packets used in the final model. The vertical dashed line in
each plot corresponds to time t. Each wavelet packet is indexed by a pair: (resolution level,
frequency index within a level). They are (clockwise, from top left): a. S2: A father Haar
wavelet (7,0); b. S20: The wavelet packet (2,15); c. S30: The wavelet packet (1,6); d. S39:
The wavelet packet (0,33).
and they too are interpretable. Figure 9 shows the form of the Haar wavelet packets in the
model. Each plot in figure 9 contains a vertical dashed line at t = 0 which serves as an origin
for obtaining wavelet packet coefficients of a series. Given this, the interpretation of each
of the plots in Figure 9 is as follows:
a. S2 is a Haar father wavelet at scale 7. At this scale information is averaged over
the previous 4 hours. Inclusion of this wavelet packet indicates that the series
Xt + Xt−1 + Xt−2 + Xt−3 is important for prediction.
b. S20 is a wavelet packet with average oscillation frequency of just over 23 hours. We
assume that this wavelet packet captures daily variation in wind speed. Note however,
that the oscillation only occurs over the previous 5 days. So, for prediction, daily
variation is important for prediction, but only the past 5 days is relevant.
c. S35 is a wavelet packet with average oscillation frequency of 4.7 days. It is well-known
that wind speeds oscillate at or near this frequency. Indeed, this frequency falls into
the middle of the “macrometeorological peak” and is associated with the large-scale
pressure systems passing overhead (Cook (1985), van der Hoven (1957)).
d. S39 is a wavelet packet which mostly oscillates over the whole series at a frequency of 16
hours except for the period around t = 0 where it averages over the immediate 8 hours
into the past and future. It is difficult to attach a direct meteorological interpretation
to this wavelet packet, although wind takes approximately 8 hours to travel between
Aberporth and Valley assuming a mean wind speed of 4 to 5ms−1 .
19
4
2
0
-2
-4
Abs. value residual difference
22
24
26
28
30
32
34
36
38
Days since 00:00 on 6th April 1995
40
42
Figure 10: Plot of difference between absolute values of residuals (dt ) against time t. The
solid horizontal line is centred at zero, the horizontal dashed line corresponds to the mean
of the dt (see text for description).
The S39 wavelet packet takes values equally from the future as well as from the past
which is perfectly legitimate mathematically but would be a problem for real-time
prediction.
This example shows the necessity of using the time-ordered NWPT over the time-ordered
NWT in that wavelet packets were shown to be useful for prediction. (Methods based on
classical Fourier methods would not be able to generate this model — the packets included
here are true “time-scale” objects. In other words they give information about the extent to
which these oscillations last as well as the “average frequency” of oscillation.)
Model predictions.
For this example the differences in prediction between the
established method and our new methodology are not large. In fact, both methodologies
often make the same mistakes. Ours is slightly better but we also obtain a wealth of extra
interpretable information as outlined above.
To compare the two methods of prediction we first constructed the residuals rtEST and
NEW
rt
for the established and new methods of prediction. Then we computed the difference
between the absolute values of the residuals:
dt = |rtEST | − |rtNEW |
and Figure 10 shows dt against time. If our model works well then its residuals will be small
compared to the established method and dt should be positive. Figure 10 shows that this is
indeed the case and our model is a little better. Our model is also formally better in terms of
mean residual sums of squares (MRSS, ours is 0.088, the established method is 0.094). The
other interesting feature is that our model is better over the early parts of the prediction
interval: over 10 days our MRSS is 0.12, the established MRSS is 0.14; over 5 days our MRSS
is 0.19, the established MRSS is 0.23. Roughly speaking our model is 10% better than the
established method. Although 10% does not sound very much in absolute terms it can make
a lot of difference to the wind power output statistics for a proposed wind farm (wind power
20
Awake
Heart rate (beats per minute)
130
120
110
100
Asleep
1.4
1.6
1.8
2.0
Time since baby was put to bed (hours)
2.2
Figure 11: Time-series of heart rate and sleep state for a four month old baby beginning at
23:09:42. Heart-rate is labelled by the left-hand axis. Sleep-state is labelled by the right-hand
axis and takes only two states: asleep and awake.
output is related to the cube of the speed) and hence to the economics and viability of the
farm.
5.2 Infant sleep-state classification
Background. Many pathological phenomena, such as breathing problems, are likely to occur
during sleep. Figure 11 shows a segment of two time series recorded from a four month old
infant who was placed to bed at night. The series shown are of heart rate, Xt , and sleep state,
Yt , (i.e. whether the infant was awake or asleep) sampled every 30 seconds. After about 1.5
hours the infant eventually fell asleep only to wake around half an hour later. Heart rate
was automatically measured using a standard commercial ECG (electrocardiogram) monitor.
Sleep state was manually determined the next day by a trained observer visually interpreting
each 30 second period of the infant’s EEG (electro-encephalogram — “brain-waves”) and EOG
(electro-oculogram — eye movements) that had been concurrently recorded, see Anders
et al. (1971). Whilst this is an accurate and reproducible method of sleep state analysis
(about 80% inter-observer agreement) the determination is time-consuming, laborious and
expensive. The attachment of the recording sensors to the infants scalp (EEG) and face (EOG)
may be distressing to both parents and infants, may lead to artifacts by interfering with the
infants sleep, and is not practicable in the home environment. Thus such recordings must be
performed in the hospital which further adds to cost and potential distress. By comparison,
ECG recording is relatively unobtrusive, since the leads are attached to the infants chest and
parents can be readily taught to do this. Thus a method which can reliably predict sleep
state from heart rate would be clinically valuable - our method does just this.
Figure 11 shows that the heart rate is low when the baby is asleep and high when it
is awake. Therefore, the mean level of heart rate over certain time scales is likely to be
important for determining sleep state. As well as prediction our model gives us valuable
information about the relationship between sleep state and heart rate. The statistical model
21
Table 3: Resolution levels and frequency indices of the time-ordered non-decimated wavelet
packets that were identified as being important for relating Yt to Xt . The “correlation” column
shows the correlation between Yt and the particular wavelet packet coefficients.
Wavelet packet
Resolution
Frequency
Packet ID
level j
index
S1
4
0
S2
5
0
S3
3
0
S4
6
0
S5
1
0
Correlation
with Yt
0.92
0.89
0.89
0.89
0.79
tells us which sorts of wavelet components of Xt are most important for classifying sleep
state.
Computerised analyses of adult sleep have been documented in the medical literature
using: multidimensional scaling, Burger et al. (1977); adaptive segmentation and fuzzy
subset theory, Gath and Baron (1980); expert system approaches, Ray et al. (1986). In
infants, attention has focussed on further subdividing sleep into two broad categories
“active” and “quiet” sleep — the former being equivalent to adult rapid eye movement
(REM) sleep — during which dreaming and many upper airway breathing disorders occur
(DeHann et al. (1977); Schechtman et al. (1988)). Harper et al. (1987) developed an off-line
system based on cardiac (4 variables from heart rate) and respiratory measures (3 variables
from respiration). Harper et al. (1987) quote success rates of 85% using all 7 variables, 82%
for the four cardiac and 80% for the three respiratory variables. We only concentrate on
two categories here: awake and asleep. More detailed sleep-state categorization could be
undertaken but this is only a matter of choice of statistical classification technique, it is not
because we are using wavelet methods.
Stoffer (1991) presents a series of interesting analyses of sleep-state time series using
Walsh-Fourier analysis. Walsh functions are orthogonal basis functions, similar in some
ways to the Haar basis. However, Walsh functions are not localized in time (like the Fourier
sine and cosine) and so are not really useful for the sorts of analyses we do here. The
Walsh basis does have its uses when the underlying time series is a categorical variable
like sleep-state. Amongst other analyses, Stoffer produces estimates of Walsh coherency
between sleep-state and total number of body movements over time. The Walsh coherency,
like the usual (Fourier) coherency, is a quantity defined in terms of frequency only and not
time-frequency (although it might be interesting to see work on time-localized coherency).
Using our methods. In all we experimented with three statistical modelling methods:
linear discriminant analysis (LDA), logistic regression and CART. Of these three methods
LDA working on the log-transformed absolute values of coefficients was most successful
and is described here. We mainly used segments of length 128 but longer segments of 512
were used with the more sophisticated antedependence models mentioned below.
The explanatory time series Xt from Figure 11 was transformed with the time-ordered
NWPT using Daubechies’ extremal-phase wavelets with 10 vanishing moments and the X
matrix formed. The matrix consisted of N = 128 observations on K = 254 variables (nondecimated wavelet packets). Since we have more variables than observations we again used
our “naive” variable selection method with K1 = 5, labelled the resulting “top” variables by
S1, . . . , S5 and identified them in Table 3 along with their correlations.
Note that the
best variables discovered by the naive selection strategy all have frequency index 0 (indeed,
the next best also has frequency index 0 at scale 2). The wavelet packets at frequency
index 0 are father wavelets corresponding to repeated application of the smoothing filter
H . The father wavelets resemble statistical kernel functions as shown in Figure 12 which
22
0.4
0.2
-0.2
0.0
Basis function
-32
-26
-20
-14
-9
-4 0 3 6 9
Minutes
13
18
23
28
Figure 12: The top three wavelet packets for classifying the baby sleep state from heart rate.
The figure shows Daubechies’ extremal phase father wavelets at scales 4 (S1, solid line), 5
(S2, dotted line), and 3 (S3, dashed line). The vertical line shows current time t. (Only three
are shown for clarity).
shows father wavelets corresponding to resolution levels of 3, 4 and 5 (S3, S1 and S2). The
appearance of the father wavelets suggests that averaging over scales 1 to 6 in the immediate
past is important for determining sleep state. This corresponds with the observation earlier
than the level of the heart rate over these scales is an important determining factor. The
prospects for real-time prediction are probably not as good as the father wavelets average
a short time into the future as well (e.g. S1 requires about 6 minutes, S2 about 2.5 minutes
and S3 about 11 minutes).
The top five variables were then used in a LDA analysis which determined which linear
combinations were best for discrimination. The best linear combination turned out to be:
12 S1 − 0.78 S2 − 0.37 S3 + 4.5 S4 − 2.5 S5.
(10)
Thus for this data set S1 is very influential. S1 corresponds to averaging over periods of about
10 minutes (looking at the solid curve in figure 12). This period of oscillation was found in
analyses carried out by Stoffer (1991) to be present in infants unexposed to maternal alcohol.
Although here we are saying that the 10 minute cycle is important for linking heart rate and
sleep state. The Stoffer analysis identifies a 9 minute cycle in spectral analysis of just sleep
state.
Prediction. To exercise our model we took the next 128 heart rate values, performed
the NWPT analysis, extracted the same top five variables and used the linear combinations
determined by the LDA in (10) to predict the sleep state for the next 128 time periods
(additionally the mean of the next 128 heart rate values was adjusted to be the same as the
previous 128 values to prevent this affecting the analysis as it adds no discriminatory value).
Figure 13 shows the new time-ordered NWPT values projected onto the first two discriminant
axes, the location of the discriminant rule and 13 misclassified observations. With this
classification we achieved a 90% overall success rate (thirteen observations misclassified the
23
2944
1
0
0
1 00
1
0
2942
0
0
0
0
0
0
0
0
0
2940
0 0
0
0 1111
1 1
1
1
0
0
1
1
00
0
0
1
0
0
0
0
0
0
1
1
0
1
0
0
0
1
1
1
1
1
0
0
2938
1
1
0
0
0
0
1
1
1 1
1
1
00
M00
1 1
M1
1
1
0 00
1
1
1
0
2936
Second discriminant axis
1
0
0
1
0
0
0
1
0 0
0
0 0 0
00
0
0
1
1
0
1
1
1 1
1
1
1
1
2934
1
00
-60
0
1
1
1
1
1
1
1
1
0
00
-58
-56
-54
-52
First discriminant axis
Figure 13: Time-ordered NWPT values from the new heart rate time series projected onto
first and second linear discriminant axes. The label of each point shows its true group
membership. The vertical dashed line shows the discriminant rule: observations to the left
are assigned to the asleep group (0), those to the right are assigned to the awake group (1).
The 13 misclassified observations appear in the top-left of the plot. The M0 and M1 labels
refer to the means of the asleep and awake groups used to build the discriminant model.
24
Heart rate (beats per minute)
Awake
130
120
110
100
Asleep
2.4
2.6
2.8
3.0
3.2
Time since baby was put to bed (hours)
3.4
Figure 14: New heart rate series with true sleep state (solid) and predicted sleep state
(dashed).
infant to be asleep when it was really awake). Figure 14 shows the new heart rate series
with the true and predicted heart rate. Our method is fooled into thinking that the baby has
gone to sleep just after 2.4 hours, probably by the sharp drop in heart rate. Likewise, just
around 3.1 hours our method is a bit slow in noticing that the baby woke up, but the delay in
noticing is 2 minutes (however, the true record does note that during this period the human
judge was uncertain about the true sleep state).
Evaluation. The biomedical time series used here is one of several recorded prospectively
at monthly intervals from a group of infants during the first 5 months of life. As infants
mature their EEG and EOG become easier to classify and conventionally determined sleep
state becomes more accurate with less disagreement between observers. This was reflected
by our LDA models which became better at predicting infant sleep state with increasing age
(around 75% success at 2-3 months, and 75% to 90% success at 4–5 months of age).
As an alternative to our “naive” variable selection method we used the first and second
order antedependence models from Krzanowski et al. (1995) to build discriminatory models
for the sleep state with segments of length N = 512 resulting in K = 1022. Table 4 shows
cross-validated success rates for a particular baby at different stages of development and
suggests that better classification may be possible with the older infant.
Although we have had some success in classifying sleep state by building a model in the
early part of a night and predicting what the sleep state is in later periods it may not be
possible to transfer the exact model to the same infant on a different night or to a different
baby of the same age. However, the same father wavelets nearly always recur in the best
model suggesting that averaging over certain time-scales is important. Only the coefficients
in the sleep state/heart rate model differ across nights or infants.
25
Table 4: Cross-validated classification success rates using antedependence models of order
1 (AD1) and 2 (AD2) for an infant at different ages. Rates show how accurately the categories
of asleep and awake were classified as well as the overall classification success rate. Crossvalidation was performed with the leave-one-out method of Lachenbruch and Mickey (1968).
Infant Age
(Months)
2
3
4
5
Model
AD1
AD2
AD1
AD2
AD1
AD2
AD1
AD2
Asleep
0.89
0.89
0.95
0.96
0.94
0.95
0.97
0.97
Success rate
Awake Overall
0.90
0.90
0.89
0.89
0.88
0.94
0.88
0.95
0.86
0.89
0.89
0.91
0.95
0.96
0.95
0.96
6 Discussion and further work
This article proposes the idea of using time-ordered non-decimated wavelet (NWT) or nondecimated wavelet packet transforms (NWPT) to provide models of a response time series
in terms of flexible representations of an explanatory time series. There are many other
interesting issues some of which we briefly discuss here.
In statistical modelling it is usually the case that addition of extra informative variables
can only improve the model. For example, in the wind speed example wind direction
was found to be an extra important variable. It is perfectly possible to include extra
variables directly and/or insert their time-ordered non-decimated wavelet representations
in our methodology. Adding extra variables in this way could be termed a multiple variable
representation.
The wavelet transformed series could be included in standard time series models or
structural models, see Priestley (1981) or Harvey (1993). Indeed, explicitly lagged variables
included in the models could be beneficial. For example, in the infant example there may be
a lag between changes in heart rate and sleep state. However, due to the multi-scale nature
of the wavelet representations time lags are implicitly included although not necessarily to
the time resolution one might require. For example, lower- to mid-level coefficients contain
information about some past and future events as well as the current time but the distance
into the past and future increases as the resolution level decreases.
The time-ordered NWPT results in a situation where there are many more variables
than observations and we have only used two variable selection procedures. There is a
great need for effective variable selection procedures driven by discrimination or regression
considerations. However, we stress that our contribution is the development of the timeordered NWPT and its application to making models between two time series.
We also intend to extend our methodology to two-dimensional processes. For example,
one might have image data concerning manufacturing details and end-product of an
industrial procedure and wish to relate them and provide predictions. Implementations of
two-dimensional “packets” NWTs have been used, see Lang et al. (1995), although presumably
not spatially ordered. We have implemented a two-dimensional spatially ordered NWT but
have not yet used it for the modelling described in this article. Implementation of the twodimensional spatially ordered NWPT would be a challenge.
26
7 Conclusion
This article proposes the use of time-ordered non-decimated wavelet or non-decimated
wavelet packet transforms to represent an explanatory time series.
The resulting
representations are then used as variables in a statistical model to provide predictions of
a response time series. The statistical model usually provides valuable information about
which components in the explanatory time series drive the response time series. Standard
statistical models can be used to their full extent as the time-ordered non-decimated wavelet
transforms produce variables with as many coefficients as time series observations.
We exhibit our methodology on by modelling the relationship between wind speeds at
two different sites and the relationship between infant sleep state and heart rate. In both
examples our models can predict future values of the response time series from future values
of the explanatory time series more accurately than established methods. More importantly,
our methodology permits a detailed investigation of which components of the explanatory
drive features in the response series. In both examples the investigation revealed interesting
and interpretable components.
Acknowledgements
The authors would like to thank Peter Fleming and Jeanine Young of the Institute of Child
Health, The Royal Hospital for Sick Children, Bristol and Piers Guy of Micon Turbines, UK,
Ltd for supplying data. Many thanks are due to Wojtek Krzanowski, School of Mathematical
Sciences, Exeter University, who kindly provided software for computing and evaluating
antedependence models. We would like to thank Bernard Silverman and Peter Green for
reading an earlier version of this manuscript and for providing helpful comments.
Nason and Sapatinas were supported in part by EPSRC grant GR/K70236. This work was
completed whilst Sapatinas was a Research Associate in the Department of Mathematics,
University of Bristol, supported by GR/K70236. Sawczenko was supported by grant 186
from the Foundation for the Study of Infant Deaths, Shield Nationwide Ltd, The United Bristol
Healthcare NHS Trust Medical Research Fund and Cot Death Research.
Appendix: Weaving details
A distinction must be made between an “ordinary” NWPT packet (such as the ones in Figure 1)
and a time-ordered NWPT packet. Time-ordered NWPT packets follow the WPT indexing
scheme and are obtained by weaving together coefficients from ordinary NWPT packets.
NWPT packets are as long as the original data. For example, in the ordinary WPT at level
J − 1 there are two packets: packet 0 and 1 each of length 2J−1 (see Figure 3). With the
ordinary NWPT at level J − 1 there appears to be four packets (see Figure 1). However,
one can also visualise the ordinary NWPT packets at level J − 1 as two time-ordered nondecimated packets corresponding to the ordinary WPT by interweaving the four packets in
the following way:
• weaving together the packets produced by H D0 and H D1 . This produces the timeordered non-decimated packet H and corresponds to the time-ordered non-decimated
version of the ordinary WPT packet frequency index 0.
• weaving together the packets produced by GD0 and GD1 . This produces the timeordered non-decimated packet G and corresponds to the time-ordered non-decimated
version of the ordinary WPT packet frequency index 1.
Therefore the weaving process is a two-stage procedure: choose which time-ordered nondecimated NWPT packet you require (using the WPT indexing scheme) and then identify the
associated ordinary NWPT packets; weave the associated packets into time-order.
27
In general, to obtain the correct time ordering the ordinary NWPT packets are not taken
sequentially but with reference to the root node. For example, let us refer to level 1 in
Figures 3 and 1. Suppose that we wished to obtain time-ordered non-decimated wavelet
packet frequency index 1 (or in operator notation the packet produced by H followed
by G). This corresponds to ordinary NWPT packet indices 01, 03, 21 and 23 using the
base 4 notation from Section 2.2 (each of the cases where a G operator follows a H
operator regardless of decimation). To produce correct time-ordering we take coefficients
successively from the ordinary NWPT packets in the order 01, 21, 03 and then 23. This
ordering occurs because the shift of wavelet packets is finer nearer the root node. The
transition from level 3 to 2 encodes a shift of one position, the transition from level 2 to 1
encodes a shift of two positions. So the “distance” of 21 to 01 is only 1, from 03 to 01 is 2
and from 23 to 01 is 3. So, relative to 01, 21 has undergone a unit shift, 03 a two unit shift
and 23 both a unit and two unit (= 3 unit) shift.
To obtain the ordinary NWPT indices associated with a time-ordered NWPT frequency
index, r at level j, say (r = 0, . . . , 2J−j − 1, j = 0, . . . , J − 1, see Figure 3 for details of the WPT
indexing scheme) the following recursive procedure can be used:
1. convert the (decimal) time-ordered non-decimated wavelet packet frequency index r
into binary string s. Convert s into decimal but this time assuming s is in base 4. Call
a
-- means convert from base a into base b.
the result p (some example conversions: ---------→
b
10
4
2
10
1 -------------→
-- 1 ---------→
-- 1,
10
4
2
10
2 -------------→
-- 10 ---------→
-- 4,
10
4
2
10
-- 11 ---------→
-- 5).
3 ------------→
2. For i = j, . . . , J − 1 do
e <- 2^(2*J-2*i-1)
p <- c(p, p+e)
This example contains partial S code (see Becker, Chambers and Wilks (1988)). The first line
sets e = 2(2J−2i−1) the second line uses the concatenation operator c that pastes together
two vectors, i.e.
m
c({xi }n
i=1 , {yj }j=1 ) = (x1 , . . . , xn , y1 , . . . , ym ).
As an example suppose that again the time-ordered NWPT indices for the non-decimated
wavelet packet at level 1 frequency index 1 for the 8 point data set are required. After the
binary to base 4 conversion: p = 1. In the loop: setting i = 1 we obtain e = 8 and p = (1, 9).
Then setting i = 2 we obtain e = 2 and p = (1, 9, 3, 11) which are the required indices (in
base 4: 01, 21, 03, 23). Time-ordered coefficients are obtained from these four ordinary
packets by taking the first coefficient from each in order, then the second coefficient from
each in order and so on.
References
Abramovich, F., & Benjamini, Y. (1996). Adaptive thresholding of wavelet coefficients.
Computat. Stat. Data Anal., 22, 351–361.
28
Abramovich, F., & Silverman, B.W. (1998). Wavelet decomposition approaches to statistical
inverse problems. Biometrika, 85, (to appear).
Abramovich, F., Sapatinas, T., & Silverman, B.W. (1998). Wavelet thresholding via a Bayesian
approach. J. R. Statist. Soc. B, 60, (725–749).
Anders, T.F., Emde, R.N., & Parmelee, A.H. (eds). (1971). A Manual of Standardized
Terminology, Techniques and Criteria for Scoring of States of Sleep and Wakefullness
in Newborn Infants. Los Angeles: UCLA Brain Information Service.
Antoniadis, A. (1996). Smoothing noisy data with tapered Coiflets series. Scand. J. Statist.,
23, 313–330.
Antoniadis, A., Grégoire, G., & Nason, G.P. (1999). Density and hazard rate estimation for
right censored data using wavelet methods. J. R. Statist. Soc. B, 61, (to appear).
Becker, R.A., Chambers, J. M., & Wilks, A. R. (1988). The New S Language. Pacific Grove,
California: Wadsworth.
Beylkin, G. (1992). On the representation of operators in bases of compactly supported
wavelets. SIAM. J. Num. Anal., 29, 1716–1740.
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and Regression
Trees. Belmont, Calif.: Wadsworth.
Bruce, A., & Gao, H.-Y. (1996). Applied Wavelet Analysis with S-Plus. New York: SpringerVerlag.
Buckheit, J.B., & Donoho, D.L. (1995). Improved linear discrimination using time-frequency
dictionaries. Pages 540–551 of: Laine, A.F., Unser, M.A., & Wickerhauser, M.V. (eds),
Proc. SPIE. Wavelet Applications in Signal and Image Processing III, vol. 2569. Bellingham,
Washington: SPIE.
Burger, D., Cantani, P., & West, J. (1977).
Multidimensional analysis of sleep
electrophysiological signals. Biol. Cybern., 26, 131–139.
Chen, S.S., Donoho, D.L., & Saunders, M.A. (1996). Atomic decompositions by basis pursuit.
Tech. rept. 479. Department of Statistics, Stanford University, Stanford.
Chipman, H.A., Kolaczyk, E.D., & McCulloch, R.E. (1997).
Shrinkage. J. Am. Statist. Ass., 92, 1413–1421.
Adaptive Bayesian Wavelet
Chui, C.K. (1992). An Introduction to Wavelets. London: Academic Press.
Clyde, M., Parmigiani, G., & Vidakovic, B. (1998). Multiple shrinkage and subset selection in
wavelets. Biometrika, 85, (391–401).
Cochran, W.G., & Cox, G.M. (1957). Experimental Designs. New York: Wiley.
Cohen, I., Raz, S., & Malah, D. (1997).
Orthonormal shift-invariant wavelet packet
decomposition and representation. Sig. Proc., 57, 251–270.
Coifman, R.R., & Donoho, D.L. (1995). Translation-invariant de-noising. Pages 125–150 of:
Antoniadis, A., & Oppenheim, G. (eds), Wavelets and Statistics, Lecture Notes in Statistics
103. New-York: Springer-Verlag.
Coifman, R.R., & Saito, N. (1994). Constructions of local orthonormal bases for classification
and regression. Compt. Rend. Acad. Sci. Paris Ser. A, 319, 191–196.
Coifman, R.R., & Wickerhauser, M.V. (1992). Entropy-based algorithms for best-basis
selection. IEEE Trans. Inf. Theor., 38, 713–718.
29
Cook, N.J. (1985). The Designer’s Guide to Wind Loading of Building Structures. London:
Butterworths.
Crouse, M.S., Nowak, R.D., & Baraniuk, R.G. (1998). Wavelet-based statistical signal processing
using hidden Markov models. IEEE Trans. Sig. Proc., 46, (to appear).
Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets. Comms. Pure
Appl. Math., 41, 909–996.
Daubechies, I. (1992). Ten Lectures on Wavelets. Philadelphia: SIAM.
DeHann, R.J., Patrick, Chess, G., & Jaco, N. (1977). Definition of sleep state in the newborn
infant by heart rate analysis. Am. J. Obstet. Gynecol., 127, 753–758.
Delyon, B., & Juditsky, A. (1995). Estimating wavelet coefficients. Pages 151–168 of:
Antoniadis, A., & Oppenheim, G. (eds), Wavelets and Statistics, Lecture Notes in Statistics
103. New-York: Springer-Verlag.
Donoho, D.L. (1992). Interpolating wavelet transforms. Tech. rept. 408. Department of
Statistics, Stanford University, Stanford.
Donoho, D.L. (1995). Nonlinear solution of linear inverse problems by wavelet-vaguelette
decomposition. Appl. Comput. Harm. Anal., 2, 101–126.
Donoho, D. L., & Johnstone, I.M. (1994a). Ideal denoising in an orthonormal basis chosen
from a library of bases. Compt. Rend. Acad. Sci. Paris Ser. A, 319, 1317–1322.
Donoho, D.L., & Johnstone, I.M. (1994b). Ideal spatial adaptation by wavelet shrinkage.
Biometrika, 81, 425–455.
Donoho, D.L., & Johnstone, I.M. (1995). Adapting to unknown smoothness via wavelet
shrinkage. J. Am. Statist. Ass., 90, 1200–1224.
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., & Picard, D. (1995). Wavelet shrinkage:
asymptopia? (with discussion). J. R. Statist. Soc. B, 57, 301–337.
Donoho, D.L., Johnstone, I.M., Kerkyacharian, G., & Picard, D. (1996). Density estimation by
wavelet thresholding. Ann. Statist., 24, 508–539.
Gao, H.-Y. (1997). Choice of thresholds for wavelet shrinkage estimate of the spectrum. J.
Time Series Anal., 18, 231–251.
Gath, I., & Baron, E. (1980). Computerized method for scoring of polygraphic sleep
recordings. Comput. Prog. Biomed., 11, 217–223.
George, E.I., & McCulloch, R.E. (1993). Variable selection via Gibbs sampling. J. Am. Statist.
Ass., 88, 881–889.
Hall, P., & Nason, G.P. (1997). On choosing a non-integer resolution level when using wavelet
methods. Statist. Probab. Lett., 34, 5–11.
Hall, P., & Patil, P. (1995). Formulae for mean integrated squared error of nonlinear waveletbased density estimators. Ann. Statist., 23, 905–928.
Hannah, P., Palutikof, J.P., Rainbird, P.B., & Shein, K. (1996). Prediction of extreme wind
speeds at wind energy sites. ETSU Report W/11/00427/REP.
Harper, R.M., Schechtman, V.L., & Kluge, K.A. (1987). Machine classification of infant sleep
state using cardiorespiratory measures. Electroenceph. Clin. Neurophysiol., 67, 379–387.
Harvey, A. (1993). Time Series Models. 2nd edn. New York: Harvester Wheatsheaf.
30
Haslett, J., & Raftery, A. E. (1989). Space-time modelling with long-memory dependence:
assessing Ireland’s wind power resource. J. R. Statist. Soc. C, 38, 1–50.
Hastie, T.J. & Tibshirani, R.J. (1990). Generalized Additive Models. London: Chapman and
Hall.
Hess-Nielsen, N., & Wickerhauser, M. V. (1996). Wavelets and time-frequency analysis. Proc.
IEEE, 84, 523–540.
Jawerth, B., & Sweldens, W. (1994). An overview of wavelet based multiresolution analyses.
SIAM Rev., 36, 377–412.
Johnstone, I.M., & Silverman, B.W. (1997). Wavelet threshold estimators for data with
correlated noise. J. R. Statist. Soc. B, 59, 319–351.
Kovac, A. (1997). Wavelet thresholding for unequally spaced data. Ph.D. thesis, Department
of Mathematics, University of Bristol, Bristol.
Krzanowski, W.J., Jonathan, P., McCarthy, W.V., & Thomas, M.R. (1995). Discriminant analysis
with singular covariance matrices: methods and application to spectroscopic data. J. R.
Statist. Soc. C, 44, 101–115.
Lachenbruch, P.A., & Mickey, M.R. (1968). Estimation of error rates in discriminant analysis.
Technometrics, 10, 1–11.
Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., & Wells, R.O. (1995). Nonlinear processing of
a shift invariant DWT for noise reduction. Pages 640–651 of: Szu, H.H. (ed), Proc. SPIE.
Wavelet Applications II, vol. 2491. Bellingham, Washington: SPIE.
Learned, R.E., & Willsky, A.S. (1995). A wavelet packet approach to transient signal
classification. Appl. Comput. Harm. Anal., 2, 265–278.
Mallat, S. G. (1989a). Multiresolution approximations and wavelet orthonormal bases of
L2 (R). Trans. Am. Math. Soc., 315, 69–87.
Mallat, S. G. (1989b). A theory for multiresolution signal decomposition: the wavelet
representation. IEEE Trans. Pattn Anal. Mach. Intell., 11, 674–693.
Mallat, S.G., & Zhang, Z. (1993). Matching pursuit in a time-frequency dictionary. IEEE Trans.
Sig. Proc., 41, 3397–3415.
Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate Analysis. London: Academic Press.
McCoy, E.J., Percival, D.B., & Walden, A.T. (1995). Spectrum estimation via wavelet
thresholding of multitaper estimators. Tech. rept. TR-95-14. Statistics Section, Imperial
College, London.
McCullagh, P., & Nelder, J.A. (1989). Generalized Linear Models. 2nd edn. London: Chapman
and Hall.
Meyer, Y. (1992). Wavelets and Operators. Cambridge: Cambridge University Press.
Mintzer, F. (1982). On half-band, third-band and Nth band FIR filters and their design. IEEE
Trans. Acoust. Speech Sig. Proc., 30, 734–738.
Mintzer, F. (1985). Filters for distortion-free two-band multirate filter banks. IEEE Trans.
Acoust. Speech Sig. Proc., 33, 626–630.
Moulin, P. (1994). Wavelet thresholding techniques for power spectrum estimation. IEEE
Trans. Sig. Proc., 42, 3126–3136.
31
Nason, G.P. (1996). Wavelet shrinkage using cross-validation. J. R. Statist. Soc. B, 58, 463–479.
Nason, G.P., & Silverman, B.W. (1994). The discrete wavelet transform in S. J. Comput. Graph.
Statist., 3, 163–191.
Nason, G.P., & Silverman, B.W. (1995). The stationary wavelet transform and some statistical
applications. Pages 281–300 of: Antoniadis, A., & Oppenheim, G. (eds), Wavelets and
Statistics, Lecture Notes in Statistics 103. New-York: Springer-Verlag.
Nason, G.P., von Sachs, R., & Kroisandt, G. (1998). Adaptive estimation of the evolutionary
wavelet spectrum. (submitted for publication).
Neumann, M.H., & von Sachs, R. (1997). Wavelet thresholding in anisotropic function classes
and application to adaptive estimation of evolutionary spectra. Ann. Statist., 25, 38–76.
Ogden, R.T. (1997). Essential wavelets for statistical applications and data analysis. Boston:
Birkhäuser.
Ogden, R.T., & Parzen, E. (1996).
Change-point approach to data analytic wavelet
thresholding. Statist. Comput., 6, 93–99.
Percival, D.B., & Guttorp, P. (1994). Long-memory processes, the Allan variance and wavelets.
Pages 325–344 of: Foufoula-Georgiou, E., & Kumar, P. (eds), Wavelets in Geophysics. San
Diego: Academic Press.
Percival, D.B., & Mofjeld, H.O. (1997). Analysis of subtidal coastal sea level fluctuations using
wavelets. J. Am. Statist. Ass., 92, 868–880.
Pesquet, J.C., Krim, H., & Carfantan, H. (1996). Time-invariant orthonormal wavelet
representations. IEEE Trans. Sig. Proc., 44, 1964–1970.
Priestley, M.B. (1981). Spectral Analysis and Time Series. London: Academic Press.
Ray, S., Lee, W., Morgan, C., & Airth-Kindree, W. (1986). Computer sleep state scoring — an
expert system approach. Int. J. Biom. Comput., 19, 43–61.
Saito, N. (1994). Local feature extraction and its applications using a library of bases. Ph.D.
thesis, Yale University, New Haven.
Saito, N., & Coifman, R.R. (1996). On local feature-extraction for signal classification. Z.
Angew. Math. Mech., 76, 453–456.
Schechtman, V.L., Kluge, K.L., & Harper, R.M. (1988). Time-domain system for assessing
variation in heart-rate. Med. Bio. Eng. Comput., 26, 367–373.
Shensa, M.J. (1992). Discrete wavelet transforms: wedding the à trous and Mallat algorithms.
IEEE Trans. Sig. Proc., 40, 2464–2482.
Smith, M.J.T., & Barnwell, T.P. (1986). Exact reconstruction techniques for tree-structured
subband coders. IEEE Trans. Acoust. Speech Sig. Proc., 34, 434–441.
Stoffer, D.S. (1991). Walsh-Fourier analysis and its statistical applications (with comments).
J. Am. Statist. Ass., 86, 461–485.
Strang, G. (1993). Wavelet transforms versus Fourier transforms. Bull. (New Series) Am. Math.
Soc., 28, 288–305.
van der Hoven, I. (1957). Power spectrum of horizontal wind speed in the frequency range
from 0.0007 to 900 cycles per hour. J. Meteorol., 14, 160–164.
32
Vidakovic, B. (1998). Nonlinear wavelet shrinkage with Bayes rules and Bayes factors. J. Am.
Statist. Ass., 93, 173–179.
von Sachs, R. (1996). Adaptively wavelet-smoothed Wigner estimates of evolutionary spectra.
Z. Angew. Math. Mech., 76, 71–74.
von Sachs, R., & Schneider, K. (1996). Wavelet smoothing of evolutionary spectra by nonlinear
thresholding. Appl. Comput. Harm. Anal., 3, 268–282.
von Sachs, R., Nason, G.P., & Kroisandt, G. (1996). Spectral representation and estimation for
locally-stationary wavelet processes. Proceedings of the workshop “Spline functions and
wavelets”: Montreal. (to appear).
Walden, A.T., & Contreras Cristan, A. (1997). The phase-corrected undecimated discrete
wavelet packet transform and the recurrence of high latitude interplanetary shock waves.
Tech. rept. TR-97-03. Statistics Section, Imperial College, London.
Wickerhauser, M.V. (1994). Adapted Wavelet Analysis from Theory to Software. Wellesley,
Massachusetts: A.K. Peters.
33