Estimating the Hurst Exponent

MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Estimating the Hurst Exponent
Roman Racine
April 14, 2011
Abstract
The Hurst Exponent is a dimensionless estimator for the self-similarity
of a time series. Initially defined by Harold Edwin Hurst to develop a
law for regularities of the Nile water level, it now has applications in
medicine and finance. Meaningful values are in the range [0, 1]. Different
methods for estimating the Hurst Exponent have been evaluated: The
classical “Rescaled Range” method developed by Harold Edwin Hurst. In
addition to nowaday’s standard method, two wavelet-based methods have
been evaluated and compared, one of which is proven the one with the
best convergence [4] developed by Gloter and Hoffmann. A core part of
the project was to write software to implement and compare the different
algorithms.
Contents
1 Problem description
1.1 Definition of the Hurst exponent . . . . . . . . . . . . . . . . . .
2
2
2 Small overview over wavelets
3
3 Wavelet estimator
5
4 Other estimators
4.1 The standard method . . . . . . . . . . . . . . . . . . . . . . . .
4.2 The optimal method . . . . . . . . . . . . . . . . . . . . . . . . .
5
5
6
5 Implementation
6
6 Measurements
8
7 Results
7.1 Bias . . . . . . . . . . . . . . . . . .
7.2 Minimum input length . . . . . . . .
7.3 Comparison of RS, standard, D4 and
7.4 Errors depending on H . . . . . . . .
7.5 Noise . . . . . . . . . . . . . . . . . .
7.6 Runtime . . . . . . . . . . . . . . . .
8 Conclusions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
9
10
12
12
15
17
1
Roman Racine
. . .
. . .
D16
. . .
. . .
. . .
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
9 Higher dimensional cases
9.1 Standard algorithm . . . .
9.2 Rescaled range algorithm
9.3 Other algorithms . . . . .
9.4 Measurement results . . .
9.4.1 Synthetic data . .
9.4.2 Real data . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
19
19
19
19
20
10 Future work
22
10.1 Multidimensional wavelet estimator . . . . . . . . . . . . . . . . . 22
10.2 Wavelet packet algorithm . . . . . . . . . . . . . . . . . . . . . . 22
A Other applications
22
A.1 Stock market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
A.2 Picking on seismic signals . . . . . . . . . . . . . . . . . . . . . . 23
A.3 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
B Software manual
B.1 Overview . . . . . . . .
B.2 Compiling and installing
B.3 Use . . . . . . . . . . . .
B.3.1 Binaries . . . . .
B.3.2 Web service . . .
B.4 Delivered files . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
27
27
27
28
28
Problem description
The Hurst Exponent is a dimensionless estimator for the self-similarity of a time
series. Initially defined by Harold Edwin Hurst to develop a law for regularities
of the Nile water level, it now finds applications in medicine and finance. The
Hurst Exponent H is related to the fractal dimension D by the relation D = 2−
H. For white noise H = 0, for brownian noise H ≈ 0.5; H = 1 indicates directed
motion. Applied to financial data such as stock prices, the Hurst Exponent can
be interpreted as a measure for the trendiness: H < 0.5 high volatility, stock
price is anti trended, H = 0.5, stock price behaves like a brownian process, no
trend, H > 0.5 stock price has a trend. Some people believe that an estimation
of the Hurst Exponent may yield some valuable information on the long-term
behaviour of a particular stock.
A similar interpretation exists for virology applications: H < 0.5 means
that a virus is locally confined, H ≈ 0.5 means the virus makes a brownian (and
therefore undirected) motion H > 0.5 indicates a directed motion. [9, 6]
1.1
Definition of the Hurst exponent
There are several ways to formally define the Hurst Exponent. The oldest
description developed by Harold Hurst himself is as follows:
R(n)
= CnH , n → ∞
(1)
E
S(n)
2
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
The left hand side is also known as the expected value of the rescaled range
(see [1]). R(n) is defined on a time series Xi , i = 1, 2, . . . , n as follows:
R(n) = max(Xi , i = 1, 2, . . . , n) − min(Xi , i = 1, 2, . . . , n)
(2)
S(n) is the standard deviation and C an arbitrary constant.
This immediately yields a procedure how to compute the Hurst exponent.
For some i, i = 1, 2, . . . , N5 (where N is the length of the time series) one computes
R(i)
ai = E
S(i)
For every size i, the right hand side is estimated by partitioning the time series
into chunks of size i. On every chunk, R(i)
and
S(i) for this particular chunk
are computed. The expectation value E
R(i)
S(i)
for the whole time series is then
estimated by taking the average over all sub-results of all chunks. This then
leads to the equation
E[ai ] = CiH
and therefore
log(E[ai ]) = log(C) + Hlog(i)
(3)
Using more than two different i, these equations are generally overdetermined
provided there is enough input data and can be solved using a least squares fit.
The slope of the fit will be the estimated value for H, the constant offset the
estimated value for log(C) which is in this case unimportant. Unfortunately, this
procedure generally shows poor convergence and bias, as will later be shown.
Using this method, Harold Hurst has calculated the Hurst exponent of the
yearly Nile high water which has obviously been recorded for centuries. Interestingly, the estimated value was H = 0.77, meaning that this time series shows
some memory, which makes sense, as for example large lakes on the upper Nile
could act as “memory” between different years.
The rest of this thesis focuses on wavelet-based estimators which are supposed to show better convergence as well as better computational behaviour. As
well, a more modern method than rescaled-range is introduced as a benchmark.
2
Small overview over wavelets
Wavelets are a tool which is used in signal analysis and which are in some
kind similar to the Fourier transform. The main difference is, that the Wavelet
transform is localized in the time as well as the frequency domain, while the
Fourier transform is only localized in the frequency domain. Depending on the
transformation method chosen, there might also be advantages in computational
effort: The wavelet transformation can be computed in O(n) as opposed to
O(nlog(n)) for the Fourier transformation. The result of a Fourier analysis
shows which frequencies have occurred in the signal, but contains no information
about where in the signal they have occurred. The Wavelet transform gives more
freedom: Depending on the chosen Wavelet function, one can make a trade off
between resolution in time space and resolution in frequency space. (see [2]).
3
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 1: Three synthetic traces with different Hurst exponents
Throughout the rest of this thesis, only discrete wavelets will be used, as
the input signal is discrete as well. The wavelets used include the Daubechies
wavelets named after Ingrid Daubechies as well as Coiflets which have been
developed by Ingrid Daubechies and Ronald Coifman. The use of other discrete
wavelets would be possible as well.
A wavelet generally consists of two functions, a low pass and a high pass
filter. Depending on the design of the wavelet, these filters might have special
properties, e.g. the two filters being orthogonal. (All treated wavelets in this
thesis are orthogonal.)
A wavelet decomposition consists of several subsequent applications of
the wavelet on a time series. The most common way to do this is as follows:
1. Decompose the signal, this leads to a low-pass filtered and a high-pass
filtered new signal.
2. Both signals are down sampled by two. The decomposed high-pass filtered signal is called an “octave” of the time series, as it contains the
appropriately sampled signal for a certain octave.
3. Reiterate on the low-pass filtered signal.
For the chosen wavelets in this thesis there are some clever techniques which
allow to do the decomposition in O(n) time and O(n) space. However, this is
not the only decomposition which is widely used. Another one is the so-called
stationary wavelet transform which skips the down sampling described above.
The advantage is that no information is lost due to down sampling, however, the
drawback is that this decomposition needs O(nlog(n)) time and space. (Strictly
4
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 2: scheme of a wavelet decomposition, source: Wikipedia
speaking from an information-theoretical point of view, of course no information is lost after applying the standard discrete wavelet transformation as it is
reversible. However, some of the information is not accessible by algorithms
which are operating just on one octave at a time, which in some cases justifies
the use of the stationary wavelet transformation.)
3
Wavelet estimator
The basic equation to derive the Wavelet estimator is as follows (from [4]):
Qj+1
= C2−2H
(4)
E
Qj
Q is the “energy level” of a certain “octave” (i.e. the high-pass filtered output
of a certain stage of the wavelet decomposition):
X
Qj :=
d2j,k
(5)
k
where dj,k are the coefficients corresponding to the jth octave. C is an arbitrary
unknown constant.
Knowing this, an estimator can be constructed by estimating Qj for different
j: Defining
Qj+1
aj :=
Qj
E[aj ] := C2−2H
(6)
log(E[aj ]) = −2Hlog(C)
(7)
Doing this for several aj , this leads to an overdetermined equation system
and using a linear least squares fit, the Hurst exponent can be estimated as
−0.5 of the slope. This procedure can be run for arbitrary discrete wavelets.
4
4.1
Other estimators
The standard method
What is called “the standard method” in this thesis is an algorithm based on
the following equations (see [9]) where xi , i = 1, 2, . . . , N is the time series. This
5
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
method makes use of two different power laws. First, one makes use of the fact
that for a time series x1 , x2 , . . . , xN with N elements, the right hand side of the
following equation follows a power law scaling with ∆n:
µν (∆n) :=
1
N − ∆n
N −∆n−1
X
|xn+∆n − xn |ν
(8)
n=0
For every ν, µν (∆n) follows a power law in the following sense:
E[µν (∆n)] = C1 ∆nαν
(9)
αν can be estimated by computing µν (∆n) for different ∆n and doing a
linear regression of log(µν (∆n)) against log(C1 ∆n).
After this, αν and ν are coupled through H:
E[αν ] = Hν
(10)
By estimating αν for various ν, one can estimate H by using (10) for a
second linear regression.
I have chosen ν to be in the range i = 1, 2, . . . , 10 and ∆n to be in the range
j = 1, 2, 4, 8, . . . , N/5 where N is the number of elements in the time series.
This algorithm is in O(n2 ) and therefore considerably slower than all of the
others as the measurement results will confirm.
4.2
The optimal method
The optimal method is described in the paper [4]. The authors prove that this
method will lead to the optimal rate of convergence under some assumptions.
They specially treat noisy input data where the algorithm is shown to be optimal. Note, that the optimal method imposes some limitations on the choice of
wavelets. The wavelet needs to have open support in [0, S] (S some integer) and
two vanishing moments, which rules out coiflets and the D2 wavelet. While the
wavelet decomposition itself is in O(n), the optimal algorithm is in O(nlog(n))
because it does not wavelet-decompose the original input data itself but pretreats the input data in O(nlog(n)) time before decomposing it afterwards. As
the results will show, this pre-treating pays off when the data is noisy. However,
also note that having the optimal rate of convergence does not mean that the
algorithm gives the best results for any real-world sized inputs, this just means
that for inputs which are large enough, the algorithm will eventually dominate
all the others in precision.
5
Implementation
The code is written in C++. Besides the helper functions to read and print
data, the outline of the code is as follows:
The base classes are HurstEstimator and Wavelet. The different estimators for the Hurst exponent as well as the different types of wavelets are derived
classes of these two base classes.
The class HurstEstimator basically contains some methods for input and
output. It will read in data, call an estimator and output the data. This is just
some standard C++ code without any sophisticated parts.
6
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
The class Wavelet contains basic wavelet methods described in the previous
sections: The standard decomposition as well as the stationary decomposition,
using what is called the “algorithme à trous”: Instead of down sampling the
data as in the standard wavelet decomposition, the wavelet coefficients are up
sampled by inserting zeros. Using the stationary decomposition, no information is lost due to down sampling, on the other hand, more time and space is
needed O(nlog(n)) instead of O(n). (To state this again: From an informationtheoretical point of view, the information is not lost due to down sampling, but
it’s not accessible for an algorithm which just gets a single octave as an input.)
While the rescaled-range algorithm by Harold Hurst as well as the standard
algorithm can take inputs of arbitrary size, Wavelet algorithms expect the input
length to be a power of two. Also, one needs to make a decision on what to do
at the edges of the input: A wavelet filter needs to have input values for every
filter coefficient, at the edge of the input, these values do not exist. Several
approaches exist on how to overcome these problems. For this project, I have
decided to pad the input with the last known value. The justification for this
choice is, that doing this, a constant signal is introduced into the original signal.
The constant part of the signal is discarded after the decomposition, so this way
should minimize the distortion. However, the Matlab Wavelet Toolbox has a
slightly different handling of edges and produces one element more after down
sampling. This leads to a slightly different decomposition than using the code
written for this thesis. These differences are due to different handling of the
edges and are not fundamental to the implementation of the algorithm.
Besides the described code, a linear least squares solver has been implemented as well which is needed by all estimators. The mathematics behind it
are as follows: All least squares problems in this thesis are of the kind:
ai m + c = bi
(11)
where m is the value of interest. These equations are generally overdetermined,
i.e. more than two of these equations exist. These equations are solved by a
linear least squares fit by defining


a1 1


A :=  ...
(12)

an
1
The equation system can then be written in matrix form:
Ax = r
(13)
where x = [m c]t contains the values looked for.
This is then solved by multiplying At on both sides.
At Ax = At r
The matrix B := At A is always positive definite and can therefore be decomposed using the Cholesky decomposition where R is an upper triangle matrix:
Bx = RRt x = At r
7
Roman Racine
(14)
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Defining b := Rt x, one first forward solves
Rb = At r
After this, one backward solves
Rt x = b
This yields x. Knowing that the matrix B = At A is always positive definite
makes it possible to use the Cholesky decomposition instead of Gauss elimination and eliminates the need for handling the numerical stability which is a
common problem for Gauss elimination. See [5] p. 145 for a detailed description.
6
Measurements
The following algorithms have been compared:
1. Rescaled Range
2. Standard method
3. Optimal method with wavelets D4, D8, D16 and D64 (Described in [4]).
4. Wavelet method with wavelets D2, D4, D8, D16, D64, C6 and C12 (DN:
Daubechies wavelet with N coefficients, CN: Coiflet with N coefficients)
5. Wavelet method using stationary wavelet decomposition with wavelets D2,
D4, D16, D64 and C12
Artificial data has been created with Hurst exponents of H = 0.0, 0.1, . . . 1.0
and with trace lengths of 28 to 220 . Thirty-two test sets for each combination of
H and length have been generated, average and standard deviation have been
computed. The relevant criteria were the accuracy of the estimator as well as
the standard deviation of the results.
The tests were run with the data as is as well as with various levels of noise
added in a way which will be described in the next section.
7
Results
Tests have been run on simulated data for all combinations of input length
26 , 28 , 212 , 216 , 220 , H = 0.0, 0.1, . . . 1.0 and hurst estimators on thirty-two test
sets for each combination to compute the standard error yielding more than
1500 test results. Only a small subset of all the data is shown in the following
graphics. The error bars displayed on all images mark the standard deviation
which was computed over all thirty-two test sets.
7.1
Bias
Some initial testing with a “naive” implementations of “Rescaled Range” and
“Standard” has shown some significant bias. After some discussion, two main
reasons for the bias were identified:
8
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 3: Comparison of three different implementations of the Standard algorithm for H = 0.2 and different input sizes. Green: Naive implementation,
Blue: Taking into account that window sizes should not be larger than N5 , Red:
16
Additionally limiting any window size to a maximum of 25 . Red and blue are
identical up to 216 .
• Taking into account window sizes which are too large compared to the
length of the time series. As a general rule of thumb, one should not use
windows which are larger than N5 where N is the length of the time series.
• For very large time series (larger than 216 ), one should additionally limit
16
the upper size of any window to about 25 .
Figure 3 shows a comparison between the naive implementation, the limita16
tion to N5 and the second limitation to 25 for large inputs for a hurst estimator
of H = 0.2. From the data, it’s clear that these two steps will significantly
reduce bias of these two algorithms.
After optimizing the maximum window size for the two window-based algorithms, all of the measurements have been run with the optimized versions.
Raw data for the non-optimized versions is delivered as add-on.
7.2
Minimum input length
Results clearly show that most of the tested algorithms are not suited for small
inputs (e.g. 26 data points). The wavelet-based algorithms need to fill their
filter banks in order to produce some useful output. A large percentage of the
data worked on will consist of padding when running the algorithms on short
inputs. Also the rescaled range algorithm is very unstable on too short inputs
and should therefore not be used. Generally, the following rules of thumb hold:
9
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 4: Results for synthetic data with H = 0.2
Up to input sizes of 256, only the standard algorithm can be used. After this,
wavelet-based algorithms using wavelets with few wavelet coefficients (up to
D16) as well as rescaled range can be used as well. Wavelet-based algorithms
using bigger wavelets such as D32 or D64 should only be used on very large
data sets as the requirement for padding grows with the number of wavelet
coefficients in a wavelet.
Generally, larger input sets will also reduce the standard error of the estimation, which is evident on all subsequently shown graphs in this section.
Especially estimations on very short input data sets should be treated with
care, as the standard error is significant! All of the graphs in this section only
contain information for input sets of the size 26 for the standard algorithm.
All of the other algorithms either cannot produce any result because they need
some minimum input size, or have that little accuracy or a that large standard
error, that they are completely unusable. Based on the results of these tests
I have set some minimum sizes for the input data length in the code of the
delivered software, so that users do not run unreasonable combinations of input
data length and algorithms. Of course this can be easily changed in the code if
really needed, but this should be done with care.
7.3
Comparison of RS, standard, D4 and D16
The first series of graphs shows a comparison of the Hurst estimators “Rescaled
Range”, “Standard”, “Wavelet” (using D4) and “Wavelet” (using D16). Results
are shown for a synthetic traces with H chosen as 0.2, 0.5 and 0.8.
It’s obvious from the graphs that the wavelet-based methods start with a
huge standard error for small datasets compared to the classical algorithms but
10
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 5: Results for synthetic data with H = 0.5
Figure 6: Results for synthetic data with H = 0.8
11
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 7: Comparison of the optimal algorithm to others with H = 0.2
will eventually outperform all other algorithms for long inputs. By looking
at the results, the best way to work with estimators would be to choose the
classical algorithms for small inputs and the wavelet-based algorithms for large
inputs. This even gets clearer when one takes the optimal algorithm by Gloter
and Hoffmann into account (called “opt” in the image labels) (see Figure 7).
Although only a graph for H = 0.2 is shown, this holds for all other tested
values as well.
7.4
Errors depending on H
Another observation is that the accuracy depend on H for the classical algorithms (“Rescaled Range”, “standard”). It tends to be better around H ≈ 0.5
and gets worse at the extremes. Both algorithms seem to have a bias towards
0.5 while the wavelet-based algorithms are bias-free. Figure 8 illustrates this.
7.5
Noise
In contrast to synthetic traces, most real-world data will contain noise, as measurements cannot be carried out with arbitrary precision. The authors of [4]
claim their algorithm treats noise best. I have run all algorithms with the same
test sets with noise added to it. Noise has been modeled as random normal
distributed factor with average 1 and standard deviation l to each data point:
yi := xi R
(15)
Where R = N (1, l). I have run noise tests for l ∈ {0.05, 0.1, 0.2}. On the
one hand, it’s clear that multiplying random distributed noise will influence
12
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 8: Plot of true H against estimated H with 216 data points as input
the Hurst exponent of a time series. However, it’s very interesting to see that
already for l = 0.05, most of the algorithms show very poor performance (see
Figure 9). As can be seen from the figures, RS and Std are already unusable
for l = 0.05. The standard wavelet algorithm starts to give completely useless
results with l = 0.1 (see Figure 10). Only the optimal algorithm can handle
this level of noise. However, keep in mind that the optimal algorithm needs
large input sets (size 216 or larger, see figure 7) to get to a reasonable level of
standard error. If only small input sets are available, the only thing to do is to
reduce noise in measurements as far as possible. Otherwise, there is no way to
estimate the Hurst exponent from the data as there is no suitable algorithm for
small data sets which can cope with a significant amount of noise.
After these results, I have run more tests only with the optimal algorithm for
l = 0.4 and l = 0.6 to explore the limitations of this algorithm. Figure 11 shows
the results. The results show that the optimal algorithm even at higher noise
level gives some useful results. However, the standard errors keep on growing, so
large input data sets well beyond 216 are needed to estimate the Hurst exponent
with a certain confidence.
The conclusion from the noise measurements is, that for short input sets
(such as 64 data points), where the standard algorithm is the only one which is
available, measurement accuracy is crucial, as even small amounts of noise will
distort the results. Algorithms which cope well with noise need very large input
sets to reach a certain precision.
13
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 9: Plot of different algorithms on a data sets of size 216 with multiplicative noise normally distributed with N (1, 0.05)
Figure 10: Plot of different algorithms on a data sets of size 216 with multiplicative noise normally distributed with N (1, 0.1)
14
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 11: Plot of the optimal algorithm (using D16) on a data sets of size 216
with multiplicative noise normally distributed with standard deviation l as in
the legend.
7.6
Runtime
Run time measurements have been made for inputs between 28 and 220 . The
measurements have been made on a machine with an Intel Core Duo Quad
CPU and 4GB RAM. The complete test bench is written single-threaded. The
given values are pure CPU time in the maximum resolution available. Figure
12 shows the run-time behaviour of the different algorithms. It’s clearly visible
that the standard algorithm is in O(n2 ), the Optimal algorithm in O(nlog(n))
while the others are in O(n).
Another metric is the run time needed to achieve a certain precision. This
is based on the following consideration: The confidence interval for a series of
n measurements of a random variable X is computed as follows:
σ
ci = tp,n √
n
(16)
P (x̄ − ci ≤ X ≤ x̄ + ci) ≥ p
(17)
where for any X holds
where σ is the standard deviation of the random variable X and x̄ its average
and tp,n the corresponding value of the t-distribution for a certain confidence
level and a certain n. (See [8], chapters 6 and 7.)
This means that quadrupling the input will make the confidence intervals
shrink by a factor of two. Depending on the inherent precision of the algorithm
in question (i.e. how much standard error it produces), more or less input data is
15
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 12: Runtime comparison of the different algorithms.
Figure 13: Runtime comparison of the different algorithms without std algorithm for better overview.
16
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
needed to meet a certain target. If the desired confidence interval is cides = 0.05
and the measured confidence interval with n input data is c, then the number
ndes of input data points needed to reach cides is
ndes = n
c
cides
2
)
(18)
Table 1 first shows the raw values for the amount of input data needed to
reach a confidence interval of at most 0.05 around the average value. These
values are to interpreted like this: For a given algorithm, measuring as many
time series with the run length given on the top so that the total number of data
points (number of time series times length of each run) equals the value in the
table will guarantee that in 95% of all cases doing this the resulting average will
be within ±0.05 around the average that a certain method estimates. However,
this is just a guarantee for the precision (being within a certain range around an
average), it does not guarantee the accuracy of the result (being within a certain
range around the true value). As has been shown in the previous sections, some
algorithms tend to have a bias. While by looking at the table and keeping in
mind that some algorithms have super-linear run time, it might be tempting
to cut long datasets into several short data sets, as these will run faster and a
certain level of precision is reached within a shorter time. However, the increased
precision is paid by lower accuracy, as it is clear from the previous subsections,
that runs with small data sets will produce less accurate results. Depending on
the application, it might of course be worth paying.
The next step is to take into account the run time. This is done by simply
extrapolating the measured run times to the number of data points calculated in
table 1. This way, one knows not only how many input data points one needs,
but also how much time it will cost to process them using a certain algorithm.
Depending on how long it takes to measure the data compared to process the
data (or how much it costs or some other metric), different trade-offs might be
chosen. This information is displayed in table 2.
Generally, one can see, that for input sequences up to about 212 , the standard
algorithm needs the fewest input to achieve the desired precision. For longer
inputs, wavelet based algorithms start to be competitive. This gets even clearer
when taking run time into account. As the wavelet based algorithms are in
O(n) compared to the standard algorithm which is in O(n2 ), they need much
less time to achieve the desired precision. The optimal algorithm completely
loses this test at least for input sizes up to 220 . However, its strengths are its
resilience against noise.
8
Conclusions
The following conclusions can be made after the measurements. There is no
free lunch. Except for the rescaled range algorithm developed by Harold Hurst
himself which is dominated by other algorithms in all tested areas, all algorithms
have their strengths and weaknesses.
• The standard algorithm can be used for short inputs (such as 26 ). Its
the only algorithm which delivers reasonable results for small input sets.
However, it has quadratic run time and is therefore significantly slower
17
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Algo / Run length
RS
Std
Wavelet (D4)
Wavelet (D16)
Wavelet (D64)
Opt (D16)
26
1,459
-
Bachelor Thesis
28
14,710
2,095
6,730
31,920
17,600
212
23,850
3,994
8,520
7,910
9,593
347,400
216
39,560
40,605
66,200
27,450
12,542
897,600
220
694,430
91,119
276,510
128,490
50,170
3,054,600
Table 1: Total number of measured data points needed to achieve a confidence
interval of ±0.05 given a certain length of each individual run shown for different
algorithms using synthetic traces with H = 0.5.
Algo / Run length
RS
Std
Wavelet (D4)
Wavelet (D16)
Wavelet (D64)
Opt (D16)
26
0.0486
-
28
0.0103
0.1039
0.0066
0.0249
0.0192
212
0.0298
0.3047
0.0065
0.0060
0.0293
1.0602
216
0.0698
4.7592
0.0284
0.0484
0.0885
5.8637
220
1.204
10.8804
0.1434
0.2416
0.3677
62.1483
Table 2: Time needed to achieve a confidence interval of ±0.05 using synthetic
traces with H = 0.5. All values are in seconds.
than the wavelet-based estimators on long inputs. But as a positive aspect,
it needs least input data for small input sets up to 212 to achieve a certain
precision. On the other hand, its also very prone to noise in the data and
has some bias in contrast to the wavelet-based algorithms and the optimal
algorithm which are bias-free.
• The wavelet-based algorithms have a broad range where they perform
well. Their good convergence means that they need less input data as
other algorithms specially for large time series and their linear run time
makes them fast. For very long input sets such as 220 or larger, wavelets
with more coefficients start to get interesting, as they need less input than
smaller wavelets. Also, the wavelet-based algorithms have some resilience
against noise. The drawback is that wavelet algorithms can only run on
data sets with a certain minimum length, e.g. 128 or even 256 as they
need some input data to fill their filter banks. They are not suitable for
short input sets.
• The optimal algorithm shows weaker performance than the waveletbased algorithms in most tests up to the maximum input size tested. On
the other hand, its resilience against noise is superior to all other tested
algorithms. It’s the only algorithm which can be used in the tested noise
model with multiplicative noise with a distribution of N (1, 0.1) or more.
However, also the optimal algorithm is not suitable for small input sets,
as it shares the same problems as the standard wavelet-based algorithms.
In general, different aspects must be taken in to account before choosing
the algorithm. For short inputs, only the standard algorithm is suitable. As
it is prone to noise, this algorithm must be used with care. For large input
18
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
sets, wavelet-based algorithms will show better overall performance. For noisy
data sets, depending on the level of noise, the optimal algorithm or the other
wavelet-algorithms can be used. In this case, one needs to use larger input data
sets.
9
Higher dimensional cases
9.1
Standard algorithm
The standard method can be easily extended to two and more-dimensional applications. For the two-dimensional case, the equation (8) can be adjusted like
this:
µν (∆n) :=
1
N − ∆n
N −∆n−1
X
kxn+∆n − xn kν2
(19)
n=0
Note that the absolute value has now changed to the L2 norm.
9.2
Rescaled range algorithm
Also the rescaled range method can be extended to a more-dimensional case.
However, equation (2) needs to be reformulated like this:
R(n) = maxi,j (dist(xi , xj ))
(20)
As this leads to a significant computational effort, I have chosen to do the
following instead for every time window starting at index l and of size n:
l+n
x̄ =
1X
xi
n
(21)
i=l
R(n) = 2maxx (dist(x, x̄))
(22)
The standard deviation S(n) is computed as
N
S(n) :=
9.3
1 X
kx − x̄k2
N − 1 i=1
Other algorithms
The use of wavelet algorithms for multidimensional cases is not that easy, as
multidimensional wavelets would be needed for this. I have not followed this
path.
9.4
9.4.1
Measurement results
Synthetic data
First, these algorithms have been run against synthetic data sets to verify
their correctness. The synthetic data sets have H ∈ {0.0, 0.5, 1.0} and l ∈
{26 , 28 , 212 , 216 , 220 }. One graph for each value of H is shown. As one can
19
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 14: Results for synthetic data with H = 0.0. The error bars for Std are
so small that they are not visible.
Test case
All3T6 2
All3T6 4
All3T6
AllCtrl
AllLatA
AllMCD
AllMCDLatA
estimated H
0.112
0.0723
0.1347
−0.0205
0.373
0.0146
0.0718
Ci lower bound
0.0685
0.0519
0.115
−0.0293
0.352
0.00679
0.0401
Ci upper bound
0.155
0.0928
0.155
−0.0117
0.395
0.0223
0.103
Table 3: Measurements for real data.
see, the rescaled range algorithm shows a much poorer convergence than in
the one dimensional case. I suppose that this is because of some design decisions made such as estimating he maximum distance between two points using
R(n) = 2maxx (dist(x, x̄)) which of course is not exactly the true value. However, I did not further follow this path, as the rescaled range algorithm is known
to have poor performance from the one dimensional case and should generally
not be used.
9.4.2
Real data
The tests with real two-dimensional data have only been made with the standard
algorithm in 2D. The data consists of a number of different test cases with several
individual runs each containing several hundred two dimensional data points.
Table 3 shows test case, estimated Hurst exponent and the 95% confidence
interval bounds.
20
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 15: Results for synthetic data with H = 0.5
Figure 16: Results for synthetic data with H = 1.0
21
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
10
10.1
Bachelor Thesis
Future work
Multidimensional wavelet estimator
Multidimensional Hurst estimators only been treated very limited. More work
could be done to adapt wavelet-based estimators for 2D or even higher dimensional applications. The implementation which has been made in this thesis is
based on the “standard algorithm“ and therefore is very slow for large input
sets.
10.2
Wavelet packet algorithm
As described in [6], the Hurst exponent can also be estimated using the wavelet
packet algorithm. This algorithm is widely used in image compression and gives
more degrees of freedom than the classical wavelet decomposition. Instead of
only decomposing the low-pass filtered output of one decomposition step, the
wavelet function is applied to the high-pass filtered output as well. Afterwards,
a best basis is built from all of the decomposed data.
A
A.1
Other applications
Stock market
I have applied the Hurst estimators to historical closing prices of the blue chips of
the Swiss stock market. Some measurement results (using a static decomposition
wavelet estimator) are shown in the table below. The data used is from June
2005 to May 2010.
ABB
0.44
Actelion
0.47
Baer
0.43
Credit Suisse 0.31
Holcim
0.40
Lonza
0.35
Nestlé
0.48
Novartis
0.38
Richemond
0.51
Roche
0.42
Swiss Re
0.34
Swisscom
0.42
Swiss Life
0.49
Syngenta
0.49
Synthes
0.42
UBS
0.48
Swatch
0.52
Zurich
0.45
This does not look very promising, if the values were clearly over 0.5 in all
cases, making money with stocks would be a lot easier. Probably, more information could be taken out of the data, if not only closing prices (i.e. prices at the
end of the day) would be taken into account or if a time window had been chosen which does not contain huge turbulences on the financial markets. It would
22
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
also be interesting to investigate whether the Hurst exponent has dropped after
the introduction of electronic trading as well as after making real-time trading
available to the broad public through on-line trading. However, appropriate
historical data is not freely available.
A.2
Picking on seismic signals
As I am working at the Swiss seismological service and we are currently evaluating new software to locate earthquakes, I have tested a wavelet-based algorithm
described below against other algorithms we are evaluating. Given a measurement signal (basically a time series) as the input, the problem is to find as
exactly as possible the onset time of an arriving P wave (primary wave, the
fastest seismic wave and therefore the first to hit a measurement station). The
complicated part about this is to cope with background noise and to get the onset time as exactly as possible. A usual problem for all of the algorithms is that
they might miss the P wave and only declare the onset time when the larger but
slower S wave (secondary wave) arrives, a problem which all algorithms using
threshold values are prone to. Having some picks on the P and some on the S
wave will distort the results. Various algorithms have been developed to pick
earthquakes and to filter mispicks.
I have implemented a wavelet-based picker which basically does the following
1. Make a stationary wavelet transform. This gives several time series, each
containing one octave of the original signal.
2. For each octave, estimate the average and the standard deviation of the
energy (squared decomposed values) on the first n seconds. This is an
estimate for the noise in the respective frequency band.
3. Run a sliding window over the rest of the data, declare a pick if the energy
reaches at least m times the average plus l times the standard deviation.
This might result in no pick as well if there is too much noise or too little
signal.
4. Doing this on all octaves generally results in a number of picks. If there
are more than k picks, a linear least squares fit is run. As every filter
dilutes the signal (because the wavelet is a windowed function), using the
least squares fit allows an estimate on what the pick would be on the
undecomposed signal. For every decomposition step, an equation like this
is produced:
ti m + c = i
(23)
where i is the number of decompositions already done (e.g. for i = 0 this
would be the original signal), ti the picking time on the corresponding
octave, m and c unknowns. Solving the linear least squares problem and
setting i = 0 leads to the desired value.
Good values for the parameters k, l, m and n need to be found by prior
knowledge and testing.
This procedure can be refined by using a wavelet with a high resolution in
the frequency domain (but a poor in the time domain) to do a first round of
picking. This way, noise is more probably contained in just a few frequency
23
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
bands, but picks are poor as the resolution in the time domain is poor. After
this initial round, a second round is done with a wavelet with a high resolution
in the time domain (but a poor in the frequency domain). Instead of picking all
over again, only fine adjustments of the original picks are made within a tiny
window. This way, noise is not that much of a problem any more. The wavelets
D64 and D16 have been used for this purpose.
The assessment of the wavelet picker was made against the Baer-Kradolfer
picker from ETH (see [3]) and AR-AIC (see [10]) using a predefined set of various
events which have been reviewed manually before. All of the algorithms have
been run as a plugin to seiscomp (s. [7]). Testing is still ongoing. Some results
show that the wavelet picker generally makes less picks than other available
pickers and has far less false positives. However, it also misses more picks than
other algorithms. Parameters can be tuned for all available algorithms, and
depending on the testing metric (how much to punish false positives and how
much to punish missed picks) results can be quite different.
A.3
Images
To illustrate the principle and the usefulness of a stationary wavelet transform,
two typical seismic signals as well as their static wavelet decomposition are
shown. The first one (BRANT) shows a typical signal of a remotely located
station which is distant to the earthquake. Most of the noise is low-periodic.
Also note that little energy is in the high frequency part of the earthquake signal.
This shows that the earthquake is at quite some distance, as the earth acts as
a low-pass filter. The second shows a typical signal of a station located close to
houses and close to the earthquake (WILA). A lot of high periodic noise shows
the presence of civilization (cars, machines etc.). Also note that the energy is
spread over the spectrum, this shows that the earthquake has happened close.
If the distance between earthquake and station were bigger, less energy would
be in the high frequency part of the earthquake signal. The colors have been
adjusted to use the full possible range and do not correspond with specific exact
values.
The method on how to graphically display the information is taken from [2].
24
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 17: Seismic signal of station BRANT, vertical component, x axis in
1/120s, y axis velocity in raw counts
Figure 18: Scalogram display of static wavelet decomposed data for station
BRANT, x axis: 1/12s, y axis: top highest octave, bottom lowest displayed
octave, red: lot of energy, blue little energy
25
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
Figure 19: Seismic signal of station WILA, vertical component, x axis in 1/120s,
y axis velocity in raw counts
Figure 20: Scalogram display of static wavelet decomposed data for station
WILA, x axis: 1/12s, y axis: top highest octave, bottom lowest displayed octave,
red: lot of energy, blue little energy
26
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
B
Bachelor Thesis
Software manual
B.1
Overview
B.2
Compiling and installing
The core part of the software consists of C++ template classes. No other
libraries than the standard C++ library is needed to compile them, so this
should work on every common platform. The web front end is written in Perl and
only uses the libraries which are distributed together with the Perl compiler, no
additional software installations are required. It makes use of the CGI interface.
The synthetic traces have been created by a Matlab routine delivered by Ivo
Sbalzarini. The testing routines to drive the test benches partially make use
of the Perl library Math::Random which is not part of the core distribution of
Perl but is available as a separate package on common Linux platforms and on
http://www.cpan.org. However, this part is not necessary for using the software.
Compiling the software should be as simple as changing to the source directory and typing ”make“ and ”make install“ which will copy the files to
/usr/local/bin.
To install the web interface, copy web interface.pl to a web server directory
and make sure that CGI is enabled for the particular web directory. You might
need to adjust the parameters in the first section of the script according to your
needs (maximum file size, location of the hurst estimator binary). As executing
the web interface script with large volumes of data can impose a heavy load on
the system, you will probably want to limit access to the script to particular
users, e.g. by password protection or any other mechanism supported by the
web server.
B.3
B.3.1
Use
Binaries
After compiling, the following binaries should have been produced:
• hurst: This is the main binary. A help message is displayed when run
without parameters.
• test 1d: This is a binary for testing the library against 1D input data.
This binary is designed to be driven through testbench 1d.pl.
• test 2d: This is a binary for testing the library against 2D input data.
This binary is designed to be driven through testbench 2d.pl.
• static wavelet transform: This binary takes a time series as input and
creates output which can then in turn be used as input for the Matlab
”image“ command to produce visual output of a static wavelet transform.
The chosen parameters are optimized to display seismic waveform signals
in a format which is typical at my working place. In case of other inputs,
the routines will need to be fine-adjusted in order to produce an optimal
visual result.
27
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
B.3.2
Bachelor Thesis
Web service
The web service should be pretty straightforward to use. Files can be uploaded
in plain text format. The following input format is expected: In case of an 1D
algorithm, the input format might be two values on one line or one single value
on a line. In case of a single value on a line, the whole input file is treated as a
single time series, in this case, the output will be the estimated hurst exponent
for all chosen algorithms.
In case of two values per line, the first value is assumed to be an index
and the second one the corresponding value. In case an index is lower than
its predecessor, the program assumes that a new data set has begun. In case
of multiple data sets, the output value will be the average value, the standard
deviation, the lower and upper 95% confidence interval.
In case a 2D algorithm has been chosen, in case of two values per line,
the algorithm assumes a single input set, in case of three values per line, the
first value is treated as an index. In all other aspects, the same information is
displayed as in the 1D case. 1D and 2D algorithms cannot be mixed on the
same data set (this would make absolutely no sense anyway).
Blanks, tabulators, commas and semicolons or any combination of them are
accepted as separators. In case the input is invalid or too short for a certain
hurst estimator, the result will be -1.
B.4
Delivered files
As a general remark, the standard method is called ”MSS“ in all of the delivery
files as well as in the source code.
• measurement output: The measurement output resides here. The files
result 1d*.txt contain the one dimensional measurements, result 2d*.txt
the two dimensional measurements. For the one dimensional case, noise
measurements have been made as described in section 7.5. In this case,
the value at the end of the file name describes the level of noise introduced
into the data, e.g. result 1d 0.2.txt is the measurement output for multiplicative noise distributed as N (1, 0.2). The output files are formatted
as follows: For every measured combination of simulated H and input
length, a section starts with H l. After this on every subsequent line of
the section, a line is formatted like this: Name of the estimator, average,
standard deviation, stddev
avg , lower bound of the 95% confidence interval,
upper bound of the 95% conficence interval, running time.
Example: RS: 0.12461 0.01705 0.13682 0.11871 0.13052 2.18125
This describes a measurement with the rescaled range algorithm with average 0.12461, standard error 0.01705, f racstddevavg = 0.13682, lower
bound of the conficence interval 0.11871, upper bound 0.13052 and run
time 2.18125s.
• src: The source files as well as the perl scripts as described earlier in
this section reside here. The directories data 1d and data 2d contain the
synthetic traces used to test the algorithms as gzipped files. The content
of the files is coded like this: 0.2 10 27.data.gz means the file contains a
synthetic trace with H = 0.2, containing 210 data points. 27 is the run
28
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
number. data real contains real world data from biological applications
in 2D. data seismic contains two sample files for static wavelet transform.
The directories html and latex are generated by doxygen.
• report: Contains this thesis as well as the source files.
29
Roman Racine
MOSAIC Group, Prof. Ivo F. Sbalzarini, ETH Zurich
Bachelor Thesis
References
[1] Wikipedia article on rescaled range.
[2] W. Bäni. Wavelets, eine Einführung für Ingenieure (2. Auflage). Oldenbourg, München, 2005.
[3] M. Bär and U. Kradolfer. An automatic phase picker for local and teleseismic events. Bulletin of the Seismological Society of America, 77(4):1437–
1445, 1987.
[4] Arnaud Gloter and Marc Hoffmann. Estimation of the hurst parameter
from discrete noisy data. Annals of Statistics, 35(5):1947–1974, 2007.
[5] Gene H. Golub and Charles F. Van Loan. Matrix computations (3rd ed.).
Johns Hopkins University Press, Baltimore, MD, USA, 1996.
[6] C.L. Jones, G. T. Lonergan, and D. E. Mainwaring. Wavelet packet computation of the hurst exponent. Journal of Physics A: Mathematical and
General, 29, 1996.
[7] Geoforschungszentrum Potsdam. Seiscomp3, http://www.seiscomp3.org.
[8] John A. Rice. Mathematical Statistics and Data Analysis, Third Edition.
Thomsoon Brook/Cole, 2006.
[9] Ivo F. Sbalzarini. Moments of displacement and their spectrum.
[10] R. Sleeman and T. van Eck. Robust automatic p-phase picking: An online implementation in the analysis of broadband seismogram recordings.
Physics of the earth and planetary interiors, 113:265–275, 1999.
30
Roman Racine

Download Report

Estimating the Hurst Exponent

Paperzz.com

Your Paperzz